xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-13 16:25:46 +08:00

Author	SHA1	Message	Date
Tao He	99b3a504c5	[Qwen3-Next][GDN] fixes cuda graph capturing bug in GDN metadata and a stride bug in causal_conv_1d. (#25743 ) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>	2025-09-26 01:18:58 -07:00
Andrew Sansom	d48f4d6daf	perf: Avoid copying inputs_embeds tensors to GPU unless prompt_embeds is enabled (#25739 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai>	2025-09-26 01:18:09 -07:00
Eugene Khvedchenya	392edee34a	EVS Support (Video tokens pruning) (#22980 ) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-09-26 11:54:54 +08:00
Ekagra Ranjan	e71b8e210d	[Spec Decode] Add Batch Parallel Ngram. Upto 8x lower overhead. (#24986 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-09-25 15:22:03 -07:00
Cyrus Leung	0ea80c87d9	[Model] Define `merge_by_field_config` MM interface (#25676 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-25 17:13:07 +00:00
AlonKejzman	e04a1b6b21	[BUGFIX] Fix crash in Eagle Speculative Decoding models when exceedin… (#24662 ) Signed-off-by: AlonKejzman <alonkeizman@gmail.com>	2025-09-25 15:40:14 +00:00
Cyrus Leung	755ed7b05b	[Misc] Simplify PoolerOutput and move to `v1/outputs` (#25629 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-25 06:47:03 +00:00
XuruiYang	845adb3ec6	[Model] Add LongCat-Flash (#23991 ) Signed-off-by: yangxurui <yangxurui@meituan.com> Co-authored-by: yangxurui <yangxurui@meituan.com>	2025-09-24 21:53:40 -07:00
Roger Wang	42488dae69	[Bugfix] Fix dummy video number of frames calculation (#25553 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-09-24 09:47:30 +00:00
Lucas Wilkinson	dc464a3d39	[BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch (#25505 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-09-23 18:00:29 -06:00
Wentao Ye	8b8a8afc89	[CI] Fix Pre-commit Issue (#25497 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-24 04:09:37 +08:00
jiahanc	d5944d5146	[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue (#25406 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-09-23 15:44:35 -04:00
Michael Goin	24fab45d96	[Perf] Change default CUDAGraphMode from PIECEWISE to FULL_AND_PIECEWISE (#25444 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-23 15:29:26 -04:00
Lucas Wilkinson	cc1dc7ed6d	[Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support (#24845 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-09-23 16:02:10 +00:00
Matthew Bonanni	ac0048c0ae	[BugFix] [DP/EP] Fix slow execution when BS <= DP (#25407 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Chris Bamford <chrisbam4d@gmail.com>	2025-09-22 17:26:17 -07:00
Woosuk Kwon	1c3ffdbecc	[V0 Deprecation] Remove V0 sampling metadata (#25345 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-09-21 10:37:11 -07:00
Wenlong Wang	032d661d27	[Docs] Fix warnings in mkdocs build (continued) (#25042 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-09-20 11:45:18 +00:00
Chen Zhang	9607d5eb44	[Hybrid Allocator] Support full attention with different hidden size (#25101 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-09-19 23:43:59 -07:00
Lucas Kabela	3da17c2cc2	[Bugfix] Remove VLLM_TEST_DYNAMO_FULLGRAPH_CAPTURE #2969 (#25090 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2025-09-19 20:27:21 -04:00
Nick Hill	14c1432789	[BugFix] Fix async scheduling CPU tensor race take 2 (#25279 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-19 16:34:07 -07:00
Jee Jee Li	2821986450	[Core] Modify the initialization parameters of the lora manager (#25249 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-19 18:01:28 +00:00
Andrew Sansom	9a4600e4dc	[CORE] Prompt Embeddings Support for v1 Engine (#24278 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai> Signed-off-by: Andrew Sansom <qthequartermasterman@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-09-19 08:03:09 +08:00
Aziz	38db529f66	[feat]: Create interface for model-specific M-RoPE (#24194 ) Signed-off-by: AzizCode92 <azizbenothman76@gmail.com> Signed-off-by: Aziz <azizbenothman76@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-09-18 19:18:56 +00:00
Shanshan Shen	470484a4f5	[Structured Output][Refactor] Move `apply_grammar_bitmask()` method from `ModelRunner` to structured output utils (#21999 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-09-18 20:44:31 +08:00
Jee Jee Li	37970105fe	[Model] Improve Pooling Model (#25149 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-18 11:04:21 +00:00
Benjamin Chislett	b7433ca1a4	[Spec Decode] Efficient padded speculation (#24539 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-09-18 01:07:24 -04:00
Russell Bryant	58d4c705a8	[Core] Get num_encoder_tokens from scheduler config (#24989 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-09-16 20:59:07 -07:00
Nick Hill	eeb135eb87	[Core] Use `CpuGpuBuffer` for block table tensors (#24795 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-16 19:18:06 -07:00
Sage Moore	567939953b	[Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM (#23693 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-09-16 12:21:48 -04:00
cascade	17871983a2	[Bugfix] Fix sequence parallelism bug when enable pipeline parallelism (#24021 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-09-16 04:32:32 +00:00
Woosuk Kwon	3e903b6cb4	[Chore] Minor simplification for non-PP path (#24810 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-09-13 17:41:36 -07:00
Didier Durand	bcb06d7baf	[Doc]: fix typos in various files (#24726 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-12 06:43:12 -07:00
Flora Feng	0377802c20	[Multimodal] Remove legacy multimodal fields in favor of MultiModalFeatureSpec (#24548 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2025-09-12 21:42:23 +08:00
Boyuan Feng	94e6b2d55f	Allow users to specify kv cache memory size (#21489 ) Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-11 13:41:07 +00:00
Tao He	e93f4cc9e3	Add the support for the qwen3 next model (a hybrid attention model). (#24526 ) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-11 15:32:09 +08:00
Nick Hill	e2d8c27f68	[BugFix] Fix pipeline parallel (#24621 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-10 23:05:30 -07:00
Michael Goin	fba7856581	[Perf] Warmup FlashInfer attention during startup (#23439 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com>	2025-09-10 15:03:17 -07:00
Russell Bryant	37e8182bfe	[v1] Add Whisper model support (encoder-decoder) (#21088 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: NickLucche <nlucches@redhat.com>	2025-09-10 13:53:35 -07:00
Nick Hill	f4f1a8df22	[BugFix] Ensure integrity of reused CPU tensors during async scheduling (#24527 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: guoze.lin <guozelin@tencent.com>	2025-09-10 21:15:14 +08:00
Lucas Wilkinson	0ae43dbf8c	[Attention] add DCP support for FLASH_ATTN_MLA backend (#24453 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2025-09-10 17:19:26 +08:00
Micah Williamson	1c63a16b65	[Core] Run garbage collector after CUDA graph capture to fix throughput regression (#24128 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-09-09 10:38:10 -04:00
Woosuk Kwon	2e5d21378d	Skip MM Encoder for non-first PP ranks (#24387 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-07 09:38:35 -07:00
youkaichao	558f0907dc	[attention][DCP] use AttentionImpl.need_to_return_lse_for_decode (#24372 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-09-07 01:18:59 +00:00
Bangsheng Tang	848562bd49	break execute_model in gpu_model_runner into sub-functions for custom scopes (#24265 ) Co-authored-by: Bangsheng Tang <bangsheng@meta.com>	2025-09-06 14:02:47 -07:00
Andrew Sansom	305a1cc0d2	refactor: Turn GPUModelRunner.inputs_embeds to a CpuGpuBuffer (#24345 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai>	2025-09-05 23:01:23 -07:00
yzds	ac201a0eaf	[Feature] Support Decode Context Parallel (DCP) for MLA (#23734 ) Signed-off-by: hongchao <hongchao@msh.team> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: hongchao <hongchao@msh.team> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-09-06 13:24:05 +08:00
Didier Durand	35bf193864	[Doc]: fix typos in Python comments (#24294 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-05 19:41:12 -07:00
Benjamin Chislett	cee182b297	[Perf][V1] Fully overlap model execution (#23569 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-09-05 18:20:17 -07:00
liuzhenwei	e599e2c65e	[XPU][P/D] Add XPU support in NixlConnector (#22436 ) Signed-off-by: zhenwei <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2025-09-04 21:03:12 -07:00
co63oc	1bd007f234	fix some typos (#24071 ) Signed-off-by: co63oc <co63oc@users.noreply.github.com>	2025-09-02 20:44:50 -07:00

1 2 3 4 5 ...

360 Commits