xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-09 07:29:16 +08:00

Author	SHA1	Message	Date
Nick Hill	169313b9f8	[Misc] Make handling of SamplingParams clearer in n>1 case (#26032 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-01 19:31:39 -07:00
Gregory Shtrasberg	0b018d8baf	[ROCm][Bugfix] Add missing parameter to ROCm backend (#26029 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-10-01 19:23:14 -07:00
Wentao Ye	da554f932e	[Bug] Fix Negative Cuda Memory Usage (#25683 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-01 18:16:26 -04:00
Huamin Li	c36f0aa300	Fix test_mamba_ssm_ssd.py due to missing _query_start_loc_to_chunk_indices_offsets (#25995 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-10-01 18:18:36 +00:00
Kenichi Maehashi	3b7c20a6b5	[Bugfix] Apply same sampling parameters for both `n=1` and `n>1` (#26005 ) Signed-off-by: Kenichi Maehashi <maehashi@preferred.jp>	2025-10-01 14:37:35 +00:00
Lucia Fang	001e50c92c	[Model] MTP fallback to eager for DeepSeek v32 (#25982 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-10-01 01:53:22 +00:00
David Ben-David	9a9f48dff7	[V1] [P/D] Add Support for KV Load Failure Recovery (#19330 ) Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com>	2025-09-30 14:57:08 -07:00
Lehua Ding	e184c9c510	[perf] Use CPU tensor to reduce GPU->CPU sync (#25884 ) Signed-off-by: Lehua Ding <lehuading@tencent.com>	2025-09-30 19:51:16 +08:00
Yongye Zhu	fa7e254a7f	[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: Lucia Fang <fanglu@meta.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Co-authored-by: Lucia Fang <fanglu@meta.com> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Xiaozhu Meng <mxz297@gmail.com> Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-09-30 17:14:41 +08:00
Lucas Wilkinson	23194d83e8	[BugFix] Fix DP/EP hang (#25906 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-09-30 04:18:59 +00:00
Aaron Pham	6a113d9aed	[V0 Deprecation] Remove `vllm.worker` and update according imports (#25901 )	2025-09-29 23:26:11 +00:00
Thomas Parnell	fea3e476aa	[Kernel] Chunk-aligned mamba2 (#24683 )	2025-09-29 23:18:25 +02:00
Adrian Abeyta	c42ff4f4fd	[BugFix][torch.compile] KV scale calculation issues with FP8 quantization (#25513 ) Signed-off-by: adabeyta <aabeyta@redhat.com>	2025-09-29 15:52:04 -04:00
Chenxi Yang	d0d138bc55	[Nixl][P/D] Add cuda2cpu support (HD->DH transfer) (#24690 ) Signed-off-by: Chenxi Yang <cxyang@fb.com> Co-authored-by: Chenxi Yang <cxyang@fb.com>	2025-09-29 14:31:51 +00:00
Kunshang Ji	143844fa43	[XPU]Fix xpu spec decoding UTs, avoid using cuda graph (#25847 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-09-29 05:15:10 +00:00
Robert Shaw	9b44a7d926	[P/D] NIXL Updates (#25844 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Chenheli Hua <huachenheli@outlook.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2025-09-29 04:46:30 +00:00
Michael Goin	0307428d65	Remove redundant cudagraph dispatcher warning (#25841 )	2025-09-28 17:12:42 -04:00
Roger Wang	69311446ba	[MM] Optimize memory profiling for scattered multimodal embeddings (#25810 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-09-28 02:17:58 +00:00
Jialin Ouyang	c216119d64	[Core] GC Debug callback (#24829 ) Signed-off-by: Jialin Ouyang <jialino@meta.com> Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Co-authored-by: Jialin Ouyang <jialino@meta.com>	2025-09-27 17:53:31 +00:00
Patrick C. Toulme	b65e56babe	[Core] Refactor self.model() to call a helper for subclassing. (#25084 ) Signed-off-by: Patrick Toulme <ptoulme@meta.com> Signed-off-by: Patrick Toulme <pctoulme+1@gmail.com>	2025-09-27 08:40:59 -07:00
Cyrus Leung	27d7638b94	[Bugfix] Merge MM embeddings by index instead of token IDs (#16229 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-09-27 08:15:12 +00:00
WeiQing Chen	f1d53d150c	[Multimodal][Speculative Decoding]Eagle Eagle3 mm support, enablement on qwen2.5vl (#22872 ) Signed-off-by: Junhong <liujunhong11@huawei.com> Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Co-authored-by: Junhong <liujunhong11@huawei.com> Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>	2025-09-27 03:35:47 +00:00
Zhuohan Li	8bf8f45822	[Core] Don't count preempted tokens in prefix cache hit rate (#25787 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-09-27 00:16:40 +00:00
Jonas M. Kübler	6f5c0931c1	[Spec decode] automatically disable mm for text-only draft models (#25667 ) Signed-off-by: Jonas Kuebler <kuebj@amazon.com>	2025-09-27 08:10:21 +08:00
Naman Lalit	4e33a7ea85	[Bugfix] Optimize CpuGpuBuffer initialization (#25447 ) Signed-off-by: Naman Lalit <nl2688@nyu.edu>	2025-09-27 08:07:36 +08:00
Bram Wasti	dc48ba0c75	Kernel-override Determinism [1/n] (#25603 ) Signed-off-by: Bram Wasti <bwasti@meta.com>	2025-09-26 16:59:09 -07:00
Sage Moore	4778b42660	Reduce the Cuda Graph memory footprint when running with DBO (#25779 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-09-26 22:29:56 +00:00
qizixi	c70ac4b8ff	[spec decode] Consolidate speculative decode method name for MTP (#25232 ) Signed-off-by: zixi-qi <qizixi@meta.com>	2025-09-26 22:27:05 +00:00
fhl2000	f075693da7	[V1] address post issues related to #20059 (part 1) (#23046 ) Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-26 15:58:19 -04:00
Seiji Eicher	8d52f2b3a7	[ray][metrics] Replace ':' with '_' for OpenTelemetry compatibility in Ray (#25439 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com> Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com>	2025-09-26 09:43:30 -07:00
Lucas Wilkinson	984d18498a	[BugFix] Fix using `dbo_decode_token_threshold` always (and ignoring `dbo_prefill_token_threshold`) (#25622 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-09-26 16:22:49 +00:00
wang.yuqi	fe6b19c314	[Bugfix] Properly abort pooling request. (#25734 ) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-09-26 05:47:34 -07:00
Chih-Chieh Yang	2b6b1d7809	[Model] Mamba2 varlen refactor (#21467 ) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com> Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com>	2025-09-26 11:31:14 +00:00
Icey	dd70437a4f	Remove cuda hard-code in compute_causal_conv1d_metadata (#25555 ) Signed-off-by: Icey <1790571317@qq.com>	2025-09-26 01:19:20 -07:00
Tao He	99b3a504c5	[Qwen3-Next][GDN] fixes cuda graph capturing bug in GDN metadata and a stride bug in causal_conv_1d. (#25743 ) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>	2025-09-26 01:18:58 -07:00
Andrew Sansom	d48f4d6daf	perf: Avoid copying inputs_embeds tensors to GPU unless prompt_embeds is enabled (#25739 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai>	2025-09-26 01:18:09 -07:00
Andrew Sansom	e84e0735c7	fix: revert cast to cpu in `MsgpackEncoder._encode_tensor` to avoid hidden performance regressions (#25738 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai>	2025-09-26 01:18:05 -07:00
yitingdc	3edf87d25f	[CI/Build] fix doc build warning: Failed to get 'name: description' pair (#25733 ) Signed-off-by: yiting.jiang <yiting.jiang@daocloud.io>	2025-09-26 01:18:02 -07:00
Eugene Khvedchenya	392edee34a	EVS Support (Video tokens pruning) (#22980 ) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-09-26 11:54:54 +08:00
Nick Hill	8b77328ffe	[Misc] Don't log shm dequeue delay warning on worker side (#25720 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-26 01:08:30 +00:00
Zhuohan Li	8c435c9bce	[Core] Enable command line logging for LLMEngine (#25610 ) Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com> Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-09-25 15:31:17 -07:00
Ekagra Ranjan	e71b8e210d	[Spec Decode] Add Batch Parallel Ngram. Upto 8x lower overhead. (#24986 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-09-25 15:22:03 -07:00
Matthew Bonanni	3468f17ebe	[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>	2025-09-25 17:37:50 +00:00
Cyrus Leung	0ea80c87d9	[Model] Define `merge_by_field_config` MM interface (#25676 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-25 17:13:07 +00:00
Lucas Wilkinson	13cc7f5370	[BugFix] Fix DBO hang (#25625 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-09-25 17:04:48 +00:00
AlonKejzman	e04a1b6b21	[BUGFIX] Fix crash in Eagle Speculative Decoding models when exceedin… (#24662 ) Signed-off-by: AlonKejzman <alonkeizman@gmail.com>	2025-09-25 15:40:14 +00:00
Tyler Michael Smith	2e5df88c92	[Logging] Remove TORCH_NCCL_AVOID_RECORD_STREAMS to squash a warning (#25532 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-09-25 15:16:06 +00:00
Russell Bryant	532a6cfccb	[ux] Switch a warning to debug about a pytorch fallback (#23750 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-09-25 14:38:16 +00:00
Li, Jiang	eb32335e35	[CPU] update torch 2.8 and fix missing fields in TorchSDPAMetadata (#25652 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-09-25 13:29:11 +00:00
Jonas M. Kübler	69a8c8e99a	[torch.compile] Make Query Quantization Fusable (#24914 ) Signed-off-by: Jonas Kuebler <kuebj@amazon.com>	2025-09-25 09:25:12 -04:00

1 2 3 4 5 ...

1336 Commits