xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-18 18:27:15 +08:00

Author	SHA1	Message	Date
Cyrus Leung	1e4ecca1d0	[V0 Deprecation] Remove `VLLM_USE_V1` from tests (#26341 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 15:42:31 +00:00
Grant Holmes (Ren)	d100d78eb3	Optimize KV cache distribution for asymmetric pipeline parallelism (#25164 ) Signed-off-by: gholmes829 <g.holmes429@gmail.com>	2025-10-07 09:20:30 +00:00
Sage Moore	2111b4643c	[Core] Simplify the Dp padding/should ubatch coordination logic (#25768 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-10-07 01:57:49 +00:00
Yannick Schnider	6431be808f	[Tests] conftest: Extending VllmRunner and HfRunner to accept token_ids as input (#26295 ) Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com> Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-06 17:19:34 +00:00
Matthew Bonanni	4727a8afa7	[Attention] Remove unused reorder_batch method (#24463 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-06 13:13:39 -04:00
Cyrus Leung	391612e78b	[Frontend] Consolidate tokenizer init code (#26276 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-06 09:34:52 +00:00
Harry Mellor	6c04638214	Fix per file ruff ignores related to line length (#26262 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-06 05:12:40 +00:00
Harry Mellor	1c0c68202c	Fix per file ruff ignores related to typing (#26254 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 16:37:55 +00:00
Harry Mellor	557b2e961d	Remove all cases of `fmt: on/off` (#26253 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 09:18:14 -07:00
Harry Mellor	4e256cadc2	Remove all references to `yapf` as it's no longer used (#26251 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 09:18:11 -07:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
22quinn	78c1d5bfd2	[Easy] Add str repr for IterationStats (#26232 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-10-05 05:00:21 +00:00
Nicolò Lucchesi	2a6dc67eb5	[Bugfix] Fix `_reqs_to_process` leak on abort (#26012 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-10-04 11:39:31 +00:00
Yannick Schnider	f05fea1f5e	[Core] Enable decode of context length equal to max model length (#26168 ) Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>	2025-10-04 09:59:26 +00:00
Cyrus Leung	1838cd4860	Revert "Add batch invariant kernel override for FlashInfer backend [2/n]" (#26220 )	2025-10-04 02:45:08 -07:00
Huamin Li	7d6b03381e	[CI Failure] fix_test_auto_prefix_cache_support (#26053 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-10-04 02:44:49 -07:00
Bram Wasti	2f7dbc9b42	Add batch invariant kernel override for FlashInfer backend [2/n] (#25769 ) Signed-off-by: Bram Wasti <bwasti@meta.com> Signed-off-by: Bram Wasti <bwasti@fb.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-10-03 19:49:30 -07:00
Xiang Si	adae0c1f43	[CI/Build] do not enforce precompilation on tpu ci tests (#25992 ) Signed-off-by: Xiang Si <sixiang@google.com>	2025-10-03 13:38:42 +00:00
Yannick Schnider	8ee846c27c	[Bugfix] Re-enable prefill of max model length (#24446 ) Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>	2025-10-03 14:13:34 +02:00
Nicolò Lucchesi	48f309029a	[NIXL][Misc] Expose metrics from NIXL for logging to CLI (#25388 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-10-03 10:47:59 +00:00
Matthew Bonanni	2aaa423842	[Attention] Move Backend enum into registry (#25893 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-02 20:32:24 -07:00
Chen Zhang	1e50f1be70	[Deepseek v3.2] Support indexer prefill chunking (#25999 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-10-02 10:29:12 -07:00
Lucia Fang	001e50c92c	[Model] MTP fallback to eager for DeepSeek v32 (#25982 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-10-01 01:53:22 +00:00
David Ben-David	9a9f48dff7	[V1] [P/D] Add Support for KV Load Failure Recovery (#19330 ) Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com>	2025-09-30 14:57:08 -07:00
Reza Barazesh	bc546f76a1	[CI] Move applicable tests to CPU (#24080 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-30 14:45:20 +01:00
Nicolò Lucchesi	80608ba5af	[NIXL] Add support for MLA caches with different latent dim (#25902 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-09-30 12:18:29 +00:00
Yongye Zhu	fa7e254a7f	[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: Lucia Fang <fanglu@meta.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Co-authored-by: Lucia Fang <fanglu@meta.com> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Xiaozhu Meng <mxz297@gmail.com> Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-09-30 17:14:41 +08:00
Simon Danielsson	e23cacda35	[Bugfix]: Clean up chunked prefill logging when using whisper (#25075 ) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>	2025-09-30 08:17:49 +00:00
Chenxi Yang	d0d138bc55	[Nixl][P/D] Add cuda2cpu support (HD->DH transfer) (#24690 ) Signed-off-by: Chenxi Yang <cxyang@fb.com> Co-authored-by: Chenxi Yang <cxyang@fb.com>	2025-09-29 14:31:51 +00:00
Cyrus Leung	cd87bfbf37	[CI/Build] Reorganize root-level V1 tests (#25767 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-27 13:51:15 +08:00
WeiQing Chen	f1d53d150c	[Multimodal][Speculative Decoding]Eagle Eagle3 mm support, enablement on qwen2.5vl (#22872 ) Signed-off-by: Junhong <liujunhong11@huawei.com> Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Co-authored-by: Junhong <liujunhong11@huawei.com> Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>	2025-09-27 03:35:47 +00:00
Jonas M. Kübler	6f5c0931c1	[Spec decode] automatically disable mm for text-only draft models (#25667 ) Signed-off-by: Jonas Kuebler <kuebj@amazon.com>	2025-09-27 08:10:21 +08:00
Bram Wasti	dc48ba0c75	Kernel-override Determinism [1/n] (#25603 ) Signed-off-by: Bram Wasti <bwasti@meta.com>	2025-09-26 16:59:09 -07:00
qizixi	c70ac4b8ff	[spec decode] Consolidate speculative decode method name for MTP (#25232 ) Signed-off-by: zixi-qi <qizixi@meta.com>	2025-09-26 22:27:05 +00:00
fhl2000	f075693da7	[V1] address post issues related to #20059 (part 1) (#23046 ) Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-26 15:58:19 -04:00
Seiji Eicher	8d52f2b3a7	[ray][metrics] Replace ':' with '_' for OpenTelemetry compatibility in Ray (#25439 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com> Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com>	2025-09-26 09:43:30 -07:00
Cyrus Leung	db1e42f627	[CI/Build] Fix some V1 tests not being run (#25569 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-26 20:52:36 +08:00
wang.yuqi	fe6b19c314	[Bugfix] Properly abort pooling request. (#25734 ) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-09-26 05:47:34 -07:00
Chauncey	2827b3f4a3	[CI] Fix test_shared_storage_connector_hashes (#25748 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-09-26 20:46:17 +08:00
Ekagra Ranjan	e71b8e210d	[Spec Decode] Add Batch Parallel Ngram. Upto 8x lower overhead. (#24986 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-09-25 15:22:03 -07:00
Matthew Bonanni	3468f17ebe	[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>	2025-09-25 17:37:50 +00:00
Cyrus Leung	2f17117606	[mypy] Fix wrong type annotations related to tuple (#25660 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-25 13:00:45 +00:00
Jonas M. Kübler	58c360d9be	[Bug] fix import and unit test (#25558 ) Signed-off-by: Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com>	2025-09-24 10:17:59 +00:00
Chengji Yao	190c45a6af	[TPU][Bugfix] fix the missing apply_model in tpu worker (#25526 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-09-24 05:18:08 +00:00
Benjamin Chislett	c30b405b8f	[Spec Decode] Enable FlashInfer Spec Decoding (#25196 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Co-authored-by: lhsjohn <huashuoli@tencent.com>	2025-09-23 22:29:58 -04:00
Doug Smith	7ad5e50adf	Improve output when failing json.loads() on structured output test (#25483 ) Signed-off-by: dougbtv <dosmith@redhat.com>	2025-09-23 18:03:31 -06:00
kourosh hakhamaneshi	abad204be6	[BugFix] Fix OOM in vLLM replicas by ensuring consistent NCCL memory accounting (#25359 ) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>	2025-09-23 15:49:09 -07:00
Jialin Ouyang	4f8c4b890a	[Core] Use KVCacheBlock as much as possible instead of dict[block_id, KVCacheBlock] (#24830 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-09-23 15:11:14 -07:00
jiahanc	d5944d5146	[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue (#25406 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-09-23 15:44:35 -04:00
Harry Mellor	875d6def90	Add backward compatibility for `GuidedDecodingParams` (#25422 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-23 17:07:30 +01:00

1 2 3 4 5 ...

570 Commits