xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-09 07:24:54 +08:00

Author	SHA1	Message	Date
Lucas Wilkinson	dc464a3d39	[BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch (#25505 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-09-23 18:00:29 -06:00
Alexander Matveev	1210e4d95b	[Bugfix] [B200] cutlass_mla - ensure kv_split == 1 for batch size > 1 (#25509 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-09-23 16:57:55 -07:00
Lucas Wilkinson	e0b24ea030	[Perf] Increase default max splits for FA3 full cudagraphs (#25495 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-09-23 16:53:34 -07:00
Juan Villamizar	bde2a1a8a4	[ROCm] Small functional changes for gptoss (#25201 ) Signed-off-by: jpvillam <jpvillam@amd.com> Co-authored-by: jpvillam <jpvillam@amd.com>	2025-09-23 23:39:50 +00:00
Thomas Parnell	5e25b12236	[Kernel] [Mamba] Remove BLOCK_H=1 from list of tuneable configurations for `_chunk_cumsum_fwd_kernel` (#25197 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com>	2025-09-23 23:23:30 +00:00
Corey Lowman	c85d75cf08	Add `VLLM_NVTX_SCOPES_FOR_PROFILING=1` to enable `nvtx.annotate` scopes (#25501 ) Signed-off-by: Corey Lowman <clowman1993@gmail.com>	2025-09-23 22:50:09 +00:00
kourosh hakhamaneshi	abad204be6	[BugFix] Fix OOM in vLLM replicas by ensuring consistent NCCL memory accounting (#25359 ) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>	2025-09-23 15:49:09 -07:00
Michael Goin	7361ab379f	Remove redundant mutates_args and dispatch_key for direct_register_custom_op (#25512 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-23 22:48:40 +00:00
Andrew Xia	95bc60e4cb	[gpt-oss][bugfix] remove logic to require resp_ in ResponseAPI (#25428 ) Signed-off-by: Andrew Xia <axia@meta.com>	2025-09-23 15:46:46 -07:00
Michael Goin	4f2954f724	Fix triton_reshape_and_cache_flash.py triton import (#25522 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-23 15:26:10 -07:00
rouchenzi	eca7be9077	Add VLLM_ENABLE_INDUCTOR_MAX_AUTOTUNE & VLLM_ENABLE_INDUCTOR_COORDINA… (#25493 ) Signed-off-by: rouchenzi <ruochenwen@gmail.com> Signed-off-by: rouchenzi <40842833+rouchenzi@users.noreply.github.com>	2025-09-23 22:17:49 +00:00
Thomas Parnell	969b4da3a6	[V0 Deprecation] Remove placeholder attn (#25510 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-09-23 22:12:14 +00:00
Jialin Ouyang	4f8c4b890a	[Core] Use KVCacheBlock as much as possible instead of dict[block_id, KVCacheBlock] (#24830 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-09-23 15:11:14 -07:00
Isotr0py	ae002924e9	[CI/Build] Fix and re-enable v1 PP test on CI (#25496 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-23 21:58:25 +00:00
Gregory Shtrasberg	690f948e4a	[Bugfix] Fix for the import error from #24588 (#25481 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-09-23 21:31:08 +00:00
Chauncey	08275ec0a2	[Build] Update Xgrammar to 0.1.25 (#25467 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-09-23 21:25:46 +00:00
Alec S	c828d1bf98	[Bugfix] gpt-oss container tool output bug (#25485 ) Signed-off-by: Alec Solder <alecs@fb.com> Co-authored-by: Alec Solder <alecs@fb.com>	2025-09-23 20:43:45 +00:00
Wentao Ye	8b8a8afc89	[CI] Fix Pre-commit Issue (#25497 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-24 04:09:37 +08:00
Ilya Markov	8bdd8b5c51	Enable symmetric memory all reduce by default only enabling for TP (#25070 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-23 15:53:00 -04:00
Michael Goin	a8ffc4f0f2	[Bugfix] Lower gpt-oss max cudagraph size to 992 to be compatible with FA3 (#25508 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-23 12:49:55 -07:00
jiahanc	d5944d5146	[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue (#25406 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-09-23 15:44:35 -04:00
Michael Goin	24fab45d96	[Perf] Change default CUDAGraphMode from PIECEWISE to FULL_AND_PIECEWISE (#25444 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-23 15:29:26 -04:00
ElizaWszola	63400259d0	[Performance] Move apply_w8a8_block_fp8_linear to an op class (#24666 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: ElizaWszola <elizaw.9289@gmail.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com>	2025-09-23 12:03:10 -07:00
Amir Samani	8c1c81a3de	[core] add nccl symmetric memory for all reduce (#24532 ) Signed-off-by: Amir Samani <asamani@nvidia.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-23 14:33:06 -04:00
Hashem Hashemi	a3a7828010	[ROCm] Add skinny gemm bias support for dtypes fp16,bf16,fp8 (#24988 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com> Signed-off-by: Hashem Hashemi <159079214+amd-hhashemi@users.noreply.github.com>	2025-09-23 14:31:45 -04:00
Jee Jee Li	5abb117901	[Core] Ensure LoRA linear respect the base_layer's tp_size and tp_rank (#25487 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-23 18:19:25 +00:00
Ekagra Ranjan	867ecdd1c8	[Spec Decode][CI] Add e2e test for `examples/spec_decode.py` and prevent breaking Acceptance Length (#24531 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-09-23 10:46:40 -07:00
Weida Hong	24e8222745	[Misc] Reduce initialization time of auto_tune (#23682 ) Signed-off-by: Weida Hong <wdhongtw@google.com>	2025-09-23 17:34:58 +00:00
Burkhard Ringlein	100b630a60	[V1][Kernel] Add triton implementation for `reshape_and_cache_flash` (#24503 ) Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com> Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-23 12:52:40 -04:00
Ming Yang	527821d191	Use macro guard CUDA functions for back compatibility in grouped_topk_kernel.cu (#25346 ) Signed-off-by: Ming Yang <minos.future@gmail.com> Signed-off-by: Rahul Tuli <rtuli@redhat.com> Co-authored-by: Rahul Tuli <rtuli@redhat.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com> Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-09-23 09:45:39 -07:00
Wentao Ye	846197f505	[Log] Optimize kv cache memory log from Bytes to GiB (#25204 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-23 12:44:37 -04:00
rivos-shreeasish	2357480b1a	[BugFix] Fix UB in per_token_group_quant.cu (#24913 ) Signed-off-by: Shreeasish Kumar <shreeasish@rivosinc.com>	2025-09-23 09:14:22 -07:00
bnellnm	f11e3c516b	[Kernels] Support blocked fp8 quantization for compressed tensors MoE (#25219 ) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-23 16:11:34 +00:00
Harry Mellor	875d6def90	Add backward compatibility for `GuidedDecodingParams` (#25422 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-23 17:07:30 +01:00
Lucas Wilkinson	cc1dc7ed6d	[Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support (#24845 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-09-23 16:02:10 +00:00
Thomas Parnell	a903669e10	[V1] Remove V0 code paths for Hybrid models (#25400 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-09-23 08:26:13 -07:00
Michael Goin	2c58742dff	[UX] Change kv-cache-memory log level to debug (#25479 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-09-23 08:01:24 -07:00
Fanli Lin	4c966e440e	[XPU] Fix MOE DP accuracy issue on XPU (#25465 )	2025-09-23 14:32:57 +00:00
Peter Pan	da5e7e4329	[Docs] NixlConnector quickstart guide (#24249 ) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>	2025-09-23 14:23:22 +00:00
Chauncey	f05a4f0e34	[P/D] Support NIXL connector to disconnect during a clean shutdown (#24423 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-09-23 16:08:02 +02:00
Joel	61d1b35561	[BugFix] Register expert_map as named buffer for wake_up and sleep (#25458 ) Signed-off-by: wuxibin <wuxibin@bytedance.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-09-23 21:49:13 +08:00
Isotr0py	b6a136b58c	[CI/Build] Fix disabled v1 attention backend selection test (#25471 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-23 13:05:46 +00:00
vllmellm	0d9fe260dd	[docs] Benchmark Serving Incorrect Arg (#25474 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-09-23 06:05:11 -07:00
Jee Jee Li	273690a50a	[Core] Optimize LoRA weight loading (#25403 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-23 18:19:45 +08:00
Isotr0py	231c2c63e4	[Bugfix] Fix idefics3 `tie_word_embeddings` (#25454 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-23 10:06:48 +00:00
Andreas Hartel	4322c553a6	[Test]: Hermes tool parser stream output error in Qwen3 case (#25203 ) Signed-off-by: Andreas Hartel <andreas.hartel@aleph-alpha.com>	2025-09-23 17:56:31 +08:00
Cyrus Leung	babad6e5dd	[Misc] Move DP for ViT code inside model executor dir (#25459 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-23 09:20:52 +00:00
Zhikaiiii	9383cd6f10	[Frontend] Add a new xml-based tool parser for qwen3-coder (#25028 ) Signed-off-by: Zhikaiiii <1658973216@qq.com>	2025-09-23 16:07:27 +08:00
Ming Yang	ba8d2165b6	Handle triton kernel import exception (#25319 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-09-23 00:56:00 -07:00
Cyrus Leung	c98be0a232	[Model] Enable DP for ViT in Qwen2-VL (#25445 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-23 05:17:10 +00:00

1 2 3 4 5 ...

9814 Commits