Matthew Bonanni
7ba32aa60b
[Attention][FlashInfer] Enable FP8 FlashInfer (TRTLLM) MLA decode ( #24705 )
...
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-09-12 15:45:53 -06:00
Alexandre Marques
c89ed8de43
Invert pattern order to make sure that out_proj layers are identified ( #24781 )
...
Signed-off-by: Alexandre Marques <almarque@redhat.com>
2025-09-12 14:45:29 -07:00
Wentao Ye
3beadc2f25
[Compilation Bug] Fix Inductor Graph Output with Shape Issue ( #24772 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-12 21:23:05 +00:00
Clayton Coleman
bc636f21a6
[Benchmark] Allow arbitrary headers to be passed to benchmarked endpoints ( #23937 )
...
Signed-off-by: Clayton Coleman <smarterclayton@gmail.com>
2025-09-12 13:57:53 -07:00
Zhewen Li
017354c0ef
[CI] Trigger BC Linter when labels are added/removed ( #24767 )
2025-09-12 11:44:36 -07:00
Cyrus Leung
010acc6e1e
[Bugfix] Fix incompatibility between #20452 and #24548 ( #24754 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-12 11:17:29 -07:00
afeldman-nm
c8c42597ab
[CI] Speed up model unit tests in CI ( #24253 )
...
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
2025-09-12 10:36:50 -07:00
Michael Goin
9d2a44606d
[UX] Remove AsyncLLM torch profiler disabled log ( #24609 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-12 10:08:44 -07:00
Samit
f17c075884
[Model] Switch to Fused RMSNorm in GLM-4.1V model ( #24733 )
...
Signed-off-by: SamitHuang <285365963@qq.com>
2025-09-12 09:12:23 -07:00
Lukas Geiger
b0d1213ac3
[Models] Prevent CUDA sync in Qwen2.5-VL ( #24741 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-09-12 16:03:55 +00:00
Lukas Geiger
57f94e88ea
[Models] Optimise and simplify _validate_and_reshape_mm_tensor ( #24742 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-09-12 15:37:37 +00:00
Kebe
684b6870e1
[Bugfix][Frontend] Fix --enable-log-outputs does not match the documentation ( #24626 )
...
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-09-12 08:01:24 -07:00
dongluw
a5b84f1cbf
[Core] Shared memory based object store for Multimodal data caching and IPC ( #20452 )
...
Signed-off-by: donglu <donglu@cohere.com>
2025-09-12 07:54:17 -07:00
Elvir Crnčević
9f04d9d55f
[Qwen3-Next] MoE configs for H100 TP=1,2 and TP2/EP ( #24739 )
...
Signed-off-by: elvircrn <elvircrn@gmail.com>
2025-09-12 07:54:04 -07:00
Yan Ma
4d7c1d531b
[Bugfix] Fix MRoPE dispatch on XPU ( #24724 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com>
2025-09-12 21:43:56 +08:00
Hyogeun Oh (오효근)
41f17bf290
[Docs] Fix warnings in mkdocs build (continued) ( #24740 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-09-12 06:43:15 -07:00
Didier Durand
bcb06d7baf
[Doc]: fix typos in various files ( #24726 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-12 06:43:12 -07:00
Flora Feng
0377802c20
[Multimodal] Remove legacy multimodal fields in favor of MultiModalFeatureSpec ( #24548 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2025-09-12 21:42:23 +08:00
Wenlong Wang
72fc8aa412
[Multi Modal] Add FA3 in VIT ( #24347 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-09-12 21:27:24 +08:00
youkaichao
fdb09c77d6
[sleep mode] save memory for on-the-fly quantization ( #24731 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-09-12 11:25:19 +00:00
Ignacio Sica
7a1c4025f1
[Kernel] [CPU] refactor cpu_attn.py:_run_sdpa_forward for better memory access ( #24701 )
...
Signed-off-by: ignaciosica <mignacio.sica@gmail.com>
2025-09-12 19:23:07 +08:00
Jee Jee Li
60a0951924
[Bugfix] Fix BNB name match ( #24735 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-12 11:12:01 +00:00
Chen Zhang
64d90c3e4f
[Misc][gpt-oss] Add gpt-oss label to PRs that mention harmony or related to builtin tool call ( #24717 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-09-12 18:57:07 +08:00
Li, Jiang
59d5d2c736
[CI/Build] Skip prompt embeddings tests on V1-only CPU backend ( #24721 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-12 18:51:01 +08:00
wang.yuqi
d21a36f5f9
[CI] Add ci_envs for convenient local testing ( #24630 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-09-12 08:52:25 +00:00
Chen Zhang
561a0baee0
[CI] Fix flaky test v1/worker/test_gpu_model_runner.py::test_kv_cache_stride_order ( #24640 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-09-12 07:49:09 +00:00
Nick Hill
f592b3174b
[BugFix] Fix Qwen3-Next PP ( #24709 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-09-11 23:35:04 -07:00
Li, Jiang
7920de0a2a
[Bugfix] Fix MRoPE dispatch on CPU ( #24712 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-12 04:56:31 +00:00
Andrew Sansom
ddcec289c7
Fix implementation divergence for BLOOM models between vLLM and HuggingFace when using prompt embeds ( #24686 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
2025-09-12 04:35:48 +00:00
Maximilien de Bayser
e090b7b45b
Enable conversion of multimodal models to pooling tasks ( #24451 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-09-12 03:30:41 +00:00
Gregory Shtrasberg
6a50eaa0d3
[DOCs] Update ROCm installation docs section ( #24691 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-09-11 20:02:53 -07:00
Jee Jee Li
12a8414d81
[Qwen3-Next] MoE configs for H20 TP=1,2,4,8 ( #24707 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-12 10:06:26 +08:00
Tao He
880c741bb6
[Bugfix] fixes the causal_conv1d_update kernel update non-speculative decoding cases ( #24680 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
v0.10.2rc2
2025-09-11 18:16:43 -07:00
RichardoMu
40b6c9122b
[V1] feat:add engine v1 tracing ( #20372 )
...
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Signed-off-by: Ye Zhang <zhysishu@gmail.com>
Signed-off-by: RichardoMu <44485717+RichardoMrMu@users.noreply.github.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Mu Huai <tianbowen.tbw@antgroup.com>
Co-authored-by: Ye Zhang <zhysishu@gmail.com>
Co-authored-by: Benjamin Bartels <benjamin@bartels.dev>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: 瑜琮 <ly186375@antfin.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-09-11 17:10:39 -07:00
Lucas Wilkinson
2e6bc46821
[Startup] Make DeepGEMM warmup scale with max-num-batched-tokens ( #24693 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-09-11 20:10:19 -04:00
Wentao Ye
fcba05c435
[Bug] Fix Layer weight_block_size Assertion Issue ( #24674 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-11 19:47:59 -04:00
Zazzle516
7a30fa8708
[Doc] Clarify cudagraph capture size logic and default behavior in scheduler ( #18698 )
...
Signed-off-by: Zazzle516 <2405677060@qq.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-11 23:18:09 +00:00
Chen Zhang
f82f7a8990
[Qwen3-Next] MOE configs for H100 TP4 ( #24699 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-09-11 15:45:52 -07:00
Michael Goin
c3aea10dc8
[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel ( #23280 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-11 15:43:14 -07:00
Matthew Bonanni
d4fd2768ef
[Bugfix][Attention] Fix FlashInfer MLA block size logic ( #24692 )
...
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-09-11 22:39:42 +00:00
Vadim Gimpelson
7a70a71892
[Qwen3-Next] Add B200 MoE configs for Qwen3-next ( #24698 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2025-09-11 15:34:58 -07:00
Zhewen Li
7d4651997a
[CI/Build] Add bc-linter to vLLM CI ( #21234 )
...
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-09-11 15:34:36 -07:00
Woosuk Kwon
569bf1c9c0
[Qwen3-Next] MoE configs for H200 TP=1,2,4 ( #24695 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-11 14:38:16 -07:00
Wentao Ye
1ec20355f5
[Bugfix] Set VLLM_ALLREDUCE_USE_SYMM_MEM default to False ( #24696 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-11 14:32:27 -07:00
Xiaozhu Meng
e42af78b18
[flashinfer] [kernel] support for fp8 kv cache for trtllm prefill attention ( #24197 )
...
Signed-off-by: Xiaozhu <mxz297@gmail.com>
2025-09-11 14:20:09 -07:00
Duncan Moss
074854b24f
[Kernel][B200] mxfp4 fused cutlass moe ( #23696 )
...
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-11 17:04:56 -04:00
Andrew Xia
79ac59f32e
Update Spec Decode metrics to include drafted and accepted token throughput ( #24127 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-11 19:58:43 +00:00
Nick Hill
b971f91504
[BugFix] Fix tokenize asyncio task leak ( #24677 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-09-11 19:44:04 +00:00
Woosuk Kwon
c733bd5e87
[Qwen3-Next] Add MoE Config for H200 ( #24688 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-11 12:40:15 -07:00
Wentao Ye
a892b259b4
[Doc] Remove Useless Comments ( #24687 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-11 12:25:47 -07:00