10112 Commits

Author SHA1 Message Date
Chendi.Xue
e45271b09c [BugFix][QWEN-VL]fix wrong apply_rotary_emb_torch selection introduced by #24642 (#26123)
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Jun Jiang
84135b1489 Fix undefined symbol: cutlass_moe_mm_sm100 (#26098)
Signed-off-by: Jun Jiang <jasl9187@hotmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Cyrus Leung
611c23b68f [Renderer] Move Processor out of LLMEngine (#26165)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Aleksandr Samarin
c40c0d9c82 [Model] Fixed stream generator for gpt-oss + spec-decoding (#26027)
Signed-off-by: Aleksandr Samarin <astrlrd@nebius.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Xiang Si
d8b1f9ccc3 [CI/Build] do not enforce precompilation on tpu ci tests (#25992)
Signed-off-by: Xiang Si <sixiang@google.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
whx
fac9b430ec [Model] Supplement to PR 24862: Pass param prefix to LLMHead (#25805)
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Paul Pak
c6f384dafd [backends][short_conv] CUDA graph piecewise edits (#24215)
Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Yannick Schnider
7faf51f1cc [Bugfix] Re-enable prefill of max model length (#24446)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Yang Liu
ff1daf6c8a [Renderer] Move Processor out of AsyncLLM (#24138)
Signed-off-by: Yang <lymailforjob@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Sage Moore
f376868620 Quick fix for IMA with the Prefix Prefill kernel during graph capture (#25983)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Egor
564233d550 [Doc] Fixed shape description for fused_batched_moe.py (#25668)
Signed-off-by: Egor <e.a.krivov@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Wenlong Wang
2bcc745042 [Multi Modal] Configurable MM Profiling (#25631)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
kyt
fa29d31f0d [openai] Fix missing tool usage check (system message) (#24768)
Signed-off-by: kyt <eluban4532@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Nicolò Lucchesi
2168fc8fae [NIXL][Misc] Expose metrics from NIXL for logging to CLI (#25388)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Thomas Parnell
8d332b3cf6 [CI] Fix distributed hybrid tests in CI (#26155)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Yannick Schnider
c634415273 [test utils] correct wrong typing (#26159)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Cyrus Leung
c81dc099a3 [Model] Use merge_by_field_config for MM models (InternVL family) (#26153)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
HUIJONG JEONG
edaae1825f add(v1): RequestStatesStats to RequestOutput (#24947)
Signed-off-by: huijjj <huijong.jeong@squeezebits.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Jiangyun Zhu
5b80f22087 [Perf] Optimize reshape_and_cache CUDA Kernel (#25955)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Liu-congo <1502632128@qq.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Cyrus Leung
ae03f4c010 [Input] Remove unused prompt field (#26097)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Varun Sundar Rabindranath
7e4b1861c3 [Misc] Remove typing.List (#26150)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
ahao-anyscale
d628fa1e56 [BUG] Reorder model config creation (#26124)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Harry Mellor
6b12b2ee38 FusedMoE support for the Transformers backend (#22650)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Cyrus Leung
bbeace233b [Model] Use merge_by_field_config for MM models (G) (#26117)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Zhewen Li
09b1a5676d [Bugfix] Fix import gemm_afp4wfp4 failure on AMD (#26068)
Signed-off-by: zhewenli <zhewenli@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
TJian
f35f896e3a [ROCm] [VL] [Bugfix] Fix vit flash attn dispatcher logic for ROCm (#26104)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Tyler Michael Smith
218349d760 [Build/CI] Revert back to Ubuntu 20.04, install python 3.12 with uv (#26103)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Andrew Xia
79b2fe7f19 [gpt-oss] disable tool server initialization if no tool in request (#25790)
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Benjamin Chislett
56d0073f2a [Bug]: Limit num_reqs in dummy_run when max_num_seqs is small (#26144)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Matthew Bonanni
a06bb9bf36 [DeepSeek] Improve performance of DS MLA cache kernel (#26132)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Michael Goin
173c8a9520 [CI/Build] Conditionally register cutlass_fp4_group_mm to fix building on Hopper (#26138)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Matthew Bonanni
2ea7d48656 [Attention] Move Backend enum into registry (#25893)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Ekagra Ranjan
8db7b7f39c [Bug][Benchmark] Fix duplicate req in oversampling (#26140)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Wentao Ye
587b30c571 [Log] Optimize DeepGEMM Missing Log (#26106)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Michael Goin
0c76bb2de1 [Bugfix] Disable cascade attention with FlashInfer (#26130)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Matthew Bonanni
72c5dd0310 Fix MTP with deepep_low_latency (#25904)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
ElizaWszola
abc55b1fe5 [Perf] Fix and reapply move apply w8a8 block fp8 linear to class (#25696)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Chen Zhang
d737c66b95 [Mamba][KVCacheManager] Simplify kv cache manage logic for mamba + MTP (#25119)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Ekagra Ranjan
da3a188bdb EAGLE 3: Fix preamble so that measured speedup over Eagle 1 becomes 32% instead of 5% on MTBench (#25916)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:58 -07:00
Chen Zhang
77e958752b [Deepseek v3.2] Support indexer prefill chunking (#25999)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Chenheli Hua
c5880cfa4c [Small] Prevent bypassing media domain restriction via HTTP redirects (#26035)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Lucas Wilkinson
01888b5cbf [BugFix] Fix FI accuracy issue when used for MLA prefill (#26063)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Cyrus Leung
fa179abde3 [CI/Build] Replace vllm.entrypoints.openai.api_server entrypoint with vllm serve command (#25967)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Michael Goin
5c8a4a2208 [CI] Add Blackwell DeepSeek FP8 FlashInfer MoE tests (#26040)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
vllmellm
06d102ecc8 [Qwen][ROCm] Flash Attention Rotary Embeddings (#24642)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
leo-pony
422f2cca4b [Platform][CI] Added OOT platform interface e2e test that running on Ascend NPU (#25470)
Signed-off-by: leo-pony <nengjunma@outlook.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Cyrus Leung
3884dce376 [Model] Use merge_by_field_config for MM models (D-F) (#26076)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Cyrus Leung
00c0b25e82 [Model] Use merge_by_field_config for MM models (A-C) (#26073)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Lucas Wilkinson
0655b90d80 [FA/Chore] Bump vllm-flash-attention (#25537)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Thomas Parnell
83fa298682 Change size of single CUDA graph for CI to 4 (#26089)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00