Gregory Shtrasberg
dd0b749bd1
RC related config changes
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-12-22 17:21:46 +00:00
Harry Mellor
72506c9834
Check for truthy rope_parameters not the existence of it ( #30983 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
(cherry picked from commit 19c583398aec0ef2b9fe42ba020bc3c39e7e001f)
2025-12-18 14:07:04 -08:00
Isotr0py
b2eb84de77
[Bugfix] Remove tile_size=64 for mm_prefix triton attention ( #30973 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
(cherry picked from commit d2dc5dfc6ecafbd3d725c1c42dd019db2b1efd30)
2025-12-18 14:06:49 -08:00
sarathc-cerebras
ac43367ced
adds jais 2 support ( #30188 )
...
Signed-off-by: sarathc-cerebras <sarath.chandran@cerebras.net>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
(cherry picked from commit 28d15ab56bd9d3fd17010bc4abaeec06988f7887)
2025-12-18 14:06:33 -08:00
Yifan Qiao
30fe765e9f
[Fix][FlexAttention] return max logical block index to handle reused blocks ( #30915 )
...
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
(cherry picked from commit 11a89cf95caaec8dec13fab1e8e3d64c9a852a08)
2025-12-18 14:06:17 -08:00
Lucas Wilkinson
2c0ee0fde8
[BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) ( #30910 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit 30bb19a760d6d5e8c69b3a4c78c9cb7430872a61)
2025-12-17 23:56:41 -08:00
Isotr0py
55f1fc1b1b
[v1] Add PrefixLM support to TritonAttention backend ( #30386 )
...
(cherry picked from commit 74a1ac38b00a8cf502db085d1bbd77712cf47e41)
2025-12-17 19:57:52 -08:00
Varun Sundar Rabindranath
17f3988094
[BugFix] Workspace allocation during profile run : DeepEPHighThroughput + DeepGEMM ( #30899 )
...
(cherry picked from commit e3fc374a9a69dddb16885d810f1e28d3fdd39ebd)
2025-12-17 19:57:33 -08:00
Nicolò Lucchesi
682c38583c
[CI][Bugfix] Fix flaky tests/entrypoints/openai/test_audio.py::test_chat_streaming_audio ( #30878 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
(cherry picked from commit 9ca8cb38fd68142627c9649756f1ddc5432c8b19)
2025-12-17 19:57:15 -08:00
Yan Ma
f124b56786
[XPU] fix broken fp8 online quantization for XPU platform ( #30831 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com>
(cherry picked from commit 4f735babb7353987137b85ec0465e594e9ed1384)
2025-12-17 00:30:39 -08:00
Li, Jiang
d78e128b8b
[Bugfix][CPU] Fix CPU backend ROPE dispatch for VL models ( #30829 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit 0cd5353644d3d045ab33c7e8e19c182bfd7db911)
2025-12-17 00:18:02 -08:00
Lucas Wilkinson
761b730dcb
[BugFix] Fix memory spike in workspace allocation ( #30744 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
(cherry picked from commit 00a8d7628c202f580d5230eaa7fe94338a0549f5)
2025-12-17 00:17:31 -08:00
TJian
f34eca5f01
[ROCm] [Bugfix] Fix torch sdpa hallucination ( #30789 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
(cherry picked from commit 2410132bb1f9faa5b252fad3f2b83dc926946b08)
2025-12-16 17:16:25 -08:00
Wentao Ye
4cd332f3cf
[CI] Skip ci failure test ( #30804 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
(cherry picked from commit b6ec077e058e15e5b853793924e6643ec6c579aa)
2025-12-16 17:16:08 -08:00
Roger Wang
16484d394c
[Core][MM] Optimize encoder cache manager by operating with embeddings only ( #30475 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Sun Kim <sunytokki@gmail.com>
(cherry picked from commit f5f51e5931ffd99afe69696b60765b88d3eb13f2)
2025-12-16 17:15:49 -08:00
Isotr0py
e397bd6592
[CI/Build] Skip broken ViT backend functionality test tempoarily ( #30782 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
(cherry picked from commit 4de08ad698674560be7abebd9437d698d1216872)
2025-12-16 17:15:26 -08:00
Isotr0py
6a88d590bb
[Bugfix] Fix broken ViT attention selection for Blackwell device ( #30731 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
(cherry picked from commit e94384bbadbaf99dea24c4af4de6a8c897f830e7)
2025-12-16 17:13:54 -08:00
Shanshan Shen
ad8c073131
[CustomOp] Extract ApplyRotaryEmb as CustomOp and unify the dispatch logic ( #29873 )
...
Signed-off-by: shen-shanshan <467638484@qq.com>
Co-authored-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
(cherry picked from commit 3bd9c491583d45ae9f3f24b10e99626f502014b4)
2025-12-16 17:13:23 -08:00
Kevin Musgrave
c01d589813
[Benchmarks] auto_tune.sh: Use hostname variable for server requests ( #30529 )
...
Signed-off-by: Kevin Musgrave <kevin.musgrave@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-15 22:00:29 +00:00
Matthew Bonanni
60dbf7d8f1
Update batch invariant to use attention config ( #30704 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-15 15:24:16 -05:00
Michael Goin
a450c64a30
[Bugfix] Fail instead of ignoring when CompilationConfig gets invalid args ( #30708 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-15 20:18:02 +00:00
Fadi Arafeh
b2191abdca
[docs][fix] Update Arm CPU vLLM wheel installation docs ( #30594 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-12-15 19:46:25 +00:00
Matthew Bonanni
51e5b3e3c4
[Bugfix] Fix ViT with FlashAttention on ROCm ( #30703 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-15 19:45:21 +00:00
Isotr0py
ec154c36ee
[Platform] Refactor Platform attention backend selection to avoid breakpoint for OOT platform ( #30212 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-15 17:36:07 +00:00
Harry Mellor
970713d4a4
Remove SkipValidation from ModelConfig ( #30695 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-15 17:34:08 +00:00
mondaylord
17fec3af09
[Bugfix] Fix missing first token in tool calls during reasoning-to-tool transition ( #30671 )
...
Signed-off-by: mondaylord <20212010046@fudan.edu.cn>
2025-12-15 16:13:37 +00:00
yjc9696
855b101d75
[Frontend] add tools for dsv32 developer role ( #30040 )
...
Signed-off-by: pridejcyang <pridejcyang@tencent.com>
Co-authored-by: pridejcyang <pridejcyang@tencent.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-15 15:08:47 +00:00
Robert Shaw
d0502b4928
[MoE][Refactor 1/N] Separate Online Quantization ( #30627 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-12-15 06:54:53 -08:00
Max Hu
3f175f18a2
[Bugfix] Fix multimodal configuration for Qwen3VL MOE model ( #30670 )
...
Signed-off-by: Max Hu <hyoung2991@gmail.com>
2025-12-15 14:06:01 +00:00
Cyrus Leung
ed586e7724
[Refactor] [3/N] Move tool parser tests and run on CPU ( #30693 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-15 13:45:36 +00:00
Chauncey
2a1776b7ac
[Refactor] [2/N] Move tool parsers into the vLLM main directory ( #30675 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-15 12:54:52 +00:00
Nicolò Lucchesi
185c22bf2f
[Misc][Hybrid allocator + kv connector] Optionally enable hybrid allocator + KV cache connector ( #29805 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-15 11:17:58 +00:00
duke
e4806d973a
[BugFix] Add embed_input_ids method to make QWenLMHeadModel a vllm model ( #30674 )
...
Signed-off-by: root <iwzbi@zju.edu.cn>
Co-authored-by: root <iwzbi@zju.edu.cn>
2025-12-15 10:38:29 +00:00
wang.yuqi
4429d934de
[Model] Automatic conversion of TokenClassification model ( #30666 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-15 08:13:00 +00:00
ゆり
33278073d6
typing: Add type hints to TurnMetrics class in context.py ( #30552 )
...
Co-authored-by: zkexorability <zkexorability@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 23:00:39 -08:00
汪志鹏
1adeb3b84c
[New Model] BAGEL support (AR only) ( #28439 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-15 14:58:23 +08:00
Kunshang Ji
e3a1cd1c59
[XPU] fix Dockerfile.xpu, avoid wheel conflicts ( #30662 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-12-15 13:32:06 +08:00
Wentao Ye
3778673ea8
[Feat] Refactor for parallel_config in FusedMoEModularKernel ( #30282 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-12-15 04:21:36 +00:00
Seokhyun An
b337647aa0
[Bugfix] Drop empty tool_calls lists to keep assistant replies in chat template ( #30648 )
...
Signed-off-by: Seokhyun An <iamseokhyun@gmail.com>
2025-12-15 04:21:12 +00:00
Jee Jee Li
a524d1ba0a
[Bugfix] Fix deepseek_v32 tokenizer_mode ( #30658 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-15 04:20:31 +00:00
Shanshan Shen
87b4d1557d
[CustomOp][MM] Extract MMEncoderAttention as CustomOp and replace the backend of QwenVisionAttention with it. ( #30125 )
...
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-12-15 11:13:32 +08:00
Wenqi Glantz
84e23d103d
additional protection for CVE-2025-62164 ( #30649 )
...
Signed-off-by: Wenqi Glantz <wglantz@nvidia.com>
2025-12-15 03:07:10 +00:00
Shanshan Shen
738648fb81
[CustomOp] Support object-level enable for CustomOp ( #30547 )
...
Signed-off-by: shen-shanshan <467638484@qq.com>
2025-12-15 11:02:09 +08:00
Boyuan Feng
917fdae5b2
[Log] Skip piecewise cudagraph warn when using full cudagraph ( #30657 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-12-15 02:49:45 +00:00
Robert Shaw
e2ed238885
Revert "[Fix]Load kv-cache dtype from hf_quant_config.json automatically" ( #30653 )
2025-12-14 19:33:41 -05:00
Or Ozeri
174e39ead7
CPU KV Offloading: Use more CUDA streams ( #29013 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-12-14 23:50:45 +00:00
RioS
9ccbf6b692
[responsesAPI]add extra body parameters ( #30532 )
...
Signed-off-by: Ri0S <aa248424@gmail.com>
2025-12-14 19:25:45 +00:00
Chendi.Xue
ae2e503dda
[NIXL][BUG FIX] Fix a bug for PD with host_buffer after merging 29665 ( #30420 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
2025-12-14 15:38:28 +00:00
Tsukasa OI
9e33a1a75b
[Model][Quantization] Override HF defaults to GGUF ones (incl. Qwen3 MoE) ( #30118 )
...
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
2025-12-14 15:01:42 +00:00
Vensen
add4b0ca44
[Bugfix][benchmarks] Fix input token calculation for rerank benchmark metrics ( #30596 )
...
Signed-off-by: vensen <vensenmu@gmail.com>
2025-12-14 14:57:15 +00:00