11928 Commits

Author SHA1 Message Date
Arpit Khandelwal
dfdda96747
[Core] Remove forced None assignment for deprecated PassConfig flags (#29994)
Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-04 09:15:04 +00:00
Xu Wenqing
ffdd18111b
Add DeepSeek-V3.2 tool parser. (#29848)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
2025-12-04 08:46:34 +00:00
Ye (Charlotte) Qi
b8a6ae4158
[ROCm] add fallback for aiter fp8 decode mla (#30005)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-12-04 08:45:57 +00:00
Mark McLoughlin
899e2ef558
[Core] Fix standalone runs of test_reset_prefix_cache_e2e (#29899)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-12-04 16:22:03 +08:00
Cyrus Leung
68eb5c8d97
[Misc] Move functions into PoolingMetadata (#30027)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 08:21:19 +00:00
Micah Williamson
5430e110c0
[CI][AMD] Match Main CI Behavior By Skipping test_eplb_spec_decode In AMD CI (#30006)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-04 16:20:54 +08:00
TJian
3f1b03739a
[ROCm] [Bugfix] compute_attn_mask_seqlen for qwen3 omni (#29974)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-12-04 08:20:24 +00:00
Charlie Fu
9aa33a74b0
[Rocm][CI] Fix test_speculator_eagle3 by skipping the CompressedTensorw4a16 Model (#30001)
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>
2025-12-04 07:52:28 +00:00
CYJiang
fd68e909db
[docs] Remove _total from counter metrics names (#30028)
In Prometheus Counters always expose their actual numeric value with a metric name that ends in _total. We should document the base name, as this what appears in the get_metrics() API.

Signed-off-by: CYJiang <86391540+googs1025@users.noreply.github.com>
2025-12-04 07:46:15 +00:00
daniel-salib
404fc4bfc0
[Frontend] refactor harmony utils output message parsing (#29820)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
2025-12-04 15:36:57 +08:00
Chauncey
82a64b3d8f
[Bugfix] fixed deepseekv32 tool calling error (#30025)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-04 15:12:12 +08:00
Cyrus Leung
9ae2f60374
[Misc] Various cleanups for MM input processing (#29970)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 06:22:20 +00:00
Jianwei Mao
80f8af4b2f
Fix error while downloading dependencies for CPU backend (#29797)
Signed-off-by: Jianwei Mao <maojianwei2016@126.com>
2025-12-04 06:04:44 +00:00
Kuntai Du
8aaa81b35f
[KVConnector] remove unused code (the model aware kv ops class) (#29709)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
2025-12-04 06:00:52 +00:00
Benjamin Bartels
fca3f46658
[Frontend] Fixes anthropic /v1/messages streaming not containing input_tokens on first chunk (#29971)
Signed-off-by: bbartels <benjamin@bartels.dev>
2025-12-04 05:50:27 +00:00
gausah01
28097d5638
[Bugfix][CPU] Fix CPU KV cache fallback memory allocation (#29604)
Signed-off-by: Gauri Sahnan <gauri.sahnan@arm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2025-12-04 13:01:15 +08:00
Jee Jee Li
dd38ba3a26
[Bugfix] Fix adapter_enabled IMA (#29977)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-04 12:51:15 +08:00
Li Wang
5f91cdda75
[Misc] Add docker build env for Ascend NPU (#30015)
Signed-off-by: wangli <wangli858794774@gmail.com>
2025-12-03 19:53:00 -08:00
Iceber Gu
33a3d6c798
fix LoRA-related examples (#29956)
Signed-off-by: Iceber Gu <caiwei95@hotmail.com>
2025-12-04 11:48:30 +08:00
Zhewen Li
c493b9d092
[CI/Build] Add MM code path to Examples Test (#29986)
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-12-03 19:21:45 -08:00
Xieyang Xu
ad32e3e19c
enable multi-node in external launcher mode (#29833) 2025-12-03 17:02:02 -08:00
Shengqi Chen
1109f98288
[CI] fix docker image build by specifying merge-base commit id when downloading pre-compiled wheels (#29930)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
2025-12-03 14:08:19 -08:00
Elizabeth Thomas
b5407869c8
[Bugfix] Respect VLLM_CONFIGURE_LOGGING value (#28671)
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Jane Xu <janeyx@meta.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Johnny Yang <johnnyyang@google.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: bruceszchen <bruceszchen@tencent.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Johnny Yang <24908445+jcyang43@users.noreply.github.com>
2025-12-03 22:00:52 +00:00
bnellnm
2902c34826
[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton (#29929)
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-12-03 20:49:00 +00:00
Wentao Ye
ac1886588f
[CI] Fix re import error (#29973)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-03 15:16:54 -05:00
Yongtao Huang
2fc5d6e0d7
Fix LLMEngine.del dp_group cleanup condition (#29954)
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com>
2025-12-03 12:14:44 -08:00
elvischenv
afe9eb408e
[Bugfix] Fix flashinfer ar+norm kernel not available issue (#29960)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-12-03 18:50:53 +00:00
Varun Sundar Rabindranath
19bee6d12d
[Performance][DP/EP] Add silu_mul_per_token_group_quant_fp8_colmajor kernel (#29470)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-12-03 18:04:59 +00:00
avigny
dd5d1ef780
[Bugfix] Mistral tool parser streaming update (#19425)
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Jeff Cook <jeff@jeffcook.io>
Co-authored-by: sfbemerk <benjaminmerkel@mail.de>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-03 17:45:31 +00:00
Micah Williamson
d1f7392c5f
[ROCm][CI] Fix v1/logits_processors failure on ROCm (#29927)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-04 01:17:07 +08:00
Yu Jiaqi
9ae3c55b10
SigLIP example add chat_template (#29902)
Signed-off-by: piood <2477084691@qq.com>
2025-12-03 16:12:58 +00:00
Lumis Chen
9bcf92295a
[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163)
Signed-off-by: LuminolT <lumischen01@gmail.com>
Signed-off-by: Lumis Chen <lumischen01@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-12-03 16:06:57 +00:00
rasmith
5aa9b09040
[CI/Build][AMD] Skip test_shared_storage_connector_hashes in test_shared_storage_connector.py due to hipErrorLaunchFailure when calling .cpu() (#29839)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-03 22:56:35 +08:00
ioana ghiban
1bb17ecb39
[CPU Backend] [Doc]: Update Installation Docs for CPUs (#29868)
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
2025-12-03 13:33:50 +00:00
ioana ghiban
15b1511a15
[GPU Backend] [Doc]: Remove duplicate statements on missing GPU wheels. (#29962)
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
2025-12-03 12:56:47 +00:00
Chauncey
b78772c433
[Frontend] supports deepseekv32 chat template (#29837)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-03 20:53:44 +08:00
Amr Mahdi
f5d3d93c40
[docker] Build CUDA kernels in separate Docker stage for faster rebuilds (#29452)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
2025-12-03 11:41:53 +00:00
Fadi Arafeh
78f4bb0ba8
[DOC] Add Arm to list of compute resouces providers (#29894)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-12-03 11:36:58 +00:00
HDCharles
b294e28db2
[refactor] CTMoEMethods to use QuantizationArgs (#28871)
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-03 11:00:56 +00:00
Roger Wang
787b84a9fc
[Bugfix] Follow-up fix on MediaWithBytes (#29951)
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-12-03 10:42:49 +00:00
Tsukasa OI
42c1949643
[Bugfix][Quantization] Support BF16 tensors on GGUF (#29948)
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
2025-12-03 10:33:46 +00:00
Isotr0py
cc4e296ea6
[CI/Build] Avoid duplicate empty inputs test for common multimodal generation tests (#29907)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-03 10:27:36 +00:00
Isotr0py
a21cd9ed23
[Bugfix] Fix incorrect image_grid_thw rank for HunyuanOCR from missing merge_by_field_config=True (#29950)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-03 10:05:10 +00:00
WeiQing Chen
7fe9c1a223
[CI] Add Async Eplb nightly CI tests (#29385)
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-03 09:51:08 +00:00
Chauncey
3f42b05fbc
[Refactor] [1/N] to simplify the vLLM serving architecture (#28040)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-03 01:26:39 -08:00
Yong Hoon Shin
69520bc695
Add logging for cudagraph related info (#29825)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-12-03 01:01:48 -08:00
Andrew Xia
3a7751485b
[responsesAPI] support input output messages for non harmony models (#29549)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-02 23:59:23 -08:00
Cyrus Leung
bbfb55c29e
[Misc] Allow fetch_* utils to access local files by default (#29932)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-03 15:49:34 +08:00
JackieWu
0bec63fa31
[BugFix] fix imgs_pos in hunyuan_vl (#29879)
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-03 06:20:37 +00:00
elvischenv
c719c40540
[Bugfix] Defunctionalize TRTLLM AR+Norm op for avoiding extra clone kernel before it (#29631)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-12-03 05:15:50 +00:00