Noa Neria
6366c098d7
Validating Runai Model Streamer Integration with S3 Object Storage ( #29320 )
...
Signed-off-by: Noa Neria <noa@run.ai>
2025-12-04 18:04:43 +08:00
dtc
842aba501d
[P/D] Introduce Mooncake Transfer Engine as kv_connector ( #24718 )
...
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: dtc <dtcccc@linux.alibaba.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
2025-12-04 09:51:36 +00:00
rasmith
f2f4cea6cc
[CI/Build][AMD] Skip test on test_hybrid_attention_mamba_tensor_shapes on ROCm, requires FLASHINFER ( #29995 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-04 09:30:22 +00:00
Arpit Khandelwal
dfdda96747
[Core] Remove forced None assignment for deprecated PassConfig flags ( #29994 )
...
Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-04 09:15:04 +00:00
Xu Wenqing
ffdd18111b
Add DeepSeek-V3.2 tool parser. ( #29848 )
...
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
2025-12-04 08:46:34 +00:00
Ye (Charlotte) Qi
b8a6ae4158
[ROCm] add fallback for aiter fp8 decode mla ( #30005 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-12-04 08:45:57 +00:00
Mark McLoughlin
899e2ef558
[Core] Fix standalone runs of test_reset_prefix_cache_e2e ( #29899 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-12-04 16:22:03 +08:00
Cyrus Leung
68eb5c8d97
[Misc] Move functions into PoolingMetadata ( #30027 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 08:21:19 +00:00
Micah Williamson
5430e110c0
[CI][AMD] Match Main CI Behavior By Skipping test_eplb_spec_decode In AMD CI ( #30006 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-04 16:20:54 +08:00
TJian
3f1b03739a
[ROCm] [Bugfix] compute_attn_mask_seqlen for qwen3 omni ( #29974 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-12-04 08:20:24 +00:00
Charlie Fu
9aa33a74b0
[Rocm][CI] Fix test_speculator_eagle3 by skipping the CompressedTensorw4a16 Model ( #30001 )
...
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>
2025-12-04 07:52:28 +00:00
CYJiang
fd68e909db
[docs] Remove _total from counter metrics names ( #30028 )
...
In Prometheus Counters always expose their actual numeric value with a metric name that ends in _total. We should document the base name, as this what appears in the get_metrics() API.
Signed-off-by: CYJiang <86391540+googs1025@users.noreply.github.com>
2025-12-04 07:46:15 +00:00
daniel-salib
404fc4bfc0
[Frontend] refactor harmony utils output message parsing ( #29820 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com>
2025-12-04 15:36:57 +08:00
Chauncey
82a64b3d8f
[Bugfix] fixed deepseekv32 tool calling error ( #30025 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-04 15:12:12 +08:00
Cyrus Leung
9ae2f60374
[Misc] Various cleanups for MM input processing ( #29970 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 06:22:20 +00:00
Jianwei Mao
80f8af4b2f
Fix error while downloading dependencies for CPU backend ( #29797 )
...
Signed-off-by: Jianwei Mao <maojianwei2016@126.com>
2025-12-04 06:04:44 +00:00
Kuntai Du
8aaa81b35f
[KVConnector] remove unused code (the model aware kv ops class) ( #29709 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
2025-12-04 06:00:52 +00:00
Benjamin Bartels
fca3f46658
[Frontend] Fixes anthropic /v1/messages streaming not containing input_tokens on first chunk ( #29971 )
...
Signed-off-by: bbartels <benjamin@bartels.dev>
2025-12-04 05:50:27 +00:00
gausah01
28097d5638
[Bugfix][CPU] Fix CPU KV cache fallback memory allocation ( #29604 )
...
Signed-off-by: Gauri Sahnan <gauri.sahnan@arm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2025-12-04 13:01:15 +08:00
Jee Jee Li
dd38ba3a26
[Bugfix] Fix adapter_enabled IMA ( #29977 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-04 12:51:15 +08:00
Li Wang
5f91cdda75
[Misc] Add docker build env for Ascend NPU ( #30015 )
...
Signed-off-by: wangli <wangli858794774@gmail.com>
2025-12-03 19:53:00 -08:00
Iceber Gu
33a3d6c798
fix LoRA-related examples ( #29956 )
...
Signed-off-by: Iceber Gu <caiwei95@hotmail.com>
2025-12-04 11:48:30 +08:00
Zhewen Li
c493b9d092
[CI/Build] Add MM code path to Examples Test ( #29986 )
...
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-12-03 19:21:45 -08:00
Xieyang Xu
ad32e3e19c
enable multi-node in external launcher mode ( #29833 )
2025-12-03 17:02:02 -08:00
Shengqi Chen
1109f98288
[CI] fix docker image build by specifying merge-base commit id when downloading pre-compiled wheels ( #29930 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
2025-12-03 14:08:19 -08:00
Elizabeth Thomas
b5407869c8
[Bugfix] Respect VLLM_CONFIGURE_LOGGING value ( #28671 )
...
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Jane Xu <janeyx@meta.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Johnny Yang <johnnyyang@google.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: bruceszchen <bruceszchen@tencent.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Johnny Yang <24908445+jcyang43@users.noreply.github.com>
2025-12-03 22:00:52 +00:00
bnellnm
2902c34826
[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton ( #29929 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-12-03 20:49:00 +00:00
Wentao Ye
ac1886588f
[CI] Fix re import error ( #29973 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-03 15:16:54 -05:00
Yongtao Huang
2fc5d6e0d7
Fix LLMEngine.del dp_group cleanup condition ( #29954 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com>
2025-12-03 12:14:44 -08:00
elvischenv
afe9eb408e
[Bugfix] Fix flashinfer ar+norm kernel not available issue ( #29960 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-12-03 18:50:53 +00:00
Varun Sundar Rabindranath
19bee6d12d
[Performance][DP/EP] Add silu_mul_per_token_group_quant_fp8_colmajor kernel ( #29470 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-12-03 18:04:59 +00:00
avigny
dd5d1ef780
[Bugfix] Mistral tool parser streaming update ( #19425 )
...
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Jeff Cook <jeff@jeffcook.io>
Co-authored-by: sfbemerk <benjaminmerkel@mail.de>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-03 17:45:31 +00:00
Micah Williamson
d1f7392c5f
[ROCm][CI] Fix v1/logits_processors failure on ROCm ( #29927 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-04 01:17:07 +08:00
Yu Jiaqi
9ae3c55b10
SigLIP example add chat_template ( #29902 )
...
Signed-off-by: piood <2477084691@qq.com>
2025-12-03 16:12:58 +00:00
Lumis Chen
9bcf92295a
[Core] Add xxHash as a high-performance hash option for accelerating prefix caching ( #29163 )
...
Signed-off-by: LuminolT <lumischen01@gmail.com>
Signed-off-by: Lumis Chen <lumischen01@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-12-03 16:06:57 +00:00
rasmith
5aa9b09040
[CI/Build][AMD] Skip test_shared_storage_connector_hashes in test_shared_storage_connector.py due to hipErrorLaunchFailure when calling .cpu() ( #29839 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-03 22:56:35 +08:00
ioana ghiban
1bb17ecb39
[CPU Backend] [Doc]: Update Installation Docs for CPUs ( #29868 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
2025-12-03 13:33:50 +00:00
ioana ghiban
15b1511a15
[GPU Backend] [Doc]: Remove duplicate statements on missing GPU wheels. ( #29962 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
2025-12-03 12:56:47 +00:00
Chauncey
b78772c433
[Frontend] supports deepseekv32 chat template ( #29837 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-03 20:53:44 +08:00
Amr Mahdi
f5d3d93c40
[docker] Build CUDA kernels in separate Docker stage for faster rebuilds ( #29452 )
...
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
2025-12-03 11:41:53 +00:00
Fadi Arafeh
78f4bb0ba8
[DOC] Add Arm to list of compute resouces providers ( #29894 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-12-03 11:36:58 +00:00
HDCharles
b294e28db2
[refactor] CTMoEMethods to use QuantizationArgs ( #28871 )
...
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-03 11:00:56 +00:00
Roger Wang
787b84a9fc
[Bugfix] Follow-up fix on MediaWithBytes ( #29951 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-12-03 10:42:49 +00:00
Tsukasa OI
42c1949643
[Bugfix][Quantization] Support BF16 tensors on GGUF ( #29948 )
...
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
2025-12-03 10:33:46 +00:00
Isotr0py
cc4e296ea6
[CI/Build] Avoid duplicate empty inputs test for common multimodal generation tests ( #29907 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-03 10:27:36 +00:00
Isotr0py
a21cd9ed23
[Bugfix] Fix incorrect image_grid_thw rank for HunyuanOCR from missing merge_by_field_config=True ( #29950 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-03 10:05:10 +00:00
WeiQing Chen
7fe9c1a223
[CI] Add Async Eplb nightly CI tests ( #29385 )
...
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-03 09:51:08 +00:00
Chauncey
3f42b05fbc
[Refactor] [1/N] to simplify the vLLM serving architecture ( #28040 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-03 01:26:39 -08:00
Yong Hoon Shin
69520bc695
Add logging for cudagraph related info ( #29825 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-12-03 01:01:48 -08:00
Andrew Xia
3a7751485b
[responsesAPI] support input output messages for non harmony models ( #29549 )
...
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-02 23:59:23 -08:00