Cyrus Leung
63c56cbb25
[Misc] Factor out common _apply_feature_select_strategy ( #26003 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Wenlong Wang
25e5b9ccec
[BugFix][MM] Fix Nonetype error when video is cache in qwen2.5-omni-thinker ( #26004 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
nadathurv
b9ed8c9679
[Doc] updating torch.compile doc link ( #25989 )
...
Signed-off-by: nadathurv <work.vnadathur@gmail.com>
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Co-authored-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Lucia Fang
9506409fc6
[Misc]allow disable pynccl ( #25421 )
...
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Harry Mellor
fda819837e
Update to Transformers v4.56.2 ( #24638 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Lucas Wilkinson
7c795fdf41
[BugFix] Fix default kv-cache-dtype default for DeepseekV3.2 ( #25988 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Zhewen Li
6444f65a2b
[Bugfix] Fix __syncwarp on ROCM ( #25996 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Roger Wang
4c094b339e
[MM] Add text-only mode for Qwen3-VL ( #26000 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Param
cd0bbf5de2
Fix INT8 quantization error on Blackwell GPUs (SM100+) ( #25935 )
...
Signed-off-by: padg9912 <phone.and.desktop@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Wentao Ye
2b6b859916
[Log] Optimize Log for FP8MOE ( #25709 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Salvatore Cena
04cb503fda
Update launch_bounds_utils.h for correct compile on Multiple Cuda Arch - PTXAS out of range Warning ( #25843 )
...
Signed-off-by: Salvatore Cena <cena@cenas.it>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Lucia Fang
d437ba32fd
[Model] MTP fallback to eager for DeepSeek v32 ( #25982 )
...
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Lucas Wilkinson
e734a2a085
[Misc] Make EP kernels install script support uv ( #25785 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Andrew Xia
fd56f2e644
[gpt-oss] use vLLM instead of openai types for streaming ( #25186 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Harry Mellor
1690954497
[Docs] Remove API Reference from search index ( #25949 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Harry Mellor
b3e1846da6
Add explicit pooling classes for the Transformers backend ( #25322 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
David Ben-David
8328d39d40
[V1] [P/D] Add Support for KV Load Failure Recovery ( #19330 )
...
Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Jee Jee Li
ef318228e7
[Bench] Add DeepSeekV32 to MoE benchmark ( #25962 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
cjackal
8ecccdd15f
[Llama4] [multimodal] Fix misplaced dtype cast of cos_sin_cache in Llama4VisionRotaryEmbedding ( #25889 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Or Ozeri
bb2e04e41e
OffloadingConnector: Fix GPU block tracking bug ( #25856 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
bnellnm
6083b4d926
[Docs] Add moe kernel features doc ( #25297 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Cyrus Leung
493acdb7e2
[Doc] Improve MM Pooling model documentation ( #25966 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Wentao Ye
3c75d3b00c
[Bug] Fix AttributeError: 'QKVParallelLinear' object has no attribute 'orig_dtype' ( #25958 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
youkaichao
206ab1f0df
[bugfix][deepseek] fix flashmla kernel selection ( #25956 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Cyrus Leung
e33579cd96
[Bugfix] Token type and position embeddings fail to be applied to inputs_embeds ( #25922 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Pavani Majety
8c52fccb1a
[Bugfix] Fix accuracy issue of TRTLLM FP8 MOE and improve logging ( #25895 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Anion
ea6144a019
[Bugfix][Model] Fix inference for Hunyuan dense models ( #25354 )
...
Signed-off-by: anion <1005128408@qq.com>
Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Sergio Paniego Blanco
b6ea29b721
Add Hugging Face Inference Endpoints guide to Deployment docs ( #25886 )
...
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Asaf Joseph Gardin
d9f8ded136
[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 ( #25858 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
ihb2032
02776c0386
[Fix] Improve CPU backend compatibility for RISC-V ( #25816 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: ihb2032 <1355790728@qq.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Reza Barazesh
8914d52869
[CI] Move applicable tests to CPU ( #24080 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Nicolò Lucchesi
bf8bb7e250
[NIXL] Add support for MLA caches with different latent dim ( #25902 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Lehua Ding
eea2536a35
[perf] Use CPU tensor to reduce GPU->CPU sync ( #25884 )
...
Signed-off-by: Lehua Ding <lehuading@tencent.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Cyrus Leung
a1898466a6
[Model] Move vision_feature_select_strategy into resolve_visual_encoder_outputs ( #25938 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
CSWYF3634076
9dce93e07c
[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 ( #25936 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Sergio Paniego Blanco
c0734fc51a
Updated TRL integration docs ( #25684 )
...
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
a120092009
034f3a4980
[Doc] Add Cambricon MLU support ( #25942 )
...
Signed-off-by: a120092009 <zhaoty0121@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Yongye Zhu
0230cd0afb
[New Model] DeepSeek-V3.2 (Rebased to Main) ( #25896 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Simon Danielsson
da71651386
[Bugfix]: Clean up chunked prefill logging when using whisper ( #25075 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Zhou Jiahao
0da98ff2eb
[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect logical_not ( #25925 )
...
Signed-off-by: zhoukz <me@zhoukz.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
acisseJZhong
db4a03e2e2
[BugFix] Pass config_format via try_get_generation_config ( #25912 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Lucas Wilkinson
e165f980d9
[BugFix] Fix DP/EP hang ( #25906 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Harry Mellor
ea7cf8db35
MoveVllmConfig from config/__init__.py to config/vllm.py ( #25271 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Zhuohan Li
1108ffb3e6
[Benchmark] Support benchmark throughput for external launcher DP ( #25913 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Wentao Ye
0c7cc69e29
[Bug] Fix Weight Loading for Block FP8 Cutlass SM90 ( #25909 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Andrew Sansom
6941d53c0c
Test Prompt Embeds/LoRA compatibility and Enable LoRA Support for OPT Models ( #25717 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Aaron Pham
97f1312f8c
[V0 Deprecation] Remove vllm.worker and update according imports ( #25901 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Nicolò Lucchesi
09b01cd395
[NIXL] Increase default KV block eviction timeout on P ( #25897 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Zhuohan Li
4deb9c88ca
[Doc] Polish example for torchrun dp ( #25899 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Thomas Parnell
b7973eabe5
[Kernel] Chunk-aligned mamba2 ( #24683 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00