Harry Mellor
b3e1846da6
Add explicit pooling classes for the Transformers backend ( #25322 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
David Ben-David
8328d39d40
[V1] [P/D] Add Support for KV Load Failure Recovery ( #19330 )
...
Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Jee Jee Li
ef318228e7
[Bench] Add DeepSeekV32 to MoE benchmark ( #25962 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
cjackal
8ecccdd15f
[Llama4] [multimodal] Fix misplaced dtype cast of cos_sin_cache in Llama4VisionRotaryEmbedding ( #25889 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Or Ozeri
bb2e04e41e
OffloadingConnector: Fix GPU block tracking bug ( #25856 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
bnellnm
6083b4d926
[Docs] Add moe kernel features doc ( #25297 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Cyrus Leung
493acdb7e2
[Doc] Improve MM Pooling model documentation ( #25966 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Wentao Ye
3c75d3b00c
[Bug] Fix AttributeError: 'QKVParallelLinear' object has no attribute 'orig_dtype' ( #25958 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
youkaichao
206ab1f0df
[bugfix][deepseek] fix flashmla kernel selection ( #25956 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Cyrus Leung
e33579cd96
[Bugfix] Token type and position embeddings fail to be applied to inputs_embeds ( #25922 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Pavani Majety
8c52fccb1a
[Bugfix] Fix accuracy issue of TRTLLM FP8 MOE and improve logging ( #25895 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Anion
ea6144a019
[Bugfix][Model] Fix inference for Hunyuan dense models ( #25354 )
...
Signed-off-by: anion <1005128408@qq.com>
Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Sergio Paniego Blanco
b6ea29b721
Add Hugging Face Inference Endpoints guide to Deployment docs ( #25886 )
...
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Asaf Joseph Gardin
d9f8ded136
[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 ( #25858 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
ihb2032
02776c0386
[Fix] Improve CPU backend compatibility for RISC-V ( #25816 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: ihb2032 <1355790728@qq.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Reza Barazesh
8914d52869
[CI] Move applicable tests to CPU ( #24080 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Nicolò Lucchesi
bf8bb7e250
[NIXL] Add support for MLA caches with different latent dim ( #25902 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Lehua Ding
eea2536a35
[perf] Use CPU tensor to reduce GPU->CPU sync ( #25884 )
...
Signed-off-by: Lehua Ding <lehuading@tencent.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
Cyrus Leung
a1898466a6
[Model] Move vision_feature_select_strategy into resolve_visual_encoder_outputs ( #25938 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:57 -07:00
CSWYF3634076
9dce93e07c
[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 ( #25936 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Sergio Paniego Blanco
c0734fc51a
Updated TRL integration docs ( #25684 )
...
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
a120092009
034f3a4980
[Doc] Add Cambricon MLU support ( #25942 )
...
Signed-off-by: a120092009 <zhaoty0121@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Yongye Zhu
0230cd0afb
[New Model] DeepSeek-V3.2 (Rebased to Main) ( #25896 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Simon Danielsson
da71651386
[Bugfix]: Clean up chunked prefill logging when using whisper ( #25075 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Zhou Jiahao
0da98ff2eb
[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect logical_not ( #25925 )
...
Signed-off-by: zhoukz <me@zhoukz.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
acisseJZhong
db4a03e2e2
[BugFix] Pass config_format via try_get_generation_config ( #25912 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Lucas Wilkinson
e165f980d9
[BugFix] Fix DP/EP hang ( #25906 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Harry Mellor
ea7cf8db35
MoveVllmConfig from config/__init__.py to config/vllm.py ( #25271 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Zhuohan Li
1108ffb3e6
[Benchmark] Support benchmark throughput for external launcher DP ( #25913 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Wentao Ye
0c7cc69e29
[Bug] Fix Weight Loading for Block FP8 Cutlass SM90 ( #25909 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Andrew Sansom
6941d53c0c
Test Prompt Embeds/LoRA compatibility and Enable LoRA Support for OPT Models ( #25717 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Aaron Pham
97f1312f8c
[V0 Deprecation] Remove vllm.worker and update according imports ( #25901 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Nicolò Lucchesi
09b01cd395
[NIXL] Increase default KV block eviction timeout on P ( #25897 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Zhuohan Li
4deb9c88ca
[Doc] Polish example for torchrun dp ( #25899 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Thomas Parnell
b7973eabe5
[Kernel] Chunk-aligned mamba2 ( #24683 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Gregory Shtrasberg
e7203c2338
[Bugfix][ROCm] Fixing trying to import non-existent symbols from libnccl.so ( #25605 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Naman Lalit
ae0c35923f
[Doc] Add documentation for vLLM continuous benchmarking and profiling ( #25819 )
...
Signed-off-by: Naman Lalit <nl2688@nyu.edu>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Adrian Abeyta
c692506e10
[BugFix][torch.compile] KV scale calculation issues with FP8 quantization ( #25513 )
...
Signed-off-by: adabeyta <aabeyta@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Lee Nau
9555929e13
[Bugfix] Use correct key "ignore" for config.json non-quantized layers ( #25706 )
...
Signed-off-by: Lee Nau <lnau@nvidia.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Jee Jee Li
2405817748
[Model] Remove MotifForCausalLM ( #25866 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Isotr0py
616bce15ce
[CI/Build] Include Transformers backend test in nightly transformers test ( #25885 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Rahul Tuli
c33992154a
[Bugfix][Speculative Decoding] Fix Eagle3 quantization config issue ( #25883 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Chenxi Yang
f84b2a0dd0
[Nixl][P/D] Add cuda2cpu support (HD->DH transfer) ( #24690 )
...
Signed-off-by: Chenxi Yang <cxyang@fb.com>
Co-authored-by: Chenxi Yang <cxyang@fb.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Jiangyun Zhu
9f78b9ca84
[torch.compile] serialize cudagraph_mode as its enum name instead of value ( #25868 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Zhou Jiahao
4e2774f5c3
[Model][Bugfix] Fix issues in MiDashengLM implementation for quantized models ( #25854 )
...
Signed-off-by: zhoukz <me@zhoukz.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Yingjun Mou
85d4306047
[Bugfix] Fix requirements paths in install instructions ( #25827 )
...
Signed-off-by: yingjun-mou <renzomou@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
youkaichao
770a2cf7ae
update to latest deepgemm for dsv3.2 ( #25871 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Cyrus Leung
ea55445b8d
[Misc] Remove more get_input_embeddings_v0 ( #25857 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Isotr0py
b765adccd7
[V0 Deprecation][Models] Remove all V0 condition for mm embeddings merge ( #25331 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Roger Wang
4079a63a86
[Bugfix] Fallback ViT attn backend to SDPA for blackwell ( #25851 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00