a120092009
|
8d0afa9b42
|
[Doc] Add Cambricon MLU support (#25942)
Signed-off-by: a120092009 <zhaoty0121@gmail.com>
|
2025-09-30 17:59:47 +08:00 |
|
Yongye Zhu
|
fa7e254a7f
|
[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
|
2025-09-30 17:14:41 +08:00 |
|
Simon Danielsson
|
e23cacda35
|
[Bugfix]: Clean up chunked prefill logging when using whisper (#25075)
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
|
2025-09-30 08:17:49 +00:00 |
|
Zhou Jiahao
|
2e1b8bc2b6
|
[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect logical_not (#25925)
Signed-off-by: zhoukz <me@zhoukz.com>
|
2025-09-30 08:15:23 +00:00 |
|
acisseJZhong
|
e47433b3c1
|
[BugFix] Pass config_format via try_get_generation_config (#25912)
|
2025-09-30 05:09:50 +00:00 |
|
Lucas Wilkinson
|
23194d83e8
|
[BugFix] Fix DP/EP hang (#25906)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-09-30 04:18:59 +00:00 |
|
Harry Mellor
|
61aedb5ffe
|
MoveVllmConfig from config/__init__.py to config/vllm.py (#25271)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-29 19:49:49 -07:00 |
|
Zhuohan Li
|
d3bd171123
|
[Benchmark] Support benchmark throughput for external launcher DP (#25913)
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
|
2025-09-30 01:43:57 +00:00 |
|
Wentao Ye
|
89e4050af4
|
[Bug] Fix Weight Loading for Block FP8 Cutlass SM90 (#25909)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-30 09:15:19 +08:00 |
|
Andrew Sansom
|
78a47f87ce
|
Test Prompt Embeds/LoRA compatibility and Enable LoRA Support for OPT Models (#25717)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
|
2025-09-30 08:10:58 +08:00 |
|
Aaron Pham
|
6a113d9aed
|
[V0 Deprecation] Remove vllm.worker and update according imports (#25901)
|
2025-09-29 23:26:11 +00:00 |
|
Nicolò Lucchesi
|
2e4fe48c37
|
[NIXL] Increase default KV block eviction timeout on P (#25897)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-09-29 21:35:14 +00:00 |
|
Zhuohan Li
|
8eb0a1d906
|
[Doc] Polish example for torchrun dp (#25899)
|
2025-09-29 21:31:34 +00:00 |
|
Thomas Parnell
|
fea3e476aa
|
[Kernel] Chunk-aligned mamba2 (#24683)
|
2025-09-29 23:18:25 +02:00 |
|
Gregory Shtrasberg
|
61a3431613
|
[Bugfix][ROCm] Fixing trying to import non-existent symbols from libnccl.so (#25605)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-09-29 17:01:50 -04:00 |
|
Naman Lalit
|
9bedac9623
|
[Doc] Add documentation for vLLM continuous benchmarking and profiling (#25819)
Signed-off-by: Naman Lalit <nl2688@nyu.edu>
|
2025-09-29 20:49:49 +00:00 |
|
Adrian Abeyta
|
c42ff4f4fd
|
[BugFix][torch.compile] KV scale calculation issues with FP8 quantization (#25513)
Signed-off-by: adabeyta <aabeyta@redhat.com>
|
2025-09-29 15:52:04 -04:00 |
|
Lee Nau
|
d5ab28511c
|
[Bugfix] Use correct key "ignore" for config.json non-quantized layers (#25706)
Signed-off-by: Lee Nau <lnau@nvidia.com>
|
2025-09-29 15:07:29 -04:00 |
|
Jee Jee Li
|
e61eb5e09d
|
[Model] Remove MotifForCausalLM (#25866)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-30 00:36:30 +08:00 |
|
Isotr0py
|
0899ba5b42
|
[CI/Build] Include Transformers backend test in nightly transformers test (#25885)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-29 09:33:39 -07:00 |
|
Rahul Tuli
|
145ac73317
|
[Bugfix][Speculative Decoding] Fix Eagle3 quantization config issue (#25883)
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
|
2025-09-29 11:37:20 -04:00 |
|
Chenxi Yang
|
d0d138bc55
|
[Nixl][P/D] Add cuda2cpu support (HD->DH transfer) (#24690)
Signed-off-by: Chenxi Yang <cxyang@fb.com>
Co-authored-by: Chenxi Yang <cxyang@fb.com>
|
2025-09-29 14:31:51 +00:00 |
|
Jiangyun Zhu
|
43227236ec
|
[torch.compile] serialize cudagraph_mode as its enum name instead of value (#25868)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-09-29 13:54:52 +00:00 |
|
Zhou Jiahao
|
8616300ae2
|
[Model][Bugfix] Fix issues in MiDashengLM implementation for quantized models (#25854)
Signed-off-by: zhoukz <me@zhoukz.com>
|
2025-09-29 10:59:04 +00:00 |
|
Yingjun Mou
|
edbaadd91f
|
[Bugfix] Fix requirements paths in install instructions (#25827)
Signed-off-by: yingjun-mou <renzomou@gmail.com>
|
2025-09-29 03:49:35 -07:00 |
|
youkaichao
|
9360d34fa1
|
update to latest deepgemm for dsv3.2 (#25871)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-09-29 17:51:43 +08:00 |
|
Cyrus Leung
|
1b67b04656
|
[Misc] Remove more get_input_embeddings_v0 (#25857)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-29 08:03:37 +00:00 |
|
Isotr0py
|
bd51f78e39
|
[V0 Deprecation][Models] Remove all V0 condition for mm embeddings merge (#25331)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
|
2025-09-29 14:09:18 +08:00 |
|
Roger Wang
|
65ecb4f134
|
[Bugfix] Fallback ViT attn backend to SDPA for blackwell (#25851)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-09-29 06:03:51 +00:00 |
|
Kunshang Ji
|
143844fa43
|
[XPU]Fix xpu spec decoding UTs, avoid using cuda graph (#25847)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-09-29 05:15:10 +00:00 |
|
Thomas Parnell
|
219cfbe7f6
|
Add Phi4FlashForCausalLM to _PREVIOUSLY_SUPPORTED_MODELS (#25832)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-09-29 05:08:17 +00:00 |
|
Robert Shaw
|
9b44a7d926
|
[P/D] NIXL Updates (#25844)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Chenheli Hua <huachenheli@outlook.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-09-29 04:46:30 +00:00 |
|
Juechen Liu
|
a3ae45a38c
|
[Misc] fix tests failure by using current_platform (#25825)
Signed-off-by: Juechen Liu <jueliu@meta.com>
|
2025-09-29 04:18:57 +00:00 |
|
Michael Goin
|
0307428d65
|
Remove redundant cudagraph dispatcher warning (#25841)
|
2025-09-28 17:12:42 -04:00 |
|
JJJYmmm
|
471997adf6
|
[Bugfix] fix Qwen3VLMoe load when pp > 1 (#25838)
Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com>
Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com>
|
2025-09-28 17:56:12 +00:00 |
|
Yuxuan Zhang
|
b1ded114b9
|
Update GLM-4.5 Doc transformers version (#25830)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2025-09-28 12:05:51 +00:00 |
|
weiliang
|
f4e4088c99
|
Fix random dataset mismatched token length with config. (#24937)
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-28 08:23:44 +00:00 |
|
Isotr0py
|
0efd540dbc
|
[VLM] Update Qwen3-VL max_num_video_tokens calculation for configurable video profiling (#25557)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-28 04:21:01 +00:00 |
|
Roger Wang
|
6144754014
|
[Bugfix] Fix Qwen3-VL regression from #24982 (#25814)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-09-28 03:21:09 +00:00 |
|
Roger Wang
|
69311446ba
|
[MM] Optimize memory profiling for scattered multimodal embeddings (#25810)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-09-28 02:17:58 +00:00 |
|
Nicolò Lucchesi
|
da63274d9f
|
[Bugfix][NIXL] Fix Async Scheduler timeout issue (#25808)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-09-27 15:17:35 -04:00 |
|
Jialin Ouyang
|
c216119d64
|
[Core] GC Debug callback (#24829)
Signed-off-by: Jialin Ouyang <jialino@meta.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Co-authored-by: Jialin Ouyang <jialino@meta.com>
|
2025-09-27 17:53:31 +00:00 |
|
Clayton Coleman
|
5546acb463
|
[Bug]: Set LD_LIBRARY_PATH to include the 'standard' CUDA location (#25766)
Signed-off-by: Clayton Coleman <smarterclayton@gmail.com>
|
2025-09-27 13:36:28 -04:00 |
|
Jiangyun Zhu
|
c0ec81836f
|
[torch.compile]: Add VLLM_DEBUG_DUMP_PATH environment variable (#25651)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-27 16:09:00 +00:00 |
|
Patrick C. Toulme
|
b65e56babe
|
[Core] Refactor self.model() to call a helper for subclassing. (#25084)
Signed-off-by: Patrick Toulme <ptoulme@meta.com>
Signed-off-by: Patrick Toulme <pctoulme+1@gmail.com>
|
2025-09-27 08:40:59 -07:00 |
|
Peter Pan
|
49996cd597
|
[env] default nixl side port conflicts with kv-event zmq port (#25056)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
|
2025-09-27 15:02:40 +00:00 |
|
yyzxw
|
ecb37e276a
|
[docs] transcriptions API audio upload (#25446)
Signed-off-by: zxw <1020938856@qq.com>
|
2025-09-27 15:00:35 +00:00 |
|
Tyler Michael Smith
|
a5354b3ed2
|
[Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models (#24982)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-09-27 14:22:28 +00:00 |
|
Tyler Michael Smith
|
f9df8b4ad7
|
[Bugfix] Fix triton import precommit failure (#25803)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-09-27 07:13:11 -07:00 |
|
Harry Mellor
|
ec152c8748
|
Fix GPTQ model loading in Transformers backend (#25770)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-27 12:18:20 +00:00 |
|