Cyrus Leung
5a87d8b9b1
[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release ( #30396 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-10 19:59:35 -08:00
Laith Sakka
87aee9ed2b
Add evaluate_guards option to DynamicShapesConfig ( #27432 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-12-08 10:46:15 -05:00
Wentao Ye
17eb25e327
[Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement ( #29558 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-07 04:44:50 +00:00
Ilya Markov
4e26d3b09e
[Compile] Conditional compilation. Introduce compile_ranges ( #24252 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Luka Govedič <luka.govedic@gmail.com>
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
2025-12-05 18:17:32 +00:00
Arpit Khandelwal
dfdda96747
[Core] Remove forced None assignment for deprecated PassConfig flags ( #29994 )
...
Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-04 09:15:04 +00:00
Arpit Khandelwal
d7284a2604
[Core] Rename PassConfig flags as per RFC #27995 ( #29646 )
...
Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-12-03 03:38:55 +00:00
Nengjun Ma
eaf81485ed
[Ascend]: Fixed the issue where OOT Platform vllm-ascend could not enable SP in Eager mode ( #28935 )
...
Signed-off-by: leo-pony <nengjunma@outlook.com>
2025-12-01 15:02:18 -05:00
Morrison Turnansky
0838b52e2e
[Frontend][torch.compile] CompilationConfig Overhaul ( #20283 ): Set up -O infrastructure ( #26847 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: adabeyta <aabeyta@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Co-authored-by: adabeyta <aabeyta@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-27 01:55:58 -08:00
Harry Mellor
51fc9e017a
Scheduled removal of CompilationConfig.use_inductor ( #29323 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-25 12:55:42 +00:00
Icey
888152bf87
Allow oot custom compiler extension via CompilerInterface ( #28623 )
...
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: Icey <1790571317@qq.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
2025-11-25 15:25:15 +08:00
Laith Sakka
7a228b5305
Add option to use unbacked, and backed size obl dynamic shapes for more sounds compilation. ( #26199 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-11-24 10:12:41 -05:00
Lucas Wilkinson
30d6466238
[BugFix] Fix Eagle IndexError: list index out of range for even num_speculative_tokens ( #29102 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-22 00:47:05 +00:00
Boyuan Feng
8c25f9cfb6
[BugFix] skip combo kernel on cpu ( #29129 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-11-21 11:50:59 +08:00
Lucas Wilkinson
8f4f77a727
[BugFix] Fix false assertion with spec-decode=[2,4,..] and TP>2 ( #29036 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-19 13:43:54 -08:00
vnadathur
1ffe934c8a
[torch.compile] caching of config fields should be opt-out by default ( #26468 )
...
Signed-off-by: vnadathur <glvikramn@gmail.com>
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
Co-authored-by: WorldExplored <srreyansh.sethi@gmail.com>
Co-authored-by: Srreyansh Sethi <107075589+worldexplored@users.noreply.github.com>
Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-19 06:13:54 -08:00
Lucas Wilkinson
64e39d667c
[BugFix] Temporary fix for IMA with MTP = 2 and full-cg ( #28315 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-17 09:41:22 -05:00
Roger Wang
d3387750f1
[Misc] Turn off encoder torch compile by default ( #28634 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-11-13 08:38:08 -08:00
Harry Mellor
a742134cc5
Remove deprecated fields from CompilationConfig ( #27593 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-12 16:10:28 +00:00
TJian
edb59a9470
[ROCm] [Bugfix] Fix fused_qknorm_rope_kernel rocm compatibility ( #28500 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-11-12 05:01:14 -08:00
Yanan Cao
48c879369f
[Frontend] Change CompilationMode to a proper Enum ( #28165 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2025-11-11 19:46:18 -05:00
zhrrr
68c09efc37
[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model ( #27165 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
2025-11-11 12:00:31 -05:00
Ilya Markov
d17ecc6b19
[PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds ( #24248 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-11-10 18:33:11 -05:00
Harry Mellor
c0a4b95d64
Fix issues from #28242 ( #28257 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-07 04:23:17 +00:00
Lucas Kabela
4bf56c79cc
[Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile ( #28242 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2025-11-07 00:16:03 +00:00
Vadim Gimpelson
b6a248bdd7
[PERF] Decouple projections from GDN custom op. Attempt 2 ( #28083 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2025-11-05 17:01:12 -08:00
Vadim Gimpelson
d4e547bb7e
Revert "[PERF] Decouple projections from GDN custom op" ( #28080 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2025-11-04 15:58:23 -08:00
Vadim Gimpelson
5fd8f02ea9
[PERF] Decouple projections from GDN custom op ( #27512 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2025-11-04 08:11:41 -08:00
ahao-anyscale
cac4c10ef0
[BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile ( #27616 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
2025-11-03 11:13:51 -05:00
Zhiyuan Li
4e68cc9b6a
[Model] Introduce Kimi Linear to vLLM ( #27809 )
...
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn>
Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>
2025-10-30 21:02:27 +08:00
Lucas Kabela
94666612a9
[Misc][qwen2_5_vl][torch.compile] Enable supports_torch_compile on generic nn.Module and demonstrate speedup on Qwen Vision model ( #23207 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
Signed-off-by: Lucas Kabela <lucasakabela@gmail.com>
2025-10-28 22:36:43 +00:00
fhl2000
284cc92275
[MISC] cudagraph_capture_sizes related improvements ( #26016 )
...
Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-24 05:11:05 -07:00
Andy Lo
b63f2143f8
[LoRA] LoRA cuda graph specialization ( #25914 )
...
Signed-off-by: Andy Lo <andy@mistral.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-10-20 04:21:09 +00:00
Isotr0py
6ac5e06f7c
[Chore] Clean up pytorch helper functions in vllm.utils ( #26908 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
2025-10-18 09:48:22 -07:00
Cyrus Leung
4d4d6bad19
[Chore] Separate out vllm.utils.importlib ( #27022 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-17 00:48:59 +00:00
Morrison Turnansky
96b9aa5aa0
[Frontend][torch.compile] CompilationConfig Overhaul ( #20283 ): name change compilation level to compilation mode, deprecation compilation level ( #26355 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-10-15 02:51:16 +00:00
Luka Govedič
2dcd12d357
[torch.compile] Fix tests for torch==2.9 inductor partition ( #26116 )
...
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
2025-10-14 19:55:02 -04:00
Boyuan Feng
ca683a2a72
use combo kernel to fuse qk-norm and qk-rope ( #26682 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-10-14 09:40:59 -04:00
Morrison Turnansky
e3fdb627d9
[FrontEnd] UNREVERT CompilationConfig overhaul ( #20283 ): deprecate use_inductor in favor of backend, simplify custom_ops ( #26502 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
2025-10-13 22:47:16 +00:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y ( #26633 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Angela Yi
01653a917b
[compile] Fix inductor partition config ( #26645 )
...
Signed-off-by: angelayi <yiangela7@gmail.com>
2025-10-11 21:03:14 +00:00
baonudesifeizhai
cddce79fda
[torch.compile] Make inductor partition rules respect splitting_ops #25691 ( #25845 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-10-10 16:35:28 +00:00
Lucas Wilkinson
29255cfc3b
[Spec-Decode] Support piecewise cudagraphs for Eagle head ( #25109 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
2025-10-10 01:20:31 -04:00
Jiangyun Zhu
5728da11ea
Revert #26113 "[Frontend] CompilationConfig overhaul ( #20283 ): deprecate use_inductor in favor of backend, simplify custom_ops" ( #26472 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-10-09 05:43:55 -07:00
Naveenraj Kamalakannan
e614ab7806
Separate MLAAttention class from Attention ( #25103 )
...
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-10-08 17:11:11 -07:00
Morrison Turnansky
0c824fc46f
[Frontend] CompilationConfig overhaul ( #20283 ): deprecate use_inductor in favor of backend, simplify custom_ops ( #26113 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
2025-10-07 12:53:43 -07:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Yongye Zhu
fa7e254a7f
[New Model] DeepSeek-V3.2 (Rebased to Main) ( #25896 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-09-30 17:14:41 +08:00
Jiangyun Zhu
43227236ec
[torch.compile] serialize cudagraph_mode as its enum name instead of value ( #25868 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-09-29 13:54:52 +00:00
Jiangyun Zhu
c0ec81836f
[torch.compile]: Add VLLM_DEBUG_DUMP_PATH environment variable ( #25651 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-09-27 16:09:00 +00:00
fhl2000
f075693da7
[V1] address post issues related to #20059 (part 1) ( #23046 )
...
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-09-26 15:58:19 -04:00