177 Commits

Author SHA1 Message Date
Copilot
61728cd1df
Re-enable FlashInfer for Llama4 on Blackwell in e2e fusion tests (#28966)
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-19 13:32:19 -05:00
Harry Mellor
a8b70304d6
Update rope_scaling to rope_parameters in preparation for Transformers v5 (#28542)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-19 09:06:36 -08:00
Yanan Cao
2c8b9182b5
[CI] Reorganize compile tests so new tests are automatically included in CI (#28625)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2025-11-19 06:13:50 -08:00
Angela Yi
f36292dbee
[compile] Enable sequence parallelism matching w/o custom ops enabled (#27126)
Signed-off-by: angelayi <yiangela7@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
2025-11-15 11:46:12 +00:00
Laith Sakka
2e0ad629b0
Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110)
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-11-14 14:11:10 -08:00
Boyuan Feng
fd75d3e8c0
[Minor] avoid register new custom and just import silly_attn (#28578)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-11-14 09:32:31 +00:00
Yanan Cao
262d263f6c
[Bugfix] Eliminate tuple inputs to submodules in graph partitioning (#28533)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2025-11-13 15:09:05 -05:00
Roger Wang
d3387750f1
[Misc] Turn off encoder torch compile by default (#28634)
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-11-13 08:38:08 -08:00
Harry Mellor
a742134cc5
Remove deprecated fields from CompilationConfig (#27593)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-12 16:10:28 +00:00
TJian
edb59a9470
[ROCm] [Bugfix] Fix fused_qknorm_rope_kernel rocm compatibility (#28500)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-11-12 05:01:14 -08:00
Yanan Cao
48c879369f
[Frontend] Change CompilationMode to a proper Enum (#28165)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2025-11-11 19:46:18 -05:00
Adrian Abeyta
d23539549a
Use FLASHINFER MLA backend when testing fp8_kv_scale_compile (#28491)
Signed-off-by: adabeyta <aabeyta@redhat.com>
2025-11-12 00:34:58 +00:00
zhrrr
68c09efc37
[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model (#27165)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
2025-11-11 12:00:31 -05:00
jvlunteren
533b018f72
[BugFix] Fix Failing Ruff Check (#28469)
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
2025-11-11 06:41:43 -08:00
Matthew Bonanni
b30dfa03c5
[Attention] Refactor CUDA attention backend selection logic (#24794)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-11 07:40:44 -05:00
Adrian Abeyta
a5a790eea6
[Bugfix] Ensure calculated KV scales are applied in attention. (#27232)
Signed-off-by: adabeyta <aabeyta@redhat.com>
2025-11-10 23:42:37 +00:00
Ilya Markov
d17ecc6b19
[PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds (#24248)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-11-10 18:33:11 -05:00
Boyuan Feng
b158df2813
remove resolve_op_overloads and use splitting_ops directly (#28081)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-11-08 01:13:13 +00:00
Copilot
a736e5ff77
[CI] Reduce Blackwell Fusion test runtime by filtering tests and only run all tests in nightly (#28074) 2025-11-07 15:58:16 +08:00
Lucas Kabela
4bf56c79cc
[Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile (#28242)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2025-11-07 00:16:03 +00:00
gmagogsfm
bde5039325
[CI] Add compile/test_multimodal_compile.py to CI (#28151)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-06 05:41:47 +00:00
Lucas Kabela
94666612a9
[Misc][qwen2_5_vl][torch.compile] Enable supports_torch_compile on generic nn.Module and demonstrate speedup on Qwen Vision model (#23207)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
Signed-off-by: Lucas Kabela <lucasakabela@gmail.com>
2025-10-28 22:36:43 +00:00
fhl2000
284cc92275
[MISC] cudagraph_capture_sizes related improvements (#26016)
Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-24 05:11:05 -07:00
fhl2000
85fee74b33
[Bugfix][CI] Move resolving cudagraph_mode before initializing attn_metadata_builder (#27427)
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
2025-10-23 20:31:14 -07:00
dongbo910220
a0003b56b0
[Chore] Separate out system utilities from vllm.utils (#27201)
Signed-off-by: dongbo910220 <1275604947@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-10-22 20:25:25 +00:00
Harry Mellor
8f18feb191
Remove last level references not removed in #26355 (#27260)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-22 09:18:17 +00:00
Jiangyun Zhu
ab3e80042e
[torch.compile] Enable silu_mul_fp8_quant fusion without custom ops enabled (#27146)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-10-22 00:22:39 -04:00
Isotr0py
6ac5e06f7c
[Chore] Clean up pytorch helper functions in vllm.utils (#26908)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
2025-10-18 09:48:22 -07:00
Luka Govedič
bd7157a071
[torch.compile] Enable attention and allreduce fusion without custom ops enabled (#24604)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-10-17 08:10:23 -06:00
Boyuan Feng
17c540a993
[torch.compile] fix simple inductor graph partition test (#27050)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-10-16 21:09:36 -04:00
Lucia Fang
11ae016bd7
[torch.compile] Passing only necessary compilation config to inductor pass config (#27041)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
2025-10-17 00:01:52 +00:00
Richard Zou
9b6504c307
[BugFix] Work around graph partition x torch.compile cache issue (#26956)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2025-10-15 20:06:11 -07:00
Adrian Abeyta
0a9ef0cfce
Move query quantization to attention layer for Flashinfer & Triton. (#26534)
Signed-off-by: adabeyta <aabeyta@redhat.com>
Signed-off-by: Adrian Abeyta <aabeyta@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-10-15 19:01:38 -04:00
Boyuan Feng
f57438338d
[BugFix] Patch inductor memory plan logic (#26878)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-15 12:51:45 +00:00
Boyuan Feng
f0862eae43
[Graph Partition] pass tests for decorator (#26831)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-10-15 06:39:48 +00:00
Morrison Turnansky
96b9aa5aa0
[Frontend][torch.compile] CompilationConfig Overhaul (#20283): name change compilation level to compilation mode, deprecation compilation level (#26355)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-10-15 02:51:16 +00:00
Luka Govedič
2dcd12d357
[torch.compile] Fix tests for torch==2.9 inductor partition (#26116)
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
2025-10-14 19:55:02 -04:00
Michael Goin
c3a722fcb2
[CI Failure] Fix tests with missing TinyLlama-1.1B-Chat-v1.0-FP8-e2e (#26816)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-10-14 18:38:59 +00:00
Morrison Turnansky
e3fdb627d9
[FrontEnd] UNREVERT CompilationConfig overhaul (#20283): deprecate use_inductor in favor of backend, simplify custom_ops (#26502)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
2025-10-13 22:47:16 +00:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Angela Yi
a25f2adee9
[compile] Add patched_fused_scaled_matmul_reduce_scatter (#26604)
Signed-off-by: angelayi <yiangela7@gmail.com>
2025-10-11 05:44:43 -07:00
Zhengxu Chen
eef921f45e
AOT Compilation for torch.compile (Bundled) (#24274)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2025-10-10 19:02:11 -04:00
baonudesifeizhai
cddce79fda
[torch.compile] Make inductor partition rules respect splitting_ops #25691 (#25845)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-10-10 16:35:28 +00:00
Andy Lo
67661375fa
[BugFix] Fix noop elimination edge case (#26394)
Signed-off-by: Andy Lo <andy@mistral.ai>
2025-10-10 13:33:04 +00:00
Boyuan Feng
b545a0b207
fix test_simple_inductor_graph_partition (#26522)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-10-10 06:39:19 +00:00
Jiangyun Zhu
5728da11ea
Revert #26113 "[Frontend] CompilationConfig overhaul (#20283): deprecate use_inductor in favor of backend, simplify custom_ops" (#26472)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-10-09 05:43:55 -07:00
Morrison Turnansky
0c824fc46f
[Frontend] CompilationConfig overhaul (#20283): deprecate use_inductor in favor of backend, simplify custom_ops (#26113)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
2025-10-07 12:53:43 -07:00
Cyrus Leung
1e4ecca1d0
[V0 Deprecation] Remove VLLM_USE_V1 from tests (#26341)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-07 15:42:31 +00:00
Gregory Shtrasberg
f231e5bc21
[ROCm] Split AITER unified attention into its own backend (#25507)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-10-06 22:49:23 +00:00
Harry Mellor
6c04638214
Fix per file ruff ignores related to line length (#26262)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-06 05:12:40 +00:00