Lucia Fang
|
c7fc6b1354
|
fix incompatibililty with non cuda platform for nvfp4 (#23478)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
|
2025-08-24 15:35:41 -07:00 |
|
elvischenv
|
24d0c9e6ed
|
[NVIDIA][torch.compile] Support Flashinfer TRTLLM FP8-q/kv NVFP4-out Attention Kernel (#22703)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-08-22 22:09:05 +00:00 |
|
Didier Durand
|
22cf679aad
|
[Doc]: fix various typos in multiple files (#23179)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-08-22 10:38:46 -07:00 |
|
Yong Hoon Shin
|
dfd2382039
|
[torch.compile] Support conditional torch.compile per module (#22269)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-08-20 16:52:59 +00:00 |
|
elvischenv
|
03752dba8f
|
[NVIDIA] Support Flashinfer TRTLLM FP8-q/kv/out Attention Kernel (#21716)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-08-19 08:22:15 -04:00 |
|
Xiao
|
a4454e9401
|
chore: disable enable_cpp_symbolic_shape_guards (#23048)
Signed-off-by: Xiao Liu <xiszishu@gmail.com>
|
2025-08-18 23:08:05 -04:00 |
|
fhl2000
|
74f441f4b5
|
[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (#20059)
Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2025-08-15 10:01:39 -04:00 |
|
Gregory Shtrasberg
|
031ca762d7
|
[ROCm][Bugfix] Compilation passes fix (#22202)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-08-04 19:12:28 -07:00 |
|
Xiao
|
554df8a6a2
|
Revert "[compile][startup] Disable C++ compilation of symbolic shapes" (#22122)
Signed-off-by: Xiao Liu <xiszishu@gmail.com>
|
2025-08-02 09:03:30 -07:00 |
|
Animesh Jain
|
9659bc7f27
|
[compile][startup] Disable C++ compilation of symbolic shapes (#20836)
Signed-off-by: Animesh Jain <anijain@umich.edu>
|
2025-08-01 10:38:52 -07:00 |
|
Richard Zou
|
8026a335a1
|
[BugFix] Update AttnFusionPass cache key (#21947)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-08-01 07:11:29 -07:00 |
|
TJian
|
26b5f7bd2a
|
[BUG] [ROCm] Fix import bug on ROCm (#22083)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-08-01 05:25:20 -07:00 |
|
Ilya Markov
|
6e672daf62
|
Add FlashInfer allreduce RMSNorm Quant fusion (#21069)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-07-31 13:58:38 -07:00 |
|
Zhengxu Chen
|
7349d5268b
|
[ez] Remove a trailing space from compilation/decorators.py (#22028)
|
2025-07-31 09:46:07 -07:00 |
|
cascade
|
287f527f54
|
[Feature] Add async tensor parallelism for scaled mm (#20155)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-07-30 17:23:41 -04:00 |
|
Richard Zou
|
04e38500ee
|
[Bugfix] VLLM_V1 supports passing other compilation levels (#19340)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-07-29 09:35:58 -04:00 |
|
Chaojun Zhang
|
d9f9a3fd96
|
[XPU] Conditionally import CUDA-specific passes to avoid import errors on xpu platform (#21036)
Signed-off-by: chzhang <chaojun.zhang@intel.com>
|
2025-07-24 23:23:36 +08:00 |
|
Yong Hoon Shin
|
4ac7713e32
|
Add test case for compiling multiple graphs (#21044)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-23 11:00:47 -07:00 |
|
Xin Li
|
ae268b6326
|
Fix Flashinfer Allreduce+Norm enable disable calculation based on fi_allreduce_fusion_max_token_num (#21325)
Signed-off-by: XIn Li <xinli@nvidia.com>
|
2025-07-22 12:42:31 -07:00 |
|
Ilya Markov
|
37a7d5d74a
|
[Misc] Refactor AllReduceFusionPass. Remove parameter (#20918)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-07-15 06:57:40 +00:00 |
|
Boyuan Feng
|
91b3d190ae
|
[cold start] replace VLLM_COMPILE_DEPYF with debug_dump_dir (#20940)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-07-15 13:02:17 +08:00 |
|
Richard Zou
|
ba8c300018
|
[BugFix] VLLM_DISABLE_COMPILE_CACHE=1 should disable all reads and writes from the cache (#20942)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-07-15 01:26:18 +00:00 |
|
Yong Hoon Shin
|
61e20828da
|
Fall back if flashinfer comm module not found (#20936)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-14 23:11:18 +00:00 |
|
Boyuan Feng
|
c1c8ca57ff
|
[cold start time] add envs.VLLM_COMPILE_DEPYF to guard decompile (#20790)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-07-11 23:06:13 -07:00 |
|
Ilya Markov
|
fc0f41d10a
|
Integration SM100 FlashInfer fused allreduce RMSNorm (#20691)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-07-11 18:58:15 -07:00 |
|
Luka Govedič
|
762be26a8e
|
[Bugfix] Upgrade depyf to 0.19 and streamline custom pass logging (#20777)
Signed-off-by: Luka Govedic <lgovedic@redhat.com>
Signed-off-by: luka <lgovedic@redhat.com>
|
2025-07-11 00:15:22 -07:00 |
|
Luka Govedič
|
31d5c1797f
|
[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf (#19830)
Signed-off-by: Luka Govedic <lgovedic@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-11 04:56:28 +00:00 |
|
Kyle Yu
|
d2e841a10a
|
[Misc] Improve logging for dynamic shape cache compilation (#20573)
Signed-off-by: kyolebu <kyu@redhat.com>
|
2025-07-08 00:48:09 +00:00 |
|
Jee Jee Li
|
1caca5a589
|
[Misc] Add SPDX-FileCopyrightText (#20428)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-04 07:40:42 +00:00 |
|
Boyuan Feng
|
c01d1c5aba
|
use .dev for version comparison with pytorch nightly release (#20031)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-06-24 21:52:16 +00:00 |
|
cascade
|
e6327c9b3e
|
[Feature] Support sequence parallelism for static fp8 quantization (#19181)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-06-23 16:09:02 -04:00 |
|
Richard Zou
|
ed33349738
|
[BugFix] Fix use_cudagraph=False (#19612)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-06-19 08:23:12 +08:00 |
|
Luka Govedič
|
3597b06a4f
|
[CUDA] Enable full cudagraph for FlashMLA (#18581)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-06-13 18:12:26 +00:00 |
|
youkaichao
|
d70bc7c029
|
[torch.compile] reorganize the cache directory to support compiling multiple models (#19064)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-06-13 15:23:25 +08:00 |
|
Boyuan Feng
|
ce688ad46e
|
use base version for version comparison (#19587)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-06-13 15:09:34 +08:00 |
|
Luka Govedič
|
f98548b9da
|
[torch.compile][ROCm] Fuse quantization onto attention using a torch.compile pass (#16756)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-12 08:31:04 -07:00 |
|
Richard Zou
|
eaa2e51088
|
[Bugfix] Re-enable use_cudagraph in vLLM v1 (#19299)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-06-08 08:56:12 +08:00 |
|
Li, Jiang
|
4555143ea7
|
[CPU] V1 support for the CPU backend (#16441)
|
2025-06-03 18:43:01 -07:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
Michael Goin
|
cc977286e7
|
Reduce logs in CLI scripts and plugin loader (#18970)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-03 06:00:45 +00:00 |
|
Richard Zou
|
84ec470fca
|
Improve "failed to get the hash of the compiled graph" error (#18956)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-05-30 15:00:54 +00:00 |
|
Richard Zou
|
a521ef06e5
|
Use standalone_compile by default in torch >= 2.8.0 (#18846)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-05-30 06:41:58 +08:00 |
|
Richard Zou
|
26b4fa45be
|
Add ability to use CUDAGraphs with use_inductor=False (#17345)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-05-29 10:16:52 +08:00 |
|
Richard Zou
|
aa42561e40
|
Fix PiecewiseCompileInterpreter (#17338)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-05-28 08:40:53 +00:00 |
|
Hyogeun Oh (오효근)
|
a68e293cb9
|
[Doc] Convert Sphinx directives ( {class}, {meth}, {attr}, ...) to MkDocs format for better documentation linking (#18663)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
|
2025-05-27 01:44:20 -07:00 |
|
cascade
|
71ea614d4a
|
[Feature]Add async tensor parallelism using compilation pass (#17882)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-05-23 01:03:34 -07:00 |
|
Mengqing Cao
|
f8d2cc5f55
|
[Compile][Platform] Make PiecewiseBackend pluggable and extendable (#18076)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-05-22 12:11:53 -07:00 |
|
Charlie Fu
|
7b2f28deba
|
[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm (#18082)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-05-13 22:13:56 -07:00 |
|
Harry Mellor
|
19324d660c
|
Update deprecated type hinting in vllm/compilation (#18072)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-13 08:32:48 -07:00 |
|
Aaron Pham
|
cb528d0585
|
[Fix] check to make sure processor has chat templates (#18047)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-05-13 03:04:10 -07:00 |
|