Sage Moore
e283eff060
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/dbo-full-cudagraphs
2025-07-31 21:24:57 +00:00
Sage Moore
ba17d955a0
Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/dbo-full-cudagraphs
2025-07-31 21:17:21 +00:00
Ilya Markov
6e672daf62
Add FlashInfer allreduce RMSNorm Quant fusion ( #21069 )
...
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-07-31 13:58:38 -07:00
Zhengxu Chen
7349d5268b
[ez] Remove a trailing space from compilation/decorators.py ( #22028 )
2025-07-31 09:46:07 -07:00
cascade
287f527f54
[Feature] Add async tensor parallelism for scaled mm ( #20155 )
...
Signed-off-by: cascade812 <cascade812@outlook.com>
2025-07-30 17:23:41 -04:00
Richard Zou
04e38500ee
[Bugfix] VLLM_V1 supports passing other compilation levels ( #19340 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com>
2025-07-29 09:35:58 -04:00
Sage Moore
1c41175b2a
full cudagraphs
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-07-25 20:08:05 +00:00
Chaojun Zhang
d9f9a3fd96
[XPU] Conditionally import CUDA-specific passes to avoid import errors on xpu platform ( #21036 )
...
Signed-off-by: chzhang <chaojun.zhang@intel.com>
2025-07-24 23:23:36 +08:00
Yong Hoon Shin
4ac7713e32
Add test case for compiling multiple graphs ( #21044 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-07-23 11:00:47 -07:00
Xin Li
ae268b6326
Fix Flashinfer Allreduce+Norm enable disable calculation based on fi_allreduce_fusion_max_token_num ( #21325 )
...
Signed-off-by: XIn Li <xinli@nvidia.com>
2025-07-22 12:42:31 -07:00
Ilya Markov
37a7d5d74a
[Misc] Refactor AllReduceFusionPass. Remove parameter ( #20918 )
...
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-07-15 06:57:40 +00:00
Boyuan Feng
91b3d190ae
[cold start] replace VLLM_COMPILE_DEPYF with debug_dump_dir ( #20940 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-07-15 13:02:17 +08:00
Richard Zou
ba8c300018
[BugFix] VLLM_DISABLE_COMPILE_CACHE=1 should disable all reads and writes from the cache ( #20942 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com>
2025-07-15 01:26:18 +00:00
Yong Hoon Shin
61e20828da
Fall back if flashinfer comm module not found ( #20936 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-07-14 23:11:18 +00:00
Boyuan Feng
c1c8ca57ff
[cold start time] add envs.VLLM_COMPILE_DEPYF to guard decompile ( #20790 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-07-11 23:06:13 -07:00
Ilya Markov
fc0f41d10a
Integration SM100 FlashInfer fused allreduce RMSNorm ( #20691 )
...
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-07-11 18:58:15 -07:00
Luka Govedič
762be26a8e
[Bugfix] Upgrade depyf to 0.19 and streamline custom pass logging ( #20777 )
...
Signed-off-by: Luka Govedic <lgovedic@redhat.com>
Signed-off-by: luka <lgovedic@redhat.com>
2025-07-11 00:15:22 -07:00
Luka Govedič
31d5c1797f
[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf ( #19830 )
...
Signed-off-by: Luka Govedic <lgovedic@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-11 04:56:28 +00:00
Kyle Yu
d2e841a10a
[Misc] Improve logging for dynamic shape cache compilation ( #20573 )
...
Signed-off-by: kyolebu <kyu@redhat.com>
2025-07-08 00:48:09 +00:00
Jee Jee Li
1caca5a589
[Misc] Add SPDX-FileCopyrightText ( #20428 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-04 07:40:42 +00:00
Boyuan Feng
c01d1c5aba
use .dev for version comparison with pytorch nightly release ( #20031 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-06-24 21:52:16 +00:00
cascade
e6327c9b3e
[Feature] Support sequence parallelism for static fp8 quantization ( #19181 )
...
Signed-off-by: cascade812 <cascade812@outlook.com>
2025-06-23 16:09:02 -04:00
Richard Zou
ed33349738
[BugFix] Fix use_cudagraph=False ( #19612 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com>
2025-06-19 08:23:12 +08:00
Luka Govedič
3597b06a4f
[CUDA] Enable full cudagraph for FlashMLA ( #18581 )
...
Signed-off-by: luka <luka@neuralmagic.com>
2025-06-13 18:12:26 +00:00
youkaichao
d70bc7c029
[torch.compile] reorganize the cache directory to support compiling multiple models ( #19064 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-06-13 15:23:25 +08:00
Boyuan Feng
ce688ad46e
use base version for version comparison ( #19587 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-06-13 15:09:34 +08:00
Luka Govedič
f98548b9da
[torch.compile][ROCm] Fuse quantization onto attention using a torch.compile pass ( #16756 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
2025-06-12 08:31:04 -07:00
Richard Zou
eaa2e51088
[Bugfix] Re-enable use_cudagraph in vLLM v1 ( #19299 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com>
2025-06-08 08:56:12 +08:00
Li, Jiang
4555143ea7
[CPU] V1 support for the CPU backend ( #16441 )
2025-06-03 18:43:01 -07:00
Simon Mo
02f0c7b220
[Misc] Add SPDX-FileCopyrightText ( #19100 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Michael Goin
cc977286e7
Reduce logs in CLI scripts and plugin loader ( #18970 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-03 06:00:45 +00:00
Richard Zou
84ec470fca
Improve "failed to get the hash of the compiled graph" error ( #18956 )
...
Signed-off-by: rzou <zou3519@gmail.com>
2025-05-30 15:00:54 +00:00
Richard Zou
a521ef06e5
Use standalone_compile by default in torch >= 2.8.0 ( #18846 )
...
Signed-off-by: rzou <zou3519@gmail.com>
2025-05-30 06:41:58 +08:00
Richard Zou
26b4fa45be
Add ability to use CUDAGraphs with use_inductor=False ( #17345 )
...
Signed-off-by: rzou <zou3519@gmail.com>
2025-05-29 10:16:52 +08:00
Richard Zou
aa42561e40
Fix PiecewiseCompileInterpreter ( #17338 )
...
Signed-off-by: rzou <zou3519@gmail.com>
2025-05-28 08:40:53 +00:00
Hyogeun Oh (오효근)
a68e293cb9
[Doc] Convert Sphinx directives ( {class}, {meth}, {attr}, ...) to MkDocs format for better documentation linking ( #18663 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-05-27 01:44:20 -07:00
cascade
71ea614d4a
[Feature]Add async tensor parallelism using compilation pass ( #17882 )
...
Signed-off-by: cascade812 <cascade812@outlook.com>
2025-05-23 01:03:34 -07:00
Mengqing Cao
f8d2cc5f55
[Compile][Platform] Make PiecewiseBackend pluggable and extendable ( #18076 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-05-22 12:11:53 -07:00
Charlie Fu
7b2f28deba
[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm ( #18082 )
...
Signed-off-by: charlifu <charlifu@amd.com>
2025-05-13 22:13:56 -07:00
Harry Mellor
19324d660c
Update deprecated type hinting in vllm/compilation ( #18072 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-13 08:32:48 -07:00
Aaron Pham
cb528d0585
[Fix] check to make sure processor has chat templates ( #18047 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-05-13 03:04:10 -07:00
Harry Mellor
4b2ed7926a
Improve configs - the rest! ( #17562 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-09 15:18:44 -07:00
Richard Zou
ea2236bf95
Add option to use torch._inductor.standalone_compile ( #17057 )
...
Signed-off-by: rzou <zou3519@gmail.com>
2025-05-09 12:59:04 -07:00
Richard Zou
edbf2d609e
[easy] Fix logspam on PiecewiseBackend errors ( #17138 )
...
Signed-off-by: rzou <zou3519@gmail.com>
2025-05-05 23:46:11 -07:00
Harry Mellor
d6484ef3c3
Add full API docs and improve the UX of navigating them ( #17485 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-03 19:42:43 -07:00
Sage Moore
460a2b1100
[torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations ( #10867 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2025-05-01 07:59:28 -07:00
Keyun Tong
26bc4bbcd8
Avoid overwriting vllm_compile_cache.py ( #17418 )
...
Signed-off-by: Keyun Tong <tongkeyun@gmail.com>
2025-05-01 07:30:57 +00:00
Bryan Lu
70788bdbdc
[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE ( #17211 )
...
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
2025-04-29 21:10:00 +00:00
Richard Zou
165cb56329
Ignore '<string>' filepath ( #17330 )
...
Signed-off-by: rzou <zou3519@gmail.com>
2025-04-28 19:23:29 -07:00
cascade
690fe019f0
[Feature] support sequence parallelism using compilation pass ( #16155 )
...
Signed-off-by: cascade812 <cascade812@outlook.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-04-27 06:29:35 -07:00