328 Commits

Author SHA1 Message Date
vnadathur
1ffe934c8a
[torch.compile] caching of config fields should be opt-out by default (#26468)
Signed-off-by: vnadathur <glvikramn@gmail.com>
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
Co-authored-by: WorldExplored <srreyansh.sethi@gmail.com>
Co-authored-by: Srreyansh Sethi <107075589+worldexplored@users.noreply.github.com>
Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-19 06:13:54 -08:00
Didier Durand
7ed27f3cb5
[Doc]: fix typos in various files (#28945)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-18 22:52:30 -08:00
Li, Jiang
20852c8f4c
[CPU] Refactor CPU WNA16 (#28826)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-19 10:32:00 +08:00
Jialin Ouyang
40b6b38f2c
[Core] Switch Flat logprob control from environment variable to SamplingParams (#28914)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-11-19 02:10:02 +00:00
Ning Xie
ac1daf3233
fix comment typo (#28802)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-11-16 17:03:21 +00:00
Laith Sakka
2e0ad629b0
Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110)
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-11-14 14:11:10 -08:00
Alexander Matveev
69d0e90313
[MoE][Kernel][Perf] Improve Shared Expert Stream Overlap (#28406)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-11-12 23:37:24 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
4ca5cd5740
[Core][AMD] Migrate fully transparent sleep mode to ROCm platform (#12695)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>
2025-11-12 15:24:12 -08:00
QiliangCui
3eb0c2673e
[TPU] Support GCS path in VLLM_TORCH_PROFILER_DIR (#28487)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-11-12 22:31:14 +00:00
Andreas Karatzas
9f0247cfa4
VLLM_USE_TRITON_FLASH_ATTN V0 variable deprecation (#27611)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Andreas Karatzas <Andreas.Karatzas@amd.com>
2025-11-11 18:34:36 -08:00
wangxiyuan
e1710393c4
[[V0 deprecation]]Remove VLLM_USE_V1 env (#28204)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-11 18:22:16 -07:00
Ilya Markov
1788aa1efb
[BugFix] Graceful handling of torch symm mem errors. (#27671)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-11-11 17:41:54 -07:00
Max Hu
412e153df5
[Feature] Allow configuring FlashInfer workspace size (#28269)
Signed-off-by: Max Hu <hyoung2991@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-11 23:32:20 +00:00
Matthew Bonanni
b30dfa03c5
[Attention] Refactor CUDA attention backend selection logic (#24794)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-11 07:40:44 -05:00
Zhuohan Li
8d706cca90
[Misc] FlattenLogprobs -> FlatLogprobs (#28335) 2025-11-11 03:41:23 +00:00
Wentao Ye
de540c0354
[Feature] Add env var VLLM_MOE_USE_DEEP_GEMM (#28422)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-11 02:29:48 +00:00
vllmellm
f080a83511
[RFC][ROCm][AITER] Keep all AITER kernels in _aiter_ops class like _custom_ops and _ipex_ops (#24490)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-10 08:20:53 -08:00
Yong Hoon Shin
de2b78305f
[ROCm] Add env to enable/disable aiter triton gemm (#28321)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-11-08 22:27:00 -08:00
Benjamin Chislett
975676d174
[Feat] Drop-in Torch CUDA Profiler (#27841)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2025-11-08 14:07:37 -08:00
Pavani Majety
72b1c2ae2c
[Bugfix] Use latency MOE backend as default for Flashinfer and other misc fixes (#27439)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-11-07 04:18:39 -08:00
Jialin Ouyang
ccd98b59c1
[Perf] Introduce FlattenLogprobs to store logprobs results to reduce GC overhead (#28171)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-11-07 00:27:12 -08:00
Wentao Ye
90189c71a9
[Bug] Fix env string "0" same to True (#28159)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-05 17:04:20 -08:00
Chenheli Hua
1fb4217a05
[Multimodal] Make MediaConnector extensible. (#27759)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
2025-11-04 18:28:01 +00:00
ahao-anyscale
cac4c10ef0
[BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile (#27616)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
2025-11-03 11:13:51 -05:00
Paul Zhang
e7acb20076
[Feature] Batch invariant torch.compile (#27660)
Signed-off-by: PaulZhang12 <paulzhan@fb.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-10-30 13:11:29 -07:00
Wentao Ye
a8141fa649
[Refactor] Remove VLLM_DEEPEP_LOW_LATENCY_ALLOW_NVLINK (#27750)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-30 15:32:39 -04:00
Boyuan Feng
a9fe0793f2
use_aot_compile should respect VLLM_DISABLE_COMPILE_CACHE (#27698)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-10-29 17:08:54 +00:00
Alec S
ab2eb27b74
[Frontend] [gpt-oss] Mcp type bug (#27689)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Alec Solder <alecs@fb.com>
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-10-29 10:01:32 +00:00
Alec S
3c7fefdeba
[Frontend] [gpt-oss] Tool json call parsing error retry (#27675)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Alec Solder <alecs@fb.com>
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-10-29 09:42:44 +00:00
Wentao Ye
d3ab240f39
[Bug] Fix deepep low latency use nvlink by default (#27677)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-28 23:53:12 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
936643a868
[BugFix] Also consider RAY_EXPERIMENTAL_NOSET_* when storing compilation cache (#27294)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-10-28 10:22:28 -04:00
Pengchao Wang
d95d0f4b98
[Distributed] Basic set of configuration for large EP deployment on GB200 (#27328)
Signed-off-by: Pengchao Wang <wpc@fb.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
2025-10-24 14:16:44 -07:00
Ming Yang
0f67d4d962
[Attention] Add MLA prefill backend: trtllm_ragged_attention_deepseek (#26397)
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-10-24 10:24:08 -07:00
Richard Zou
cd390b609d
[compile] Turn standalone_compile back on (#27460)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2025-10-24 16:30:27 +00:00
Alexander Matveev
344a0017c0
[Performance] Dual stream execution of "shared_experts" and "selected_experts" inside FusedMoE (#26440)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-10-21 21:38:29 +00:00
Nick Hill
647214f3d5
[V0 Deprecation] Remove V0 executors (#27142)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-21 11:09:37 -07:00
Woosuk Kwon
fb860670da
[Minor] Remove unused env variable (#27161) 2025-10-18 18:48:35 -07:00
Isotr0py
6ac5e06f7c
[Chore] Clean up pytorch helper functions in vllm.utils (#26908)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
2025-10-18 09:48:22 -07:00
Patrick von Platen
b038d9c40c
[Data-parallel] Allow DP>1 for world_size > num_gpus on node (8) (#26367)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Rui Qiao <ruisearch42@gmail.com>
2025-10-17 08:24:42 -07:00
kliuae
1317034379
[ROCm][FEAT] Fuse DeepSeek shared experts into AITER fused_moe ops (#24097)
Signed-off-by: chenjun <junchen2@amd.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Co-authored-by: valarLip <103567126+valarLip@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-10-16 10:41:34 +08:00
Wentao Ye
e5b438a247
[Bug] Temporally Disable VLLM_ALLREDUCE_USE_SYMM_MEM by Default (#26925)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-15 16:18:50 -04:00
Kaixi Hou
de92d916fe
[NVIDIA] Add support for cudnn fp4 gemm via flashinfer (#26107)
Signed-off-by: kaixih <kaixih@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-10-15 13:53:00 -04:00
Jialin Ouyang
380f17527c
[Perf] Cache vllm.env.__getattr__ result to avoid recomputation (#26146)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-10-14 17:03:21 -04:00
Wentao Ye
6d87a2838c
[Config] Remove Unused Environment Variable VLLM_DISABLE_PAD_FOR_CUDAGRAPH (#26743)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-14 11:47:49 -04:00
Jialin Ouyang
cfded80793
[Easy] Fix env type check errors from VLLM_DEBUG_LOG_API_SERVER_RESPONSE (#26742)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-10-14 01:46:44 +00:00
Michael Goin
0d21b9b51e
[UX] Speedup DeepGEMM warmup with heuristics (#25619)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-10-13 07:59:27 -07:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Zhengxu Chen
eef921f45e
AOT Compilation for torch.compile (Bundled) (#24274)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2025-10-10 19:02:11 -04:00
Rui Qiao
757fa4a4da
[DP][ray] Support different VLLM_RAY_DP_PACK_STRATEGY (#23849)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-10-09 19:53:43 -07:00
Wentao Ye
f8607863d8
[Feature] Enable E8M0 by Default on Hopper for DeepGEMM, 5% E2E throughput improvement (#26197)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-08 15:33:56 +08:00