Shanshan Shen
e9ba99f296
[V1][Structured Output] Add supports_structured_output() method to Platform ( #16148 )
...
Signed-off-by: shen-shanshan <467638484@qq.com>
2025-04-07 11:06:24 +00:00
Ilya Markov
ef608c37a7
[Distributed] [ROCM] Fix custom allreduce enable checks ( #16010 )
...
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-04-04 09:39:08 -07:00
Li, Jiang
2386803f2a
[CPU] Change default block_size for CPU backend ( #16002 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-04-04 09:39:05 -07:00
Aleksandr Malyshev
57a810db9c
[ROCM][V0] PA kennel selection when no sliding window provided ( #15982 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
2025-04-03 05:28:44 +00:00
Aleksandr Malyshev
e73ff24e31
[ROCM][KERNEL] Paged attention for V1 ( #15720 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>
2025-04-02 19:48:00 -07:00
Ilya Markov
b7b7676d67
[Distributed] Add custom allreduce support for ROCM ( #14125 )
...
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-03-31 22:49:12 -07:00
Kebe
4e0f6076be
[Bugfix] Fix failure to launch in Tensor Parallel TP mode on macOS. ( #14948 )
...
Signed-off-by: Kebe <mail@kebe7jun.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-03-28 10:13:41 +08:00
Joe Runde
5f063a80bd
[bugfix] add supports_v1 platform interface ( #15417 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-03-25 15:00:32 -04:00
Thien Tran
4f044b1d67
[Kernel][CPU] CPU MLA ( #14744 )
...
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
2025-03-25 09:34:59 +00:00
Cyrus Leung
6dd55af6c9
[Doc] Update docs on handling OOM ( #15357 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-03-24 14:29:34 -07:00
Lucas Wilkinson
dccf535f8e
[V1] Enable V1 Fp8 cache for FA3 in the oracle ( #15191 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-03-23 15:07:04 -07:00
Russell Bryant
b877031d80
Remove openvino support in favor of external plugin ( #15339 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-22 14:06:39 -07:00
Isotr0py
f8a08cb90d
[V1] Enable Triton(ROCm) Attention backend for Nvidia GPUs ( #14071 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-21 03:14:19 +00:00
Richard Liu
a8f12a63fd
Fix env vars for running Ray distributed backend on GKE ( #15166 )
...
Signed-off-by: Richard Liu <ricliu@google.com>
2025-03-20 14:59:33 +00:00
Mickaël Seznec
a597a57595
[Attention] Flash Attention 3 - fp8 ( #14570 )
...
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
2025-03-20 01:14:20 -04:00
Yan Ma
9b87a579aa
[Misc][XPU] Use None as device capacity for XPU ( #14932 )
...
Signed-off-by: yan ma <yan.ma@intel.com>
2025-03-17 01:22:14 -07:00
Lucas Wilkinson
1e799b7ec1
[BugFix] Fix MLA + V1 + TP==1 causing reinitialization of cuda context ( #14910 )
2025-03-17 03:35:37 +00:00
Li, Jiang
a2ae496589
[CPU] Support FP8 KV cache ( #14741 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-03-14 22:07:36 -07:00
Alexander Matveev
7888e1d0a3
[V1] TPU - Enable prefix caching by default ( #14773 )
2025-03-13 20:40:05 -07:00
Siyuan Liu
1bc3b739c4
[V1][TPU] Add assertion on multi-step-scheduler ( #14707 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
2025-03-12 21:37:58 -07:00
Li, Jiang
ff47aab056
[CPU] Upgrade CPU backend to torch-2.6 ( #13381 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-03-12 10:41:13 +00:00
Jeff Daily
a1c8f3796c
dynamic distpatch of fp8 kernels ( #14245 )
...
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
2025-03-11 10:54:56 -04:00
gnovack
d6123170d5
[Neuron] Add Neuron device communicator for vLLM v1 ( #14085 )
2025-03-10 18:37:04 -07:00
Harry Mellor
3b352a2f92
Correct capitalisation: VLLM -> vLLM ( #14562 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-10 16:36:21 +00:00
Lucas Wilkinson
b0d541947a
[Attention] Default to FlashMLA backend for MLA ( #14451 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-03-08 18:18:39 -08:00
youkaichao
6eaf93020d
[platforms] improve rocm debugging info ( #14257 )
2025-03-04 21:32:18 -08:00
Tyler Michael Smith
72c62eae5f
[V1] EP/TP MoE + DP Attention ( #13931 )
2025-03-04 21:27:26 -08:00
Michael Goin
6247bae6c6
[Bugfix] Restrict MacOS CPU detection ( #14210 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-03-04 22:25:27 +08:00
youkaichao
ac65bc92df
[platform] add debug logging during inferring the device type ( #14195 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-04 18:39:16 +08:00
Cody Yu
989f4f430c
[Misc] Remove lru_cache in NvmlCudaPlatform ( #14156 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
2025-03-04 11:09:34 +08:00
Mengqing Cao
b87c21fc89
[Misc][Platform] Move use allgather to platform ( #14010 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com>
2025-03-03 15:40:04 +08:00
Woosuk Kwon
3b5567a209
[V1][Minor] Do not print attn backend twice ( #13985 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-03-01 07:09:14 +00:00
Lucas Wilkinson
2e94b9cfbb
[Attention] Flash MLA for V1 ( #13867 )
...
Signed-off-by: Yang Chen <yangche@fb.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Yang Chen <yangche@fb.com>
2025-02-27 23:03:41 +00:00
Yang Chen
58d1b2aa77
[Attention] MLA support for V1 ( #13789 )
...
Signed-off-by: Yang Chen <yangche@fb.com>
2025-02-27 13:14:17 -05:00
Lucas Wilkinson
f95903909f
[Kernel] FlashMLA integration ( #13747 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-02-27 10:35:08 +08:00
cjackal
51010a1807
[Misc] set single whitespace between log sentences ( #13771 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
2025-02-25 10:26:12 +08:00
Alex Brooks
9621667874
[Misc] Warn if the vLLM version can't be retrieved ( #13501 )
...
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
2025-02-20 06:24:48 +00:00
Cyrus Leung
435b502a6e
[ROCm] Make amdsmi import optional for other platforms ( #13460 )
2025-02-18 03:15:56 -08:00
Divakar Verma
7c7adf81fc
[ROCm] fix get_device_name for rocm ( #13438 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-02-18 04:07:12 +00:00
Yan Ma
30513d1cb6
[Bugfix] fix xpu communicator ( #13368 )
...
Signed-off-by: yan ma <yan.ma@intel.com>
2025-02-17 20:59:18 +08:00
Mengqing Cao
238dfc8ac3
[MISC] tiny fixes ( #13378 )
2025-02-17 00:57:13 -08:00
Isotr0py
d67cc21b78
[Bugfix][Platform][CPU] Fix cuda platform detection on CPU backend edge case ( #13358 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-02-16 18:55:27 +00:00
youkaichao
a0231b7c25
[platform] add base class for communicators ( #13208 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-16 22:14:22 +08:00
Lily Liu
80f63a3966
[V1][Spec Decode] Ngram Spec Decode ( #12193 )
...
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
2025-02-15 18:05:11 -08:00
Alexander Matveev
45f90bcbba
[WIP] TPU V1 Support Refactored ( #13049 )
2025-02-14 00:21:53 -08:00
Sage Moore
ba59b78a9c
[ROCm][V1] Add intial ROCm support to V1 ( #12790 )
2025-02-13 22:21:50 -08:00
Li, Jiang
565c1efa65
[CI/Build][Bugfix] Fix CPU backend default threads num ( #13077 )
2025-02-11 16:55:56 +00:00
wangxiyuan
2e3b969ec0
[Platform] add pre_register_and_update function ( #12432 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-02-11 22:06:46 +08:00
Gregory Shtrasberg
7539bbc6a6
[ROCm] Using a more precise memory profiling ( #12624 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-02-11 21:47:10 +08:00
Russell Bryant
c320ca8edd
[Core] Don't do platform detection at import time ( #12933 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-11 07:25:25 +00:00