Lucas Wilkinson
|
cabaf4eff3
|
[Attention] MLA decode optimizations (#12528)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
|
2025-01-30 23:49:37 -08:00 |
|
Robert Shaw
|
5f671cb4c3
|
[V1] Improve Error Message for Unsupported Config (#12535)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2025-01-29 04:56:56 +00:00 |
|
Harry Mellor
|
823ab79633
|
Update pre-commit hooks (#12475)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-27 17:23:08 -07:00 |
|
Konrad Zawora
|
96f6a7596f
|
[Bugfix] Fix HPU multiprocessing executor (#12167)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
|
2025-01-23 02:07:07 +08:00 |
|
Kevin H. Luu
|
64ea24d0b3
|
[ci/lint] Add back default arg for pre-commit (#12279)
Signed-off-by: kevin <kevin@anyscale.com>
|
2025-01-22 01:15:27 +00:00 |
|
Mengqing Cao
|
c64612802b
|
[Platform] improve platforms getattr (#12264)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-01-21 14:42:41 +00:00 |
|
Işık
|
af69a6aded
|
fix: update platform detection for M-series arm based MacBook processors (#12227)
Signed-off-by: isikhi <huseyin.isik000@gmail.com>
|
2025-01-20 22:23:28 +00:00 |
|
youkaichao
|
2b83503227
|
[misc] fix cross-node TP (#12166)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-18 10:53:27 +08:00 |
|
kewang-xlnx
|
de0526f668
|
[Misc][Quark] Upstream Quark format to VLLM (#10765)
Signed-off-by: kewang-xlnx <kewang@xilinx.com>
Signed-off-by: kewang2 <kewang2@amd.com>
Co-authored-by: kewang2 <kewang2@amd.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2025-01-15 11:05:15 -05:00 |
|
youkaichao
|
ad34c0df0f
|
[core] platform agnostic executor via collective_rpc (#11256)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-15 13:45:21 +08:00 |
|
Shanshan Shen
|
9ddac56311
|
[Platform] move current_memory_usage() into platform (#11369)
Signed-off-by: Shanshan Shen <467638484@qq.com>
|
2025-01-15 03:38:25 +00:00 |
|
Chen Zhang
|
a2d2acb4c8
|
[Bugfix][Kernel] Give unique name to BlockSparseFlashAttention (#12040)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-01-14 15:45:05 +00:00 |
|
Shanshan Shen
|
a7d59688fb
|
[Platform] Move get_punica_wrapper() function to Platform (#11516)
Signed-off-by: Shanshan Shen <467638484@qq.com>
|
2025-01-13 13:12:10 +00:00 |
|
youkaichao
|
458e63a2c6
|
[platform] add device_control env var (#12009)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-13 20:59:09 +08:00 |
|
youkaichao
|
89ce62a316
|
[platform] add ray_device_key (#11948)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-13 16:20:52 +08:00 |
|
wangxiyuan
|
20410b2fda
|
[platform] support custom torch.compile backend key (#11318)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-01-10 23:46:51 +08:00 |
|
wangxiyuan
|
ef725feafc
|
[platform] support pytorch custom op pluggable (#11328)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-01-10 10:02:38 +00:00 |
|
Kunshang Ji
|
61af633256
|
[BUGFIX] Fix UnspecifiedPlatform package name (#11916)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-01-10 16:20:46 +08:00 |
|
wangxiyuan
|
405eb8e396
|
[platform] Allow platform specify attention backend (#11609)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
|
2025-01-09 21:46:50 +08:00 |
|
Robert Shaw
|
56fe4c297c
|
[TPU][Quantization] TPU W8A8 (#11785)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-01-08 19:33:29 +00:00 |
|
Cyrus Leung
|
ee77fdb5de
|
[Doc][2/N] Reorganize Models and Usage sections (#11755)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-06 21:40:31 +08:00 |
|
youkaichao
|
b12e87f942
|
[platforms] enable platform plugins (#11602)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-30 20:24:45 +08:00 |
|
Mengqing Cao
|
6c6f7fe8a8
|
[Platform] Move model arch check to platform (#11503)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2024-12-27 08:45:25 +00:00 |
|
Rafael Vasquez
|
32aa2059ad
|
[Docs] Convert rST to MyST (Markdown) (#11145)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
|
2024-12-23 22:35:38 +00:00 |
|
wangxiyuan
|
e88db68cf5
|
[Platform] platform agnostic for EngineArgs initialization (#11225)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2024-12-16 22:11:06 -08:00 |
|
Chenguang Li
|
d1fa714cb1
|
[Refactor]A simple device-related refactor (#11163)
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
|
2024-12-13 13:39:00 +00:00 |
|
Gene Der Su
|
82c73fd510
|
[Bugfix] cuda error running llama 3.2 (#11047)
|
2024-12-10 07:41:11 +00:00 |
|
Tyler Michael Smith
|
28b3a1c7e5
|
[V1] Multiprocessing Tensor Parallel Support for v1 (#9856)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-10 06:28:14 +00:00 |
|
Gregory Shtrasberg
|
b63ba84832
|
[ROCm][bugfix] scpecilative decoding worker class (#11035)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2024-12-09 14:00:29 -08:00 |
|
wangxiyuan
|
aea2fc38c3
|
[Platform] Move async output check to platform (#10768)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2024-12-09 17:24:46 +00:00 |
|
Cyrus Leung
|
aa39a8e175
|
[Doc] Create a new "Usage" section (#10827)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-05 11:19:35 +08:00 |
|
Michael Goin
|
7090c27bb2
|
[Bugfix] Only require XGrammar on x86 (#10865)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-03 10:32:21 -08:00 |
|
wangxiyuan
|
661175bc82
|
[platform] Add verify_quantization in platform. (#10757)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2024-11-29 15:22:21 +00:00 |
|
Chendi.Xue
|
0a71900bc9
|
Remove hard-dependencies of Speculative decode to CUDA workers (#10587)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2024-11-26 17:57:11 -08:00 |
|
Conroy Cheers
|
f5792c7c4a
|
[Hardware][NVIDIA] Add non-NVML CUDA mode for Jetson (#9735)
Signed-off-by: Conroy Cheers <conroy@corncheese.org>
|
2024-11-26 10:26:28 -08:00 |
|
Isotr0py
|
04668ebe7a
|
[Bugfix] Avoid import AttentionMetadata explicitly in Mllama (#10593)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-23 18:12:20 +00:00 |
|
JiHuazhong
|
86a44fb896
|
[Platforms] Refactor openvino code (#10573)
Signed-off-by: statelesshz <hzji210@gmail.com>
|
2024-11-22 22:23:12 -08:00 |
|
youkaichao
|
eebad39f26
|
[torch.compile] support all attention backends (#10558)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-22 14:04:42 -08:00 |
|
youkaichao
|
a111d0151f
|
[platforms] absorb worker cls difference into platforms folder (#10555)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2024-11-21 21:00:32 -08:00 |
|
youkaichao
|
cf656f5a02
|
[misc] improve error message (#10553)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-21 13:13:17 -08:00 |
|
youkaichao
|
aaddce5d26
|
[platforms] improve error message for unspecified platforms (#10520)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-20 23:07:56 -08:00 |
|
Mengqing Cao
|
9d827170a3
|
[Platforms] Add device_type in Platform (#10508)
Signed-off-by: MengqingCao <cmq0113@163.com>
|
2024-11-21 04:44:20 +00:00 |
|
youkaichao
|
388ee3de66
|
[torch.compile] limit inductor threads and lazy import quant (#10482)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-20 18:36:33 -08:00 |
|
youkaichao
|
772a66732d
|
[platforms] restore xpu check for parallel config (#10479)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-20 17:13:28 +00:00 |
|
Li, Jiang
|
63f1fde277
|
[Hardware][CPU] Support chunked-prefill and prefix-caching on CPU (#10355)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2024-11-20 10:57:39 +00:00 |
|
Mengqing Cao
|
d5b28447e0
|
[Platforms] Refactor xpu code (#10468)
Signed-off-by: MengqingCao <cmq0113@163.com>
|
2024-11-19 22:52:13 -08:00 |
|
youkaichao
|
803f37eaaa
|
[6/N] torch.compile rollout to users (#10437)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-19 10:09:03 -08:00 |
|
Mengqing Cao
|
8c1fb50705
|
[Platform][Refactor] Extract func get_default_attn_backend to Platform (#10358)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2024-11-19 11:22:26 +08:00 |
|
youkaichao
|
51bb12d17b
|
[4/N][torch.compile] clean up set_torch_compile_backend (#10401)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-17 23:57:20 -08:00 |
|
youkaichao
|
8d74b5aee9
|
[platforms] refactor cpu code (#10402)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-16 23:14:23 -08:00 |
|