Ralf Gommers
7c1ed45848
[CI/Build]: make it possible to build with a free-threaded interpreter ( #29241 )
...
Signed-off-by: Ralf Gommers <ralf.gommers@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-28 15:21:46 -08:00
Abolfazl Shahbazi
d15afc1fd0
Refactor CPU/GPU extension targets for CMake build ( #28026 )
...
Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
2025-11-08 14:17:35 +08:00
Fadi Arafeh
a663f6ae64
[cpu][perf] Fix low CPU utilization with VLLM_CPU_OMP_THREADS_BIND on AArch64 ( #27415 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-10-27 11:14:55 +00:00
Johnny
5234dc7451
[NVIDIA] Blackwell Family ( #24673 )
...
Signed-off-by: Johnny <johnnynuca14@gmail.com>
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
Signed-off-by: Johnny <johnnync13@gmail.com>
Signed-off-by: Salvatore Cena <cena@cenas.it>
Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com>
Co-authored-by: Salvatore Cena <cena@cenas.it>
2025-10-01 10:50:54 -07:00
FengjinChen
79cbcab871
Force use C++17 globally to avoid compilation error ( #24823 )
...
Signed-off-by: chenfengjin <1871653365@qq.com>
2025-09-14 19:30:10 +00:00
Gregory Shtrasberg
5d5d419ca6
[Bugfix][CI/Build][ROCm] Make sure to use the headers from the build folder on ROCm ( #22264 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-08-05 20:39:32 -07:00
Huy Do
6c9837a761
Fix cuda_archs_loose_intersection when handling sm_*a ( #20207 )
...
Signed-off-by: Huy Do <huydhn@gmail.com>
2025-06-29 16:52:34 -07:00
Lu Fang
8d1e89d946
[Misc][ROCm] Enforce no unused variable in ROCm C++ files ( #19796 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-06-18 20:25:15 -07:00
Luka Govedič
a3896c7f02
[Build] Fixes for CMake install ( #18570 )
2025-05-27 20:49:24 -04:00
Lucas Wilkinson
c7852a6d9b
[Build] Allow shipping PTX on a per-file basis ( #18155 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-05-15 16:41:55 -07:00
Kaixi Hou
4fc5c23bb6
[NVIDIA] Support nvfp4 quantization ( #12784 )
2025-02-12 19:51:51 -08:00
Lucas Wilkinson
103bd17ac5
[Build] Only build 9.0a for scaled_mm and sparse kernels ( #12339 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-01-27 10:40:00 -05:00
tvirolai-amd
cd9d06fb8d
Allow hip sources to be directly included when compiling for rocm. ( #12087 )
2025-01-15 16:46:03 -05:00
bnellnm
3cb07a36a2
[Misc] Upgrade to pytorch 2.5 ( #9588 )
...
Signed-off-by: Bill Nell <bill@neuralmagic.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-10-27 09:44:24 +00:00
Lucas Wilkinson
aeb37c2a72
[CI/Build] Per file CUDA Archs (improve wheel size and dev build times) ( #8845 )
2024-10-03 22:55:25 -04:00
Luka Govedič
71c60491f2
[Kernel] Build flash-attn from source ( #8245 )
2024-09-20 23:27:10 -07:00
bnellnm
de6f90a13d
[Misc] guard against change in cuda library name ( #8609 )
2024-09-20 06:36:30 +08:00
bnellnm
73202dbe77
[Kernel][Misc] register ops to prevent graph breaks ( #6917 )
...
Co-authored-by: Sage Moore <sage@neuralmagic.com>
2024-09-11 12:52:19 -07:00
Jee Jee Li
f80ab3521c
Clean up remaining Punica C information ( #7027 )
2024-08-04 15:37:08 -07:00
Matt Wong
dd793d1de5
[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes ( #5422 )
2024-06-25 15:56:15 -07:00
Hongxia Yang
f758aed0e8
[Bugfix][CI/Build][AMD][ROCm]Fixed the cmake build bug which generate garbage on certain devices ( #5641 )
2024-06-18 23:21:29 -07:00
bnellnm
5467ac3196
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops ( #5047 )
2024-06-09 16:23:30 -04:00
Cody Yu
c833101740
[Kernel] Refactor FP8 kv-cache with NVIDIA float8_e4m3 support ( #4535 )
2024-05-09 18:04:17 -06:00
Matt Wong
59a6abf3c9
[Hotfix][CI/Build][Kernel] CUDA 11.8 does not support layernorm optimizations ( #3782 )
2024-04-08 14:31:02 -07:00
Adrian Abeyta
2ff767b513
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) ( #3290 )
...
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-04-03 14:15:55 -07:00
mawong-amd
b6d103542c
[Kernel] Layernorm performance optimization ( #3662 )
2024-03-30 14:26:38 -07:00
Simon Mo
51c31bc10c
CMake build elf without PTX ( #3739 )
2024-03-30 01:53:08 +00:00
bnellnm
3ad438c66f
Fix build when nvtools is missing ( #3698 )
2024-03-29 18:52:39 -07:00
bnellnm
9fdf3de346
Cmake based build system ( #2830 )
2024-03-18 15:38:33 -07:00