Akash kaothalkar
86d15bfd8d
[Hardware][PowerPC] Fix fp16 compilation error for Power in cpu attention backend and bump oneDNN version ( #28535 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
2025-11-13 13:32:21 +00:00
Radu Salavat
d44fbbab0e
[build][cmake]: Bundle static ACL and torch libgomp for CPU extension builds ( #28059 )
...
Signed-off-by: Radu Salavat <radu.salavat@arm.com>
2025-11-13 05:43:08 +00:00
Li, Jiang
7f829be7d3
[CPU] Refactor CPU attention backend ( #27954 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-12 09:43:06 +08:00
Jonas M. Kübler
9c84ca8293
[FA/Chore] Bump FA version for FP8 two-level accumulation ( #27889 )
...
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
2025-11-10 12:06:04 -08:00
Abolfazl Shahbazi
d15afc1fd0
Refactor CPU/GPU extension targets for CMake build ( #28026 )
...
Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
2025-11-08 14:17:35 +08:00
Fadi Arafeh
a663f6ae64
[cpu][perf] Fix low CPU utilization with VLLM_CPU_OMP_THREADS_BIND on AArch64 ( #27415 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-10-27 11:14:55 +00:00
Matthew Bonanni
b4fda58a2d
[MLA] Bump FlashMLA ( #27354 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-10-22 15:48:37 -07:00
Tao He
250fb1b8ea
[Bugfix] fixes the decoding metadata of dense mla's fp8 kvcache. ( #27144 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-10-21 18:27:03 +00:00
Fadi Arafeh
163965d183
[cpu] Dispatch un-quantized linear to oneDNN/ACL by default for AArch64 ( #27183 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Michael Yang <Michael.Yang@arm.com>
2025-10-21 02:02:58 +00:00
Lucas Wilkinson
9f020f4f31
[BugFix] Fix failing gemma-3-1b-it test: test_lm_eval_accuracy_v1_engine[google/gemma-3-1b-it] ( #27111 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-10-18 12:44:39 -06:00
zhrrr
e471d7ca7e
[CI/Build][Bugfix] fix qutlass cmake error when set QUTLASS_SRC_DIR ( #26773 )
...
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-10-15 04:09:44 +00:00
ihb2032
086609de64
fix(nix): Allow local oneDNN path to fix vLLM CPU build failure ( #26401 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: ihb2032 <1355790728@qq.com>
2025-10-11 09:12:16 +00:00
Nishidha Panpaliya
8f8474fbe3
[CI/Build] Fix ppc64le CPU build and tests ( #22443 )
...
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
2025-10-11 13:04:42 +08:00
Roberto L. Castro
96ad65b7fe
[Transform] [Quantization] Add QuTLASS support to vLLM ( #24440 )
...
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Signed-off-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-10-10 09:43:40 -07:00
Ming Yang
3b736e1c38
[Attention][DCP] Support DCP with query length > 1 (MTP) with FA3 ( #25049 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-10-09 08:06:29 -07:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Fadi Arafeh
9705fba7b7
[cpu][perf] Accelerate unquantized-linear for AArch64 through oneDNN/ACL and weight prepack ( #25948 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2025-10-04 12:16:38 +08:00
Lucas Wilkinson
418d111f8c
[FA/Chore] Bump vllm-flash-attention ( #25537 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-10-02 11:06:14 -04:00
Johnny
5234dc7451
[NVIDIA] Blackwell Family ( #24673 )
...
Signed-off-by: Johnny <johnnynuca14@gmail.com>
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
Signed-off-by: Johnny <johnnync13@gmail.com>
Signed-off-by: Salvatore Cena <cena@cenas.it>
Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com>
Co-authored-by: Salvatore Cena <cena@cenas.it>
2025-10-01 10:50:54 -07:00
Yongye Zhu
fa7e254a7f
[New Model] DeepSeek-V3.2 (Rebased to Main) ( #25896 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-09-30 17:14:41 +08:00
chenlang
1e9a77e037
[Hardware][RISC-V] Add riscv64 support for vLLM with scalar ( #22112 )
...
Signed-off-by: chenlang <chen.lang5@zte.com.cn>
Co-authored-by: chenlang <10346245@zte.com.cn>
2025-09-25 20:46:11 +08:00
Nikhil Gupta
359d293006
[fix]: add Arm 4bit fused moe support ( #23809 )
...
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>
2025-09-24 01:32:22 +00:00
FengjinChen
79cbcab871
Force use C++17 globally to avoid compilation error ( #24823 )
...
Signed-off-by: chenfengjin <1871653365@qq.com>
2025-09-14 19:30:10 +00:00
Ignacio Sica
369a079568
[Hardware][Apple-CPU] Disable OneDNN build for Apple Silicon ( #24200 )
...
Signed-off-by: ignaciosica <mignacio.sica@gmail.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2025-09-04 02:48:25 -07:00
Lucas Wilkinson
402759d472
[Attention] FlashAttn MLA ( #14258 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
2025-09-04 02:47:59 -07:00
En Ouyang
b78bed1bc5
[Hardware][Mac] Fix the installation fail for Apple Silicon (CPU) ( #23565 )
...
Signed-off-by: oye93 <en.ouyang93@outlook.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2025-08-26 13:04:25 +00:00
Matthew Bonanni
19fe1a0510
[Kernel] Add FP8 support with FlashMLA backend ( #22668 )
...
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-08-22 02:26:32 +00:00
Li, Jiang
7be5d113d8
[CPU] Refactor CPU W8A8 scaled_mm ( #23071 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-08-21 09:34:24 +08:00
Lucas Wilkinson
292084e72a
[BugFix] Fix for IMA in FA3 varlen combine ( #22967 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-08-17 08:52:04 -07:00
Lucas Wilkinson
177e55e3bd
[Attention] FA3 Attention Sinks Perf Boost ( #22478 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-08-15 17:41:07 -04:00
Thomas Parnell
bd875d2eb7
[Bugfix] Update FA commit hash ( #22546 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-08-08 16:10:25 -07:00
Lucas Wilkinson
cd9b9de1fb
[BugFix] Fix IMA FlashMLA full cuda-graph and DP + Update FlashMLA ( #21691 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-08-08 16:09:42 -07:00
Lucas Wilkinson
2cb6ef8996
[BugFix] Fix FA2 RuntimeError when sinks is provided ( #22365 )
...
Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com>
2025-08-06 08:03:03 -07:00
Woosuk Kwon
e3c876dca3
Upgrade FA3 for attention sink ( #22313 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-05 21:36:21 -07:00
Gregory Shtrasberg
5d5d419ca6
[Bugfix][CI/Build][ROCm] Make sure to use the headers from the build folder on ROCm ( #22264 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-08-05 20:39:32 -07:00
Ignacio Sica
5140f54b89
[CI/Build] fix cpu_extension for apple silicon ( #21195 )
...
Signed-off-by: ignaciosica <mignacio.sica@gmail.com>
2025-07-24 22:53:59 -07:00
nishith-fujitsu
c7753a9809
[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU ( #14129 )
...
Signed-off-by: nishith-fujitsu <nishith.jaiswal@fujitsu.com>
2025-07-10 15:59:04 +00:00
Li, Jiang
7f0367109e
[CI/Build][CPU] Enable cross compilation in CPU release pipeline ( #20423 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-03 05:26:12 -07:00
Woosuk Kwon
8acb4badee
[CUDA graphs] Enable full cuda graphs with FA3 AoT scheduling ( #20301 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-01 09:07:36 -07:00
Li, Jiang
6cc1e7d96d
[CPU] Update custom ops for the CPU backend ( #20255 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-01 07:25:03 +00:00
Huy Do
6c9837a761
Fix cuda_archs_loose_intersection when handling sm_*a ( #20207 )
...
Signed-off-by: Huy Do <huydhn@gmail.com>
2025-06-29 16:52:34 -07:00
Eli Uriegas
0d06b533a0
cmake: Update vllm_flash_attn for vllm_kernels ( #20032 )
...
Signed-off-by: Eli Uriegas <eliuriegas@meta.com>
2025-06-24 22:44:10 +00:00
Lucas Wilkinson
a045b7e89a
[Perf] Improve/Fix-regression for FA3 in High QPS regimes ( #19463 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-06-24 13:09:01 -04:00
Lu Fang
8d1e89d946
[Misc][ROCm] Enforce no unused variable in ROCm C++ files ( #19796 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-06-18 20:25:15 -07:00
Lucas Wilkinson
07334959d8
[Wheel Size] Only build FA2 8.0+PTX ( #19336 )
2025-06-17 12:32:49 +09:00
Akash kaothalkar
b9a1791e2c
[Hardware][POWER] Add IBM POWER11 Support to CPU Extension Detection ( #19082 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
2025-06-08 09:17:14 +00:00
Simon Mo
02f0c7b220
[Misc] Add SPDX-FileCopyrightText ( #19100 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Luka Govedič
a3896c7f02
[Build] Fixes for CMake install ( #18570 )
2025-05-27 20:49:24 -04:00
Lucas Wilkinson
c7852a6d9b
[Build] Allow shipping PTX on a per-file basis ( #18155 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-05-15 16:41:55 -07:00
Akash kaothalkar
e515668edf
[Hardware][Power] Enable compressed tensor W8A8 INT8 quantization for POWER ( #17153 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-05-07 22:35:03 -07:00