Matthew Bonanni
369f47aa0f
[DeepSeek v3.2] Remove unnecessary syncwarps ( #31047 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-23 21:33:30 -08:00
rongfu.leng
4ed11105d7
[Misc] Remove unused custom ops copy_blocks and copy_blocks_mla ( #30967 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-12-23 18:22:35 -08:00
danielafrimi
b94f80ffb8
[FIX] FP4 quantization kernel padding initialization bug ( #31097 )
...
Signed-off-by: <>
Co-authored-by: root <root@gpu-193.slurm-workers-slurm.slurm.svc.cluster.local>
Co-authored-by: root <root@gpu-951.slurm-workers-slurm.slurm.svc.cluster.local>
2025-12-23 08:45:18 -08:00
TJian
022f3cea53
[ROCm] [Critical]: Remove unused variable ( #31156 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-12-22 08:28:22 -08:00
Jee Jee Li
097978a15d
[Kernel] Enable fused_qknorm_rope_kernel supports partial rope ( #30821 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-21 18:39:22 -08:00
Michael Goin
06d490282f
[NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size ( #30897 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-21 09:41:57 -08:00
Robert Shaw
83a317f650
[MoE Refactor][3/N] Deprecate cutlass block quant fp8 (b200) ( #30990 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-12-19 13:09:54 -08:00
Nishidha Panpaliya
bd2b52fc2d
[CPU][Bugfix] Fix ppc64le CPU build ( #30871 )
...
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
2025-12-19 12:26:35 +00:00
Li, Jiang
f90d3636e2
[Bugfix][CPU] Fix Mac CPU build ( #30955 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-12-18 01:38:22 -08:00
Li, Jiang
e3ab93c896
[CPU] Refactor CPU fused MOE ( #30531 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-12-18 14:36:49 +08:00
Sheng Lin
f4e884f222
[NIXL][Bugfix] Fix NIXL/RDMA registration failure over CuMemAllocator ( #29569 )
...
Signed-off-by: Somoku <linsh0@protonmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2025-12-17 01:52:58 -08:00
Michael Goin
0a1ab1e565
[Perf][Kernels] Vectorize csrc/activations_kernels.cu ( #29512 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-16 14:56:02 -08:00
Jinzhen Lin
ce96857fdd
[Kernel][Quantization][MoE] add marlin kernel support for turing (sm75) ( #29901 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-12-16 14:35:28 -08:00
Daniel Cámpora
eaa82a709a
[Bugfix][DSV32] Fix overflow in topk. ( #30754 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-12-16 14:21:17 -08:00
Wentao Ye
f21f5ea38c
[Refactor] Small refactor for group topk ( #30562 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-12-16 14:50:59 -05:00
Wentao Ye
1e6b115300
[Refactor] Reduce duplicate code in per_token_group_quant cuda kernels ( #30496 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-12 16:45:23 -05:00
Lucas Wilkinson
3e41992fec
[Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 ( #27532 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-12 05:57:47 -08:00
Bhanu Prakash Voutharoja
6a6fc41c79
gptq marlin quantization support for fused moe with lora ( #30254 )
...
Signed-off-by: Bhanu068 <voutharoja.bhanu06@gmail.com>
2025-12-12 02:27:22 +00:00
Wentao Ye
61249b177d
[Refactor] Remove useless syncwarp ( #30510 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-11 17:43:41 -05:00
Aditya Tewari
cebda2a4af
[CPU] Support for Whisper ( #30062 )
...
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
2025-12-10 04:58:42 -08:00
Wilson Wu
3bdd426636
Fix typos in comments across multiple files ( #30345 )
...
Signed-off-by: Wilson Wu <iwilsonwu@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-12-09 20:05:28 -08:00
Hashem Hashemi
2e7054da06
Improve wvsplitK tile and balance heristics. ( #29937 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
2025-12-09 23:51:32 +00:00
czhu-cohere
f6227c22ab
[Kernel]Support W4A8 Grouped GEMM on Hopper ( #29691 )
...
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
2025-12-08 19:29:06 -08:00
gnovack
ea657f2078
Lora MoE Align Improvements ( #29257 )
...
Signed-off-by: gnovack <gnovack@amazon.com>
2025-12-09 10:35:16 +08:00
Wentao Ye
0ee6416f67
[Perf] Optimize group_topk kernel, 1.9% Throughput improvement, 2.1% TPOT improvemnt ( #30159 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-08 19:44:01 -05:00
Daniel Cámpora
184076c3fe
[DeepSeek v3.2] Make top-k work for any logit values. ( #27568 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-08 06:55:58 -08:00
ElizaWszola
af0444bf40
[Performance] Fused blockwise quant RMS norm ( #27883 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
2025-12-07 16:38:04 +00:00
Wentao Ye
541a2ef892
[Perf] Deepgemm fused layout kernel for activations, 4.3% throughput improvement, 10.7% TTFT improvement. ( #29546 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-07 20:31:14 +08:00
Jinzhen Lin
879ddb09c3
[Kernel][MoE] optimize moe_align_block_size ( #29642 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-07 01:58:47 -08:00
Elham
9843e332da
[CPU][Perf] Add fast vectorized exp impl from Arm Optimized Routines ( #30068 )
...
Signed-off-by: Ubuntu <ubuntu@ip-10-252-30-150.eu-west-1.compute.internal>
Signed-off-by: Elham Harirpoush <elham.harirpoush@arm.com>
Co-authored-by: Ubuntu <ubuntu@ip-10-252-30-150.eu-west-1.compute.internal>
2025-12-05 13:09:20 +00:00
Zhang Xiangze
13ea39bc09
[CPU]Parallelize over tokens in int4 moe ( #29600 )
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>
2025-12-02 06:21:39 +00:00
Hendrik Holtmann
c0dfc89485
SM120 / NVFP4: add device guard and runtime SM dispatch to cutlass_scaled_fp4_mm ( #29711 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-12-01 17:24:18 -08:00
Jinzhen Lin
1656ad3704
[Kernel][Quantization] add w4a8 support for marlin kernel ( #24722 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
2025-11-29 07:19:33 -08:00
Li, Jiang
e2f56c309d
[CPU] Update torch 2.9.1 for CPU backend ( #29664 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-28 13:37:54 +00:00
Jinzhen Lin
a67dec7cba
[Bugfix] fix IMA issue in certain cases of the moe marlin kernel ( #28619 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-11-26 19:02:21 -08:00
Pleaplusone
d9d342d214
[Performance][MLA][ROCm] Remove redundant D2D copy in deepseek ( #27457 )
...
Signed-off-by: ganyi <ygan@amd.com>
2025-11-26 12:45:28 +08:00
Michael Goin
e502098643
[Kernel] Add NVFP4 MoE CUTLASS support for SM120 ( #29242 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2025-11-25 06:59:07 -08:00
Pleaplusone
77e10c9cab
[Perf][Deepseek] optimize gather_and_maybe_dequant_cache kernel's perf for extremely long sequence ( #28029 )
...
Signed-off-by: ganyi <ygan@amd.com>
2025-11-24 19:05:46 -07:00
R3hankhan
4de87866a8
[CPU][IBM Z] Fix BF16 support and vectorize math operations for s390x ( #28926 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
2025-11-24 12:08:09 +00:00
Fadi Arafeh
730bd35378
[perf][cpu] Accelerate paged attention GEMMs (QK, PV) on Arm CPUs with NEON ( #29193 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-11-22 09:04:36 -08:00
Jane (Yuan) Xu
e6309acdba
Simplify from_blob usage in get_cuda_view_from_cpu_tensor ( #29027 )
...
Signed-off-by: Jane Xu <janeyx@meta.com>
2025-11-22 10:35:32 +00:00
skaraban3807
f1805db1a6
[Perf] These changes enhance the NUMA functionality of vllm for systems with more than one NUMA nodes per socket ( #25559 )
...
Signed-off-by: Siddappa Karabannavar <siddappa.karabannavar@amd.com>
2025-11-21 14:13:52 +00:00
zhrrr
a982f5b5ea
[kernel][perf] support uncontiguous input for rms_norm kernel ( #28103 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-11-20 19:39:09 -08:00
Pleaplusone
06c20c9904
[ROCm] Add AMD GPU support on Deepseek v3.2 and SparseMLA ( #26670 )
...
Signed-off-by: ganyi <ygan@amd.com>
2025-11-20 02:54:01 -08:00
Vensen
fb8851f254
[Bugfix][cache_kernels]: Fix OOB in cache_kernels.cu ( #28760 )
...
Signed-off-by: vensen <vensenmu@gmail.com>
Signed-off-by: Vensenmu <vensenmu@gmail.com>
2025-11-20 02:52:02 -08:00
Boyuan Feng
a903d59ffa
cleanup at::Tag::needs_fixed_stride_order ( #28974 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-20 02:51:36 -08:00
j20120307
bbc6c2f1e5
[CI/Build] Fix broken build on Apple M1 ( #28999 )
...
Signed-off-by: Kan Zhu <j20120307@gmail.com>
2025-11-19 11:07:22 +00:00
ihb2032
8151609583
refactor(cpu_types_scalar.hpp): Unify scalar loop implementations using unroll_loop ( #28847 )
...
Signed-off-by: ihb2032 <1355790728@qq.com>
Co-authored-by: lyd1992 <liuyudong@iscas.ac.cn>
2025-11-19 11:05:44 +00:00
Li, Jiang
20852c8f4c
[CPU] Refactor CPU WNA16 ( #28826 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-19 10:32:00 +08:00
tiehexue
e42bd8c2e3
Cast return value to int64_t for cache size ( #28814 )
...
Signed-off-by: tiehexue <tiehexue@hotmail.com>
2025-11-17 16:02:32 +00:00