Wentao Ye
a59cd9d9f7
[Refactor] Fix Compile Warning #1444-D ( #21462 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-08-01 06:10:30 -07:00
Eric Curtin
b876860c62
[Hardware][CPU] Build fix for ARM without BF16 ( #21848 )
...
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-07-30 06:22:00 -07:00
Wentao Ye
1b0a155534
[Perf] Using __nv_fp8_e4m3 instead of c10::e4m3 for per_token_group_quant ( #21867 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-29 21:50:46 -06:00
Harry Mellor
ba5c5e5404
[Docs] Switch to better markdown linting pre-commit hook ( #21851 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-29 19:45:08 -07:00
Gregory Shtrasberg
12a223ef9b
[AMD][CI/Build][Bugfix] Guarding CUDA specific functions by ifndef ROCM ( #21766 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-07-29 03:35:37 +00:00
lyrisz
c6c9122d50
[Kernel] SM90 CUTLASS FP8 GEMM: add support for swap AB + kernel tuning ( #20396 )
...
Signed-off-by: Faqin Zhong <faqin.zhong@gmail.com>
Co-authored-by: Duncan Moss <djm.moss@gmail.com>
2025-07-28 23:13:58 +00:00
Caleb_Du
57c22e57f9
Fix CUDA permute/unpermute for use with DeepGemm Moe ( #17934 )
...
Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>
2025-07-27 07:08:00 -07:00
Yeju Zhou
9094d11c5d
[Bugfix][Apple Silicon] fix missing symbols when build from source on Mac with Apple Silicon ( #21380 )
...
Signed-off-by: Yeju Zhou <yejuzhou@outlook.com>
2025-07-26 07:09:57 -07:00
Wentao Ye
75d29cf4e1
[Perf] Cuda Kernel for Int8 Per Token Group Quant ( #21476 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-25 17:07:07 -07:00
czhu-cohere
136d750f5f
[Kernel] Improve machete memory bound perf ( #21556 )
...
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
2025-07-25 06:53:21 -07:00
Wentao Ye
e8cb0d0495
[Bug] Fix Compressed Tensor NVFP4 cutlass_fp4_group_mm illegal memory access ( #21465 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-24 08:13:24 -07:00
Gregory Shtrasberg
90eeea8f85
[Bugfix][ROCm] Fix for warp_size uses on host ( #21205 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-07-24 00:37:19 -07:00
Gregory Shtrasberg
3ec7170ff1
[Bugfix][ROCm][Build] Fix build regression on ROCm ( #21393 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-07-22 20:27:41 -07:00
Wentao Ye
226b452a20
Revert "[Refactor] Fix Compile Warning #1444-D ( #21208 )" ( #21384 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-22 08:22:10 -07:00
Wentao Ye
774d0c014b
[Perf] Cuda Kernel for Per Token Group Quant ( #21083 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-22 07:27:15 -07:00
Duncan Moss
2c8db17cfd
[feat]: add SM100 support for cutlass FP8 groupGEMM ( #20447 )
...
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-22 07:27:12 -07:00
Mickaël Seznec
4fb56914c5
[perf] Add fused MLA QKV + strided layernorm ( #21116 )
...
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-22 07:07:44 -07:00
Wentao Ye
6e5b5ca580
[Refactor] Fix Compile Warning #1444-D ( #21208 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-21 23:33:51 -07:00
Ming Yang
e7b2042681
Revert "[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE ( #20762 ) ( #21334 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-07-21 21:49:01 -07:00
Himanshu Jaju
0ec82edda5
[perf] Speed up align sum kernels ( #21079 )
...
Signed-off-by: Himanshu Jaju <hj@mistral.ai>
2025-07-21 11:19:23 -07:00
Li, Jiang
a15a50fc17
[CPU] Enable shared-memory based pipeline parallel for CPU backend ( #21289 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-21 09:07:08 -07:00
Richard Zou
b2eb2b5ad7
[Kernel] Apply torch.Tag.needs_fixed_stride_order only for torch==2.6.0 ( #19346 )
...
Signed-off-by: rzou <zou3519@gmail.com>
2025-07-18 14:10:21 -04:00
shixianc
5780121c95
[Perf] Add swap_ab to SM90 FP8 non-block CUTLASS moe grouped gemm ( #20911 )
...
Signed-off-by: Shixian Cui <shixian@amazon.com>
Co-authored-by: Shixian Cui <shixian@amazon.com>
2025-07-18 04:34:43 +00:00
ElizaWszola
9fb2d22032
[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE ( #20762 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com>
2025-07-17 09:56:44 -04:00
Lucas Wilkinson
d31a647124
[BugFix] Fix import error on non-blackwell machines ( #21020 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-07-15 22:27:29 -07:00
Peter Pan
1eb2b9c102
[CI] update typos config for CI pre-commit and fix some spells ( #20919 )
...
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
2025-07-15 21:12:40 -07:00
Gregory Shtrasberg
ed10f3cea1
[ROCm] warpSize is being made non constexpr in ROCm 7.0 ( #20330 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-07-15 14:01:44 -04:00
Alexander Matveev
8cdc371217
SM100 Cutlass MLA decode with unrestricted num_heads (< 128) for DeepSeek TP ( #20769 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-07-15 01:06:38 +00:00
TJian
c488b928a7
[ROCm] [Bugfix] [Critical]: Fix mamba compilation bug ( #20883 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-07-14 15:23:28 +08:00
Congcong Chen
2c11a738b3
[Model] New model support for microsoft/Phi-4-mini-flash-reasoning ( #20702 )
...
Signed-off-by: Congcong Chen <congcongchen@microsoft.com>
2025-07-12 06:02:10 -07:00
Michael Goin
d47661f0cd
[Kernel] Basic tuned configs for NVFP4 CUTLASS dense GEMM ( #20646 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-11 10:05:33 -06:00
Duncan Moss
5923ab9524
[fix]: disable cutlass block scaled group gemm for EP ( #20781 )
...
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
2025-07-11 02:39:18 +00:00
nishith-fujitsu
c7753a9809
[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU ( #14129 )
...
Signed-off-by: nishith-fujitsu <nishith.jaiswal@fujitsu.com>
2025-07-10 15:59:04 +00:00
Tuan, Hoang-Trong
47043eb678
[Kernel] Triton implementation of causal-conv1d for Mamba-based models ( #18218 )
...
Signed-off-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>
Co-authored-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-07-09 12:53:55 -07:00
Wenxin Cheng
5eaf570050
Replace multiply_add with homogeneous_multiply_add to Address Clang Template Parameter Issue ( #20142 )
...
Signed-off-by: Lu Fang <lufang@fb.com>
2025-07-09 00:30:18 +00:00
Ming Yang
c438183e99
[Bugfix] Fix topk_ids indices_type for CUTLASS w8a8 FP8 MoE ( #20166 )
...
Signed-off-by: Ming Yang <yming@meta.com>
2025-07-08 23:10:57 +00:00
Lucas Wilkinson
40b86aa05e
[BugFix] Fix: ImportError when building on hopper systems ( #20513 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-07-06 12:17:30 +08:00
Vadim Gimpelson
f73d02aadc
[BUG] Fix #20484 . Support empty sequence in cuda penalty kernel ( #20491 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>
2025-07-05 19:38:02 -07:00
Duncan Moss
3d184b95b8
[feat]: CUTLASS block scaled group gemm for SM100 ( #19757 )
...
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Co-authored-by: Duncan Moss <dmoss@nvidia.com>
2025-07-04 12:58:04 -06:00
Wentao Ye
783921d889
[Perf] Optimize Vectorization Utils for Int 8 Quantization Kernels ( #20331 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-04 15:06:24 +08:00
Joonchen Liau
9e5552aa13
[NVIDIA] Support Cutlass w8a8 FP8 for Blackwell Geforce GPUs (sm120) ( #17280 )
...
Signed-off-by: kaln27 <liaojuncheng123@foxmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-02 06:47:19 -06:00
Tyler Michael Smith
3be8d312a2
[Kernel][Bugfix] Fixup some warnings in nvfp4_blockwise_moe when CUDA < 12.8 ( #20324 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-07-01 18:05:47 -07:00
周周周
9290de5667
remove unused variables in marlin_template.h ( #20236 )
2025-07-02 00:51:52 +00:00
Li, Jiang
6cc1e7d96d
[CPU] Update custom ops for the CPU backend ( #20255 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-01 07:25:03 +00:00
Richard Barnes
86debab54c
Fix numel() downcast in vllm/csrc/moe/moe_align_sum_kernels.cu +2 ( #17082 )
...
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-01 06:48:10 +00:00
Tyler Michael Smith
e8c3bd2cd1
[Bugfix] Fix some narrowing conversion warnings ( #20141 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-06-27 09:01:28 -07:00
Hosang
94a55c7681
[Fix][ROCm] Remove unused variables to fix build error on GFX11/12 ( #19891 )
...
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>
2025-06-27 07:14:44 -07:00
li haoyang
0740e29b66
[Feature] add quick all reduce ( #19744 )
...
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-06-26 20:54:24 -07:00
Michael Goin
44d2e6af63
[Bugfix] Build moe_data for both sm100 and sm90 ( #20086 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-26 20:50:12 -07:00
Ilya Markov
2d7779f888
[Perf] SM100 FP8 GEMM Optimizations after cutlass_profiler ( #20071 )
...
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-06-26 20:50:09 -07:00