535 Commits

Author SHA1 Message Date
chenlang
1e9a77e037
[Hardware][RISC-V] Add riscv64 support for vLLM with scalar (#22112)
Signed-off-by: chenlang <chen.lang5@zte.com.cn>
Co-authored-by: chenlang <10346245@zte.com.cn>
2025-09-25 20:46:11 +08:00
XuruiYang
845adb3ec6
[Model] Add LongCat-Flash (#23991)
Signed-off-by: yangxurui <yangxurui@meituan.com>
Co-authored-by: yangxurui <yangxurui@meituan.com>
2025-09-24 21:53:40 -07:00
Shiyan Deng
5c1e496a75
[MISC] replace c10::optional with std::optional (#25602)
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
2025-09-24 16:56:21 -07:00
Nikhil Gupta
359d293006
[fix]: add Arm 4bit fused moe support (#23809)
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>
2025-09-24 01:32:22 +00:00
Wentao Ye
bf68fd76a9
[Compile] Fix AMD Compile Error (#25518)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-24 00:42:48 +00:00
Alexander Matveev
1210e4d95b
[Bugfix] [B200] cutlass_mla - ensure kv_split == 1 for batch size > 1 (#25509)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-09-23 16:57:55 -07:00
Hashem Hashemi
a3a7828010
[ROCm] Add skinny gemm bias support for dtypes fp16,bf16,fp8 (#24988)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
Signed-off-by: Hashem Hashemi <159079214+amd-hhashemi@users.noreply.github.com>
2025-09-23 14:31:45 -04:00
Ming Yang
527821d191
Use macro guard CUDA functions for back compatibility in grouped_topk_kernel.cu (#25346)
Signed-off-by: Ming Yang <minos.future@gmail.com>
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Co-authored-by: Rahul Tuli <rtuli@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-09-23 09:45:39 -07:00
rivos-shreeasish
2357480b1a
[BugFix] Fix UB in per_token_group_quant.cu (#24913)
Signed-off-by: Shreeasish Kumar <shreeasish@rivosinc.com>
2025-09-23 09:14:22 -07:00
Wentao Ye
711e912946
[Compile] Fix Compile Warning for Ignoring MIN_BLOCK_PER_SM (#25193)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-19 16:23:19 -06:00
Elvir Crnčević
abdfcd4f3d
silu-v1: Fix EPS not being used during max-reduction (#25069)
Signed-off-by: elvircrn <elvircrn@gmail.com>
2025-09-18 10:25:12 +00:00
ihb2032
4f02b77de4
Fix: Add explicit #include <omp.h> for OpenMP compatibility on certain toolchains (#24951)
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: ihb2032 <1355790728@qq.com>
2025-09-18 17:43:23 +08:00
Lumina
81b16a2bc9
[Kernel] Better inf handling for grouped topk cu (#24886)
Signed-off-by: lumina37 <starry.qvq@gmail.com>
2025-09-18 05:53:55 +00:00
elvischenv
e6585ddb45
[Bugfix] Fix accuracy issue for silu_mul + nvfp4 quant fusion kernel (#24833)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-17 16:37:23 -07:00
Alexander Matveev
fedb75fa27
[Bugfix][B200] Fix cutlass_mla hang (#24966)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-17 18:06:38 -04:00
czhu-cohere
3c068c637b
[Kernel] Faster pre-processing time for W4A8 (#23972)
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
2025-09-17 14:35:32 -07:00
Matthew Bonanni
8f3616f422
Remove old cutlass mla (#23961)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-09-17 14:31:43 +00:00
Aidyn-A
bfe9380161
Apply fixes for CUDA 13 (#24599)
Signed-off-by: Aidyn-A <aidyn.b.aitzhan@gmail.com>
2025-09-17 09:15:42 -04:00
Li, Jiang
9fccd04e30
[Bugfix] Fix Stream usage in CPU model runner and OneDNN kernel check (#25046)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-17 05:54:02 -07:00
Wentao Ye
e757a629e7
[Bug] Fix Cutlass Scaled MM Compilation Error (#24887)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-15 17:21:17 -04:00
Kyle Sayers
a0b26701c9
[Transform] Deterministic Hadacore Transforms (#24106)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-09-15 12:59:31 -06:00
xiao-llm
01413e0cf5
Fp8 paged attention update (#22222)
Signed-off-by: Xiao Yu <xiao.yu@amd.com>
Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com>
Co-authored-by: Xiao Yu <xiao.yu@metamaterial.com>
Co-authored-by: Xiao Yu <xiao.yu@amd.com>
Co-authored-by: Bowen Bao <bowenbao@amd.com>
2025-09-15 10:43:26 -04:00
Michael Goin
59d7ffc17f
[CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe (#24750)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-13 07:29:19 +00:00
Elvir Crnčević
98229db244
[Kernels][DP/EP] Optimize Silu Kernel for R1 (#24054)
Signed-off-by: elvircrn <elvircrn@gmail.com>
2025-09-13 00:17:27 -07:00
elvischenv
dbeee3844c
[Perf] Use NVIDIA hardware-accelerated instruction for float to fp8_e4m3 quantization (#24757)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-09-13 00:16:24 -07:00
Woosuk Kwon
5febdc8750
[Chore] Remove unused batched RoPE op & kernel (#24789)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-13 00:08:20 -07:00
Didier Durand
bcb06d7baf
[Doc]: fix typos in various files (#24726)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-12 06:43:12 -07:00
Michael Goin
c3aea10dc8
[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-11 15:43:14 -07:00
Ilya Markov
1fdd5c42d7
[Kernels] Enable Torch Symmetric Memory All-Reduce By Default (#24111)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-11 09:45:31 -07:00
TaehyunKim
9bd831f501
[Model] New model support for Motif-1-Tiny (#23414)
Signed-off-by: ca1207 <ca1207zzz@gmail.com>
Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com>
Co-authored-by: WyldeCat <skan1543@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-10 23:29:40 -07:00
Ming Yang
86173ad593
[Kernel] Support decode context parallelism on Blackwell with CUTLASS MLA (#24385)
Signed-off-by: Ming Yang <minos.future@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-09-08 09:27:12 +08:00
mohankku
0eadaeff7e
[Bugfix] Avoid uninitialized usage of azp_val when AZP is false. (#24335)
Signed-off-by: Mohan Kumar Kumar <mohan.cbein@gmail.com>
Signed-off-by: mohankku <mohan.cbein@gmail.com>
2025-09-06 08:17:03 -07:00
yzds
ac201a0eaf
[Feature] Support Decode Context Parallel (DCP) for MLA (#23734)
Signed-off-by: hongchao <hongchao@msh.team>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: hongchao <hongchao@msh.team>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-09-06 13:24:05 +08:00
Didier Durand
35bf193864
[Doc]: fix typos in Python comments (#24294)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-05 19:41:12 -07:00
elvischenv
adc3ddb430
[Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files (#23727)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-09-04 14:25:45 -07:00
Li, Jiang
57b1ce94f7
[CPU] Refactor CPU unquantized linear (#24150)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-04 14:28:45 +08:00
Qiming Zhang
e919d6f549
[Kernel][Bugfix] Fix grouped topk cu (#24146)
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
2025-09-04 12:37:37 +08:00
Matthew Bonanni
a742322092
[Attention] Blackwell FP8 MLA support with CUTLASS_MLA backend (#23289)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-09-03 14:05:24 -04:00
Wentao Ye
c4ed78b14f
[Compile] Fix Compile Warning for w4a8_mm_entry.cu (#23660)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-09-02 20:45:52 -07:00
co63oc
1bd007f234
fix some typos (#24071)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-09-02 20:44:50 -07:00
Asaf Joseph Gardin
2b41cbbf03
[V1][Mamba1] - FP32 SSM Kernel Support (#23506)
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
2025-09-01 20:53:00 -07:00
yzds
0dc9532065
[BUGFIX ] fix undefined silu_and_mul_nvfp4_quant (#23929)
Signed-off-by: hongchao <hongchao@msh.team>
Signed-off-by: Richard Zou <zou3519@gmail.com>
Co-authored-by: hongchao <hongchao@msh.team>
Co-authored-by: Richard Zou <zou3519@gmail.com>
Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>
2025-08-29 09:36:39 -07:00
Charlie Fu
006477e60b
[ROCm][Fix] Fix rocm build caused by #23791 (#23847)
Signed-off-by: charlifu <charlifu@amd.com>
2025-08-28 19:52:27 -07:00
elvischenv
16a45b3a28
[NVIDIA] Support SiluMul + NVFP4 quant fusion (#23671)
Signed-off-by: jindih <jindih@nvidia.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: jindih <jindih@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedic <lgovedic@redhat.com>
2025-08-28 19:36:50 +00:00
yzds
186aced5ff
[Kernel] cuda kernels for upcoming decode context parallel feature (#23791)
Co-authored-by: hongchao <hongchao@msh.team>
2025-08-28 15:29:11 +08:00
Luka Govedič
4f35be10a9
[BugFix] Fix topk_softmax assert (#19764)
Signed-off-by: Luka Govedic <lgovedic@redhat.com>
2025-08-27 09:47:28 -07:00
Xin Yang
8a3cd90af5
[Kernel] Add fused grouped_topk kernel for MoE (#23274)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-08-25 11:47:52 -07:00
czhu-cohere
e76e233540
[kernel] Support W4A8 on Hopper (#23198)
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
2025-08-24 06:18:04 +00:00
Matthew Bonanni
19fe1a0510
[Kernel] Add FP8 support with FlashMLA backend (#22668)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-08-22 02:26:32 +00:00
Elvir Crnčević
044931f97b
Make sure that vectorize_with_alignment produced vectorized global loads (#23182) 2025-08-21 20:06:54 +00:00