18 Commits

Author SHA1 Message Date
zhrrr
75c7ad9918
[Kernel][Performance] Fuse float cast and renormalize to topk softmax kernel (#26717)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
2025-10-17 07:30:35 +00:00
Bram Wasti
3263799056
[unrevert] Add batch invariant kernel override for FlashInfer backend [2/n] (#26373)
Signed-off-by: Bram Wasti <bwasti@meta.com>
Signed-off-by: Bram Wasti <bwasti@fb.com>
2025-10-13 10:24:53 -04:00
Bram Wasti
dc48ba0c75
Kernel-override Determinism [1/n] (#25603)
Signed-off-by: Bram Wasti <bwasti@meta.com>
2025-09-26 16:59:09 -07:00
Aidyn-A
bfe9380161
Apply fixes for CUDA 13 (#24599)
Signed-off-by: Aidyn-A <aidyn.b.aitzhan@gmail.com>
2025-09-17 09:15:42 -04:00
Luka Govedič
4f35be10a9
[BugFix] Fix topk_softmax assert (#19764)
Signed-off-by: Luka Govedic <lgovedic@redhat.com>
2025-08-27 09:47:28 -07:00
kliuae
c6cd5ca3d3
[ROCm][Bugfix] Fix compilation error in topk softmax fused kernel (#22819)
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
2025-08-13 13:45:03 -07:00
shixianc
4c558cf62e
[Perf] Support topk softmax fused kernel for broader num_experts (#22211)
Signed-off-by: Shixian Cui <shixian@amazon.com>
Co-authored-by: Shixian Cui <shixian@amazon.com>
2025-08-12 21:34:47 -07:00
Wentao Ye
a59cd9d9f7
[Refactor] Fix Compile Warning #1444-D (#21462)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-08-01 06:10:30 -07:00
Gregory Shtrasberg
90eeea8f85
[Bugfix][ROCm] Fix for warp_size uses on host (#21205)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-07-24 00:37:19 -07:00
Wentao Ye
226b452a20
Revert "[Refactor] Fix Compile Warning #1444-D (#21208)" (#21384)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-22 08:22:10 -07:00
Wentao Ye
6e5b5ca580
[Refactor] Fix Compile Warning #1444-D (#21208)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-21 23:33:51 -07:00
Richard Barnes
86debab54c
Fix numel() downcast in vllm/csrc/moe/moe_align_sum_kernels.cu +2 (#17082)
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-01 06:48:10 +00:00
Ning Xie
2f1c19b245
[CI] change spell checker from codespell to typos (#18711)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-06-11 19:57:10 -07:00
Varun Sundar Rabindranath
fa98d77773
[Kernel] DeepEP dispatch-combine kernel integration (#18434)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-06-03 12:30:02 -07:00
bnellnm
f9c069c85e
Modularize fused experts and integrate PPLX kernels (#15956) 2025-05-14 13:11:54 -07:00
bnellnm
5467ac3196
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047) 2024-06-09 16:23:30 -04:00
Divakar Verma
a66cf40b20
[Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer (#4927)
This PR enables the fused topk_softmax kernel used in moe layer for HIP
2024-06-02 14:13:26 -07:00
Woosuk Kwon
f0d4e14557
Add fused top-K softmax kernel for MoE (#2769) 2024-02-05 17:38:02 -08:00