vllm/moe at c0c77472cbdd624b4fc8fe2f608f9e6618ccdee2 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-21 03:05:38 +08:00

History

zhrrr 75c7ad9918

[Kernel][Performance] Fuse float cast and renormalize to topk softmax kernel (#26717 )

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>

2025-10-17 07:30:35 +00:00

..

marlin_moe_wna16

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

permute_unpermute_kernels

Fix CUDA permute/unpermute for use with DeepGemm Moe (#17934 )

2025-07-27 07:08:00 -07:00

dynamic_4bit_int_moe_cpu.cpp

[fix]: add Arm 4bit fused moe support (#23809 )

2025-09-24 01:32:22 +00:00

grouped_topk_kernels.cu

Use macro guard CUDA functions for back compatibility in grouped_topk_kernel.cu (#25346 )

2025-09-23 09:45:39 -07:00

moe_align_sum_kernels.cu

[GPTOSS][DP/EP][Marlin] Enable GPTOSS Batched DP/EP using Marlin kernels (#25997 )

2025-10-16 12:53:11 -07:00

moe_ops.h

[Kernel][Performance] Fuse float cast and renormalize to topk softmax kernel (#26717 )

2025-10-17 07:30:35 +00:00

moe_permute_unpermute_op.cu

[Kernel] CUTLASS MoE FP8: Integrate cuda moe permute/unpermute (#23045 )

2025-08-20 10:35:26 -04:00

moe_wna16_utils.h

pre-commit autoupdate (#17380 )

2025-04-29 06:46:55 -07:00

moe_wna16.cu

…

topk_softmax_kernels.cu

[Kernel][Performance] Fuse float cast and renormalize to topk softmax kernel (#26717 )

2025-10-17 07:30:35 +00:00

torch_bindings.cpp

[Kernel][Performance] Fuse float cast and renormalize to topk softmax kernel (#26717 )

2025-10-17 07:30:35 +00:00