vllm/moe at 24d0c9e6edc4299f62053ee0cb0154ce86b08cb8 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-29 05:27:53 +08:00

History

shixianc b17109beea

[Kernel] CUTLASS MoE FP8: Integrate cuda moe permute/unpermute (#23045 )

Signed-off-by: Shixian Cui <shixian@amazon.com>

2025-08-20 10:35:26 -04:00

..

marlin_moe_wna16

[Kernel] [Quantization] Add MXFP4 and bias support for marlin kernel (#22428 )

2025-08-14 11:23:22 -07:00

permute_unpermute_kernels

Fix CUDA permute/unpermute for use with DeepGemm Moe (#17934 )

2025-07-27 07:08:00 -07:00

moe_align_sum_kernels.cu

[perf] Speed up align sum kernels (#21079 )

2025-07-21 11:19:23 -07:00

moe_ops.h

[Perf] Optimize moe_align_block_size CUDA kernel (#19572 )

2025-06-17 11:49:26 -07:00

moe_permute_unpermute_op.cu

[Kernel] CUTLASS MoE FP8: Integrate cuda moe permute/unpermute (#23045 )

2025-08-20 10:35:26 -04:00

moe_wna16_utils.h

pre-commit autoupdate (#17380 )

2025-04-29 06:46:55 -07:00

moe_wna16.cu

[BugFix] Accuracy fix for llama4 int4 - improperly casted scales (#16801 )

2025-04-17 22:13:29 -07:00

topk_softmax_kernels.cu

[ROCm][Bugfix] Fix compilation error in topk softmax fused kernel (#22819 )

2025-08-13 13:45:03 -07:00

torch_bindings.cpp

[Kernel] [Quantization] Add MXFP4 and bias support for marlin kernel (#22428 )

2025-08-14 11:23:22 -07:00