vllm/moe at 13ea39bc09cf4c102ba4ad308df379dc5abc3ba4 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-24 01:37:13 +08:00

History

Zhang Xiangze 13ea39bc09

[CPU]Parallelize over tokens in int4 moe (#29600 )

Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>

2025-12-02 06:21:39 +00:00

..

marlin_moe_wna16

[Kernel][Quantization] add w4a8 support for marlin kernel (#24722 )

2025-11-29 07:19:33 -08:00

permute_unpermute_kernels

Fix CUDA permute/unpermute for use with DeepGemm Moe (#17934 )

2025-07-27 07:08:00 -07:00

dynamic_4bit_int_moe_cpu.cpp

[CPU]Parallelize over tokens in int4 moe (#29600 )

2025-12-02 06:21:39 +00:00

grouped_topk_kernels.cu

[Perf][DeepSeek] Add sigmoid+bias fusion to fused_grouped_topk from TRTLLM (#28124 )

2025-11-07 18:20:55 -08:00

moe_align_sum_kernels.cu

[GPTOSS][DP/EP][Marlin] Enable GPTOSS Batched DP/EP using Marlin kernels (#25997 )

2025-10-16 12:53:11 -07:00

moe_lora_align_sum_kernels.cu

Early exit for MoE LoRA kernels (#27131 )

2025-11-03 20:22:17 +08:00

moe_ops.h

[Perf][DeepSeek] Add sigmoid+bias fusion to fused_grouped_topk from TRTLLM (#28124 )

2025-11-07 18:20:55 -08:00

moe_permute_unpermute_op.cu

[Kernel] CUTLASS MoE FP8: Integrate cuda moe permute/unpermute (#23045 )

2025-08-20 10:35:26 -04:00

moe_wna16_utils.h

pre-commit autoupdate (#17380 )

2025-04-29 06:46:55 -07:00

moe_wna16.cu

[BugFix] Accuracy fix for llama4 int4 - improperly casted scales (#16801 )

2025-04-17 22:13:29 -07:00

topk_softmax_kernels.cu

[Kernel][Performance] Fuse float cast and renormalize to topk softmax kernel (#26717 )

2025-10-17 07:30:35 +00:00

torch_bindings.cpp

[Kernel][Quantization] add w4a8 support for marlin kernel (#24722 )

2025-11-29 07:19:33 -08:00