vllm/fused_moe at 27b78c73cad00f5c7bb3b2431f02dc680f7034bc - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-23 22:07:12 +08:00

History

Jinzhen Lin 27b78c73ca

[Kernel] add triton fused moe kernel for gptq/awq (#12185 )

2025-01-29 09:07:09 -05:00

..

[ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) | fp16, fp8 (#12408 )

2025-01-25 12:17:19 +08:00

__init__.py

[torch.compile] support moe models (#9632 )

2024-10-27 21:58:04 -07:00

fused_marlin_moe.py

[optimization] remove python function call for custom op (#11750 )

2025-01-07 17:04:28 +00:00

fused_moe.py

[Kernel] add triton fused moe kernel for gptq/awq (#12185 )

2025-01-29 09:07:09 -05:00

layer.py

[BugFix] Fix parameter names and process_after_weight_loading for W4A16 MoE Group Act Order (#11528 )

2025-01-23 21:40:33 +00:00

moe_pallas.py

[Hardware][TPU] Support MoE with Pallas GMM kernel (#6457 )

2024-07-16 09:56:28 -07:00

moe_torch_iterative.py

[Hardware][TPU] workaround fix for MoE on TPU (#11764 )

2025-01-12 10:53:51 -05:00