Jinzhen Lin 750f4cabfa
[Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3) (#12222)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-01-20 16:42:16 -08:00
..
2025-01-20 14:58:01 +08:00
2025-01-20 14:58:01 +08:00
2025-01-15 02:29:53 +00:00