vllm/moe at af0444bf40b7db2f3fb9fe1508d25ceba24cac87 - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-23 10:57:20 +08:00

History

[Kernel][MoE] optimize moe_align_block_size (#29642 )

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

2025-12-07 01:58:47 -08:00

modular_kernel_tools

[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton (#29929 )

2025-12-03 20:49:00 +00:00

__init__.py

…

parallel_utils.py

[Chore] Separate out optional dependency checks from vllm.utils (#27207 )

2025-10-22 10:44:21 -04:00

test_batched_deepgemm.py

…

test_batched_moe.py

[CI/Build] Only use supported types and features on ROCm in MoE kernel tests (#29149 )

2025-11-21 20:34:33 -07:00

test_block_fp8.py

[CI/Build] Only use supported types and features on ROCm in MoE kernel tests (#29149 )

2025-11-21 20:34:33 -07:00

test_block_int8.py

[Misc] Make SchedulerConfig.max_model_len init-only (#28733 )

2025-11-15 01:59:31 -08:00

test_count_expert_num_tokens.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

test_cutedsl_moe.py

[MoE] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#25990 )

2025-11-19 13:29:06 -08:00

test_cutlass_grouped_gemm.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

test_cutlass_moe.py

[Misc] Make SchedulerConfig.max_model_len init-only (#28733 )

2025-11-15 01:59:31 -08:00

test_deepep_deepgemm_moe.py

[Performance][DeepGEMM] Estimate expected_m (#28694 )

2025-11-15 13:52:14 +08:00

test_deepep_moe.py

[Performance][B200] silu_mul_quant: pack scales in int32 (#28358 )

2025-11-13 10:16:55 -08:00

test_deepgemm.py

kernels/moe test pruning (#27053 )

2025-10-30 12:10:34 +08:00

test_flashinfer_moe.py

[Feat] Support non-gated activations in NVFP4 modelopt path (#29004 )

2025-11-30 11:02:40 -05:00

test_flashinfer.py

[MoE][Refactor] Make select_experts a non-static method (#29067 )

2025-11-24 13:38:04 -05:00

test_gpt_oss_triton_kernels.py

[CI/Build] Only use supported types and features on ROCm in MoE kernel tests (#29149 )

2025-11-21 20:34:33 -07:00

test_grouped_topk.py

kernels/moe test pruning (#27053 )

2025-10-30 12:10:34 +08:00

test_modular_kernel_combinations.py

[CI/Build] Only use supported types and features on ROCm in MoE kernel tests (#29149 )

2025-11-21 20:34:33 -07:00

test_modular_oai_triton_moe.py

[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#29708 )

2025-11-30 10:37:25 +08:00

test_moe_align_block_size.py

[Kernel][MoE] optimize moe_align_block_size (#29642 )

2025-12-07 01:58:47 -08:00

test_moe_permute_unpermute.py

[CI/Build] Only use supported types and features on ROCm in MoE kernel tests (#29149 )

2025-11-21 20:34:33 -07:00

test_moe.py

[Kernel][MoE] optimize moe_align_block_size (#29642 )

2025-12-07 01:58:47 -08:00

test_nvfp4_moe.py

kernels/moe test pruning (#27053 )

2025-10-30 12:10:34 +08:00

test_ocp_mx_moe.py

[NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel (#26714 )

2025-10-16 16:20:25 -07:00

test_pplx_cutlass_moe.py

[Misc] Make SchedulerConfig.max_model_len init-only (#28733 )

2025-11-15 01:59:31 -08:00

test_pplx_moe.py

[Misc] Make SchedulerConfig.max_model_len init-only (#28733 )

2025-11-15 01:59:31 -08:00

test_rocm_aiter_topk.py

…

test_silu_mul_fp8_quant_deep_gemm.py

[CI/Build] Only use supported types and features on ROCm in MoE kernel tests (#29149 )

2025-11-21 20:34:33 -07:00

test_silu_mul_per_token_group_quant_fp8_colmajor.py

[Performance][DP/EP] Add silu_mul_per_token_group_quant_fp8_colmajor kernel (#29470 )

2025-12-03 18:04:59 +00:00

test_triton_moe_ptpc_fp8.py

[CI/Build] Only use supported types and features on ROCm in MoE kernel tests (#29149 )

2025-11-21 20:34:33 -07:00

utils.py

[Feat] Support non-gated activations in NVFP4 modelopt path (#29004 )

2025-11-30 11:02:40 -05:00