Relax Transformers modeling backend MoE experts check (#28952)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-01-23 14:04:29 +08:00 · 2025-11-19 14:50:30 +01:00 · 2025-11-19 14:50:30 +01:00 · 4f5299f717
commit 4f5299f717
parent 09540cd918
2 changed files with 11 additions and 2 deletions
--- a/docs/models/supported_models.md
+++ b/docs/models/supported_models.md
@ -79,7 +79,9 @@ To make your model compatible with the Transformers modeling backend, it needs:
        1. Add `is_causal = False` to `MyAttention`.
    - If your model is mixture-of-experts (MoE):
        1. Your sparse MoE block must have an attribute called `experts`.
-        2. The class of `experts` (`MyExperts`) must inherit from `nn.ModuleList`.
+        2. The class of `experts` (`MyExperts`) must either:
+            - Inherit from `nn.ModuleList` (naive).
+            - Or contain all 3D `nn.Parameters` (packed).
        3. `MyExperts.forward` must accept `hidden_states`, `top_k_index`, `top_k_weights`.
 2. `MyAttention` must use `ALL_ATTENTION_FUNCTIONS` to call attention.
 3. `MyModel` must contain `_supports_attention_backend = True`.
--- a/vllm/model_executor/models/transformers/moe.py
+++ b/vllm/model_executor/models/transformers/moe.py
@ -256,7 +256,14 @@ class MoEMixin(MixtureOfExperts):
        def _recursive_replace(module: nn.Module, prefix: str):
            for child_name, child_module in module.named_children():
                qual_name = maybe_prefix(prefix, child_name)
-                if child_name == "experts" and isinstance(child_module, nn.ModuleList):
+                # Naive implementations will have experts as ModuleList
+                is_modulelist = isinstance(child_module, nn.ModuleList)
+                # Packed implementations will have experts as 3D tensors of shapes like:
+                # gate_up_proj = (num_experts, 2 * intermediate_size, hidden_size)
+                # down_proj = (num_experts, intermediate_size, hidden_size)
+                params = list(child_module.parameters())
+                is_3d = len(params) > 0 and all(p.ndim == 3 for p in params)
+                if child_name == "experts" and (is_modulelist or is_3d):
                    # Alias for readability
                    mlp = module
                    experts = child_module