mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2026-01-28 21:37:17 +08:00
[Docs] Add some details about what the MoE block needs for the Transformers backend (#28588)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
parent
8832fff972
commit
3226283461
@ -75,7 +75,12 @@ This section details the necessary modifications to make to a Transformers compa
|
||||
To make your model compatible with the Transformers backend, it needs:
|
||||
|
||||
1. `kwargs` passed down through all modules from `MyModel` to `MyAttention`.
|
||||
1. If your model is encoder-only, you must also add `is_causal = False` to `MyAttention`.
|
||||
- If your model is encoder-only:
|
||||
1. Add `is_causal = False` to `MyAttention`.
|
||||
- If your model is mixture-of-experts (MoE):
|
||||
1. Your sparse MoE block must have an attribute called `experts`.
|
||||
2. The class of `experts` (`MyExperts`) must inherit from `nn.ModuleList`.
|
||||
3. `MyExperts.forward` must accept `hidden_states`, `top_k_index`, `top_k_weights`.
|
||||
2. `MyAttention` must use `ALL_ATTENTION_FUNCTIONS` to call attention.
|
||||
3. `MyModel` must contain `_supports_attention_backend = True`.
|
||||
|
||||
@ -102,6 +107,23 @@ class MyAttention(nn.Module):
|
||||
)
|
||||
...
|
||||
|
||||
# Only do this for mixture-of-experts models
|
||||
class MyExperts(nn.ModuleList):
|
||||
def forward(self, hidden_states, top_k_index, top_k_weights):
|
||||
...
|
||||
|
||||
# Only do this for mixture-of-experts models
|
||||
class MySparseMoEBlock(nn.Module):
|
||||
def __init__(self, config):
|
||||
...
|
||||
self.experts = MyExperts(config)
|
||||
...
|
||||
|
||||
def forward(self, hidden_states: torch.Tensor):
|
||||
...
|
||||
hidden_states = self.experts(hidden_states, top_k_index, top_k_weights)
|
||||
...
|
||||
|
||||
class MyModel(PreTrainedModel):
|
||||
_supports_attention_backend = True
|
||||
```
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user