diff --git a/docs/usage/v1_guide.md b/docs/usage/v1_guide.md index 7fc615d4c042..64bd0d9bf507 100644 --- a/docs/usage/v1_guide.md +++ b/docs/usage/v1_guide.md @@ -111,11 +111,10 @@ Models that use Mamba-2 and Mamba-1 layers (e.g., `Mamba2ForCausalLM`, `MambaFor Models that combine Mamba-2 and Mamba-1 layers with standard attention layers are also supported (e.g., `BambaForCausalLM`, `Zamba2ForCausalLM`, `NemotronHForCausalLM`, `FalconH1ForCausalLM` and `GraniteMoeHybridForCausalLM`, `JambaForCausalLM`). Please note that -these models currently require disabling prefix caching and using the FlashInfer attention backend in V1. +these models currently require disabling prefix caching in V1. Hybrid models with mechanisms different to Mamba are also supported (e.g, `MiniMaxText01ForCausalLM`, `MiniMaxM1ForCausalLM`). -Please note that these models currently require disabling prefix caching, enforcing eager mode, and using the FlashInfer -attention backend in V1. +Please note that these models currently require disabling prefix caching and enforcing eager mode in V1. #### Encoder-Decoder Models