diff --git a/docs/usage/v1_guide.md b/docs/usage/v1_guide.md index 12150cf2a82e6..498ff3da0ca31 100644 --- a/docs/usage/v1_guide.md +++ b/docs/usage/v1_guide.md @@ -107,12 +107,11 @@ to enable simultaneous generation and embedding using the same engine instance i Models using selective state-space mechanisms instead of standard transformer attention are partially supported. Models that use Mamba-2 layers (e.g., `Mamba2ForCausalLM`) are supported, but models that use older Mamba-1 layers (e.g., `MambaForCausalLM`, `JambaForCausalLM`) are not yet supported. Please note that these models currently require -enforcing eager mode and disabling prefix caching in V1. +disabling prefix caching in V1. Models that combine Mamba-2 layers with standard attention layers are also supported (e.g., `BambaForCausalLM`, `Zamba2ForCausalLM`, `NemotronHForCausalLM`, `FalconH1ForCausalLM` and `GraniteMoeHybridForCausalLM`). Please note that -these models currently require enforcing eager mode, disabling prefix caching, and using the FlashInfer attention -backend in V1. +these models currently require disabling prefix caching and using the FlashInfer attention backend in V1. #### Encoder-Decoder Models