mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-09 17:14:58 +08:00
Update FAQ on interleaving sliding windows support (#29796)
Signed-off-by: Finbarr Timbers <finbarrtimbers@gmail.com>
This commit is contained in:
parent
cabc77cc86
commit
38caf7fa1a
@ -113,8 +113,6 @@ See [this page](registration.md) for instructions on how to register your new mo
|
|||||||
|
|
||||||
### How to support models with interleaving sliding windows?
|
### How to support models with interleaving sliding windows?
|
||||||
|
|
||||||
For models with interleaving sliding windows (e.g. `google/gemma-2-2b-it` and `mistralai/Ministral-8B-Instruct-2410`), the scheduler will treat the model as a full-attention model, i.e., kv-cache of all tokens will not be dropped. This is to make sure prefix caching works with these models. Sliding window only appears as a parameter to the attention kernel computation.
|
|
||||||
|
|
||||||
To support a model with interleaving sliding windows, we need to take care of the following details:
|
To support a model with interleaving sliding windows, we need to take care of the following details:
|
||||||
|
|
||||||
- Make sure the model's `config.json` contains `layer_types`.
|
- Make sure the model's `config.json` contains `layer_types`.
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user