mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-10 04:15:01 +08:00
[Docs] Fix True->true in supported_models.md (#17141)
This commit is contained in:
parent
7a0a9da72b
commit
649818995f
@ -1082,7 +1082,7 @@ See [this page](#generative-models) for more information on how to use generativ
|
||||
|
||||
:::{important}
|
||||
Pan-and-scan image pre-processing is currently supported on V0 (but not V1).
|
||||
You can enable it by passing `--mm-processor-kwargs '{"do_pan_and_scan": True}'`.
|
||||
You can enable it by passing `--mm-processor-kwargs '{"do_pan_and_scan": true}'`.
|
||||
:::
|
||||
|
||||
:::{warning}
|
||||
@ -1097,7 +1097,7 @@ V0 correctly implements the model's attention pattern:
|
||||
|
||||
V1 currently uses a simplified attention pattern:
|
||||
- Uses causal attention for all tokens, including image tokens
|
||||
- Generates reasonable outputs but does not match the original model's attention for text + image inputs, especially when `{"do_pan_and_scan": True}`
|
||||
- Generates reasonable outputs but does not match the original model's attention for text + image inputs, especially when `{"do_pan_and_scan": true}`
|
||||
- Will be updated in the future to support the correct behavior
|
||||
|
||||
This limitation exists because the model's mixed attention pattern (bidirectional for images, causal otherwise) is not yet supported by vLLM's attention backends.
|
||||
@ -1129,7 +1129,7 @@ To use Qwen2.5-Omni, you have to install Hugging Face Transformers library from
|
||||
`pip install git+https://github.com/huggingface/transformers.git`.
|
||||
|
||||
Read audio from video pre-processing is currently supported on V0 (but not V1), because overlapping modalities is not yet supported in V1.
|
||||
`--mm-processor-kwargs '{"use_audio_in_video": True}'`.
|
||||
`--mm-processor-kwargs '{"use_audio_in_video": true}'`.
|
||||
:::
|
||||
|
||||
### Pooling Models
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user