mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2026-03-24 06:33:32 +08:00
Move forward
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
This commit is contained in:
parent
5e78570cce
commit
1cb35461fc
@ -275,6 +275,10 @@ The new format of `--lora-modules` is mainly to support the display of parent mo
|
||||
}
|
||||
```
|
||||
|
||||
## LoRA Support for Tower and Connector of Multi-Modal Model
|
||||
|
||||
Currently, vLLM experimentally supports LoRA for the Tower and Connector components of multi-modal models. To enable this feature, you need to implement the corresponding token helper functions for the tower and connector. For more details on the rationale behind this approach, please refer to [PR 26674](https://github.com/vllm-project/vllm/pull/26674). We welcome contributions to extend LoRA support to additional models' tower and connector.
|
||||
|
||||
## Default LoRA Models For Multimodal Models
|
||||
|
||||
Some models, e.g., [Granite Speech](https://huggingface.co/ibm-granite/granite-speech-3.3-8b) and [Phi-4-multimodal-instruct](https://huggingface.co/microsoft/Phi-4-multimodal-instruct) multimodal, contain LoRA adapter(s) that are expected to always be applied when a given modality is present. This can be a bit tedious to manage with the above approaches, as it requires the user to send the `LoRARequest` (offline) or to filter requests between the base model and LoRA model (server) depending on the content of the request's multimodal data.
|
||||
@ -347,8 +351,11 @@ vllm serve ibm-granite/granite-speech-3.3-2b \
|
||||
--max-lora-rank 64
|
||||
```
|
||||
|
||||
|
||||
|
||||
Note: Default multimodal LoRAs are currently only available for `.generate` and chat completions.
|
||||
|
||||
|
||||
## Using Tips
|
||||
|
||||
### Configuring `max_lora_rank`
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user