Move forward

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
This commit is contained in:
Jee Jee Li 2025-12-12 07:00:24 +00:00
parent 5e78570cce
commit 1cb35461fc

View File

@ -275,6 +275,10 @@ The new format of `--lora-modules` is mainly to support the display of parent mo
}
```
## LoRA Support for Tower and Connector of Multi-Modal Model
Currently, vLLM experimentally supports LoRA for the Tower and Connector components of multi-modal models. To enable this feature, you need to implement the corresponding token helper functions for the tower and connector. For more details on the rationale behind this approach, please refer to [PR 26674](https://github.com/vllm-project/vllm/pull/26674). We welcome contributions to extend LoRA support to additional models' tower and connector.
## Default LoRA Models For Multimodal Models
Some models, e.g., [Granite Speech](https://huggingface.co/ibm-granite/granite-speech-3.3-8b) and [Phi-4-multimodal-instruct](https://huggingface.co/microsoft/Phi-4-multimodal-instruct) multimodal, contain LoRA adapter(s) that are expected to always be applied when a given modality is present. This can be a bit tedious to manage with the above approaches, as it requires the user to send the `LoRARequest` (offline) or to filter requests between the base model and LoRA model (server) depending on the content of the request's multimodal data.
@ -347,8 +351,11 @@ vllm serve ibm-granite/granite-speech-3.3-2b \
--max-lora-rank 64
```
Note: Default multimodal LoRAs are currently only available for `.generate` and chat completions.
## Using Tips
### Configuring `max_lora_rank`