Move forward

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2026-06-26 18:37:17 +08:00 · 2025-12-12 07:00:24 +00:00 · 2025-12-12 07:00:24 +00:00 · 1cb35461fc
commit 1cb35461fc
parent 5e78570cce
1 changed files with 7 additions and 0 deletions
--- a/docs/features/lora.md
+++ b/docs/features/lora.md
@ -275,6 +275,10 @@ The new format of `--lora-modules` is mainly to support the display of parent mo
    }
    ```

+## LoRA Support for Tower and Connector of Multi-Modal Model
+
+Currently, vLLM experimentally supports LoRA for the Tower and Connector components of multi-modal models. To enable this feature, you need to implement the corresponding token helper functions for the tower and connector. For more details on the rationale behind this approach, please refer to [PR 26674](https://github.com/vllm-project/vllm/pull/26674). We welcome contributions to extend LoRA support to additional models' tower and connector.
+
 ## Default LoRA Models For Multimodal Models

 Some models, e.g., [Granite Speech](https://huggingface.co/ibm-granite/granite-speech-3.3-8b) and [Phi-4-multimodal-instruct](https://huggingface.co/microsoft/Phi-4-multimodal-instruct) multimodal, contain LoRA adapter(s) that are expected to always be applied when a given modality is present. This can be a bit tedious to manage with the above approaches, as it requires the user to send the `LoRARequest` (offline) or to filter requests between the base model and LoRA model (server) depending on the content of the request's multimodal data.
@ -347,8 +351,11 @@ vllm serve ibm-granite/granite-speech-3.3-2b \
    --max-lora-rank 64
 ```

+
+
 Note: Default multimodal LoRAs are currently only available for `.generate` and chat completions.

+
 ## Using Tips

 ### Configuring `max_lora_rank`