[Doc] Add max_lora_rank configuration guide (#22782)

Signed-off-by: chiliu <cliu_whu@yeah.net>
2025-12-10 03:26:12 +08:00 · 2025-08-13 19:10:07 +08:00 · 2025-08-13 19:10:07 +08:00 · 3f52738dce
commit 3f52738dce
parent a01e0018b5
1 changed files with 19 additions and 0 deletions
--- a/docs/features/lora.md
+++ b/docs/features/lora.md
@ -351,3 +351,22 @@ vllm serve ibm-granite/granite-speech-3.3-2b \
 ```
 Note: Default multimodal LoRAs are currently only available for `.generate` and chat completions.
 ## Using Tips
 ### Configuring `max_lora_rank`
 The `--max-lora-rank` parameter controls the maximum rank allowed for LoRA adapters. This setting affects memory allocation and performance:
 - **Set it to the maximum rank** among all LoRA adapters you plan to use
 - **Avoid setting it too high** - using a value much larger than needed wastes memory and can cause performance issues
 For example, if your LoRA adapters have ranks [16, 32, 64], use `--max-lora-rank 64` rather than 256
 ```bash
 # Good: matches actual maximum rank
 vllm serve model --enable-lora --max-lora-rank 64
 # Bad: unnecessarily high, wastes memory
 vllm serve model --enable-lora --max-lora-rank 256
 ```