From 3f52738dce57360ccc92c9993c5adcaaec1f5ac2 Mon Sep 17 00:00:00 2001 From: 633WHU Date: Wed, 13 Aug 2025 19:10:07 +0800 Subject: [PATCH] [Doc] Add max_lora_rank configuration guide (#22782) Signed-off-by: chiliu --- docs/features/lora.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/docs/features/lora.md b/docs/features/lora.md index a4e05dae11c2..668460a368a7 100644 --- a/docs/features/lora.md +++ b/docs/features/lora.md @@ -351,3 +351,22 @@ vllm serve ibm-granite/granite-speech-3.3-2b \ ``` Note: Default multimodal LoRAs are currently only available for `.generate` and chat completions. + +## Using Tips + +### Configuring `max_lora_rank` + +The `--max-lora-rank` parameter controls the maximum rank allowed for LoRA adapters. This setting affects memory allocation and performance: + +- **Set it to the maximum rank** among all LoRA adapters you plan to use +- **Avoid setting it too high** - using a value much larger than needed wastes memory and can cause performance issues + +For example, if your LoRA adapters have ranks [16, 32, 64], use `--max-lora-rank 64` rather than 256 + +```bash +# Good: matches actual maximum rank +vllm serve model --enable-lora --max-lora-rank 64 + +# Bad: unnecessarily high, wastes memory +vllm serve model --enable-lora --max-lora-rank 256 +```