docs(lora_resolvers): clarify multi-resolver order and storage path requirement (#28153)

Signed-off-by: Chen Wang <Chen.Wang1@ibm.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-05-24 07:37:53 +08:00 · 2025-11-14 13:08:30 -05:00 · 2025-11-14 13:08:30 -05:00 · 9261eb3dc1
commit 9261eb3dc1
parent cdd7025961
4 changed files with 226 additions and 17 deletions
--- a/.markdownlint.yaml
+++ b/.markdownlint.yaml
@ -3,6 +3,8 @@ MD007:
 MD013: false
 MD024:
  siblings_only: true
 MD031:
  list_items: false
 MD033: false
 MD045: false
 MD046: false
--- a/docs/.nav.yml
+++ b/docs/.nav.yml
@ -46,7 +46,10 @@ nav:
      - contributing/model/multimodal.md
      - contributing/model/transcription.md
    - CI: contributing/ci
-    - Design Documents: design
+    - Design Documents:
      - Plugins:
        - design/*plugin*.md
      - design/*
  - API Reference:
    - api/README.md
    - api/vllm
--- a/docs/design/lora_resolver_plugins.md
+++ b/docs/design/lora_resolver_plugins.md
@ -0,0 +1,220 @@
 # LoRA Resolver Plugins
 This directory contains vLLM's LoRA resolver plugins built on the `LoRAResolver` framework.
 They automatically discover and load LoRA adapters from a specified local storage path, eliminating the need for manual configuration or server restarts.
 ## Overview
 LoRA Resolver Plugins provide a flexible way to dynamically load LoRA adapters at runtime. When vLLM
 receives a request for a LoRA adapter that hasn't been loaded yet, the resolver plugins will attempt
 to locate and load the adapter from their configured storage locations. This enables:
 - **Dynamic LoRA Loading**: Load adapters on-demand without server restarts
 - **Multiple Storage Backends**: Support for filesystem, S3, and custom backends. The built-in `lora_filesystem_resolver` requires a local storage path, but custom resolvers can be implemented to fetch from any source.
 - **Automatic Discovery**: Seamless integration with existing LoRA workflows
 - **Scalable Deployment**: Centralized adapter management across multiple vLLM instances
 ## Prerequisites
 Before using LoRA Resolver Plugins, ensure the following environment variables are configured:
 ### Required Environment Variables
 1. **`VLLM_ALLOW_RUNTIME_LORA_UPDATING`**: Must be set to `true` or `1` to enable dynamic LoRA loading
   ```bash
   export VLLM_ALLOW_RUNTIME_LORA_UPDATING=true
   ```
 2. **`VLLM_PLUGINS`**: Must include the desired resolver plugins (comma-separated list)
   ```bash
   export VLLM_PLUGINS=lora_filesystem_resolver
   ```
 3. **`VLLM_LORA_RESOLVER_CACHE_DIR`**: Must be set to a valid directory path for filesystem resolver
   ```bash
   export VLLM_LORA_RESOLVER_CACHE_DIR=/path/to/lora/adapters
   ```
 ### Optional Environment Variables
 - **`VLLM_PLUGINS`**: If not set, all available plugins will be loaded. If set to empty string, no plugins will be loaded.
 ## Available Resolvers
 ### lora_filesystem_resolver
 The filesystem resolver is installed with vLLM by default and enables loading LoRA adapters from a local directory structure.
 #### Setup Steps
 1. **Create the LoRA adapter storage directory**:
   ```bash
   mkdir -p /path/to/lora/adapters
   ```
 2. **Set environment variables**:
   ```bash
   export VLLM_ALLOW_RUNTIME_LORA_UPDATING=true
   export VLLM_PLUGINS=lora_filesystem_resolver
   export VLLM_LORA_RESOLVER_CACHE_DIR=/path/to/lora/adapters
   ```
 3. **Start vLLM server**:
   Your base model can be `meta-llama/Llama-2-7b-hf`. Please make sure you set up the Hugging Face token in your env var `export HF_TOKEN=xxx235`.
   ```bash
   python -m vllm.entrypoints.openai.api_server \
       --model your-base-model \
       --enable-lora
   ```
 #### Directory Structure Requirements
 The filesystem resolver expects LoRA adapters to be organized in the following structure:
 ```text
 /path/to/lora/adapters/
 ├── adapter1/
 │   ├── adapter_config.json
 │   ├── adapter_model.bin
 │   └── tokenizer files (if applicable)
 ├── adapter2/
 │   ├── adapter_config.json
 │   ├── adapter_model.bin
 │   └── tokenizer files (if applicable)
 └── ...
 ```
 Each adapter directory must contain:
 - **`adapter_config.json`**: Required configuration file with the following structure:
  ```json
  {
    "peft_type": "LORA",
    "base_model_name_or_path": "your-base-model-name",
    "r": 16,
    "lora_alpha": 32,
    "target_modules": ["q_proj", "v_proj"],
    "bias": "none",
    "modules_to_save": null,
    "use_rslora": false,
    "use_dora": false
  }
  ```
 - **`adapter_model.bin`**: The LoRA adapter weights file
 #### Usage Example
 1. **Prepare your LoRA adapter**:
   ```bash
   # Assuming you have a LoRA adapter in /tmp/my_lora_adapter
   cp -r /tmp/my_lora_adapter /path/to/lora/adapters/my_sql_adapter
   ```
 2. **Verify the directory structure**:
   ```bash
   ls -la /path/to/lora/adapters/my_sql_adapter/
   # Should show: adapter_config.json, adapter_model.bin, etc.
   ```
 3. **Make a request using the adapter**:
   ```bash
   curl http://localhost:8000/v1/completions \
       -H "Content-Type: application/json" \
       -d '{
           "model": "my_sql_adapter",
           "prompt": "Generate a SQL query for:",
           "max_tokens": 50,
           "temperature": 0.1
       }'
   ```
 #### How It Works
 1. When vLLM receives a request for a LoRA adapter named `my_sql_adapter`
 2. The filesystem resolver checks if `/path/to/lora/adapters/my_sql_adapter/` exists
 3. If found, it validates the `adapter_config.json` file
 4. If the configuration matches the base model and is valid, the adapter is loaded
 5. The request is processed normally with the newly loaded adapter
 6. The adapter remains available for future requests
 ## Advanced Configuration
 ### Multiple Resolvers
 You can configure multiple resolver plugins to load adapters from different sources:
 'lora_s3_resolver' is an example of a custom resolver you would need to implement
 ```bash
 export VLLM_PLUGINS=lora_filesystem_resolver,lora_s3_resolver
 ```
 All listed resolvers are enabled; at request time, vLLM tries them in order until one succeeds.
 ### Custom Resolver Implementation
 To implement your own resolver plugin:
 1. **Create a new resolver class**:
   ```python
   from vllm.lora.resolver import LoRAResolver, LoRAResolverRegistry
   from vllm.lora.request import LoRARequest
   class CustomResolver(LoRAResolver):
       async def resolve_lora(self, base_model_name: str, lora_name: str) -> Optional[LoRARequest]:
           # Your custom resolution logic here
           pass
   ```
 2. **Register the resolver**:
   ```python
   def register_custom_resolver():
       resolver = CustomResolver()
       LoRAResolverRegistry.register_resolver("Custom Resolver", resolver)
   ```
 ## Troubleshooting
 ### Common Issues
 1. **"VLLM_LORA_RESOLVER_CACHE_DIR must be set to a valid directory"**
   - Ensure the directory exists and is accessible
   - Check file permissions on the directory
 2. **"LoRA adapter not found"**
   - Verify the adapter directory name matches the requested model name
   - Check that `adapter_config.json` exists and is valid JSON
   - Ensure `adapter_model.bin` exists in the directory
 3. **"Invalid adapter configuration"**
   - Verify `peft_type` is set to "LORA"
   - Check that `base_model_name_or_path` matches your base model
   - Ensure `target_modules` is properly configured
 4. **"LoRA rank exceeds maximum"**
   - Check that `r` value in `adapter_config.json` doesn't exceed `max_lora_rank` setting
 ### Debugging Tips
 1. **Enable debug logging**:
   ```bash
   export VLLM_LOGGING_LEVEL=DEBUG
   ```
 2. **Verify environment variables**:
   ```bash
   echo $VLLM_ALLOW_RUNTIME_LORA_UPDATING
   echo $VLLM_PLUGINS
   echo $VLLM_LORA_RESOLVER_CACHE_DIR
   ```
 3. **Test adapter configuration**:
   ```bash
   python -c "
   import json
   with open('/path/to/lora/adapters/my_adapter/adapter_config.json') as f:
       config = json.load(f)
   print('Config valid:', config)
   "
   ```
--- a/vllm/plugins/lora_resolvers/README.md
+++ b/vllm/plugins/lora_resolvers/README.md
@ -1,16 +0,0 @@
 # LoRA Resolver Plugins
 This directory contains vLLM general plugins for dynamically discovering and loading LoRA adapters
 via the LoRAResolver plugin framework.
 Note that `VLLM_ALLOW_RUNTIME_LORA_UPDATING` must be set to true to allow LoRA resolver plugins
 to work, and `VLLM_PLUGINS` must be set to include the desired resolver plugins.
 ## lora_filesystem_resolver
 This LoRA Resolver is installed with vLLM by default.
 To use, set `VLLM_PLUGIN_LORA_CACHE_DIR` to a local directory. When vLLM receives a request
 for a LoRA adapter `foobar` it doesn't currently recognize, it will look in that local directory
 for a subdirectory `foobar` containing a LoRA adapter. If such an adapter exists, it will
 load that adapter, and then service the request as normal. That adapter will then be available
 for future requests as normal.