docs(lora_resolvers): clarify multi-resolver order and storage path requirement (#28153)

Signed-off-by: Chen Wang <Chen.Wang1@ibm.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-16 13:37:23 +08:00 · 2025-11-14 13:08:30 -05:00 · 2025-11-14 13:08:30 -05:00 · 9261eb3dc1
commit 9261eb3dc1
parent cdd7025961
4 changed files with 226 additions and 17 deletions
--- a/.markdownlint.yaml
+++ b/.markdownlint.yaml
@ -3,6 +3,8 @@ MD007:
 MD013: false
 MD024:
  siblings_only: true
+MD031:
+  list_items: false
 MD033: false
 MD045: false
 MD046: false
--- a/docs/.nav.yml
+++ b/docs/.nav.yml
@ -46,7 +46,10 @@ nav:
      - contributing/model/multimodal.md
      - contributing/model/transcription.md
    - CI: contributing/ci
-    - Design Documents: design
+    - Design Documents:
+      - Plugins:
+        - design/*plugin*.md
+      - design/*
  - API Reference:
    - api/README.md
    - api/vllm
--- a/docs/design/lora_resolver_plugins.md
+++ b/docs/design/lora_resolver_plugins.md
@ -0,0 +1,220 @@
+# LoRA Resolver Plugins
+
+This directory contains vLLM's LoRA resolver plugins built on the `LoRAResolver` framework.
+They automatically discover and load LoRA adapters from a specified local storage path, eliminating the need for manual configuration or server restarts.
+
+## Overview
+
+LoRA Resolver Plugins provide a flexible way to dynamically load LoRA adapters at runtime. When vLLM
+receives a request for a LoRA adapter that hasn't been loaded yet, the resolver plugins will attempt
+to locate and load the adapter from their configured storage locations. This enables:
+
+- **Dynamic LoRA Loading**: Load adapters on-demand without server restarts
+- **Multiple Storage Backends**: Support for filesystem, S3, and custom backends. The built-in `lora_filesystem_resolver` requires a local storage path, but custom resolvers can be implemented to fetch from any source.
+- **Automatic Discovery**: Seamless integration with existing LoRA workflows
+- **Scalable Deployment**: Centralized adapter management across multiple vLLM instances
+
+## Prerequisites
+
+Before using LoRA Resolver Plugins, ensure the following environment variables are configured:
+
+### Required Environment Variables
+
+1. **`VLLM_ALLOW_RUNTIME_LORA_UPDATING`**: Must be set to `true` or `1` to enable dynamic LoRA loading
+   ```bash
+   export VLLM_ALLOW_RUNTIME_LORA_UPDATING=true
+   ```
+
+2. **`VLLM_PLUGINS`**: Must include the desired resolver plugins (comma-separated list)
+   ```bash
+   export VLLM_PLUGINS=lora_filesystem_resolver
+   ```
+
+3. **`VLLM_LORA_RESOLVER_CACHE_DIR`**: Must be set to a valid directory path for filesystem resolver
+   ```bash
+   export VLLM_LORA_RESOLVER_CACHE_DIR=/path/to/lora/adapters
+   ```
+
+### Optional Environment Variables
+
+- **`VLLM_PLUGINS`**: If not set, all available plugins will be loaded. If set to empty string, no plugins will be loaded.
+
+## Available Resolvers
+
+### lora_filesystem_resolver
+
+The filesystem resolver is installed with vLLM by default and enables loading LoRA adapters from a local directory structure.
+
+#### Setup Steps
+
+1. **Create the LoRA adapter storage directory**:
+   ```bash
+   mkdir -p /path/to/lora/adapters
+   ```
+
+2. **Set environment variables**:
+   ```bash
+   export VLLM_ALLOW_RUNTIME_LORA_UPDATING=true
+   export VLLM_PLUGINS=lora_filesystem_resolver
+   export VLLM_LORA_RESOLVER_CACHE_DIR=/path/to/lora/adapters
+   ```
+
+3. **Start vLLM server**:
+   Your base model can be `meta-llama/Llama-2-7b-hf`. Please make sure you set up the Hugging Face token in your env var `export HF_TOKEN=xxx235`.
+   ```bash
+   python -m vllm.entrypoints.openai.api_server \
+       --model your-base-model \
+       --enable-lora
+   ```
+
+#### Directory Structure Requirements
+
+The filesystem resolver expects LoRA adapters to be organized in the following structure:
+
+```text
+/path/to/lora/adapters/
+├── adapter1/
+│   ├── adapter_config.json
+│   ├── adapter_model.bin
+│   └── tokenizer files (if applicable)
+├── adapter2/
+│   ├── adapter_config.json
+│   ├── adapter_model.bin
+│   └── tokenizer files (if applicable)
+└── ...
+```
+
+Each adapter directory must contain:
+
+- **`adapter_config.json`**: Required configuration file with the following structure:
+  ```json
+  {
+    "peft_type": "LORA",
+    "base_model_name_or_path": "your-base-model-name",
+    "r": 16,
+    "lora_alpha": 32,
+    "target_modules": ["q_proj", "v_proj"],
+    "bias": "none",
+    "modules_to_save": null,
+    "use_rslora": false,
+    "use_dora": false
+  }
+  ```
+
+- **`adapter_model.bin`**: The LoRA adapter weights file
+
+#### Usage Example
+
+1. **Prepare your LoRA adapter**:
+   ```bash
+   # Assuming you have a LoRA adapter in /tmp/my_lora_adapter
+   cp -r /tmp/my_lora_adapter /path/to/lora/adapters/my_sql_adapter
+   ```
+
+2. **Verify the directory structure**:
+   ```bash
+   ls -la /path/to/lora/adapters/my_sql_adapter/
+   # Should show: adapter_config.json, adapter_model.bin, etc.
+   ```
+
+3. **Make a request using the adapter**:
+   ```bash
+   curl http://localhost:8000/v1/completions \
+       -H "Content-Type: application/json" \
+       -d '{
+           "model": "my_sql_adapter",
+           "prompt": "Generate a SQL query for:",
+           "max_tokens": 50,
+           "temperature": 0.1
+       }'
+   ```
+
+#### How It Works
+
+1. When vLLM receives a request for a LoRA adapter named `my_sql_adapter`
+2. The filesystem resolver checks if `/path/to/lora/adapters/my_sql_adapter/` exists
+3. If found, it validates the `adapter_config.json` file
+4. If the configuration matches the base model and is valid, the adapter is loaded
+5. The request is processed normally with the newly loaded adapter
+6. The adapter remains available for future requests
+
+## Advanced Configuration
+
+### Multiple Resolvers
+
+You can configure multiple resolver plugins to load adapters from different sources:
+
+'lora_s3_resolver' is an example of a custom resolver you would need to implement
+
+```bash
+export VLLM_PLUGINS=lora_filesystem_resolver,lora_s3_resolver
+```
+
+All listed resolvers are enabled; at request time, vLLM tries them in order until one succeeds.
+
+### Custom Resolver Implementation
+
+To implement your own resolver plugin:
+
+1. **Create a new resolver class**:
+   ```python
+   from vllm.lora.resolver import LoRAResolver, LoRAResolverRegistry
+   from vllm.lora.request import LoRARequest
+   
+   class CustomResolver(LoRAResolver):
+       async def resolve_lora(self, base_model_name: str, lora_name: str) -> Optional[LoRARequest]:
+           # Your custom resolution logic here
+           pass
+   ```
+
+2. **Register the resolver**:
+   ```python
+   def register_custom_resolver():
+       resolver = CustomResolver()
+       LoRAResolverRegistry.register_resolver("Custom Resolver", resolver)
+   ```
+
+## Troubleshooting
+
+### Common Issues
+
+1. **"VLLM_LORA_RESOLVER_CACHE_DIR must be set to a valid directory"**
+   - Ensure the directory exists and is accessible
+   - Check file permissions on the directory
+
+2. **"LoRA adapter not found"**
+   - Verify the adapter directory name matches the requested model name
+   - Check that `adapter_config.json` exists and is valid JSON
+   - Ensure `adapter_model.bin` exists in the directory
+
+3. **"Invalid adapter configuration"**
+   - Verify `peft_type` is set to "LORA"
+   - Check that `base_model_name_or_path` matches your base model
+   - Ensure `target_modules` is properly configured
+
+4. **"LoRA rank exceeds maximum"**
+   - Check that `r` value in `adapter_config.json` doesn't exceed `max_lora_rank` setting
+
+### Debugging Tips
+
+1. **Enable debug logging**:
+   ```bash
+   export VLLM_LOGGING_LEVEL=DEBUG
+   ```
+
+2. **Verify environment variables**:
+   ```bash
+   echo $VLLM_ALLOW_RUNTIME_LORA_UPDATING
+   echo $VLLM_PLUGINS
+   echo $VLLM_LORA_RESOLVER_CACHE_DIR
+   ```
+
+3. **Test adapter configuration**:
+   ```bash
+   python -c "
+   import json
+   with open('/path/to/lora/adapters/my_adapter/adapter_config.json') as f:
+       config = json.load(f)
+   print('Config valid:', config)
+   "
+   ```
--- a/vllm/plugins/lora_resolvers/README.md
+++ b/vllm/plugins/lora_resolvers/README.md
@ -1,16 +0,0 @@
-# LoRA Resolver Plugins
-
-This directory contains vLLM general plugins for dynamically discovering and loading LoRA adapters
-via the LoRAResolver plugin framework.
-
-Note that `VLLM_ALLOW_RUNTIME_LORA_UPDATING` must be set to true to allow LoRA resolver plugins
-to work, and `VLLM_PLUGINS` must be set to include the desired resolver plugins.
-
-## lora_filesystem_resolver
-
-This LoRA Resolver is installed with vLLM by default.
-To use, set `VLLM_PLUGIN_LORA_CACHE_DIR` to a local directory. When vLLM receives a request
-for a LoRA adapter `foobar` it doesn't currently recognize, it will look in that local directory
-for a subdirectory `foobar` containing a LoRA adapter. If such an adapter exists, it will
-load that adapter, and then service the request as normal. That adapter will then be available
-for future requests as normal.