Signed-off-by: Chen Wang <Chen.Wang1@ibm.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
6.8 KiB
LoRA Resolver Plugins
This directory contains vLLM's LoRA resolver plugins built on the LoRAResolver framework.
They automatically discover and load LoRA adapters from a specified local storage path, eliminating the need for manual configuration or server restarts.
Overview
LoRA Resolver Plugins provide a flexible way to dynamically load LoRA adapters at runtime. When vLLM receives a request for a LoRA adapter that hasn't been loaded yet, the resolver plugins will attempt to locate and load the adapter from their configured storage locations. This enables:
- Dynamic LoRA Loading: Load adapters on-demand without server restarts
- Multiple Storage Backends: Support for filesystem, S3, and custom backends. The built-in
lora_filesystem_resolverrequires a local storage path, but custom resolvers can be implemented to fetch from any source. - Automatic Discovery: Seamless integration with existing LoRA workflows
- Scalable Deployment: Centralized adapter management across multiple vLLM instances
Prerequisites
Before using LoRA Resolver Plugins, ensure the following environment variables are configured:
Required Environment Variables
-
VLLM_ALLOW_RUNTIME_LORA_UPDATING: Must be set totrueor1to enable dynamic LoRA loadingexport VLLM_ALLOW_RUNTIME_LORA_UPDATING=true -
VLLM_PLUGINS: Must include the desired resolver plugins (comma-separated list)export VLLM_PLUGINS=lora_filesystem_resolver -
VLLM_LORA_RESOLVER_CACHE_DIR: Must be set to a valid directory path for filesystem resolverexport VLLM_LORA_RESOLVER_CACHE_DIR=/path/to/lora/adapters
Optional Environment Variables
VLLM_PLUGINS: If not set, all available plugins will be loaded. If set to empty string, no plugins will be loaded.
Available Resolvers
lora_filesystem_resolver
The filesystem resolver is installed with vLLM by default and enables loading LoRA adapters from a local directory structure.
Setup Steps
-
Create the LoRA adapter storage directory:
mkdir -p /path/to/lora/adapters -
Set environment variables:
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=true export VLLM_PLUGINS=lora_filesystem_resolver export VLLM_LORA_RESOLVER_CACHE_DIR=/path/to/lora/adapters -
Start vLLM server: Your base model can be
meta-llama/Llama-2-7b-hf. Please make sure you set up the Hugging Face token in your env varexport HF_TOKEN=xxx235.python -m vllm.entrypoints.openai.api_server \ --model your-base-model \ --enable-lora
Directory Structure Requirements
The filesystem resolver expects LoRA adapters to be organized in the following structure:
/path/to/lora/adapters/
├── adapter1/
│ ├── adapter_config.json
│ ├── adapter_model.bin
│ └── tokenizer files (if applicable)
├── adapter2/
│ ├── adapter_config.json
│ ├── adapter_model.bin
│ └── tokenizer files (if applicable)
└── ...
Each adapter directory must contain:
-
adapter_config.json: Required configuration file with the following structure:{ "peft_type": "LORA", "base_model_name_or_path": "your-base-model-name", "r": 16, "lora_alpha": 32, "target_modules": ["q_proj", "v_proj"], "bias": "none", "modules_to_save": null, "use_rslora": false, "use_dora": false } -
adapter_model.bin: The LoRA adapter weights file
Usage Example
-
Prepare your LoRA adapter:
# Assuming you have a LoRA adapter in /tmp/my_lora_adapter cp -r /tmp/my_lora_adapter /path/to/lora/adapters/my_sql_adapter -
Verify the directory structure:
ls -la /path/to/lora/adapters/my_sql_adapter/ # Should show: adapter_config.json, adapter_model.bin, etc. -
Make a request using the adapter:
curl http://localhost:8000/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "my_sql_adapter", "prompt": "Generate a SQL query for:", "max_tokens": 50, "temperature": 0.1 }'
How It Works
- When vLLM receives a request for a LoRA adapter named
my_sql_adapter - The filesystem resolver checks if
/path/to/lora/adapters/my_sql_adapter/exists - If found, it validates the
adapter_config.jsonfile - If the configuration matches the base model and is valid, the adapter is loaded
- The request is processed normally with the newly loaded adapter
- The adapter remains available for future requests
Advanced Configuration
Multiple Resolvers
You can configure multiple resolver plugins to load adapters from different sources:
'lora_s3_resolver' is an example of a custom resolver you would need to implement
export VLLM_PLUGINS=lora_filesystem_resolver,lora_s3_resolver
All listed resolvers are enabled; at request time, vLLM tries them in order until one succeeds.
Custom Resolver Implementation
To implement your own resolver plugin:
-
Create a new resolver class:
from vllm.lora.resolver import LoRAResolver, LoRAResolverRegistry from vllm.lora.request import LoRARequest class CustomResolver(LoRAResolver): async def resolve_lora(self, base_model_name: str, lora_name: str) -> Optional[LoRARequest]: # Your custom resolution logic here pass -
Register the resolver:
def register_custom_resolver(): resolver = CustomResolver() LoRAResolverRegistry.register_resolver("Custom Resolver", resolver)
Troubleshooting
Common Issues
-
"VLLM_LORA_RESOLVER_CACHE_DIR must be set to a valid directory"
- Ensure the directory exists and is accessible
- Check file permissions on the directory
-
"LoRA adapter not found"
- Verify the adapter directory name matches the requested model name
- Check that
adapter_config.jsonexists and is valid JSON - Ensure
adapter_model.binexists in the directory
-
"Invalid adapter configuration"
- Verify
peft_typeis set to "LORA" - Check that
base_model_name_or_pathmatches your base model - Ensure
target_modulesis properly configured
- Verify
-
"LoRA rank exceeds maximum"
- Check that
rvalue inadapter_config.jsondoesn't exceedmax_lora_ranksetting
- Check that
Debugging Tips
-
Enable debug logging:
export VLLM_LOGGING_LEVEL=DEBUG -
Verify environment variables:
echo $VLLM_ALLOW_RUNTIME_LORA_UPDATING echo $VLLM_PLUGINS echo $VLLM_LORA_RESOLVER_CACHE_DIR -
Test adapter configuration:
python -c " import json with open('/path/to/lora/adapters/my_adapter/adapter_config.json') as f: config = json.load(f) print('Config valid:', config) "