mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2026-01-24 00:14:34 +08:00
docs(lora_resolvers): clarify multi-resolver order and storage path requirement (#28153)
Signed-off-by: Chen Wang <Chen.Wang1@ibm.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
parent
cdd7025961
commit
9261eb3dc1
@ -3,6 +3,8 @@ MD007:
|
||||
MD013: false
|
||||
MD024:
|
||||
siblings_only: true
|
||||
MD031:
|
||||
list_items: false
|
||||
MD033: false
|
||||
MD045: false
|
||||
MD046: false
|
||||
|
||||
@ -46,7 +46,10 @@ nav:
|
||||
- contributing/model/multimodal.md
|
||||
- contributing/model/transcription.md
|
||||
- CI: contributing/ci
|
||||
- Design Documents: design
|
||||
- Design Documents:
|
||||
- Plugins:
|
||||
- design/*plugin*.md
|
||||
- design/*
|
||||
- API Reference:
|
||||
- api/README.md
|
||||
- api/vllm
|
||||
|
||||
220
docs/design/lora_resolver_plugins.md
Normal file
220
docs/design/lora_resolver_plugins.md
Normal file
@ -0,0 +1,220 @@
|
||||
# LoRA Resolver Plugins
|
||||
|
||||
This directory contains vLLM's LoRA resolver plugins built on the `LoRAResolver` framework.
|
||||
They automatically discover and load LoRA adapters from a specified local storage path, eliminating the need for manual configuration or server restarts.
|
||||
|
||||
## Overview
|
||||
|
||||
LoRA Resolver Plugins provide a flexible way to dynamically load LoRA adapters at runtime. When vLLM
|
||||
receives a request for a LoRA adapter that hasn't been loaded yet, the resolver plugins will attempt
|
||||
to locate and load the adapter from their configured storage locations. This enables:
|
||||
|
||||
- **Dynamic LoRA Loading**: Load adapters on-demand without server restarts
|
||||
- **Multiple Storage Backends**: Support for filesystem, S3, and custom backends. The built-in `lora_filesystem_resolver` requires a local storage path, but custom resolvers can be implemented to fetch from any source.
|
||||
- **Automatic Discovery**: Seamless integration with existing LoRA workflows
|
||||
- **Scalable Deployment**: Centralized adapter management across multiple vLLM instances
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before using LoRA Resolver Plugins, ensure the following environment variables are configured:
|
||||
|
||||
### Required Environment Variables
|
||||
|
||||
1. **`VLLM_ALLOW_RUNTIME_LORA_UPDATING`**: Must be set to `true` or `1` to enable dynamic LoRA loading
|
||||
```bash
|
||||
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=true
|
||||
```
|
||||
|
||||
2. **`VLLM_PLUGINS`**: Must include the desired resolver plugins (comma-separated list)
|
||||
```bash
|
||||
export VLLM_PLUGINS=lora_filesystem_resolver
|
||||
```
|
||||
|
||||
3. **`VLLM_LORA_RESOLVER_CACHE_DIR`**: Must be set to a valid directory path for filesystem resolver
|
||||
```bash
|
||||
export VLLM_LORA_RESOLVER_CACHE_DIR=/path/to/lora/adapters
|
||||
```
|
||||
|
||||
### Optional Environment Variables
|
||||
|
||||
- **`VLLM_PLUGINS`**: If not set, all available plugins will be loaded. If set to empty string, no plugins will be loaded.
|
||||
|
||||
## Available Resolvers
|
||||
|
||||
### lora_filesystem_resolver
|
||||
|
||||
The filesystem resolver is installed with vLLM by default and enables loading LoRA adapters from a local directory structure.
|
||||
|
||||
#### Setup Steps
|
||||
|
||||
1. **Create the LoRA adapter storage directory**:
|
||||
```bash
|
||||
mkdir -p /path/to/lora/adapters
|
||||
```
|
||||
|
||||
2. **Set environment variables**:
|
||||
```bash
|
||||
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=true
|
||||
export VLLM_PLUGINS=lora_filesystem_resolver
|
||||
export VLLM_LORA_RESOLVER_CACHE_DIR=/path/to/lora/adapters
|
||||
```
|
||||
|
||||
3. **Start vLLM server**:
|
||||
Your base model can be `meta-llama/Llama-2-7b-hf`. Please make sure you set up the Hugging Face token in your env var `export HF_TOKEN=xxx235`.
|
||||
```bash
|
||||
python -m vllm.entrypoints.openai.api_server \
|
||||
--model your-base-model \
|
||||
--enable-lora
|
||||
```
|
||||
|
||||
#### Directory Structure Requirements
|
||||
|
||||
The filesystem resolver expects LoRA adapters to be organized in the following structure:
|
||||
|
||||
```text
|
||||
/path/to/lora/adapters/
|
||||
├── adapter1/
|
||||
│ ├── adapter_config.json
|
||||
│ ├── adapter_model.bin
|
||||
│ └── tokenizer files (if applicable)
|
||||
├── adapter2/
|
||||
│ ├── adapter_config.json
|
||||
│ ├── adapter_model.bin
|
||||
│ └── tokenizer files (if applicable)
|
||||
└── ...
|
||||
```
|
||||
|
||||
Each adapter directory must contain:
|
||||
|
||||
- **`adapter_config.json`**: Required configuration file with the following structure:
|
||||
```json
|
||||
{
|
||||
"peft_type": "LORA",
|
||||
"base_model_name_or_path": "your-base-model-name",
|
||||
"r": 16,
|
||||
"lora_alpha": 32,
|
||||
"target_modules": ["q_proj", "v_proj"],
|
||||
"bias": "none",
|
||||
"modules_to_save": null,
|
||||
"use_rslora": false,
|
||||
"use_dora": false
|
||||
}
|
||||
```
|
||||
|
||||
- **`adapter_model.bin`**: The LoRA adapter weights file
|
||||
|
||||
#### Usage Example
|
||||
|
||||
1. **Prepare your LoRA adapter**:
|
||||
```bash
|
||||
# Assuming you have a LoRA adapter in /tmp/my_lora_adapter
|
||||
cp -r /tmp/my_lora_adapter /path/to/lora/adapters/my_sql_adapter
|
||||
```
|
||||
|
||||
2. **Verify the directory structure**:
|
||||
```bash
|
||||
ls -la /path/to/lora/adapters/my_sql_adapter/
|
||||
# Should show: adapter_config.json, adapter_model.bin, etc.
|
||||
```
|
||||
|
||||
3. **Make a request using the adapter**:
|
||||
```bash
|
||||
curl http://localhost:8000/v1/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "my_sql_adapter",
|
||||
"prompt": "Generate a SQL query for:",
|
||||
"max_tokens": 50,
|
||||
"temperature": 0.1
|
||||
}'
|
||||
```
|
||||
|
||||
#### How It Works
|
||||
|
||||
1. When vLLM receives a request for a LoRA adapter named `my_sql_adapter`
|
||||
2. The filesystem resolver checks if `/path/to/lora/adapters/my_sql_adapter/` exists
|
||||
3. If found, it validates the `adapter_config.json` file
|
||||
4. If the configuration matches the base model and is valid, the adapter is loaded
|
||||
5. The request is processed normally with the newly loaded adapter
|
||||
6. The adapter remains available for future requests
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
### Multiple Resolvers
|
||||
|
||||
You can configure multiple resolver plugins to load adapters from different sources:
|
||||
|
||||
'lora_s3_resolver' is an example of a custom resolver you would need to implement
|
||||
|
||||
```bash
|
||||
export VLLM_PLUGINS=lora_filesystem_resolver,lora_s3_resolver
|
||||
```
|
||||
|
||||
All listed resolvers are enabled; at request time, vLLM tries them in order until one succeeds.
|
||||
|
||||
### Custom Resolver Implementation
|
||||
|
||||
To implement your own resolver plugin:
|
||||
|
||||
1. **Create a new resolver class**:
|
||||
```python
|
||||
from vllm.lora.resolver import LoRAResolver, LoRAResolverRegistry
|
||||
from vllm.lora.request import LoRARequest
|
||||
|
||||
class CustomResolver(LoRAResolver):
|
||||
async def resolve_lora(self, base_model_name: str, lora_name: str) -> Optional[LoRARequest]:
|
||||
# Your custom resolution logic here
|
||||
pass
|
||||
```
|
||||
|
||||
2. **Register the resolver**:
|
||||
```python
|
||||
def register_custom_resolver():
|
||||
resolver = CustomResolver()
|
||||
LoRAResolverRegistry.register_resolver("Custom Resolver", resolver)
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **"VLLM_LORA_RESOLVER_CACHE_DIR must be set to a valid directory"**
|
||||
- Ensure the directory exists and is accessible
|
||||
- Check file permissions on the directory
|
||||
|
||||
2. **"LoRA adapter not found"**
|
||||
- Verify the adapter directory name matches the requested model name
|
||||
- Check that `adapter_config.json` exists and is valid JSON
|
||||
- Ensure `adapter_model.bin` exists in the directory
|
||||
|
||||
3. **"Invalid adapter configuration"**
|
||||
- Verify `peft_type` is set to "LORA"
|
||||
- Check that `base_model_name_or_path` matches your base model
|
||||
- Ensure `target_modules` is properly configured
|
||||
|
||||
4. **"LoRA rank exceeds maximum"**
|
||||
- Check that `r` value in `adapter_config.json` doesn't exceed `max_lora_rank` setting
|
||||
|
||||
### Debugging Tips
|
||||
|
||||
1. **Enable debug logging**:
|
||||
```bash
|
||||
export VLLM_LOGGING_LEVEL=DEBUG
|
||||
```
|
||||
|
||||
2. **Verify environment variables**:
|
||||
```bash
|
||||
echo $VLLM_ALLOW_RUNTIME_LORA_UPDATING
|
||||
echo $VLLM_PLUGINS
|
||||
echo $VLLM_LORA_RESOLVER_CACHE_DIR
|
||||
```
|
||||
|
||||
3. **Test adapter configuration**:
|
||||
```bash
|
||||
python -c "
|
||||
import json
|
||||
with open('/path/to/lora/adapters/my_adapter/adapter_config.json') as f:
|
||||
config = json.load(f)
|
||||
print('Config valid:', config)
|
||||
"
|
||||
```
|
||||
@ -1,16 +0,0 @@
|
||||
# LoRA Resolver Plugins
|
||||
|
||||
This directory contains vLLM general plugins for dynamically discovering and loading LoRA adapters
|
||||
via the LoRAResolver plugin framework.
|
||||
|
||||
Note that `VLLM_ALLOW_RUNTIME_LORA_UPDATING` must be set to true to allow LoRA resolver plugins
|
||||
to work, and `VLLM_PLUGINS` must be set to include the desired resolver plugins.
|
||||
|
||||
## lora_filesystem_resolver
|
||||
|
||||
This LoRA Resolver is installed with vLLM by default.
|
||||
To use, set `VLLM_PLUGIN_LORA_CACHE_DIR` to a local directory. When vLLM receives a request
|
||||
for a LoRA adapter `foobar` it doesn't currently recognize, it will look in that local directory
|
||||
for a subdirectory `foobar` containing a LoRA adapter. If such an adapter exists, it will
|
||||
load that adapter, and then service the request as normal. That adapter will then be available
|
||||
for future requests as normal.
|
||||
Loading…
x
Reference in New Issue
Block a user