mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2026-05-24 07:37:53 +08:00
docs(lora_resolvers): clarify multi-resolver order and storage path requirement (#28153)
Signed-off-by: Chen Wang <Chen.Wang1@ibm.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
parent
cdd7025961
commit
9261eb3dc1
@ -3,6 +3,8 @@ MD007:
|
|||||||
MD013: false
|
MD013: false
|
||||||
MD024:
|
MD024:
|
||||||
siblings_only: true
|
siblings_only: true
|
||||||
|
MD031:
|
||||||
|
list_items: false
|
||||||
MD033: false
|
MD033: false
|
||||||
MD045: false
|
MD045: false
|
||||||
MD046: false
|
MD046: false
|
||||||
|
|||||||
@ -46,7 +46,10 @@ nav:
|
|||||||
- contributing/model/multimodal.md
|
- contributing/model/multimodal.md
|
||||||
- contributing/model/transcription.md
|
- contributing/model/transcription.md
|
||||||
- CI: contributing/ci
|
- CI: contributing/ci
|
||||||
- Design Documents: design
|
- Design Documents:
|
||||||
|
- Plugins:
|
||||||
|
- design/*plugin*.md
|
||||||
|
- design/*
|
||||||
- API Reference:
|
- API Reference:
|
||||||
- api/README.md
|
- api/README.md
|
||||||
- api/vllm
|
- api/vllm
|
||||||
|
|||||||
220
docs/design/lora_resolver_plugins.md
Normal file
220
docs/design/lora_resolver_plugins.md
Normal file
@ -0,0 +1,220 @@
|
|||||||
|
# LoRA Resolver Plugins
|
||||||
|
|
||||||
|
This directory contains vLLM's LoRA resolver plugins built on the `LoRAResolver` framework.
|
||||||
|
They automatically discover and load LoRA adapters from a specified local storage path, eliminating the need for manual configuration or server restarts.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
LoRA Resolver Plugins provide a flexible way to dynamically load LoRA adapters at runtime. When vLLM
|
||||||
|
receives a request for a LoRA adapter that hasn't been loaded yet, the resolver plugins will attempt
|
||||||
|
to locate and load the adapter from their configured storage locations. This enables:
|
||||||
|
|
||||||
|
- **Dynamic LoRA Loading**: Load adapters on-demand without server restarts
|
||||||
|
- **Multiple Storage Backends**: Support for filesystem, S3, and custom backends. The built-in `lora_filesystem_resolver` requires a local storage path, but custom resolvers can be implemented to fetch from any source.
|
||||||
|
- **Automatic Discovery**: Seamless integration with existing LoRA workflows
|
||||||
|
- **Scalable Deployment**: Centralized adapter management across multiple vLLM instances
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
Before using LoRA Resolver Plugins, ensure the following environment variables are configured:
|
||||||
|
|
||||||
|
### Required Environment Variables
|
||||||
|
|
||||||
|
1. **`VLLM_ALLOW_RUNTIME_LORA_UPDATING`**: Must be set to `true` or `1` to enable dynamic LoRA loading
|
||||||
|
```bash
|
||||||
|
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=true
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **`VLLM_PLUGINS`**: Must include the desired resolver plugins (comma-separated list)
|
||||||
|
```bash
|
||||||
|
export VLLM_PLUGINS=lora_filesystem_resolver
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **`VLLM_LORA_RESOLVER_CACHE_DIR`**: Must be set to a valid directory path for filesystem resolver
|
||||||
|
```bash
|
||||||
|
export VLLM_LORA_RESOLVER_CACHE_DIR=/path/to/lora/adapters
|
||||||
|
```
|
||||||
|
|
||||||
|
### Optional Environment Variables
|
||||||
|
|
||||||
|
- **`VLLM_PLUGINS`**: If not set, all available plugins will be loaded. If set to empty string, no plugins will be loaded.
|
||||||
|
|
||||||
|
## Available Resolvers
|
||||||
|
|
||||||
|
### lora_filesystem_resolver
|
||||||
|
|
||||||
|
The filesystem resolver is installed with vLLM by default and enables loading LoRA adapters from a local directory structure.
|
||||||
|
|
||||||
|
#### Setup Steps
|
||||||
|
|
||||||
|
1. **Create the LoRA adapter storage directory**:
|
||||||
|
```bash
|
||||||
|
mkdir -p /path/to/lora/adapters
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Set environment variables**:
|
||||||
|
```bash
|
||||||
|
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=true
|
||||||
|
export VLLM_PLUGINS=lora_filesystem_resolver
|
||||||
|
export VLLM_LORA_RESOLVER_CACHE_DIR=/path/to/lora/adapters
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Start vLLM server**:
|
||||||
|
Your base model can be `meta-llama/Llama-2-7b-hf`. Please make sure you set up the Hugging Face token in your env var `export HF_TOKEN=xxx235`.
|
||||||
|
```bash
|
||||||
|
python -m vllm.entrypoints.openai.api_server \
|
||||||
|
--model your-base-model \
|
||||||
|
--enable-lora
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Directory Structure Requirements
|
||||||
|
|
||||||
|
The filesystem resolver expects LoRA adapters to be organized in the following structure:
|
||||||
|
|
||||||
|
```text
|
||||||
|
/path/to/lora/adapters/
|
||||||
|
├── adapter1/
|
||||||
|
│ ├── adapter_config.json
|
||||||
|
│ ├── adapter_model.bin
|
||||||
|
│ └── tokenizer files (if applicable)
|
||||||
|
├── adapter2/
|
||||||
|
│ ├── adapter_config.json
|
||||||
|
│ ├── adapter_model.bin
|
||||||
|
│ └── tokenizer files (if applicable)
|
||||||
|
└── ...
|
||||||
|
```
|
||||||
|
|
||||||
|
Each adapter directory must contain:
|
||||||
|
|
||||||
|
- **`adapter_config.json`**: Required configuration file with the following structure:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"peft_type": "LORA",
|
||||||
|
"base_model_name_or_path": "your-base-model-name",
|
||||||
|
"r": 16,
|
||||||
|
"lora_alpha": 32,
|
||||||
|
"target_modules": ["q_proj", "v_proj"],
|
||||||
|
"bias": "none",
|
||||||
|
"modules_to_save": null,
|
||||||
|
"use_rslora": false,
|
||||||
|
"use_dora": false
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- **`adapter_model.bin`**: The LoRA adapter weights file
|
||||||
|
|
||||||
|
#### Usage Example
|
||||||
|
|
||||||
|
1. **Prepare your LoRA adapter**:
|
||||||
|
```bash
|
||||||
|
# Assuming you have a LoRA adapter in /tmp/my_lora_adapter
|
||||||
|
cp -r /tmp/my_lora_adapter /path/to/lora/adapters/my_sql_adapter
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Verify the directory structure**:
|
||||||
|
```bash
|
||||||
|
ls -la /path/to/lora/adapters/my_sql_adapter/
|
||||||
|
# Should show: adapter_config.json, adapter_model.bin, etc.
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Make a request using the adapter**:
|
||||||
|
```bash
|
||||||
|
curl http://localhost:8000/v1/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"model": "my_sql_adapter",
|
||||||
|
"prompt": "Generate a SQL query for:",
|
||||||
|
"max_tokens": 50,
|
||||||
|
"temperature": 0.1
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
#### How It Works
|
||||||
|
|
||||||
|
1. When vLLM receives a request for a LoRA adapter named `my_sql_adapter`
|
||||||
|
2. The filesystem resolver checks if `/path/to/lora/adapters/my_sql_adapter/` exists
|
||||||
|
3. If found, it validates the `adapter_config.json` file
|
||||||
|
4. If the configuration matches the base model and is valid, the adapter is loaded
|
||||||
|
5. The request is processed normally with the newly loaded adapter
|
||||||
|
6. The adapter remains available for future requests
|
||||||
|
|
||||||
|
## Advanced Configuration
|
||||||
|
|
||||||
|
### Multiple Resolvers
|
||||||
|
|
||||||
|
You can configure multiple resolver plugins to load adapters from different sources:
|
||||||
|
|
||||||
|
'lora_s3_resolver' is an example of a custom resolver you would need to implement
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export VLLM_PLUGINS=lora_filesystem_resolver,lora_s3_resolver
|
||||||
|
```
|
||||||
|
|
||||||
|
All listed resolvers are enabled; at request time, vLLM tries them in order until one succeeds.
|
||||||
|
|
||||||
|
### Custom Resolver Implementation
|
||||||
|
|
||||||
|
To implement your own resolver plugin:
|
||||||
|
|
||||||
|
1. **Create a new resolver class**:
|
||||||
|
```python
|
||||||
|
from vllm.lora.resolver import LoRAResolver, LoRAResolverRegistry
|
||||||
|
from vllm.lora.request import LoRARequest
|
||||||
|
|
||||||
|
class CustomResolver(LoRAResolver):
|
||||||
|
async def resolve_lora(self, base_model_name: str, lora_name: str) -> Optional[LoRARequest]:
|
||||||
|
# Your custom resolution logic here
|
||||||
|
pass
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Register the resolver**:
|
||||||
|
```python
|
||||||
|
def register_custom_resolver():
|
||||||
|
resolver = CustomResolver()
|
||||||
|
LoRAResolverRegistry.register_resolver("Custom Resolver", resolver)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Common Issues
|
||||||
|
|
||||||
|
1. **"VLLM_LORA_RESOLVER_CACHE_DIR must be set to a valid directory"**
|
||||||
|
- Ensure the directory exists and is accessible
|
||||||
|
- Check file permissions on the directory
|
||||||
|
|
||||||
|
2. **"LoRA adapter not found"**
|
||||||
|
- Verify the adapter directory name matches the requested model name
|
||||||
|
- Check that `adapter_config.json` exists and is valid JSON
|
||||||
|
- Ensure `adapter_model.bin` exists in the directory
|
||||||
|
|
||||||
|
3. **"Invalid adapter configuration"**
|
||||||
|
- Verify `peft_type` is set to "LORA"
|
||||||
|
- Check that `base_model_name_or_path` matches your base model
|
||||||
|
- Ensure `target_modules` is properly configured
|
||||||
|
|
||||||
|
4. **"LoRA rank exceeds maximum"**
|
||||||
|
- Check that `r` value in `adapter_config.json` doesn't exceed `max_lora_rank` setting
|
||||||
|
|
||||||
|
### Debugging Tips
|
||||||
|
|
||||||
|
1. **Enable debug logging**:
|
||||||
|
```bash
|
||||||
|
export VLLM_LOGGING_LEVEL=DEBUG
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Verify environment variables**:
|
||||||
|
```bash
|
||||||
|
echo $VLLM_ALLOW_RUNTIME_LORA_UPDATING
|
||||||
|
echo $VLLM_PLUGINS
|
||||||
|
echo $VLLM_LORA_RESOLVER_CACHE_DIR
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Test adapter configuration**:
|
||||||
|
```bash
|
||||||
|
python -c "
|
||||||
|
import json
|
||||||
|
with open('/path/to/lora/adapters/my_adapter/adapter_config.json') as f:
|
||||||
|
config = json.load(f)
|
||||||
|
print('Config valid:', config)
|
||||||
|
"
|
||||||
|
```
|
||||||
@ -1,16 +0,0 @@
|
|||||||
# LoRA Resolver Plugins
|
|
||||||
|
|
||||||
This directory contains vLLM general plugins for dynamically discovering and loading LoRA adapters
|
|
||||||
via the LoRAResolver plugin framework.
|
|
||||||
|
|
||||||
Note that `VLLM_ALLOW_RUNTIME_LORA_UPDATING` must be set to true to allow LoRA resolver plugins
|
|
||||||
to work, and `VLLM_PLUGINS` must be set to include the desired resolver plugins.
|
|
||||||
|
|
||||||
## lora_filesystem_resolver
|
|
||||||
|
|
||||||
This LoRA Resolver is installed with vLLM by default.
|
|
||||||
To use, set `VLLM_PLUGIN_LORA_CACHE_DIR` to a local directory. When vLLM receives a request
|
|
||||||
for a LoRA adapter `foobar` it doesn't currently recognize, it will look in that local directory
|
|
||||||
for a subdirectory `foobar` containing a LoRA adapter. If such an adapter exists, it will
|
|
||||||
load that adapter, and then service the request as normal. That adapter will then be available
|
|
||||||
for future requests as normal.
|
|
||||||
Loading…
x
Reference in New Issue
Block a user