|
| 1 | +# LoRA Resolver Plugins |
| 2 | + |
| 3 | +This directory contains vLLM's LoRA resolver plugins built on the `LoRAResolver` framework. |
| 4 | +They automatically discover and load LoRA adapters from a specified local storage path, eliminating the need for manual configuration or server restarts. |
| 5 | + |
| 6 | +## Overview |
| 7 | + |
| 8 | +LoRA Resolver Plugins provide a flexible way to dynamically load LoRA adapters at runtime. When vLLM |
| 9 | +receives a request for a LoRA adapter that hasn't been loaded yet, the resolver plugins will attempt |
| 10 | +to locate and load the adapter from their configured storage locations. This enables: |
| 11 | + |
| 12 | +- **Dynamic LoRA Loading**: Load adapters on-demand without server restarts |
| 13 | +- **Multiple Storage Backends**: Support for filesystem, S3, and custom backends. The built-in `lora_filesystem_resolver` requires a local storage path, but custom resolvers can be implemented to fetch from any source. |
| 14 | +- **Automatic Discovery**: Seamless integration with existing LoRA workflows |
| 15 | +- **Scalable Deployment**: Centralized adapter management across multiple vLLM instances |
| 16 | + |
| 17 | +## Prerequisites |
| 18 | + |
| 19 | +Before using LoRA Resolver Plugins, ensure the following environment variables are configured: |
| 20 | + |
| 21 | +### Required Environment Variables |
| 22 | + |
| 23 | +1. **`VLLM_ALLOW_RUNTIME_LORA_UPDATING`**: Must be set to `true` or `1` to enable dynamic LoRA loading |
| 24 | + ```bash |
| 25 | + export VLLM_ALLOW_RUNTIME_LORA_UPDATING=true |
| 26 | + ``` |
| 27 | + |
| 28 | +2. **`VLLM_PLUGINS`**: Must include the desired resolver plugins (comma-separated list) |
| 29 | + ```bash |
| 30 | + export VLLM_PLUGINS=lora_filesystem_resolver |
| 31 | + ``` |
| 32 | + |
| 33 | +3. **`VLLM_LORA_RESOLVER_CACHE_DIR`**: Must be set to a valid directory path for filesystem resolver |
| 34 | + ```bash |
| 35 | + export VLLM_LORA_RESOLVER_CACHE_DIR=/path/to/lora/adapters |
| 36 | + ``` |
| 37 | + |
| 38 | +### Optional Environment Variables |
| 39 | + |
| 40 | +- **`VLLM_PLUGINS`**: If not set, all available plugins will be loaded. If set to empty string, no plugins will be loaded. |
| 41 | + |
| 42 | +## Available Resolvers |
| 43 | + |
| 44 | +### lora_filesystem_resolver |
| 45 | + |
| 46 | +The filesystem resolver is installed with vLLM by default and enables loading LoRA adapters from a local directory structure. |
| 47 | + |
| 48 | +#### Setup Steps |
| 49 | + |
| 50 | +1. **Create the LoRA adapter storage directory**: |
| 51 | + ```bash |
| 52 | + mkdir -p /path/to/lora/adapters |
| 53 | + ``` |
| 54 | + |
| 55 | +2. **Set environment variables**: |
| 56 | + ```bash |
| 57 | + export VLLM_ALLOW_RUNTIME_LORA_UPDATING=true |
| 58 | + export VLLM_PLUGINS=lora_filesystem_resolver |
| 59 | + export VLLM_LORA_RESOLVER_CACHE_DIR=/path/to/lora/adapters |
| 60 | + ``` |
| 61 | + |
| 62 | +3. **Start vLLM server**: |
| 63 | + Your base model can be `meta-llama/Llama-2-7b-hf`. Please make sure you set up the Hugging Face token in your env var `export HF_TOKEN=xxx235`. |
| 64 | + ```bash |
| 65 | + python -m vllm.entrypoints.openai.api_server \ |
| 66 | + --model your-base-model \ |
| 67 | + --enable-lora |
| 68 | + ``` |
| 69 | + |
| 70 | +#### Directory Structure Requirements |
| 71 | + |
| 72 | +The filesystem resolver expects LoRA adapters to be organized in the following structure: |
| 73 | + |
| 74 | +```text |
| 75 | +/path/to/lora/adapters/ |
| 76 | +├── adapter1/ |
| 77 | +│ ├── adapter_config.json |
| 78 | +│ ├── adapter_model.bin |
| 79 | +│ └── tokenizer files (if applicable) |
| 80 | +├── adapter2/ |
| 81 | +│ ├── adapter_config.json |
| 82 | +│ ├── adapter_model.bin |
| 83 | +│ └── tokenizer files (if applicable) |
| 84 | +└── ... |
| 85 | +``` |
| 86 | + |
| 87 | +Each adapter directory must contain: |
| 88 | + |
| 89 | +- **`adapter_config.json`**: Required configuration file with the following structure: |
| 90 | + ```json |
| 91 | + { |
| 92 | + "peft_type": "LORA", |
| 93 | + "base_model_name_or_path": "your-base-model-name", |
| 94 | + "r": 16, |
| 95 | + "lora_alpha": 32, |
| 96 | + "target_modules": ["q_proj", "v_proj"], |
| 97 | + "bias": "none", |
| 98 | + "modules_to_save": null, |
| 99 | + "use_rslora": false, |
| 100 | + "use_dora": false |
| 101 | + } |
| 102 | + ``` |
| 103 | + |
| 104 | +- **`adapter_model.bin`**: The LoRA adapter weights file |
| 105 | + |
| 106 | +#### Usage Example |
| 107 | + |
| 108 | +1. **Prepare your LoRA adapter**: |
| 109 | + ```bash |
| 110 | + # Assuming you have a LoRA adapter in /tmp/my_lora_adapter |
| 111 | + cp -r /tmp/my_lora_adapter /path/to/lora/adapters/my_sql_adapter |
| 112 | + ``` |
| 113 | + |
| 114 | +2. **Verify the directory structure**: |
| 115 | + ```bash |
| 116 | + ls -la /path/to/lora/adapters/my_sql_adapter/ |
| 117 | + # Should show: adapter_config.json, adapter_model.bin, etc. |
| 118 | + ``` |
| 119 | + |
| 120 | +3. **Make a request using the adapter**: |
| 121 | + ```bash |
| 122 | + curl http://localhost:8000/v1/completions \ |
| 123 | + -H "Content-Type: application/json" \ |
| 124 | + -d '{ |
| 125 | + "model": "my_sql_adapter", |
| 126 | + "prompt": "Generate a SQL query for:", |
| 127 | + "max_tokens": 50, |
| 128 | + "temperature": 0.1 |
| 129 | + }' |
| 130 | + ``` |
| 131 | + |
| 132 | +#### How It Works |
| 133 | + |
| 134 | +1. When vLLM receives a request for a LoRA adapter named `my_sql_adapter` |
| 135 | +2. The filesystem resolver checks if `/path/to/lora/adapters/my_sql_adapter/` exists |
| 136 | +3. If found, it validates the `adapter_config.json` file |
| 137 | +4. If the configuration matches the base model and is valid, the adapter is loaded |
| 138 | +5. The request is processed normally with the newly loaded adapter |
| 139 | +6. The adapter remains available for future requests |
| 140 | + |
| 141 | +## Advanced Configuration |
| 142 | + |
| 143 | +### Multiple Resolvers |
| 144 | + |
| 145 | +You can configure multiple resolver plugins to load adapters from different sources: |
| 146 | + |
| 147 | +'lora_s3_resolver' is an example of a custom resolver you would need to implement |
| 148 | + |
| 149 | +```bash |
| 150 | +export VLLM_PLUGINS=lora_filesystem_resolver,lora_s3_resolver |
| 151 | +``` |
| 152 | + |
| 153 | +All listed resolvers are enabled; at request time, vLLM tries them in order until one succeeds. |
| 154 | + |
| 155 | +### Custom Resolver Implementation |
| 156 | + |
| 157 | +To implement your own resolver plugin: |
| 158 | + |
| 159 | +1. **Create a new resolver class**: |
| 160 | + ```python |
| 161 | + from vllm.lora.resolver import LoRAResolver, LoRAResolverRegistry |
| 162 | + from vllm.lora.request import LoRARequest |
| 163 | + |
| 164 | + class CustomResolver(LoRAResolver): |
| 165 | + async def resolve_lora(self, base_model_name: str, lora_name: str) -> Optional[LoRARequest]: |
| 166 | + # Your custom resolution logic here |
| 167 | + pass |
| 168 | + ``` |
| 169 | + |
| 170 | +2. **Register the resolver**: |
| 171 | + ```python |
| 172 | + def register_custom_resolver(): |
| 173 | + resolver = CustomResolver() |
| 174 | + LoRAResolverRegistry.register_resolver("Custom Resolver", resolver) |
| 175 | + ``` |
| 176 | + |
| 177 | +## Troubleshooting |
| 178 | + |
| 179 | +### Common Issues |
| 180 | + |
| 181 | +1. **"VLLM_LORA_RESOLVER_CACHE_DIR must be set to a valid directory"** |
| 182 | + - Ensure the directory exists and is accessible |
| 183 | + - Check file permissions on the directory |
| 184 | + |
| 185 | +2. **"LoRA adapter not found"** |
| 186 | + - Verify the adapter directory name matches the requested model name |
| 187 | + - Check that `adapter_config.json` exists and is valid JSON |
| 188 | + - Ensure `adapter_model.bin` exists in the directory |
| 189 | + |
| 190 | +3. **"Invalid adapter configuration"** |
| 191 | + - Verify `peft_type` is set to "LORA" |
| 192 | + - Check that `base_model_name_or_path` matches your base model |
| 193 | + - Ensure `target_modules` is properly configured |
| 194 | + |
| 195 | +4. **"LoRA rank exceeds maximum"** |
| 196 | + - Check that `r` value in `adapter_config.json` doesn't exceed `max_lora_rank` setting |
| 197 | + |
| 198 | +### Debugging Tips |
| 199 | + |
| 200 | +1. **Enable debug logging**: |
| 201 | + ```bash |
| 202 | + export VLLM_LOGGING_LEVEL=DEBUG |
| 203 | + ``` |
| 204 | + |
| 205 | +2. **Verify environment variables**: |
| 206 | + ```bash |
| 207 | + echo $VLLM_ALLOW_RUNTIME_LORA_UPDATING |
| 208 | + echo $VLLM_PLUGINS |
| 209 | + echo $VLLM_LORA_RESOLVER_CACHE_DIR |
| 210 | + ``` |
| 211 | + |
| 212 | +3. **Test adapter configuration**: |
| 213 | + ```bash |
| 214 | + python -c " |
| 215 | + import json |
| 216 | + with open('/path/to/lora/adapters/my_adapter/adapter_config.json') as f: |
| 217 | + config = json.load(f) |
| 218 | + print('Config valid:', config) |
| 219 | + " |
| 220 | + ``` |
0 commit comments