mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-22 07:35:01 +08:00
Without this PR
---------------
Quantizing models with llm-compressor and a recipe that explicitly lists
names of layers produces a model that is not loadable by vLLM (i.e.
`vllm serve <model>` fails with `raise ValueError(f"Unable to find
matching target for {module} in the ...`).
Example recipe:
```
recipe = """
quantization_stage:
run_type: oneshot
quantization_modifiers:
GPTQModifier:
ignore: ["lm_head"]
config_groups:
group_0:
weights:
num_bits: 4
type: "int"
symmetric: true
strategy: "group"
group_size: 128
targets: [
"model.layers.0.mlp.down_proj",
"model.layers.2.mlp.down_proj",
"model.layers.3.mlp.down_proj",
"model.layers.4.mlp.down_proj",
"model.layers.5.mlp.down_proj",
"model.layers.6.mlp.down_proj",
"model.layers.7.mlp.down_proj",
"model.layers.8.mlp.down_proj",
"model.layers.9.mlp.down_proj",
"model.layers.10.mlp.down_proj",
"model.layers.11.mlp.down_proj",
"model.layers.12.mlp.down_proj",
"model.layers.13.mlp.down_proj",
"model.layers.14.mlp.down_proj",
"model.layers.15.mlp.down_proj",
"model.layers.16.mlp.down_proj",
"model.layers.17.mlp.down_proj",
"model.layers.19.mlp.down_proj",
"model.layers.21.mlp.down_proj",
"model.layers.22.mlp.down_proj",
.
.
.
]
"""
```
To reproduce the vLLM error:
```bash
vllm serve nm-testing/eldar-test
```
With this PR
------------
Models are loaded correctly without any errors.