mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2026-01-11 19:04:29 +08:00
[Doc] Add documentation for specifying model architecture (#12105)
This commit is contained in:
parent
fa0050db08
commit
f8ef146f03
@ -31,6 +31,59 @@ Please refer to the above pages for more details about each API.
|
||||
This section lists the most common options for running the vLLM engine.
|
||||
For a full list, refer to the [Engine Arguments](#engine-args) page.
|
||||
|
||||
### Model resolution
|
||||
|
||||
vLLM loads HuggingFace-compatible models by inspecting the `architectures` field in `config.json` of the model repository
|
||||
and finding the corresponding implementation that is registered to vLLM.
|
||||
Nevertheless, our model resolution may fail for the following reasons:
|
||||
|
||||
- The `config.json` of the model repository lacks the `architectures` field.
|
||||
- Unofficial repositories refer to a model using alternative names which are not recorded in vLLM.
|
||||
- The same architecture name is used for multiple models, creating ambiguity as to which model should be loaded.
|
||||
|
||||
In those cases, vLLM may throw an error like:
|
||||
|
||||
```text
|
||||
Traceback (most recent call last):
|
||||
...
|
||||
File "vllm/model_executor/models/registry.py", line xxx, in inspect_model_cls
|
||||
for arch in architectures:
|
||||
TypeError: 'NoneType' object is not iterable
|
||||
```
|
||||
|
||||
or:
|
||||
|
||||
```text
|
||||
File "vllm/model_executor/models/registry.py", line xxx, in _raise_for_unsupported
|
||||
raise ValueError(
|
||||
ValueError: Model architectures ['<arch>'] are not supported for now. Supported architectures: [...]
|
||||
```
|
||||
|
||||
:::{note}
|
||||
The above error is distinct from the following similar but different error:
|
||||
|
||||
```text
|
||||
File "vllm/model_executor/models/registry.py", line xxx, in _raise_for_unsupported
|
||||
raise ValueError(
|
||||
ValueError: Model architectures ['<arch>'] failed to be inspected. Please check the logs for more details.
|
||||
```
|
||||
|
||||
This error means that vLLM failed to import the model file. Usually, it is related to missing dependencies or outdated
|
||||
binaries in the vLLM build. Please read the logs carefully to determine the real cause of the error.
|
||||
:::
|
||||
|
||||
To fix this, explicitly specify the model architecture by passing `config.json` overrides to the `hf_overrides` option.
|
||||
For example:
|
||||
|
||||
```python
|
||||
model = LLM(
|
||||
model="cerebras/Cerebras-GPT-1.3B",
|
||||
hf_overrides={"architectures": ["GPT2LMHeadModel"]}, # GPT-2
|
||||
)
|
||||
```
|
||||
|
||||
Our [list of supported models](#supported-models) shows the model architectures that are recognized by vLLM.
|
||||
|
||||
### Reducing memory usage
|
||||
|
||||
Large models might cause your machine to run out of memory (OOM). Here are some options that help alleviate this problem.
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user