mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-10 00:06:06 +08:00
Modify README to include info on loading LLaMA (#18)
This commit is contained in:
parent
09e9245478
commit
e3f00d191e
16
README.md
16
README.md
@ -53,3 +53,19 @@ python -m cacheflow.http_frontend.fastapi_frontend
|
|||||||
# At another terminal
|
# At another terminal
|
||||||
python -m cacheflow.http_frontend.gradio_webserver
|
python -m cacheflow.http_frontend.gradio_webserver
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Load LLaMA weights
|
||||||
|
|
||||||
|
Since LLaMA weight is not fully public, we cannot directly download the LLaMA weights from huggingface. Therefore, you need to follow the following process to load the LLaMA weights.
|
||||||
|
|
||||||
|
1. Converting LLaMA weights to huggingface format with [this script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py).
|
||||||
|
```bash
|
||||||
|
python src/transformers/models/llama/convert_llama_weights_to_hf.py \
|
||||||
|
--input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path/llama-7b
|
||||||
|
```
|
||||||
|
Please make sure that `llama` is included in the output directory name.
|
||||||
|
2. For all the commands above, specify the model with `--model /output/path/llama-7b` to load the model. For example:
|
||||||
|
```bash
|
||||||
|
python simple_server.py --model /output/path/llama-7b
|
||||||
|
python -m cacheflow.http_frontend.fastapi_frontend --model /output/path/llama-7b
|
||||||
|
```
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user