Modify README to include info on loading LLaMA (#18)

2026-05-15 10:40:11 +08:00 · 2023-04-01 01:07:57 +08:00 · 2023-04-01 01:07:57 +08:00 · e3f00d191e
commit e3f00d191e
parent 09e9245478
1 changed files with 16 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -53,3 +53,19 @@ python -m cacheflow.http_frontend.fastapi_frontend
 # At another terminal
 python -m cacheflow.http_frontend.gradio_webserver
 ```
 ## Load LLaMA weights
 Since LLaMA weight is not fully public, we cannot directly download the LLaMA weights from huggingface. Therefore, you need to follow the following process to load the LLaMA weights.
 1. Converting LLaMA weights to huggingface format with [this script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py).
    ```bash
    python src/transformers/models/llama/convert_llama_weights_to_hf.py \
        --input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path/llama-7b
    ```
    Please make sure that `llama` is included in the output directory name.
 2. For all the commands above, specify the model with `--model /output/path/llama-7b` to load the model. For example:
    ```bash
    python simple_server.py --model /output/path/llama-7b
    python -m cacheflow.http_frontend.fastapi_frontend --model /output/path/llama-7b
    ```