mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-10 03:54:56 +08:00
949 B
949 B
Llama Stack
vLLM is also available via Llama Stack.
To install Llama Stack, run
pip install llama-stack -q
Inference using OpenAI-Compatible API
Then start the Llama Stack server and configure it to point to your vLLM server with the following settings:
inference:
- provider_id: vllm0
provider_type: remote::vllm
config:
url: http://127.0.0.1:8000
Please refer to this guide for more details on this remote vLLM provider.
Inference using Embedded vLLM
An inline provider is also available. This is a sample of configuration using that method:
inference:
- provider_type: vllm
config:
model: Llama3.1-8B-Instruct
tensor_parallel_size: 4