Michael Yao c2a8b08fcd
[Doc] Fix issues in integrations/llamastack.md (#24428)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-09-08 02:28:32 -07:00

949 B

Llama Stack

vLLM is also available via Llama Stack.

To install Llama Stack, run

pip install llama-stack -q

Inference using OpenAI-Compatible API

Then start the Llama Stack server and configure it to point to your vLLM server with the following settings:

inference:
  - provider_id: vllm0
    provider_type: remote::vllm
    config:
      url: http://127.0.0.1:8000

Please refer to this guide for more details on this remote vLLM provider.

Inference using Embedded vLLM

An inline provider is also available. This is a sample of configuration using that method:

inference:
  - provider_type: vllm
    config:
      model: Llama3.1-8B-Instruct
      tensor_parallel_size: 4