vllm/triton.md at f17aec0d6350303b46ee58d27a6fc83ddf9583b2

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-10 06:55:01 +08:00

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

2025-05-23 02:09:53 -07:00

title
NVIDIA Triton

{ #deployment-triton }

The Triton Inference Server hosts a tutorial demonstrating how to quickly deploy a simple facebook/opt-125m model using vLLM. Please see Deploying a vLLM model in Triton for more details.