mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-11 00:35:01 +08:00
Add documentation to Triton server tutorial (#983)
This commit is contained in:
parent
bc0644574c
commit
6f2dd6c37e
@ -64,6 +64,7 @@ Documentation
|
|||||||
|
|
||||||
serving/distributed_serving
|
serving/distributed_serving
|
||||||
serving/run_on_sky
|
serving/run_on_sky
|
||||||
|
serving/deploying_with_triton
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|||||||
6
docs/source/serving/deploying_with_triton.rst
Normal file
6
docs/source/serving/deploying_with_triton.rst
Normal file
@ -0,0 +1,6 @@
|
|||||||
|
.. _deploying_with_triton:
|
||||||
|
|
||||||
|
Deploying with NVIDIA Triton
|
||||||
|
============================
|
||||||
|
|
||||||
|
The `Triton Inference Server <https://github.com/triton-inference-server>`_ hosts a tutorial demonstrating how to quickly deploy a simple `facebook/opt-125m <https://huggingface.co/facebook/opt-125m>`_ model using vLLM. Please see `Deploying a vLLM model in Triton <https://github.com/triton-inference-server/tutorials/blob/main/Quick_Deploy/vLLM/README.md#deploying-a-vllm-model-in-triton>`_ for more details.
|
||||||
Loading…
x
Reference in New Issue
Block a user