[doc][faq] add warning to download models for every nodes (#5783)

2025-12-10 00:06:06 +08:00 · 2024-06-24 00:37:42 -07:00 · 2024-06-24 00:37:42 -07:00 · c246212952
commit c246212952
parent edd5fe5fa2
1 changed files with 4 additions and 1 deletions
--- a/docs/source/serving/distributed_serving.rst
+++ b/docs/source/serving/distributed_serving.rst
@ -36,3 +36,6 @@ To scale vLLM beyond a single machine, install and start a `Ray runtime <https:/
    $ ray start --address=<ray-head-address>

 After that, you can run inference and serving on multiple machines by launching the vLLM process on the head node by setting :code:`tensor_parallel_size` to the number of GPUs to be the total number of GPUs across all machines.
+
+.. warning::
+    Please make sure you downloaded the model to all the nodes, or the model is downloaded to some distributed file system that is accessible by all nodes.