From 1b1e8e05ff3c26b98e4161bd3c8671e86fb145f4 Mon Sep 17 00:00:00 2001 From: Reid <61492567+reidliu41@users.noreply.github.com> Date: Tue, 20 May 2025 16:53:27 +0800 Subject: [PATCH] [doc] update env variable export (#18391) Signed-off-by: reidliu41 Co-authored-by: reidliu41 --- docs/source/getting_started/quickstart.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/source/getting_started/quickstart.md b/docs/source/getting_started/quickstart.md index 298ba59f7d8b..42468ff73c2c 100644 --- a/docs/source/getting_started/quickstart.md +++ b/docs/source/getting_started/quickstart.md @@ -82,6 +82,11 @@ llm = LLM(model="facebook/opt-125m") :::{note} By default, vLLM downloads models from [Hugging Face](https://huggingface.co/). If you would like to use models from [ModelScope](https://www.modelscope.cn), set the environment variable `VLLM_USE_MODELSCOPE` before initializing the engine. + +```shell +export VLLM_USE_MODELSCOPE=True +``` + ::: Now, the fun part! The outputs are generated using `llm.generate`. It adds the input prompts to the vLLM engine's waiting queue and executes the vLLM engine to generate the outputs with high throughput. The outputs are returned as a list of `RequestOutput` objects, which include all of the output tokens.