diff --git a/docs/source/getting_started/quickstart.rst b/docs/source/getting_started/quickstart.rst index 0abc357939e1f..3083538e1dcff 100644 --- a/docs/source/getting_started/quickstart.rst +++ b/docs/source/getting_started/quickstart.rst @@ -40,6 +40,16 @@ Initialize vLLM's engine for offline inference with the ``LLM`` class and the `O llm = LLM(model="facebook/opt-125m") +Use model from www.modelscope.cn + +.. code-block:: shell + + export VLLM_USE_MODELSCOPE=True + +.. code-block:: python + + llm = LLM(model="qwen/Qwen-7B-Chat", revision="v1.1.8", trust_remote_code=True) + Call ``llm.generate`` to generate the outputs. It adds the input prompts to vLLM engine's waiting queue and executes the vLLM engine to generate the outputs with high throughput. The outputs are returned as a list of ``RequestOutput`` objects, which include all the output tokens. .. code-block:: python @@ -67,6 +77,16 @@ Start the server: $ python -m vllm.entrypoints.api_server +Use model from www.modelscope.cn + +.. code-block:: console + + $ VLLM_USE_MODELSCOPE=True python -m vllm.entrypoints.api_server \ + $ --model="qwen/Qwen-7B-Chat" \ + $ --revision="v1.1.8" \ + $ --trust-remote-code + + By default, this command starts the server at ``http://localhost:8000`` with the OPT-125M model. Query the model in shell: @@ -95,6 +115,13 @@ Start the server: $ python -m vllm.entrypoints.openai.api_server \ $ --model facebook/opt-125m +Use model from www.modelscope.cn + +.. code-block:: console + + $ VLLM_USE_MODELSCOPE=True python -m vllm.entrypoints.openai.api_server \ + $ --model="qwen/Qwen-7B-Chat" --revision="v1.1.8" --trust-remote-code + By default, it starts the server at ``http://localhost:8000``. You can specify the address with ``--host`` and ``--port`` arguments. The server currently hosts one model at a time (OPT-125M in the above command) and implements `list models `_ and `create completion `_ endpoints. We are actively adding support for more endpoints. This server can be queried in the same format as OpenAI API. For example, list the models: diff --git a/docs/source/models/supported_models.rst b/docs/source/models/supported_models.rst index 1d69d6fd6afda..bebec8f9bfc6c 100644 --- a/docs/source/models/supported_models.rst +++ b/docs/source/models/supported_models.rst @@ -81,4 +81,18 @@ Alternatively, you can raise an issue on our `GitHub