mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-16 02:35:26 +08:00
[Minor] Fix the format in quick start guide related to Model Scope (#2425)
This commit is contained in:
parent
6549aef245
commit
f745847ef7
@ -11,6 +11,14 @@ This guide shows how to use vLLM to:
|
|||||||
|
|
||||||
Be sure to complete the :ref:`installation instructions <installation>` before continuing with this guide.
|
Be sure to complete the :ref:`installation instructions <installation>` before continuing with this guide.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
By default, vLLM downloads model from `HuggingFace <https://huggingface.co/>`_. If you would like to use models from `ModelScope <https://www.modelscope.cn>`_ in the following examples, please set the environment variable:
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
export VLLM_USE_MODELSCOPE=True
|
||||||
|
|
||||||
Offline Batched Inference
|
Offline Batched Inference
|
||||||
-------------------------
|
-------------------------
|
||||||
|
|
||||||
@ -40,16 +48,6 @@ Initialize vLLM's engine for offline inference with the ``LLM`` class and the `O
|
|||||||
|
|
||||||
llm = LLM(model="facebook/opt-125m")
|
llm = LLM(model="facebook/opt-125m")
|
||||||
|
|
||||||
Use model from www.modelscope.cn
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
export VLLM_USE_MODELSCOPE=True
|
|
||||||
|
|
||||||
.. code-block:: python
|
|
||||||
|
|
||||||
llm = LLM(model="qwen/Qwen-7B-Chat", revision="v1.1.8", trust_remote_code=True)
|
|
||||||
|
|
||||||
Call ``llm.generate`` to generate the outputs. It adds the input prompts to vLLM engine's waiting queue and executes the vLLM engine to generate the outputs with high throughput. The outputs are returned as a list of ``RequestOutput`` objects, which include all the output tokens.
|
Call ``llm.generate`` to generate the outputs. It adds the input prompts to vLLM engine's waiting queue and executes the vLLM engine to generate the outputs with high throughput. The outputs are returned as a list of ``RequestOutput`` objects, which include all the output tokens.
|
||||||
|
|
||||||
.. code-block:: python
|
.. code-block:: python
|
||||||
@ -77,16 +75,6 @@ Start the server:
|
|||||||
|
|
||||||
$ python -m vllm.entrypoints.api_server
|
$ python -m vllm.entrypoints.api_server
|
||||||
|
|
||||||
Use model from www.modelscope.cn
|
|
||||||
|
|
||||||
.. code-block:: console
|
|
||||||
|
|
||||||
$ VLLM_USE_MODELSCOPE=True python -m vllm.entrypoints.api_server \
|
|
||||||
$ --model="qwen/Qwen-7B-Chat" \
|
|
||||||
$ --revision="v1.1.8" \
|
|
||||||
$ --trust-remote-code
|
|
||||||
|
|
||||||
|
|
||||||
By default, this command starts the server at ``http://localhost:8000`` with the OPT-125M model.
|
By default, this command starts the server at ``http://localhost:8000`` with the OPT-125M model.
|
||||||
|
|
||||||
Query the model in shell:
|
Query the model in shell:
|
||||||
@ -116,13 +104,6 @@ Start the server:
|
|||||||
$ python -m vllm.entrypoints.openai.api_server \
|
$ python -m vllm.entrypoints.openai.api_server \
|
||||||
$ --model facebook/opt-125m
|
$ --model facebook/opt-125m
|
||||||
|
|
||||||
Use model from www.modelscope.cn
|
|
||||||
|
|
||||||
.. code-block:: console
|
|
||||||
|
|
||||||
$ VLLM_USE_MODELSCOPE=True python -m vllm.entrypoints.openai.api_server \
|
|
||||||
$ --model="qwen/Qwen-7B-Chat" --revision="v1.1.8" --trust-remote-code
|
|
||||||
|
|
||||||
By default, the server uses a predefined chat template stored in the tokenizer. You can override this template by using the ``--chat-template`` argument:
|
By default, the server uses a predefined chat template stored in the tokenizer. You can override this template by using the ``--chat-template`` argument:
|
||||||
|
|
||||||
.. code-block:: console
|
.. code-block:: console
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user