mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-18 05:35:01 +08:00
[Doc] Move guide for multimodal model and other improvements (#6168)
This commit is contained in:
parent
175c43eca4
commit
9389380015
@ -5,10 +5,10 @@ Input Processing
|
|||||||
|
|
||||||
.. currentmodule:: vllm.inputs
|
.. currentmodule:: vllm.inputs
|
||||||
|
|
||||||
vLLM provides a mechanism for defining input processors for each model so that the inputs are processed
|
Each model can override parts of vLLM's :ref:`input processing pipeline <input_processing_pipeline>` via
|
||||||
in :class:`~vllm.LLMEngine` before they are passed to model executors.
|
:data:`~vllm.inputs.INPUT_REGISTRY` and :data:`~vllm.multimodal.MULTIMODAL_REGISTRY`.
|
||||||
|
|
||||||
Currently, this mechanism is only utilized in :ref:`multi-modal models <multi_modality>` for preprocessing multi-modal input
|
Currently, this mechanism is only utilized in :ref:`multi-modal <multi_modality>` models for preprocessing multi-modal input
|
||||||
data in addition to input prompt, but it can be extended to text-only language models when needed.
|
data in addition to input prompt, but it can be extended to text-only language models when needed.
|
||||||
|
|
||||||
Guides
|
Guides
|
||||||
|
|||||||
@ -7,25 +7,17 @@ Multi-Modality
|
|||||||
|
|
||||||
vLLM provides experimental support for multi-modal models through the :mod:`vllm.multimodal` package.
|
vLLM provides experimental support for multi-modal models through the :mod:`vllm.multimodal` package.
|
||||||
|
|
||||||
:class:`vllm.inputs.PromptStrictInputs` accepts an additional attribute ``multi_modal_data``
|
Multi-modal input can be passed alongside text and token prompts to :ref:`supported models <supported_vlms>`
|
||||||
which allows you to pass in multi-modal input alongside text and token prompts.
|
via the ``multi_modal_data`` field in :class:`vllm.inputs.PromptStrictInputs`.
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
``multi_modal_data`` can accept keys and values beyond the builtin ones, as long as a customized plugin is registered through
|
``multi_modal_data`` can accept keys and values beyond the builtin ones, as long as a customized plugin is registered through
|
||||||
:class:`vllm.multimodal.MULTIMODAL_REGISTRY`.
|
the :class:`~vllm.multimodal.MULTIMODAL_REGISTRY`.
|
||||||
|
|
||||||
By default, vLLM models do not support multi-modal inputs. To enable multi-modal support for a model, please follow :ref:`the guide for adding a new multimodal model. <adding_a_new_multimodal_model>`.
|
To implement a new multi-modal model in vLLM, please follow :ref:`this guide <enabling_multimodal_inputs>`.
|
||||||
|
|
||||||
|
..
|
||||||
# TODO: Add more instructions on how to do that once embeddings is in.
|
TODO: Add more instructions on how to add new plugins once embeddings is in.
|
||||||
|
|
||||||
Guides
|
|
||||||
++++++
|
|
||||||
|
|
||||||
.. toctree::
|
|
||||||
:maxdepth: 1
|
|
||||||
|
|
||||||
adding_multimodal_model
|
|
||||||
|
|
||||||
Module Contents
|
Module Contents
|
||||||
+++++++++++++++
|
+++++++++++++++
|
||||||
|
|||||||
@ -92,6 +92,7 @@ Documentation
|
|||||||
|
|
||||||
models/supported_models
|
models/supported_models
|
||||||
models/adding_model
|
models/adding_model
|
||||||
|
models/enabling_multimodal_inputs
|
||||||
models/engine_args
|
models/engine_args
|
||||||
models/lora
|
models/lora
|
||||||
models/vlm
|
models/vlm
|
||||||
@ -116,6 +117,7 @@ Documentation
|
|||||||
automatic_prefix_caching/details
|
automatic_prefix_caching/details
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
:caption: Developer Documentation
|
:caption: Developer Documentation
|
||||||
|
|
||||||
dev/sampling_params
|
dev/sampling_params
|
||||||
|
|||||||
@ -10,6 +10,10 @@ This document provides a high-level guide on integrating a `HuggingFace Transfor
|
|||||||
The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM.
|
The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM.
|
||||||
However, for models that include new operators (e.g., a new attention mechanism), the process can be a bit more complex.
|
However, for models that include new operators (e.g., a new attention mechanism), the process can be a bit more complex.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
By default, vLLM models do not support multi-modal inputs. To enable multi-modal support,
|
||||||
|
please follow :ref:`this guide <enabling_multimodal_inputs>` after implementing the model here.
|
||||||
|
|
||||||
.. tip::
|
.. tip::
|
||||||
If you are encountering issues while integrating your model into vLLM, feel free to open an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ repository.
|
If you are encountering issues while integrating your model into vLLM, feel free to open an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ repository.
|
||||||
We will be happy to help you out!
|
We will be happy to help you out!
|
||||||
|
|||||||
@ -1,26 +1,21 @@
|
|||||||
.. _adding_a_new_multimodal_model:
|
.. _enabling_multimodal_inputs:
|
||||||
|
|
||||||
Adding a New Multimodal Model
|
Enabling Multimodal Inputs
|
||||||
=============================
|
==========================
|
||||||
|
|
||||||
This document provides a high-level guide on integrating a :ref:`multi-modal model <multi_modality>` into vLLM.
|
This document walks you through the steps to extend a vLLM model so that it accepts :ref:`multi-modal <multi_modality>` inputs.
|
||||||
|
|
||||||
.. note::
|
.. seealso::
|
||||||
The complexity of adding a new model depends heavily on the model's architecture.
|
:ref:`adding_a_new_model`
|
||||||
The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM.
|
|
||||||
However, for models that include new operators (e.g., a new attention mechanism), the process can be a bit more complex.
|
|
||||||
|
|
||||||
.. tip::
|
|
||||||
If you are encountering issues while integrating your model into vLLM, feel free to open an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ repository.
|
|
||||||
We will be happy to help you out!
|
|
||||||
|
|
||||||
|
|
||||||
1. Set up the base vLLM model
|
1. Update the base vLLM model
|
||||||
-----------------------------
|
-----------------------------
|
||||||
|
|
||||||
As usual, follow :ref:`these steps <adding_a_new_model>` to implement the model in vLLM, but note the following:
|
It is assumed that you have already implemented the model in vLLM according to :ref:`these steps <adding_a_new_model>`.
|
||||||
|
Further update the model as follows:
|
||||||
|
|
||||||
- You should additionally implement the :class:`~vllm.model_executor.models.interfaces.SupportsVision` interface.
|
- Implement the :class:`~vllm.model_executor.models.interfaces.SupportsVision` interface.
|
||||||
|
|
||||||
.. code-block:: diff
|
.. code-block:: diff
|
||||||
|
|
||||||
@ -33,7 +28,7 @@ As usual, follow :ref:`these steps <adding_a_new_model>` to implement the model
|
|||||||
The model class does not have to be named :code:`*ForCausalLM`.
|
The model class does not have to be named :code:`*ForCausalLM`.
|
||||||
Check out `the HuggingFace Transformers documentation <https://huggingface.co/docs/transformers/model_doc/auto#multimodal>`__ for some examples.
|
Check out `the HuggingFace Transformers documentation <https://huggingface.co/docs/transformers/model_doc/auto#multimodal>`__ for some examples.
|
||||||
|
|
||||||
- While implementing the :meth:`~torch.nn.Module.forward` method, reserve a keyword parameter
|
- If you haven't already done so, reserve a keyword parameter in :meth:`~torch.nn.Module.forward`
|
||||||
for each input tensor that corresponds to a multi-modal input, as shown in the following example:
|
for each input tensor that corresponds to a multi-modal input, as shown in the following example:
|
||||||
|
|
||||||
.. code-block:: diff
|
.. code-block:: diff
|
||||||
@ -68,8 +63,8 @@ A default mapper is available for each modality in the core vLLM library. This i
|
|||||||
:ref:`input_processing_pipeline`
|
:ref:`input_processing_pipeline`
|
||||||
|
|
||||||
|
|
||||||
3. Register maximum number of multimodal tokens
|
3. Register maximum number of multi-modal tokens
|
||||||
----------------------------------------------------------
|
------------------------------------------------
|
||||||
|
|
||||||
For each modality type that the model accepts as input, calculate the maximum possible number of tokens
|
For each modality type that the model accepts as input, calculate the maximum possible number of tokens
|
||||||
and register it via :meth:`INPUT_REGISTRY.register_dummy_data <vllm.inputs.registry.InputRegistry.register_max_multimodal_tokens>`.
|
and register it via :meth:`INPUT_REGISTRY.register_dummy_data <vllm.inputs.registry.InputRegistry.register_max_multimodal_tokens>`.
|
||||||
@ -192,7 +192,7 @@ Vision Language Models
|
|||||||
-
|
-
|
||||||
|
|
||||||
If your model uses one of the above model architectures, you can seamlessly run your model with vLLM.
|
If your model uses one of the above model architectures, you can seamlessly run your model with vLLM.
|
||||||
Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` and :ref:`Adding a New Multimodal Model <adding_a_new_multimodal_model>`
|
Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` and :ref:`Enabling Multimodal Inputs <enabling_multimodal_inputs>`
|
||||||
for instructions on how to implement support for your model.
|
for instructions on how to implement support for your model.
|
||||||
Alternatively, you can raise an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ project.
|
Alternatively, you can raise an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ project.
|
||||||
|
|
||||||
|
|||||||
@ -141,7 +141,7 @@ class InputRegistry:
|
|||||||
The model is identified by ``model_config``.
|
The model is identified by ``model_config``.
|
||||||
|
|
||||||
See also:
|
See also:
|
||||||
:ref:`adding_a_new_multimodal_model`
|
:ref:`enabling_multimodal_inputs`
|
||||||
"""
|
"""
|
||||||
# Avoid circular import
|
# Avoid circular import
|
||||||
from vllm.model_executor.model_loader import get_model_architecture
|
from vllm.model_executor.model_loader import get_model_architecture
|
||||||
|
|||||||
@ -162,8 +162,8 @@ class MultiModalPlugin(ABC):
|
|||||||
If `None` is provided, then the default input mapper is used instead.
|
If `None` is provided, then the default input mapper is used instead.
|
||||||
|
|
||||||
See also:
|
See also:
|
||||||
:ref:`input_processing_pipeline`
|
- :ref:`input_processing_pipeline`
|
||||||
:ref:`adding_a_new_multimodal_model`
|
- :ref:`enabling_multimodal_inputs`
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def wrapper(model_cls: N) -> N:
|
def wrapper(model_cls: N) -> N:
|
||||||
@ -192,7 +192,8 @@ class MultiModalPlugin(ABC):
|
|||||||
TypeError: If the data type is not supported.
|
TypeError: If the data type is not supported.
|
||||||
|
|
||||||
See also:
|
See also:
|
||||||
:ref:`adding_a_new_multimodal_model`
|
- :ref:`input_processing_pipeline`
|
||||||
|
- :ref:`enabling_multimodal_inputs`
|
||||||
"""
|
"""
|
||||||
# Avoid circular import
|
# Avoid circular import
|
||||||
from vllm.model_executor.model_loader import get_model_architecture
|
from vllm.model_executor.model_loader import get_model_architecture
|
||||||
@ -230,7 +231,7 @@ class MultiModalPlugin(ABC):
|
|||||||
If `None` is provided, then the default calculation is used instead.
|
If `None` is provided, then the default calculation is used instead.
|
||||||
|
|
||||||
See also:
|
See also:
|
||||||
:ref:`adding_a_new_multimodal_model`
|
:ref:`enabling_multimodal_inputs`
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def wrapper(model_cls: N) -> N:
|
def wrapper(model_cls: N) -> N:
|
||||||
@ -260,7 +261,7 @@ class MultiModalPlugin(ABC):
|
|||||||
The model is identified by ``model_config``.
|
The model is identified by ``model_config``.
|
||||||
|
|
||||||
See also:
|
See also:
|
||||||
:ref:`adding_a_new_multimodal_model`
|
:ref:`enabling_multimodal_inputs`
|
||||||
"""
|
"""
|
||||||
# Avoid circular import
|
# Avoid circular import
|
||||||
from vllm.model_executor.model_loader import get_model_architecture
|
from vllm.model_executor.model_loader import get_model_architecture
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user