[Doc] Move guide for multimodal model and other improvements (#6168)

This commit is contained in:
Cyrus Leung 2024-07-06 17:18:59 +08:00 committed by GitHub
parent 175c43eca4
commit 9389380015
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
8 changed files with 61 additions and 67 deletions

View File

@ -5,10 +5,10 @@ Input Processing
.. currentmodule:: vllm.inputs .. currentmodule:: vllm.inputs
vLLM provides a mechanism for defining input processors for each model so that the inputs are processed Each model can override parts of vLLM's :ref:`input processing pipeline <input_processing_pipeline>` via
in :class:`~vllm.LLMEngine` before they are passed to model executors. :data:`~vllm.inputs.INPUT_REGISTRY` and :data:`~vllm.multimodal.MULTIMODAL_REGISTRY`.
Currently, this mechanism is only utilized in :ref:`multi-modal models <multi_modality>` for preprocessing multi-modal input Currently, this mechanism is only utilized in :ref:`multi-modal <multi_modality>` models for preprocessing multi-modal input
data in addition to input prompt, but it can be extended to text-only language models when needed. data in addition to input prompt, but it can be extended to text-only language models when needed.
Guides Guides

View File

@ -7,25 +7,17 @@ Multi-Modality
vLLM provides experimental support for multi-modal models through the :mod:`vllm.multimodal` package. vLLM provides experimental support for multi-modal models through the :mod:`vllm.multimodal` package.
:class:`vllm.inputs.PromptStrictInputs` accepts an additional attribute ``multi_modal_data`` Multi-modal input can be passed alongside text and token prompts to :ref:`supported models <supported_vlms>`
which allows you to pass in multi-modal input alongside text and token prompts. via the ``multi_modal_data`` field in :class:`vllm.inputs.PromptStrictInputs`.
.. note:: .. note::
``multi_modal_data`` can accept keys and values beyond the builtin ones, as long as a customized plugin is registered through ``multi_modal_data`` can accept keys and values beyond the builtin ones, as long as a customized plugin is registered through
:class:`vllm.multimodal.MULTIMODAL_REGISTRY`. the :class:`~vllm.multimodal.MULTIMODAL_REGISTRY`.
By default, vLLM models do not support multi-modal inputs. To enable multi-modal support for a model, please follow :ref:`the guide for adding a new multimodal model. <adding_a_new_multimodal_model>`. To implement a new multi-modal model in vLLM, please follow :ref:`this guide <enabling_multimodal_inputs>`.
..
# TODO: Add more instructions on how to do that once embeddings is in. TODO: Add more instructions on how to add new plugins once embeddings is in.
Guides
++++++
.. toctree::
:maxdepth: 1
adding_multimodal_model
Module Contents Module Contents
+++++++++++++++ +++++++++++++++

View File

@ -92,6 +92,7 @@ Documentation
models/supported_models models/supported_models
models/adding_model models/adding_model
models/enabling_multimodal_inputs
models/engine_args models/engine_args
models/lora models/lora
models/vlm models/vlm
@ -116,6 +117,7 @@ Documentation
automatic_prefix_caching/details automatic_prefix_caching/details
.. toctree:: .. toctree::
:maxdepth: 2
:caption: Developer Documentation :caption: Developer Documentation
dev/sampling_params dev/sampling_params

View File

@ -10,6 +10,10 @@ This document provides a high-level guide on integrating a `HuggingFace Transfor
The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM. The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM.
However, for models that include new operators (e.g., a new attention mechanism), the process can be a bit more complex. However, for models that include new operators (e.g., a new attention mechanism), the process can be a bit more complex.
.. note::
By default, vLLM models do not support multi-modal inputs. To enable multi-modal support,
please follow :ref:`this guide <enabling_multimodal_inputs>` after implementing the model here.
.. tip:: .. tip::
If you are encountering issues while integrating your model into vLLM, feel free to open an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ repository. If you are encountering issues while integrating your model into vLLM, feel free to open an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ repository.
We will be happy to help you out! We will be happy to help you out!

View File

@ -1,26 +1,21 @@
.. _adding_a_new_multimodal_model: .. _enabling_multimodal_inputs:
Adding a New Multimodal Model Enabling Multimodal Inputs
============================= ==========================
This document provides a high-level guide on integrating a :ref:`multi-modal model <multi_modality>` into vLLM. This document walks you through the steps to extend a vLLM model so that it accepts :ref:`multi-modal <multi_modality>` inputs.
.. note:: .. seealso::
The complexity of adding a new model depends heavily on the model's architecture. :ref:`adding_a_new_model`
The process is considerably straightforward if the model shares a similar architecture with an existing model in vLLM.
However, for models that include new operators (e.g., a new attention mechanism), the process can be a bit more complex.
.. tip::
If you are encountering issues while integrating your model into vLLM, feel free to open an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ repository.
We will be happy to help you out!
1. Set up the base vLLM model 1. Update the base vLLM model
----------------------------- -----------------------------
As usual, follow :ref:`these steps <adding_a_new_model>` to implement the model in vLLM, but note the following: It is assumed that you have already implemented the model in vLLM according to :ref:`these steps <adding_a_new_model>`.
Further update the model as follows:
- You should additionally implement the :class:`~vllm.model_executor.models.interfaces.SupportsVision` interface. - Implement the :class:`~vllm.model_executor.models.interfaces.SupportsVision` interface.
.. code-block:: diff .. code-block:: diff
@ -33,7 +28,7 @@ As usual, follow :ref:`these steps <adding_a_new_model>` to implement the model
The model class does not have to be named :code:`*ForCausalLM`. The model class does not have to be named :code:`*ForCausalLM`.
Check out `the HuggingFace Transformers documentation <https://huggingface.co/docs/transformers/model_doc/auto#multimodal>`__ for some examples. Check out `the HuggingFace Transformers documentation <https://huggingface.co/docs/transformers/model_doc/auto#multimodal>`__ for some examples.
- While implementing the :meth:`~torch.nn.Module.forward` method, reserve a keyword parameter - If you haven't already done so, reserve a keyword parameter in :meth:`~torch.nn.Module.forward`
for each input tensor that corresponds to a multi-modal input, as shown in the following example: for each input tensor that corresponds to a multi-modal input, as shown in the following example:
.. code-block:: diff .. code-block:: diff
@ -68,8 +63,8 @@ A default mapper is available for each modality in the core vLLM library. This i
:ref:`input_processing_pipeline` :ref:`input_processing_pipeline`
3. Register maximum number of multimodal tokens 3. Register maximum number of multi-modal tokens
---------------------------------------------------------- ------------------------------------------------
For each modality type that the model accepts as input, calculate the maximum possible number of tokens For each modality type that the model accepts as input, calculate the maximum possible number of tokens
and register it via :meth:`INPUT_REGISTRY.register_dummy_data <vllm.inputs.registry.InputRegistry.register_max_multimodal_tokens>`. and register it via :meth:`INPUT_REGISTRY.register_dummy_data <vllm.inputs.registry.InputRegistry.register_max_multimodal_tokens>`.

View File

@ -192,7 +192,7 @@ Vision Language Models
- -
If your model uses one of the above model architectures, you can seamlessly run your model with vLLM. If your model uses one of the above model architectures, you can seamlessly run your model with vLLM.
Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` and :ref:`Adding a New Multimodal Model <adding_a_new_multimodal_model>` Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` and :ref:`Enabling Multimodal Inputs <enabling_multimodal_inputs>`
for instructions on how to implement support for your model. for instructions on how to implement support for your model.
Alternatively, you can raise an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ project. Alternatively, you can raise an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ project.

View File

@ -141,7 +141,7 @@ class InputRegistry:
The model is identified by ``model_config``. The model is identified by ``model_config``.
See also: See also:
:ref:`adding_a_new_multimodal_model` :ref:`enabling_multimodal_inputs`
""" """
# Avoid circular import # Avoid circular import
from vllm.model_executor.model_loader import get_model_architecture from vllm.model_executor.model_loader import get_model_architecture

View File

@ -162,8 +162,8 @@ class MultiModalPlugin(ABC):
If `None` is provided, then the default input mapper is used instead. If `None` is provided, then the default input mapper is used instead.
See also: See also:
:ref:`input_processing_pipeline` - :ref:`input_processing_pipeline`
:ref:`adding_a_new_multimodal_model` - :ref:`enabling_multimodal_inputs`
""" """
def wrapper(model_cls: N) -> N: def wrapper(model_cls: N) -> N:
@ -192,7 +192,8 @@ class MultiModalPlugin(ABC):
TypeError: If the data type is not supported. TypeError: If the data type is not supported.
See also: See also:
:ref:`adding_a_new_multimodal_model` - :ref:`input_processing_pipeline`
- :ref:`enabling_multimodal_inputs`
""" """
# Avoid circular import # Avoid circular import
from vllm.model_executor.model_loader import get_model_architecture from vllm.model_executor.model_loader import get_model_architecture
@ -230,7 +231,7 @@ class MultiModalPlugin(ABC):
If `None` is provided, then the default calculation is used instead. If `None` is provided, then the default calculation is used instead.
See also: See also:
:ref:`adding_a_new_multimodal_model` :ref:`enabling_multimodal_inputs`
""" """
def wrapper(model_cls: N) -> N: def wrapper(model_cls: N) -> N:
@ -260,7 +261,7 @@ class MultiModalPlugin(ABC):
The model is identified by ``model_config``. The model is identified by ``model_config``.
See also: See also:
:ref:`adding_a_new_multimodal_model` :ref:`enabling_multimodal_inputs`
""" """
# Avoid circular import # Avoid circular import
from vllm.model_executor.model_loader import get_model_architecture from vllm.model_executor.model_loader import get_model_architecture