[Doc][V1] Add V1 support column for multimodal models (#10998)

Signed-off-by: Roger Wang <ywang@roblox.com>
This commit is contained in:
Roger Wang 2024-12-08 22:29:16 -08:00 committed by GitHub
parent 46004e83a2
commit af7c4a92e6
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -495,7 +495,7 @@ Text Generation
--------------- ---------------
.. list-table:: .. list-table::
:widths: 25 25 15 25 5 5 :widths: 25 25 15 20 5 5 5
:header-rows: 1 :header-rows: 1
* - Architecture * - Architecture
@ -504,144 +504,168 @@ Text Generation
- Example HF Models - Example HF Models
- :ref:`LoRA <lora>` - :ref:`LoRA <lora>`
- :ref:`PP <distributed_serving>` - :ref:`PP <distributed_serving>`
- V1
* - :code:`AriaForConditionalGeneration` * - :code:`AriaForConditionalGeneration`
- Aria - Aria
- T + I - T + I
- :code:`rhymes-ai/Aria` - :code:`rhymes-ai/Aria`
- -
- ✅︎ - ✅︎
-
* - :code:`Blip2ForConditionalGeneration` * - :code:`Blip2ForConditionalGeneration`
- BLIP-2 - BLIP-2
- T + I\ :sup:`E` - T + I\ :sup:`E`
- :code:`Salesforce/blip2-opt-2.7b`, :code:`Salesforce/blip2-opt-6.7b`, etc. - :code:`Salesforce/blip2-opt-2.7b`, :code:`Salesforce/blip2-opt-6.7b`, etc.
- -
- ✅︎ - ✅︎
-
* - :code:`ChameleonForConditionalGeneration` * - :code:`ChameleonForConditionalGeneration`
- Chameleon - Chameleon
- T + I - T + I
- :code:`facebook/chameleon-7b` etc. - :code:`facebook/chameleon-7b` etc.
- -
- ✅︎ - ✅︎
-
* - :code:`FuyuForCausalLM` * - :code:`FuyuForCausalLM`
- Fuyu - Fuyu
- T + I - T + I
- :code:`adept/fuyu-8b` etc. - :code:`adept/fuyu-8b` etc.
- -
- ✅︎ - ✅︎
-
* - :code:`ChatGLMModel` * - :code:`ChatGLMModel`
- GLM-4V - GLM-4V
- T + I - T + I
- :code:`THUDM/glm-4v-9b` etc. - :code:`THUDM/glm-4v-9b` etc.
- ✅︎ - ✅︎
- ✅︎ - ✅︎
-
* - :code:`H2OVLChatModel` * - :code:`H2OVLChatModel`
- H2OVL - H2OVL
- T + I\ :sup:`E+` - T + I\ :sup:`E+`
- :code:`h2oai/h2ovl-mississippi-800m`, :code:`h2oai/h2ovl-mississippi-2b`, etc. - :code:`h2oai/h2ovl-mississippi-800m`, :code:`h2oai/h2ovl-mississippi-2b`, etc.
- -
- ✅︎ - ✅︎
-
* - :code:`Idefics3ForConditionalGeneration` * - :code:`Idefics3ForConditionalGeneration`
- Idefics3 - Idefics3
- T + I - T + I
- :code:`HuggingFaceM4/Idefics3-8B-Llama3` etc. - :code:`HuggingFaceM4/Idefics3-8B-Llama3` etc.
- ✅︎ - ✅︎
- -
-
* - :code:`InternVLChatModel` * - :code:`InternVLChatModel`
- InternVL 2.5, Mono-InternVL, InternVL 2.0 - InternVL 2.5, Mono-InternVL, InternVL 2.0
- T + I\ :sup:`E+` - T + I\ :sup:`E+`
- :code:`OpenGVLab/InternVL2_5-4B`, :code:`OpenGVLab/Mono-InternVL-2B`, :code:`OpenGVLab/InternVL2-4B`, etc. - :code:`OpenGVLab/InternVL2_5-4B`, :code:`OpenGVLab/Mono-InternVL-2B`, :code:`OpenGVLab/InternVL2-4B`, etc.
- -
- ✅︎ - ✅︎
- ✅︎
* - :code:`LlavaForConditionalGeneration` * - :code:`LlavaForConditionalGeneration`
- LLaVA-1.5 - LLaVA-1.5
- T + I\ :sup:`E+` - T + I\ :sup:`E+`
- :code:`llava-hf/llava-1.5-7b-hf`, :code:`TIGER-Lab/Mantis-8B-siglip-llama3` (see note), etc. - :code:`llava-hf/llava-1.5-7b-hf`, :code:`TIGER-Lab/Mantis-8B-siglip-llama3` (see note), etc.
- -
- ✅︎ - ✅︎
- ✅︎
* - :code:`LlavaNextForConditionalGeneration` * - :code:`LlavaNextForConditionalGeneration`
- LLaVA-NeXT - LLaVA-NeXT
- T + I\ :sup:`E+` - T + I\ :sup:`E+`
- :code:`llava-hf/llava-v1.6-mistral-7b-hf`, :code:`llava-hf/llava-v1.6-vicuna-7b-hf`, etc. - :code:`llava-hf/llava-v1.6-mistral-7b-hf`, :code:`llava-hf/llava-v1.6-vicuna-7b-hf`, etc.
- -
- ✅︎ - ✅︎
-
* - :code:`LlavaNextVideoForConditionalGeneration` * - :code:`LlavaNextVideoForConditionalGeneration`
- LLaVA-NeXT-Video - LLaVA-NeXT-Video
- T + V - T + V
- :code:`llava-hf/LLaVA-NeXT-Video-7B-hf`, etc. - :code:`llava-hf/LLaVA-NeXT-Video-7B-hf`, etc.
- -
- ✅︎ - ✅︎
-
* - :code:`LlavaOnevisionForConditionalGeneration` * - :code:`LlavaOnevisionForConditionalGeneration`
- LLaVA-Onevision - LLaVA-Onevision
- T + I\ :sup:`+` + V\ :sup:`+` - T + I\ :sup:`+` + V\ :sup:`+`
- :code:`llava-hf/llava-onevision-qwen2-7b-ov-hf`, :code:`llava-hf/llava-onevision-qwen2-0.5b-ov-hf`, etc. - :code:`llava-hf/llava-onevision-qwen2-7b-ov-hf`, :code:`llava-hf/llava-onevision-qwen2-0.5b-ov-hf`, etc.
- -
- ✅︎ - ✅︎
-
* - :code:`MiniCPMV` * - :code:`MiniCPMV`
- MiniCPM-V - MiniCPM-V
- T + I\ :sup:`E+` - T + I\ :sup:`E+`
- :code:`openbmb/MiniCPM-V-2` (see note), :code:`openbmb/MiniCPM-Llama3-V-2_5`, :code:`openbmb/MiniCPM-V-2_6`, etc. - :code:`openbmb/MiniCPM-V-2` (see note), :code:`openbmb/MiniCPM-Llama3-V-2_5`, :code:`openbmb/MiniCPM-V-2_6`, etc.
- ✅︎ - ✅︎
- ✅︎ - ✅︎
-
* - :code:`MllamaForConditionalGeneration` * - :code:`MllamaForConditionalGeneration`
- Llama 3.2 - Llama 3.2
- T + I\ :sup:`+` - T + I\ :sup:`+`
- :code:`meta-llama/Llama-3.2-90B-Vision-Instruct`, :code:`meta-llama/Llama-3.2-11B-Vision`, etc. - :code:`meta-llama/Llama-3.2-90B-Vision-Instruct`, :code:`meta-llama/Llama-3.2-11B-Vision`, etc.
- -
- -
-
* - :code:`MolmoForCausalLM` * - :code:`MolmoForCausalLM`
- Molmo - Molmo
- T + I - T + I
- :code:`allenai/Molmo-7B-D-0924`, :code:`allenai/Molmo-72B-0924`, etc. - :code:`allenai/Molmo-7B-D-0924`, :code:`allenai/Molmo-72B-0924`, etc.
- -
- ✅︎ - ✅︎
- ✅︎
* - :code:`NVLM_D_Model` * - :code:`NVLM_D_Model`
- NVLM-D 1.0 - NVLM-D 1.0
- T + I\ :sup:`E+` - T + I\ :sup:`E+`
- :code:`nvidia/NVLM-D-72B`, etc. - :code:`nvidia/NVLM-D-72B`, etc.
- -
- ✅︎ - ✅︎
- ✅︎
* - :code:`PaliGemmaForConditionalGeneration` * - :code:`PaliGemmaForConditionalGeneration`
- PaliGemma - PaliGemma
- T + I\ :sup:`E` - T + I\ :sup:`E`
- :code:`google/paligemma-3b-pt-224`, :code:`google/paligemma-3b-mix-224`, etc. - :code:`google/paligemma-3b-pt-224`, :code:`google/paligemma-3b-mix-224`, etc.
- -
- ✅︎ - ✅︎
-
* - :code:`Phi3VForCausalLM` * - :code:`Phi3VForCausalLM`
- Phi-3-Vision, Phi-3.5-Vision - Phi-3-Vision, Phi-3.5-Vision
- T + I\ :sup:`E+` - T + I\ :sup:`E+`
- :code:`microsoft/Phi-3-vision-128k-instruct`, :code:`microsoft/Phi-3.5-vision-instruct` etc. - :code:`microsoft/Phi-3-vision-128k-instruct`, :code:`microsoft/Phi-3.5-vision-instruct` etc.
- -
- ✅︎ - ✅︎
- ✅︎
* - :code:`PixtralForConditionalGeneration` * - :code:`PixtralForConditionalGeneration`
- Pixtral - Pixtral
- T + I\ :sup:`+` - T + I\ :sup:`+`
- :code:`mistralai/Pixtral-12B-2409`, :code:`mistral-community/pixtral-12b` etc. - :code:`mistralai/Pixtral-12B-2409`, :code:`mistral-community/pixtral-12b` etc.
- -
- ✅︎ - ✅︎
- ✅︎
* - :code:`QWenLMHeadModel` * - :code:`QWenLMHeadModel`
- Qwen-VL - Qwen-VL
- T + I\ :sup:`E+` - T + I\ :sup:`E+`
- :code:`Qwen/Qwen-VL`, :code:`Qwen/Qwen-VL-Chat`, etc. - :code:`Qwen/Qwen-VL`, :code:`Qwen/Qwen-VL-Chat`, etc.
- ✅︎ - ✅︎
- ✅︎ - ✅︎
-
* - :code:`Qwen2AudioForConditionalGeneration` * - :code:`Qwen2AudioForConditionalGeneration`
- Qwen2-Audio - Qwen2-Audio
- T + A\ :sup:`+` - T + A\ :sup:`+`
- :code:`Qwen/Qwen2-Audio-7B-Instruct` - :code:`Qwen/Qwen2-Audio-7B-Instruct`
- -
- ✅︎ - ✅︎
-
* - :code:`Qwen2VLForConditionalGeneration` * - :code:`Qwen2VLForConditionalGeneration`
- Qwen2-VL - Qwen2-VL
- T + I\ :sup:`E+` + V\ :sup:`E+` - T + I\ :sup:`E+` + V\ :sup:`E+`
- :code:`Qwen/Qwen2-VL-2B-Instruct`, :code:`Qwen/Qwen2-VL-7B-Instruct`, :code:`Qwen/Qwen2-VL-72B-Instruct`, etc. - :code:`Qwen/Qwen2-VL-2B-Instruct`, :code:`Qwen/Qwen2-VL-7B-Instruct`, :code:`Qwen/Qwen2-VL-72B-Instruct`, etc.
- ✅︎ - ✅︎
- ✅︎ - ✅︎
-
* - :code:`UltravoxModel` * - :code:`UltravoxModel`
- Ultravox - Ultravox
- T + A\ :sup:`E+` - T + A\ :sup:`E+`
- :code:`fixie-ai/ultravox-v0_3` - :code:`fixie-ai/ultravox-v0_3`
- -
- ✅︎ - ✅︎
-
| :sup:`E` Pre-computed embeddings can be inputted for this modality. | :sup:`E` Pre-computed embeddings can be inputted for this modality.
| :sup:`+` Multiple items can be inputted per text prompt for this modality. | :sup:`+` Multiple items can be inputted per text prompt for this modality.