diff --git a/docs/deployment/integrations/production-stack.md b/docs/deployment/integrations/production-stack.md index ffec679207fd8..497f9f1a92a5d 100644 --- a/docs/deployment/integrations/production-stack.md +++ b/docs/deployment/integrations/production-stack.md @@ -41,7 +41,8 @@ vllm-deployment-router-859d8fb668-2x2b7 1/1 Running 0 2m38 vllm-opt125m-deployment-vllm-84dfc9bd7-vb9bs 1/1 Running 0 2m38s ``` -**NOTE**: It may take some time for the containers to download the Docker images and LLM weights. +!!! note + It may take some time for the containers to download the Docker images and LLM weights. ### Send a Query to the Stack @@ -149,6 +150,8 @@ In this YAML configuration: * **`requestGPU`**: Specifies the number of GPUs required. * **`pvcStorage`**: Allocates persistent storage for the model. -**NOTE:** If you intend to set up two pods, please refer to this [YAML file](https://github.com/vllm-project/production-stack/blob/main/tutorials/assets/values-01-2pods-minimal-example.yaml). +!!! note + If you intend to set up two pods, please refer to this [YAML file](https://github.com/vllm-project/production-stack/blob/main/tutorials/assets/values-01-2pods-minimal-example.yaml). -**NOTE:** vLLM production stack offers many more features (*e.g.* CPU offloading and a wide range of routing algorithms). Please check out these [examples and tutorials](https://github.com/vllm-project/production-stack/tree/main/tutorials) and our [repo](https://github.com/vllm-project/production-stack) for more details! +!!! tip + vLLM production stack offers many more features (*e.g.* CPU offloading and a wide range of routing algorithms). Please check out these [examples and tutorials](https://github.com/vllm-project/production-stack/tree/main/tutorials) and our [repo](https://github.com/vllm-project/production-stack) for more details! diff --git a/docs/features/tool_calling.md b/docs/features/tool_calling.md index c68b3aef58286..d3caeaba65f74 100644 --- a/docs/features/tool_calling.md +++ b/docs/features/tool_calling.md @@ -299,20 +299,17 @@ Limitations: Example supported models: -* `meta-llama/Llama-3.2-1B-Instruct`\* (use with ) -* `meta-llama/Llama-3.2-3B-Instruct`\* (use with ) +* `meta-llama/Llama-3.2-1B-Instruct` ⚠️ (use with ) +* `meta-llama/Llama-3.2-3B-Instruct` ⚠️ (use with ) * `Team-ACE/ToolACE-8B` (use with ) * `fixie-ai/ultravox-v0_4-ToolACE-8B` (use with ) -* `meta-llama/Llama-4-Scout-17B-16E-Instruct`\* (use with ) -* `meta-llama/Llama-4-Maverick-17B-128E-Instruct`\* (use with ) +* `meta-llama/Llama-4-Scout-17B-16E-Instruct` ⚠️ (use with ) +* `meta-llama/Llama-4-Maverick-17B-128E-Instruct` ⚠️ (use with ) Flags: `--tool-call-parser pythonic --chat-template {see_above}` ---- -**WARNING** -Llama's smaller models frequently fail to emit tool calls in the correct format. Your mileage may vary. - ---- +!!! warning + Llama's smaller models frequently fail to emit tool calls in the correct format. Your mileage may vary. ## How to write a tool parser plugin diff --git a/docs/models/supported_models.md b/docs/models/supported_models.md index e75d656af283d..52c7fa9c0e8fc 100644 --- a/docs/models/supported_models.md +++ b/docs/models/supported_models.md @@ -573,7 +573,7 @@ Specified using `--task generate`. | `GLM4VForCausalLM`^ | GLM-4V | T + I | `THUDM/glm-4v-9b`, `THUDM/cogagent-9b-20241220`, etc. | ✅︎ | ✅︎ | ✅︎ | | `Glm4vForConditionalGeneration` | GLM-4.1V-Thinking | T + IE+ + VE+ | `THUDM/GLM-4.1V-9B-Thinkg`, etc. | ✅︎ | ✅︎ | ✅︎ | | `GraniteSpeechForConditionalGeneration` | Granite Speech | T + A | `ibm-granite/granite-speech-3.3-8b` | ✅︎ | ✅︎ | ✅︎ | -| `H2OVLChatModel` | H2OVL | T + IE+ | `h2oai/h2ovl-mississippi-800m`, `h2oai/h2ovl-mississippi-2b`, etc. | | ✅︎ | ✅︎\* | +| `H2OVLChatModel` | H2OVL | T + IE+ | `h2oai/h2ovl-mississippi-800m`, `h2oai/h2ovl-mississippi-2b`, etc. | | ✅︎ | ✅︎ | | `Idefics3ForConditionalGeneration` | Idefics3 | T + I | `HuggingFaceM4/Idefics3-8B-Llama3`, etc. | ✅︎ | | ✅︎ | | `InternVLChatModel` | InternVL 3.0, InternVideo 2.5, InternVL 2.5, Mono-InternVL, InternVL 2.0 | T + IE+ + (VE+) | `OpenGVLab/InternVL3-9B`, `OpenGVLab/InternVideo2_5_Chat_8B`, `OpenGVLab/InternVL2_5-4B`, `OpenGVLab/Mono-InternVL-2B`, `OpenGVLab/InternVL2-4B`, etc. | ✅︎ | ✅︎ | ✅︎ | | `KeyeForConditionalGeneration` | Keye-VL-8B-Preview | T + IE+ + VE+ | `Kwai-Keye/Keye-VL-8B-Preview` | | | ✅︎ | @@ -599,7 +599,7 @@ Specified using `--task generate`. | `Qwen2AudioForConditionalGeneration` | Qwen2-Audio | T + A+ | `Qwen/Qwen2-Audio-7B-Instruct` | | ✅︎ | ✅︎ | | `Qwen2VLForConditionalGeneration` | QVQ, Qwen2-VL | T + IE+ + VE+ | `Qwen/QVQ-72B-Preview`, `Qwen/Qwen2-VL-7B-Instruct`, `Qwen/Qwen2-VL-72B-Instruct`, etc. | ✅︎ | ✅︎ | ✅︎ | | `Qwen2_5_VLForConditionalGeneration` | Qwen2.5-VL | T + IE+ + VE+ | `Qwen/Qwen2.5-VL-3B-Instruct`, `Qwen/Qwen2.5-VL-72B-Instruct`, etc. | ✅︎ | ✅︎ | ✅︎ | -| `Qwen2_5OmniThinkerForConditionalGeneration` | Qwen2.5-Omni | T + IE+ + VE+ + A+ | `Qwen/Qwen2.5-Omni-7B` | | ✅︎ | ✅︎\* | +| `Qwen2_5OmniThinkerForConditionalGeneration` | Qwen2.5-Omni | T + IE+ + VE+ + A+ | `Qwen/Qwen2.5-Omni-7B` | | ✅︎ | ✅︎ | | `SkyworkR1VChatModel` | Skywork-R1V-38B | T + I | `Skywork/Skywork-R1V-38B` | | ✅︎ | ✅︎ | | `SmolVLMForConditionalGeneration` | SmolVLM2 | T + I | `SmolVLM2-2.2B-Instruct` | ✅︎ | | ✅︎ | | `TarsierForConditionalGeneration` | Tarsier | T + IE+ | `omni-search/Tarsier-7b`, `omni-search/Tarsier-34b` | | ✅︎ | ✅︎ |