From 3f04a7fbf21e1b0a899c721ebc6853c3cf030db7 Mon Sep 17 00:00:00 2001 From: Cyrus Leung Date: Tue, 25 Mar 2025 19:01:58 +0800 Subject: [PATCH] [Doc] Update V1 user guide for multi-modality (#15460) Signed-off-by: DarkLight1337 --- docs/source/getting_started/v1_user_guide.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/source/getting_started/v1_user_guide.md b/docs/source/getting_started/v1_user_guide.md index 26b28c04fe739..b1c2807657ffa 100644 --- a/docs/source/getting_started/v1_user_guide.md +++ b/docs/source/getting_started/v1_user_guide.md @@ -129,6 +129,9 @@ in progress. - **Spec Decode**: Currently, only ngram-based spec decode is supported in V1. There will be follow-up work to support other types of spec decode (e.g., see [PR #13933](https://github.com/vllm-project/vllm/pull/13933)). We will prioritize the support for Eagle, MTP compared to draft model based spec decode. +- **Multimodal Models**: V1 is almost fully compatible with V0 except that interleaved modality input is not supported yet. + See [here](https://github.com/orgs/vllm-project/projects/8) for the status of upcoming features and optimizations. + #### Features to Be Supported - **FP8 KV Cache**: While vLLM V1 introduces new FP8 kernels for model weight quantization, support for an FP8 key–value cache is not yet available. Users must continue using FP16 (or other supported precisions) for the KV cache.