diff --git a/docs/usage/v1_guide.md b/docs/usage/v1_guide.md index c89c21d8575ab..73d64419b5011 100644 --- a/docs/usage/v1_guide.md +++ b/docs/usage/v1_guide.md @@ -103,7 +103,7 @@ For a complete list of supported models, see the [list of supported models](http | **LoRA** | 🚀 Optimized | | **Logprobs Calculation** | 🟢 Functional | | **FP8 KV Cache** | 🟢 Functional on Hopper devices ([PR #15191](https://github.com/vllm-project/vllm/pull/15191))| -| **Spec Decode** | 🚧 WIP ([PR #13933](https://github.com/vllm-project/vllm/pull/13933))| +| **Spec Decode** | 🚀 Optimized | | **Prompt Logprobs with Prefix Caching** | 🟡 Planned ([RFC #13414](https://github.com/vllm-project/vllm/issues/13414))| | **Structured Output Alternative Backends** | 🟢 Functional | | **Request-level Structured Output Backend** | 🔴 Deprecated | @@ -137,14 +137,6 @@ Support for logprobs with post-sampling adjustments is in progress and will be a Currently prompt logprobs are only supported when prefix caching is turned off via `--no-enable-prefix-caching`. In a future release, prompt logprobs will be compatible with prefix caching, but a recomputation will be triggered to recover the full prompt logprobs even upon a prefix cache hit. See details in [RFC #13414](https://github.com/vllm-project/vllm/issues/13414). -#### WIP Features - -These features are already supported in vLLM V1, but their optimization is still -in progress. - -- **Spec Decode**: Currently, only ngram-based spec decode is supported in V1. There - will be follow-up work to support other types of spec decode (e.g., see [PR #13933](https://github.com/vllm-project/vllm/pull/13933)). We will prioritize the support for Eagle, MTP compared to draft model based spec decode. - #### Deprecated Features As part of the major architectural rework in vLLM V1, several legacy features have been deprecated.