diff --git a/docs/features/spec_decode.md b/docs/features/spec_decode.md index be4b91feda7aa..89d5b489e1888 100644 --- a/docs/features/spec_decode.md +++ b/docs/features/spec_decode.md @@ -15,6 +15,10 @@ Speculative decoding is a technique which improves inter-token latency in memory The following code configures vLLM in an offline mode to use speculative decoding with a draft model, speculating 5 tokens at a time. +!!! warning + In vllm v0.10.0, speculative decoding with a draft model is not supported. + If you use the following code, you will get a `NotImplementedError`. + ??? code ```python