From 49314869887e169be080201ab8bcda14e745c080 Mon Sep 17 00:00:00 2001
From: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Date: Fri, 1 Aug 2025 17:11:56 +0800
Subject: [PATCH] [Doc] Added warning of speculating with draft model (#22047)

Signed-off-by: Dilute-l <dilu2333@163.com>
Co-authored-by: Dilute-l <dilu2333@163.com>
---
 docs/features/spec_decode.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/docs/features/spec_decode.md b/docs/features/spec_decode.md
index be4b91feda7aa..89d5b489e1888 100644
--- a/docs/features/spec_decode.md
+++ b/docs/features/spec_decode.md
@@ -15,6 +15,10 @@ Speculative decoding is a technique which improves inter-token latency in memory
 
 The following code configures vLLM in an offline mode to use speculative decoding with a draft model, speculating 5 tokens at a time.
 
+!!! warning
+    In vllm v0.10.0, speculative decoding with a draft model is not supported.
+    If you use the following code, you will get a `NotImplementedError`.
+
 ??? code
 
     ```python