[Doc] Added warning of speculating with draft model (#22047)

Signed-off-by: Dilute-l <dilu2333@163.com>
Co-authored-by: Dilute-l <dilu2333@163.com>
This commit is contained in:
WeiQing Chen 2025-08-01 17:11:56 +08:00 committed by GitHub
parent 0f81b310db
commit 4931486988
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -15,6 +15,10 @@ Speculative decoding is a technique which improves inter-token latency in memory
The following code configures vLLM in an offline mode to use speculative decoding with a draft model, speculating 5 tokens at a time.
!!! warning
In vllm v0.10.0, speculative decoding with a draft model is not supported.
If you use the following code, you will get a `NotImplementedError`.
??? code
```python