mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2026-05-02 23:37:53 +08:00
[Doc] Added warning of speculating with draft model (#22047)
Signed-off-by: Dilute-l <dilu2333@163.com> Co-authored-by: Dilute-l <dilu2333@163.com>
This commit is contained in:
parent
0f81b310db
commit
4931486988
@ -15,6 +15,10 @@ Speculative decoding is a technique which improves inter-token latency in memory
|
|||||||
|
|
||||||
The following code configures vLLM in an offline mode to use speculative decoding with a draft model, speculating 5 tokens at a time.
|
The following code configures vLLM in an offline mode to use speculative decoding with a draft model, speculating 5 tokens at a time.
|
||||||
|
|
||||||
|
!!! warning
|
||||||
|
In vllm v0.10.0, speculative decoding with a draft model is not supported.
|
||||||
|
If you use the following code, you will get a `NotImplementedError`.
|
||||||
|
|
||||||
??? code
|
??? code
|
||||||
|
|
||||||
```python
|
```python
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user