mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2026-05-16 09:49:08 +08:00
[Doc] Link to RFC for pooling optimizations (#21806)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
parent
7234fe2685
commit
a2480251ec
@ -7,9 +7,9 @@ These models use a [Pooler][vllm.model_executor.layers.pooler.Pooler] to extract
|
|||||||
before returning them.
|
before returning them.
|
||||||
|
|
||||||
!!! note
|
!!! note
|
||||||
We currently support pooling models primarily as a matter of convenience.
|
We currently support pooling models primarily as a matter of convenience. This is not guaranteed to have any performance improvement over using HF Transformers / Sentence Transformers directly.
|
||||||
As shown in the [Compatibility Matrix](../features/compatibility_matrix.md), most vLLM features are not applicable to
|
|
||||||
pooling models as they only work on the generation or decode stage, so performance may not improve as much.
|
We are now planning to optimize pooling models in vLLM. Please comment on <gh-issue:21796> if you have any suggestions!
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user