mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2026-01-24 05:14:28 +08:00
[Doc] Link to RFC for pooling optimizations (#21806)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
parent
7234fe2685
commit
a2480251ec
@ -7,9 +7,9 @@ These models use a [Pooler][vllm.model_executor.layers.pooler.Pooler] to extract
|
||||
before returning them.
|
||||
|
||||
!!! note
|
||||
We currently support pooling models primarily as a matter of convenience.
|
||||
As shown in the [Compatibility Matrix](../features/compatibility_matrix.md), most vLLM features are not applicable to
|
||||
pooling models as they only work on the generation or decode stage, so performance may not improve as much.
|
||||
We currently support pooling models primarily as a matter of convenience. This is not guaranteed to have any performance improvement over using HF Transformers / Sentence Transformers directly.
|
||||
|
||||
We are now planning to optimize pooling models in vLLM. Please comment on <gh-issue:21796> if you have any suggestions!
|
||||
|
||||
## Configuration
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user