From a2480251ec92ba2a849464dde48db8a2b7f6ef81 Mon Sep 17 00:00:00 2001 From: Cyrus Leung Date: Tue, 29 Jul 2025 14:53:18 +0800 Subject: [PATCH] [Doc] Link to RFC for pooling optimizations (#21806) Signed-off-by: DarkLight1337 --- docs/models/pooling_models.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/models/pooling_models.md b/docs/models/pooling_models.md index a06d86523af1a..f1200103171e9 100644 --- a/docs/models/pooling_models.md +++ b/docs/models/pooling_models.md @@ -7,9 +7,9 @@ These models use a [Pooler][vllm.model_executor.layers.pooler.Pooler] to extract before returning them. !!! note - We currently support pooling models primarily as a matter of convenience. - As shown in the [Compatibility Matrix](../features/compatibility_matrix.md), most vLLM features are not applicable to - pooling models as they only work on the generation or decode stage, so performance may not improve as much. + We currently support pooling models primarily as a matter of convenience. This is not guaranteed to have any performance improvement over using HF Transformers / Sentence Transformers directly. + + We are now planning to optimize pooling models in vLLM. Please comment on if you have any suggestions! ## Configuration