[Doc] Link to RFC for pooling optimizations (#21806)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-07-11 21:17:18 +08:00 · 2025-07-29 14:53:18 +08:00 · 2025-07-29 14:53:18 +08:00 · a2480251ec
commit a2480251ec
parent 7234fe2685
1 changed files with 3 additions and 3 deletions
--- a/docs/models/pooling_models.md
+++ b/docs/models/pooling_models.md
@ -7,9 +7,9 @@ These models use a [Pooler][vllm.model_executor.layers.pooler.Pooler] to extract
 before returning them.
 !!! note
-    We currently support pooling models primarily as a matter of convenience.
+    We currently support pooling models primarily as a matter of convenience. This is not guaranteed to have any performance improvement over using HF Transformers / Sentence Transformers directly.
-    As shown in the [Compatibility Matrix](../features/compatibility_matrix.md), most vLLM features are not applicable to
+
-    pooling models as they only work on the generation or decode stage, so performance may not improve as much.
+    We are now planning to optimize pooling models in vLLM. Please comment on <gh-issue:21796> if you have any suggestions!
 ## Configuration