mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-10 23:25:34 +08:00
[Doc] Show default pooling method in a table (#11904)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
parent
b844b99ad3
commit
3de2b1eafb
@ -8,14 +8,14 @@ In vLLM, generative models implement the {class}`~vllm.model_executor.models.Vll
|
|||||||
Based on the final hidden states of the input, these models output log probabilities of the tokens to generate,
|
Based on the final hidden states of the input, these models output log probabilities of the tokens to generate,
|
||||||
which are then passed through {class}`~vllm.model_executor.layers.Sampler` to obtain the final text.
|
which are then passed through {class}`~vllm.model_executor.layers.Sampler` to obtain the final text.
|
||||||
|
|
||||||
|
For generative models, the only supported `--task` option is `"generate"`.
|
||||||
|
Usually, this is automatically inferred so you don't have to specify it.
|
||||||
|
|
||||||
## Offline Inference
|
## Offline Inference
|
||||||
|
|
||||||
The {class}`~vllm.LLM` class provides various methods for offline inference.
|
The {class}`~vllm.LLM` class provides various methods for offline inference.
|
||||||
See [Engine Arguments](#engine-args) for a list of options when initializing the model.
|
See [Engine Arguments](#engine-args) for a list of options when initializing the model.
|
||||||
|
|
||||||
For generative models, the only supported {code}`task` option is {code}`"generate"`.
|
|
||||||
Usually, this is automatically inferred so you don't have to specify it.
|
|
||||||
|
|
||||||
### `LLM.generate`
|
### `LLM.generate`
|
||||||
|
|
||||||
The {class}`~vllm.LLM.generate` method is available to all generative models in vLLM.
|
The {class}`~vllm.LLM.generate` method is available to all generative models in vLLM.
|
||||||
@ -33,7 +33,7 @@ for output in outputs:
|
|||||||
```
|
```
|
||||||
|
|
||||||
You can optionally control the language generation by passing {class}`~vllm.SamplingParams`.
|
You can optionally control the language generation by passing {class}`~vllm.SamplingParams`.
|
||||||
For example, you can use greedy sampling by setting {code}`temperature=0`:
|
For example, you can use greedy sampling by setting `temperature=0`:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
llm = LLM(model="facebook/opt-125m")
|
llm = LLM(model="facebook/opt-125m")
|
||||||
|
|||||||
@ -14,31 +14,54 @@ As shown in the [Compatibility Matrix](#compatibility-matrix), most vLLM feature
|
|||||||
pooling models as they only work on the generation or decode stage, so performance may not improve as much.
|
pooling models as they only work on the generation or decode stage, so performance may not improve as much.
|
||||||
```
|
```
|
||||||
|
|
||||||
|
For pooling models, we support the following `--task` options.
|
||||||
|
The selected option sets the default pooler used to extract the final hidden states:
|
||||||
|
|
||||||
|
```{list-table}
|
||||||
|
:widths: 50 25 25 25
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Task
|
||||||
|
- Pooling Type
|
||||||
|
- Normalization
|
||||||
|
- Softmax
|
||||||
|
* - Embedding (`embed`)
|
||||||
|
- `LAST`
|
||||||
|
- ✅︎
|
||||||
|
- ✗
|
||||||
|
* - Classification (`classify`)
|
||||||
|
- `LAST`
|
||||||
|
- ✗
|
||||||
|
- ✅︎
|
||||||
|
* - Sentence Pair Scoring (`score`)
|
||||||
|
- \*
|
||||||
|
- \*
|
||||||
|
- \*
|
||||||
|
* - Reward Modeling (`reward`)
|
||||||
|
- `ALL`
|
||||||
|
- ✗
|
||||||
|
- ✗
|
||||||
|
```
|
||||||
|
|
||||||
|
\*The default pooler is always defined by the model.
|
||||||
|
|
||||||
|
```{note}
|
||||||
|
If the model's implementation in vLLM defines its own pooler, the default pooler is set to that instead of the one specified in this table.
|
||||||
|
```
|
||||||
|
|
||||||
|
When loading [Sentence Transformers](https://huggingface.co/sentence-transformers) models,
|
||||||
|
we attempt to override the default pooler based on its Sentence Transformers configuration file (`modules.json`).
|
||||||
|
|
||||||
|
```{tip}
|
||||||
|
You can customize the model's pooling method via the `--override-pooler-config` option,
|
||||||
|
which takes priority over both the model's and Sentence Transformers's defaults.
|
||||||
|
```
|
||||||
|
|
||||||
## Offline Inference
|
## Offline Inference
|
||||||
|
|
||||||
The {class}`~vllm.LLM` class provides various methods for offline inference.
|
The {class}`~vllm.LLM` class provides various methods for offline inference.
|
||||||
See [Engine Arguments](#engine-args) for a list of options when initializing the model.
|
See [Engine Arguments](#engine-args) for a list of options when initializing the model.
|
||||||
|
|
||||||
For pooling models, we support the following {code}`task` options:
|
|
||||||
|
|
||||||
- Embedding ({code}`"embed"` / {code}`"embedding"`)
|
|
||||||
- Classification ({code}`"classify"`)
|
|
||||||
- Sentence Pair Scoring ({code}`"score"`)
|
|
||||||
- Reward Modeling ({code}`"reward"`)
|
|
||||||
|
|
||||||
The selected task determines the default {class}`~vllm.model_executor.layers.Pooler` that is used:
|
|
||||||
|
|
||||||
- Embedding: Extract only the hidden states corresponding to the last token, and apply normalization.
|
|
||||||
- Classification: Extract only the hidden states corresponding to the last token, and apply softmax.
|
|
||||||
- Sentence Pair Scoring: Extract only the hidden states corresponding to the last token, and apply softmax.
|
|
||||||
- Reward Modeling: Extract all of the hidden states and return them directly.
|
|
||||||
|
|
||||||
When loading [Sentence Transformers](https://huggingface.co/sentence-transformers) models,
|
|
||||||
we attempt to override the default pooler based on its Sentence Transformers configuration file ({code}`modules.json`).
|
|
||||||
|
|
||||||
You can customize the model's pooling method via the {code}`override_pooler_config` option,
|
|
||||||
which takes priority over both the model's and Sentence Transformers's defaults.
|
|
||||||
|
|
||||||
### `LLM.encode`
|
### `LLM.encode`
|
||||||
|
|
||||||
The {class}`~vllm.LLM.encode` method is available to all pooling models in vLLM.
|
The {class}`~vllm.LLM.encode` method is available to all pooling models in vLLM.
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user