mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2026-05-21 01:17:02 +08:00
[Doc] Update Batch-level DP docs (#25757)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: yewentao256 <zhyanwentao@126.com>
This commit is contained in:
parent
ced693e845
commit
87ee8535a6
@ -139,9 +139,9 @@ there is relatively little gain from TP. On the other hand, TP incurs significan
|
|||||||
overhead because of all-reduce being performed after every layer.
|
overhead because of all-reduce being performed after every layer.
|
||||||
|
|
||||||
Given this, it may be advantageous to instead shard the batched input data using TP, essentially
|
Given this, it may be advantageous to instead shard the batched input data using TP, essentially
|
||||||
performing batch-level DP. This has been shown to improve the throughput by around 10% for
|
performing batch-level DP. This has been shown to improve the throughput and TTFT by around 10% for
|
||||||
`tensor_parallel_size=8`. For vision encoders that use hardware-unoptimized Conv3D operations,
|
`tensor_parallel_size=8`. For vision encoders that use hardware-unoptimized Conv3D operations,
|
||||||
batch-level DP can provide another 40% increase to throughput compared to regular TP.
|
batch-level DP can provide another 40% improvement compared to regular TP.
|
||||||
|
|
||||||
Nevertheless, since the weights of the multi-modal encoder are replicated across each TP rank,
|
Nevertheless, since the weights of the multi-modal encoder are replicated across each TP rank,
|
||||||
there will be a minor increase in memory consumption and may cause OOM if you can barely fit the model already.
|
there will be a minor increase in memory consumption and may cause OOM if you can barely fit the model already.
|
||||||
@ -172,14 +172,15 @@ Batch-level DP needs to be implemented on a per-model basis,
|
|||||||
and enabled by setting `supports_encoder_tp_data = True` in the model class.
|
and enabled by setting `supports_encoder_tp_data = True` in the model class.
|
||||||
Regardless, you need to set `mm_encoder_tp_mode="data"` in engine arguments to use this feature.
|
Regardless, you need to set `mm_encoder_tp_mode="data"` in engine arguments to use this feature.
|
||||||
|
|
||||||
Known supported models:
|
Known supported models (with corresponding benchmarks):
|
||||||
|
|
||||||
- GLM-4.5V GLM-4.1V (<gh-pr:23168>)
|
- dots_ocr (<gh-pr:25466>)
|
||||||
|
- GLM-4.1V or above (<gh-pr:23168>)
|
||||||
- InternVL (<gh-pr:23909>)
|
- InternVL (<gh-pr:23909>)
|
||||||
- Kimi-VL (<gh-pr:23817>)
|
- Kimi-VL (<gh-pr:23817>)
|
||||||
- Llama4 (<gh-pr:18368>)
|
- Llama4 (<gh-pr:18368>)
|
||||||
- MiniCPM-V-2.5 or above (<gh-pr:23327>, <gh-pr:23948>)
|
- MiniCPM-V-2.5 or above (<gh-pr:23327>, <gh-pr:23948>)
|
||||||
- Qwen2.5-VL (<gh-pr:22742>)
|
- Qwen2-VL or above (<gh-pr:22742>, <gh-pr:24955>, <gh-pr:25445>)
|
||||||
- Step3 (<gh-pr:22697>)
|
- Step3 (<gh-pr:22697>)
|
||||||
|
|
||||||
## Input Processing
|
## Input Processing
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user