mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-10 12:19:22 +08:00
[Docs]add eplb_config param use docs (#24213)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
This commit is contained in:
parent
55be93baf5
commit
c44797a4d6
@ -123,12 +123,33 @@ When enabled, vLLM collects load statistics with every forward pass and periodic
|
|||||||
|
|
||||||
### EPLB Parameters
|
### EPLB Parameters
|
||||||
|
|
||||||
|
Configure EPLB with the `--eplb-config` argument, which accepts a JSON string. The available keys and their descriptions are:
|
||||||
|
|
||||||
| Parameter | Description | Default |
|
| Parameter | Description | Default |
|
||||||
|-----------|-------------|---------|
|
|-----------|-------------|---------|
|
||||||
| `--eplb-window-size` | Number of engine steps to track for rebalancing decisions | - |
|
| `window_size`| Number of engine steps to track for rebalancing decisions | 1000 |
|
||||||
| `--eplb-step-interval` | Frequency of rebalancing (every N engine steps) | - |
|
| `step_interval`| Frequency of rebalancing (every N engine steps) | 3000 |
|
||||||
| `--eplb-log-balancedness` | Log balancedness metrics (avg tokens per expert ÷ max tokens per expert) | `false` |
|
| `log_balancedness` | Log balancedness metrics (avg tokens per expert ÷ max tokens per expert) | `false` |
|
||||||
| `--num-redundant-experts` | Additional global experts per EP rank beyond equal distribution | `0` |
|
| `num_redundant_experts` | Additional global experts per EP rank beyond equal distribution | `0` |
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
vllm serve Qwen/Qwen3-30B-A3B \
|
||||||
|
--enable-eplb \
|
||||||
|
--eplb-config '{"window_size":1000,"step_interval":3000,"num_redundant_experts":2,"log_balancedness":true}'
|
||||||
|
```
|
||||||
|
|
||||||
|
??? tip "Prefer individual arguments instead of JSON?"
|
||||||
|
|
||||||
|
```bash
|
||||||
|
vllm serve Qwen/Qwen3-30B-A3B \
|
||||||
|
--enable-eplb \
|
||||||
|
--eplb-config.window_size 1000 \
|
||||||
|
--eplb-config.step_interval 3000 \
|
||||||
|
--eplb-config.num_redundant_experts 2 \
|
||||||
|
--eplb-config.log_balancedness true
|
||||||
|
```
|
||||||
|
|
||||||
### Expert Distribution Formula
|
### Expert Distribution Formula
|
||||||
|
|
||||||
@ -146,12 +167,10 @@ VLLM_ALL2ALL_BACKEND=pplx VLLM_USE_DEEP_GEMM=1 vllm serve deepseek-ai/DeepSeek-V
|
|||||||
--data-parallel-size 8 \ # Data parallelism
|
--data-parallel-size 8 \ # Data parallelism
|
||||||
--enable-expert-parallel \ # Enable EP
|
--enable-expert-parallel \ # Enable EP
|
||||||
--enable-eplb \ # Enable load balancer
|
--enable-eplb \ # Enable load balancer
|
||||||
--eplb-log-balancedness \ # Log balancing metrics
|
--eplb-config '{"window_size":1000,"step_interval":3000,"num_redundant_experts":2,"log_balancedness":true}'
|
||||||
--eplb-window-size 1000 \ # Track last 1000 engine steps
|
|
||||||
--eplb-step-interval 3000 # Rebalance every 3000 steps
|
|
||||||
```
|
```
|
||||||
|
|
||||||
For multi-node deployment, add these EPLB flags to each node's command. We recommend setting `--num-redundant-experts` to 32 in large scale use cases so the most popular experts are always available.
|
For multi-node deployment, add these EPLB flags to each node's command. We recommend setting `--eplb-config '{"num_redundant_experts":32}'` to 32 in large scale use cases so the most popular experts are always available.
|
||||||
|
|
||||||
## Disaggregated Serving (Prefill/Decode Split)
|
## Disaggregated Serving (Prefill/Decode Split)
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user