From 51d41265ad841a3b6efea665c83cdc5d54eb7c1d Mon Sep 17 00:00:00 2001 From: Harry Mellor <19981378+hmellor@users.noreply.github.com> Date: Thu, 11 Sep 2025 17:07:23 +0100 Subject: [PATCH] [Docs] Fix typos in EP deployment doc (#24669) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> --- docs/serving/expert_parallel_deployment.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/serving/expert_parallel_deployment.md b/docs/serving/expert_parallel_deployment.md index f8701870864dc..494d2ad021e71 100644 --- a/docs/serving/expert_parallel_deployment.md +++ b/docs/serving/expert_parallel_deployment.md @@ -158,10 +158,10 @@ vllm serve Qwen/Qwen3-30B-A3B \ ### Memory Footprint Overhead -EPLB uses redundant experts to that need to fit in GPU memory. This means that EPLB may not be a good fit for memory constrained environments or when KV cache space is at a premium. +EPLB uses redundant experts that need to fit in GPU memory. This means that EPLB may not be a good fit for memory constrained environments or when KV cache space is at a premium. This overhead equals `NUM_MOE_LAYERS * BYTES_PER_EXPERT * (NUM_TOTAL_EXPERTS + NUM_REDUNDANT_EXPERTS) รท NUM_EP_RANKS`. -For DeepSeekV3, this is approximately `2.4 GB` for one redundant expert per rank. +For DeepSeekV3, this is approximately `2.4 GB` for one redundant expert per EP rank. ### Example Command