mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-09 21:35:01 +08:00

[Doc]: fix typos in various files (#29010 )

Signed-off-by: Didier Durand <durand.didier@gmail.com>

2025-11-19 04:56:21 -08:00

4.8 KiB

Raw Blame History

EPD Correctness Test

This test verifies that EPD (Encoder-Prefill-Decode) disaggregation produces identical outputs to a baseline single instance.

What It Tests

Baseline: Single vLLM instance serving a multimodal model
EPD (1E+1PD): 1 Encoder + 1 Prefill-Decode instance
Baseline (1P+1D): 1 Prefill + 1 Decode instance
EPD (1E+1P+1D): 1 Encoder + 1 Prefill + 1 Decode instance

The test ensures that disaggregated encoding produces identical outputs to the baseline.

Note that currently PD disaggregation set up may give slightly different results from a single instance. Therefore, we need the result from 1P+1D as the baseline for 1E+1P+1D

Please refer to Disaggregated Encoder Feature for the detailed explanation for the EPD features.

Files

run_epd_correctness_test.sh - Main test script (starts all instances and runs tests)
test_epd_correctness.py - Python test script (compares outputs)

Usage

Multimodal Prompts (Default)

cd vllm
./tests/v1/ec_connector/integration/run_epd_correctness_test.sh

This runs the test with actual multimodal (image) prompts.

Text-Only Prompts

cd vllm
USE_MM_PROMPTS=0 ./tests/v1/ec_connector/integration/run_epd_correctness_test.sh

This runs a quick test with text-only prompts to verify the setup works.

Custom Configuration

# Use specific GPUs
GPU_E=0 GPU_PD=1 GPU_P=1 GPU_D=2 bash ./tests/v1/ec_connector/integration/run_epd_correctness_test.sh

# Use specific ports
ENDPOINT_PORT=10001 bash ./tests/v1/ec_connector/integration/run_epd_correctness_test.sh

# Use specific model
MODEL="Qwen/Qwen2.5-VL-3B-Instruct" bash ./tests/v1/ec_connector/integration/run_epd_correctness_test.sh

# Use specific storage path
EC_SHARED_STORAGE_PATH="/tmp/my_ec_cache" bash ./tests/v1/ec_connector/integration/run_epd_correctness_test.sh

How It Works

Step 1: Baseline

Start single vLLM instance on GPU
Run test prompts (multimodal or text-only)
Save outputs to .vllm_epd_baseline.txt
Shutdown instance

Step 2: EPD (1E + 1PD)

Clear encoder cache storage
Start instances and proxy
Run same test prompts
Assert outputs match baseline exactly
Shutdown instances

Step 3: EPD (1E + 1P + 1D)

Clear encoder cache storage
Start instances and proxy
Run same test prompts
Assert outputs match baseline exactly
Shutdown instances

Test Scenarios

Multimodal Prompts (--use_mm_prompts)

Tests encoder cache transfer:

Single image query
Multiple images in one request
Mixed image and text
Image with detailed questions

Text-Only Prompts (default)

Quick sanity check:

Simple text queries
Text-only explanations
Verifies proxy routing works

Expected Behavior

✅ Test Passes When

All disagg outputs match baseline outputs exactly
No errors during instance startup
Encoder cache is properly saved and loaded
Proxy correctly routes requests

❌ Test Fails When

Outputs differ between baseline and disagg
Server startup fails
Encoder cache not found (should fall back to local execution)
Proxy routing errors

Notes

The test uses deterministic generation (temperature=0.0, seed=42)
Encoder cache should enable exact output reproduction
Test cleans up all instances and cache files after completion
Safe to run multiple times (idempotent)
We setup the PD disagg part with NixlConnector. Please read details about EPD in examples/online_serving/disaggregated_encoder/README.md

Requirements

Multiple GPUs (3 for 1E+1P+1D, 2 for 1E+1PD, 1 for baseline)
- 1E+1P+1D is runnable with 2 GPU by assign E and P on the same GPU now.
Multimodal model (e.g., Qwen2.5-VL-3B-Instruct)
Internet access (for accessing vllm test images)

Debugging

Check Logs

Logs and baseline output are saved in /tmp/ by default. Can be customized by changing the environment variables.

Check Encoder Cache

# Verify cache files are created
ls -la $EC_SHARED_STORAGE_PATH/

# Should see directories with mm_hash names
# Each containing encoder_cache.safetensors

Manual Testing

Run individual components:

# Baseline only
python test_epd_correctness.py \
    --service_url http://localhost:8000 \
    --model_name Qwen/Qwen2.5-VL-3B-Instruct \
    --mode baseline \
    --baseline_file test_output.txt \
    --use_mm_prompts

# Disagg only (requires baseline output file!)
python test_epd_correctness.py \
    --service_url http://localhost:8000 \
    --model_name Qwen/Qwen2.5-VL-3B-Instruct \
    --mode disagg \
    --baseline_file test_output.txt \
    --use_mm_prompts

4.8 KiB Raw Blame History