Didier Durand 09540cd918
[Doc]: fix typos in various files (#29010)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-19 04:56:21 -08:00

4.8 KiB

EPD Correctness Test

This test verifies that EPD (Encoder-Prefill-Decode) disaggregation produces identical outputs to a baseline single instance.

What It Tests

  • Baseline: Single vLLM instance serving a multimodal model
  • EPD (1E+1PD): 1 Encoder + 1 Prefill-Decode instance
  • Baseline (1P+1D): 1 Prefill + 1 Decode instance
  • EPD (1E+1P+1D): 1 Encoder + 1 Prefill + 1 Decode instance

The test ensures that disaggregated encoding produces identical outputs to the baseline.

Note that currently PD disaggregation set up may give slightly different results from a single instance. Therefore, we need the result from 1P+1D as the baseline for 1E+1P+1D

Please refer to Disaggregated Encoder Feature for the detailed explanation for the EPD features.

Files

  • run_epd_correctness_test.sh - Main test script (starts all instances and runs tests)
  • test_epd_correctness.py - Python test script (compares outputs)

Usage

Multimodal Prompts (Default)

cd vllm
./tests/v1/ec_connector/integration/run_epd_correctness_test.sh

This runs the test with actual multimodal (image) prompts.

Text-Only Prompts

cd vllm
USE_MM_PROMPTS=0 ./tests/v1/ec_connector/integration/run_epd_correctness_test.sh

This runs a quick test with text-only prompts to verify the setup works.

Custom Configuration

# Use specific GPUs
GPU_E=0 GPU_PD=1 GPU_P=1 GPU_D=2 bash ./tests/v1/ec_connector/integration/run_epd_correctness_test.sh

# Use specific ports
ENDPOINT_PORT=10001 bash ./tests/v1/ec_connector/integration/run_epd_correctness_test.sh

# Use specific model
MODEL="Qwen/Qwen2.5-VL-3B-Instruct" bash ./tests/v1/ec_connector/integration/run_epd_correctness_test.sh

# Use specific storage path
EC_SHARED_STORAGE_PATH="/tmp/my_ec_cache" bash ./tests/v1/ec_connector/integration/run_epd_correctness_test.sh

How It Works

Step 1: Baseline

  1. Start single vLLM instance on GPU
  2. Run test prompts (multimodal or text-only)
  3. Save outputs to .vllm_epd_baseline.txt
  4. Shutdown instance

Step 2: EPD (1E + 1PD)

  1. Clear encoder cache storage
  2. Start instances and proxy
  3. Run same test prompts
  4. Assert outputs match baseline exactly
  5. Shutdown instances

Step 3: EPD (1E + 1P + 1D)

  1. Clear encoder cache storage
  2. Start instances and proxy
  3. Run same test prompts
  4. Assert outputs match baseline exactly
  5. Shutdown instances

Test Scenarios

Multimodal Prompts (--use_mm_prompts)

Tests encoder cache transfer:

  • Single image query
  • Multiple images in one request
  • Mixed image and text
  • Image with detailed questions

Text-Only Prompts (default)

Quick sanity check:

  • Simple text queries
  • Text-only explanations
  • Verifies proxy routing works

Expected Behavior

Test Passes When

  • All disagg outputs match baseline outputs exactly
  • No errors during instance startup
  • Encoder cache is properly saved and loaded
  • Proxy correctly routes requests

Test Fails When

  • Outputs differ between baseline and disagg
  • Server startup fails
  • Encoder cache not found (should fall back to local execution)
  • Proxy routing errors

Notes

  • The test uses deterministic generation (temperature=0.0, seed=42)
  • Encoder cache should enable exact output reproduction
  • Test cleans up all instances and cache files after completion
  • Safe to run multiple times (idempotent)
  • We setup the PD disagg part with NixlConnector. Please read details about EPD in examples/online_serving/disaggregated_encoder/README.md

Requirements

  • Multiple GPUs (3 for 1E+1P+1D, 2 for 1E+1PD, 1 for baseline)
    • 1E+1P+1D is runnable with 2 GPU by assign E and P on the same GPU now.
  • Multimodal model (e.g., Qwen2.5-VL-3B-Instruct)
  • Internet access (for accessing vllm test images)

Debugging

Check Logs

Logs and baseline output are saved in /tmp/ by default. Can be customized by changing the environment variables.

Check Encoder Cache

# Verify cache files are created
ls -la $EC_SHARED_STORAGE_PATH/

# Should see directories with mm_hash names
# Each containing encoder_cache.safetensors

Manual Testing

Run individual components:

# Baseline only
python test_epd_correctness.py \
    --service_url http://localhost:8000 \
    --model_name Qwen/Qwen2.5-VL-3B-Instruct \
    --mode baseline \
    --baseline_file test_output.txt \
    --use_mm_prompts

# Disagg only (requires baseline output file!)
python test_epd_correctness.py \
    --service_url http://localhost:8000 \
    --model_name Qwen/Qwen2.5-VL-3B-Instruct \
    --mode disagg \
    --baseline_file test_output.txt \
    --use_mm_prompts