# EPD Correctness Test This test verifies that EPD (Encoder-Prefill-Decode) disaggregation produces identical outputs to a baseline single instance. ## What It Tests - **Baseline**: Single vLLM instance serving a multimodal model - **EPD (1E+1PD)**: 1 Encoder + 1 Prefill-Decode instance - **Baseline (1P+1D)**: 1 Prefill + 1 Decode instance - **EPD (1E+1P+1D)**: 1 Encoder + 1 Prefill + 1 Decode instance The test ensures that disaggregated encoding produces **identical** outputs to the baseline. Note that currently PD disaggregation set up may give slightly different results from a single instance. Therefore, we need the result from 1P+1D as the baseline for 1E+1P+1D Please refer to [Disaggregated Encoder Feature](../../../docs/features/disagg_encoder.md) for the detailed explanation for the EPD features. ## Files - `run_epd_correctness_test.sh` - Main test script (starts all instances and runs tests) - `test_epd_correctness.py` - Python test script (compares outputs) ## Usage ### Multimodal Prompts (Default) ```bash cd vllm ./tests/v1/ec_connector/integration/run_epd_correctness_test.sh ``` This runs the test with actual multimodal (image) prompts. ### Text-Only Prompts ```bash cd vllm USE_MM_PROMPTS=0 ./tests/v1/ec_connector/integration/run_epd_correctness_test.sh ``` This runs a quick test with text-only prompts to verify the setup works. ### Custom Configuration ```bash # Use specific GPUs GPU_E=0 GPU_PD=1 GPU_P=1 GPU_D=2 bash ./tests/v1/ec_connector/integration/run_epd_correctness_test.sh # Use specific ports ENDPOINT_PORT=10001 bash ./tests/v1/ec_connector/integration/run_epd_correctness_test.sh # Use specific model MODEL="Qwen/Qwen2.5-VL-3B-Instruct" bash ./tests/v1/ec_connector/integration/run_epd_correctness_test.sh # Use specific storage path EC_SHARED_STORAGE_PATH="/tmp/my_ec_cache" bash ./tests/v1/ec_connector/integration/run_epd_correctness_test.sh ``` ## How It Works ### Step 1: Baseline 1. Start single vLLM instance on GPU 2. Run test prompts (multimodal or text-only) 3. Save outputs to `.vllm_epd_baseline.txt` 4. Shutdown instance ### Step 2: EPD (1E + 1PD) 1. Clear encoder cache storage 2. Start instances and proxy 3. Run same test prompts 4. Assert outputs match baseline exactly 5. Shutdown instances ### Step 3: EPD (1E + 1P + 1D) 1. Clear encoder cache storage 2. Start instances and proxy 3. Run same test prompts 4. Assert outputs match baseline exactly 5. Shutdown instances ## Test Scenarios ### Multimodal Prompts (--use_mm_prompts) Tests encoder cache transfer: - Single image query - Multiple images in one request - Mixed image and text - Image with detailed questions ### Text-Only Prompts (default) Quick sanity check: - Simple text queries - Text-only explanations - Verifies proxy routing works ## Expected Behavior ### ✅ Test Passes When - All disagg outputs match baseline outputs exactly - No errors during instance startup - Encoder cache is properly saved and loaded - Proxy correctly routes requests ### ❌ Test Fails When - Outputs differ between baseline and disagg - Server startup fails - Encoder cache not found (should fall back to local execution) - Proxy routing errors ## Notes - The test uses deterministic generation (`temperature=0.0`, `seed=42`) - Encoder cache should enable exact output reproduction - Test cleans up all instances and cache files after completion - Safe to run multiple times (idempotent) - We setup the PD disagg part with NixlConnector. Please read details about EPD in `examples/online_serving/disaggregated_encoder/README.md` ## Requirements - Multiple GPUs (3 for 1E+1P+1D, 2 for 1E+1PD, 1 for baseline) - 1E+1P+1D is runnable with 2 GPU by assign E and P on the same GPU now. - Multimodal model (e.g., Qwen2.5-VL-3B-Instruct) - Internet access (for accessing vllm test images) ## Debugging ### Check Logs Logs and baseline output are saved in `/tmp/` by default. Can be customized by changing the environment variables. ### Check Encoder Cache ```bash # Verify cache files are created ls -la $EC_SHARED_STORAGE_PATH/ # Should see directories with mm_hash names # Each containing encoder_cache.safetensors ``` ### Manual Testing Run individual components: ```bash # Baseline only python test_epd_correctness.py \ --service_url http://localhost:8000 \ --model_name Qwen/Qwen2.5-VL-3B-Instruct \ --mode baseline \ --baseline_file test_output.txt \ --use_mm_prompts # Disagg only (requires baseline output file!) python test_epd_correctness.py \ --service_url http://localhost:8000 \ --model_name Qwen/Qwen2.5-VL-3B-Instruct \ --mode disagg \ --baseline_file test_output.txt \ --use_mm_prompts ```