From 359200f6ac9919140b995d5aa906854ba16e4870 Mon Sep 17 00:00:00 2001 From: Reid <61492567+reidliu41@users.noreply.github.com> Date: Thu, 3 Jul 2025 15:21:57 +0800 Subject: [PATCH] [doc] fix link (#20417) Signed-off-by: reidliu41 --- examples/offline_inference/profiling_tpu/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/offline_inference/profiling_tpu/README.md b/examples/offline_inference/profiling_tpu/README.md index 6595efec4377..e0122c05cff1 100644 --- a/examples/offline_inference/profiling_tpu/README.md +++ b/examples/offline_inference/profiling_tpu/README.md @@ -4,7 +4,7 @@ This script is used to profile the TPU performance of vLLM for specific prefill Note: an actual running server is a mix of both prefill of many shapes and decode of many shapes. -We assume you are on a TPU already (this was tested on TPU v6e) and have installed vLLM according to the [installation guide](https://docs.vllm.ai/en/latest/getting_started/installation/ai_accelerator/index.html). +We assume you are on a TPU already (this was tested on TPU v6e) and have installed vLLM according to the [Google TPU installation guide](https://docs.vllm.ai/en/latest/getting_started/installation/google_tpu.html). > In all examples below, we run several warmups before (so `--enforce-eager` is okay)