[Doc] Fix broken links and unlinked docs, add shortcuts to home sidebar (#18627)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-16 15:57:15 +08:00 · 2025-05-24 01:22:40 +08:00 · 2025-05-24 01:22:40 +08:00 · 371f7e4ca2
commit 371f7e4ca2
parent 15b45ffb9a
6 changed files with 16 additions and 11 deletions
--- a/docs/.nav.yml
+++ b/docs/.nav.yml
@ -9,8 +9,13 @@ nav:
      - getting_started/examples/offline_inference
      - getting_started/examples/online_serving
      - getting_started/examples/other
-    - Roadmap: https://roadmap.vllm.ai
-    - Releases: https://github.com/vllm-project/vllm/releases
+    - Quick Links:
+      - User Guide: serving/offline_inference.md
+      - Developer Guide: contributing/overview.md
+      - API Reference: api/README.md
+    - Timeline:
+      - Roadmap: https://roadmap.vllm.ai
+      - Releases: https://github.com/vllm-project/vllm/releases
  - User Guide:
    - Inference and Serving:
      - serving/offline_inference.md
@ -38,7 +43,7 @@ nav:
    - contributing/overview.md
    - glob: contributing/*
      flatten_single_child_sections: true
-    - contributing/model
+    - Model Implementation: contributing/model
    - Design Documents:
      - V0: design
      - V1: design/v1
--- a/docs/contributing/model/tests.md
+++ b/docs/contributing/model/tests.md
@ -33,14 +33,14 @@ These tests compare the model outputs of vLLM against [HF Transformers](https://

 #### Generative models

-For [generative models][generative-models], there are two levels of correctness tests, as defined in <gh-file:tests/models/utils.py>:
+For [generative models](../../models/generative_models.md), there are two levels of correctness tests, as defined in <gh-file:tests/models/utils.py>:

 - Exact correctness (`check_outputs_equal`): The text outputted by vLLM should exactly match the text outputted by HF.
 - Logprobs similarity (`check_logprobs_close`): The logprobs outputted by vLLM should be in the top-k logprobs outputted by HF, and vice versa.

 #### Pooling models

-For [pooling models][pooling-models], we simply check the cosine similarity, as defined in <gh-file:tests/models/embedding/utils.py>.
+For [pooling models](../../models/pooling_models.md), we simply check the cosine similarity, as defined in <gh-file:tests/models/utils.py>.

 [](){ #mm-processing-tests }

--- a/docs/features/spec_decode.md
+++ b/docs/features/spec_decode.md
@ -170,7 +170,7 @@ A variety of speculative models of this type are available on HF hub:
 ## Speculating using EAGLE based draft models

 The following code configures vLLM to use speculative decoding where proposals are generated by
-an [EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency)](https://arxiv.org/pdf/2401.15077) based draft model. A more detailed example for offline mode, including how to extract request level acceptance rate, can be found [here](<gh-file:examples/offline_inference/eagle.py>).
+an [EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency)](https://arxiv.org/pdf/2401.15077) based draft model. A more detailed example for offline mode, including how to extract request level acceptance rate, can be found [here](gh-file:examples/offline_inference/eagle.py).

 ```python
 from vllm import LLM, SamplingParams
--- a/docs/models/supported_models.md
+++ b/docs/models/supported_models.md
@ -3,7 +3,7 @@ title: Supported Models
 ---
 [](){ #supported-models }

-vLLM supports [generative](generative-models) and [pooling](pooling-models) models across various tasks.
+vLLM supports [generative](./generative_models.md) and [pooling](./pooling_models.md) models across various tasks.
 If a model supports more than one task, you can set the task via the `--task` argument.

 For each task, we list the model architectures that have been implemented in vLLM.
@ -376,7 +376,7 @@ Specified using `--task generate`.

 ### Pooling Models

-See [this page](pooling-models) for more information on how to use pooling models.
+See [this page](./pooling_models.md) for more information on how to use pooling models.

 !!! warning
    Since some model architectures support both generative and pooling tasks,
@ -628,7 +628,7 @@ Specified using `--task generate`.

 ### Pooling Models

-See [this page](pooling-models) for more information on how to use pooling models.
+See [this page](./pooling_models.md) for more information on how to use pooling models.

 !!! warning
    Since some model architectures support both generative and pooling tasks,
--- a/docs/serving/openai_compatible_server.md
+++ b/docs/serving/openai_compatible_server.md
@ -5,7 +5,7 @@ title: OpenAI-Compatible Server

 vLLM provides an HTTP server that implements OpenAI's [Completions API](https://platform.openai.com/docs/api-reference/completions), [Chat API](https://platform.openai.com/docs/api-reference/chat), and more! This functionality lets you serve models and interact with them using an HTTP client.

-In your terminal, you can [install](../getting_started/installation.md) vLLM, then start the server with the [`vllm serve`][serve-args] command. (You can also use our [Docker][deployment-docker] image.)
+In your terminal, you can [install](../getting_started/installation/README.md) vLLM, then start the server with the [`vllm serve`][serve-args] command. (You can also use our [Docker][deployment-docker] image.)

 ```bash
 vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123
--- a/docs/serving/seed_parameter_behavior.md
+++ b/docs/serving/seed_parameter_behavior.md
@ -1,4 +1,4 @@
-# Seed Parameter Behavior in vLLM
+# Seed Parameter Behavior

 ## Overview