[Doc] Fix broken links and unlinked docs, add shortcuts to home sidebar (#18627)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
Cyrus Leung 2025-05-24 01:22:40 +08:00 committed by GitHub
parent 15b45ffb9a
commit 371f7e4ca2
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 16 additions and 11 deletions

View File

@ -9,8 +9,13 @@ nav:
- getting_started/examples/offline_inference - getting_started/examples/offline_inference
- getting_started/examples/online_serving - getting_started/examples/online_serving
- getting_started/examples/other - getting_started/examples/other
- Roadmap: https://roadmap.vllm.ai - Quick Links:
- Releases: https://github.com/vllm-project/vllm/releases - User Guide: serving/offline_inference.md
- Developer Guide: contributing/overview.md
- API Reference: api/README.md
- Timeline:
- Roadmap: https://roadmap.vllm.ai
- Releases: https://github.com/vllm-project/vllm/releases
- User Guide: - User Guide:
- Inference and Serving: - Inference and Serving:
- serving/offline_inference.md - serving/offline_inference.md
@ -38,7 +43,7 @@ nav:
- contributing/overview.md - contributing/overview.md
- glob: contributing/* - glob: contributing/*
flatten_single_child_sections: true flatten_single_child_sections: true
- contributing/model - Model Implementation: contributing/model
- Design Documents: - Design Documents:
- V0: design - V0: design
- V1: design/v1 - V1: design/v1

View File

@ -33,14 +33,14 @@ These tests compare the model outputs of vLLM against [HF Transformers](https://
#### Generative models #### Generative models
For [generative models][generative-models], there are two levels of correctness tests, as defined in <gh-file:tests/models/utils.py>: For [generative models](../../models/generative_models.md), there are two levels of correctness tests, as defined in <gh-file:tests/models/utils.py>:
- Exact correctness (`check_outputs_equal`): The text outputted by vLLM should exactly match the text outputted by HF. - Exact correctness (`check_outputs_equal`): The text outputted by vLLM should exactly match the text outputted by HF.
- Logprobs similarity (`check_logprobs_close`): The logprobs outputted by vLLM should be in the top-k logprobs outputted by HF, and vice versa. - Logprobs similarity (`check_logprobs_close`): The logprobs outputted by vLLM should be in the top-k logprobs outputted by HF, and vice versa.
#### Pooling models #### Pooling models
For [pooling models][pooling-models], we simply check the cosine similarity, as defined in <gh-file:tests/models/embedding/utils.py>. For [pooling models](../../models/pooling_models.md), we simply check the cosine similarity, as defined in <gh-file:tests/models/utils.py>.
[](){ #mm-processing-tests } [](){ #mm-processing-tests }

View File

@ -170,7 +170,7 @@ A variety of speculative models of this type are available on HF hub:
## Speculating using EAGLE based draft models ## Speculating using EAGLE based draft models
The following code configures vLLM to use speculative decoding where proposals are generated by The following code configures vLLM to use speculative decoding where proposals are generated by
an [EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency)](https://arxiv.org/pdf/2401.15077) based draft model. A more detailed example for offline mode, including how to extract request level acceptance rate, can be found [here](<gh-file:examples/offline_inference/eagle.py>). an [EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency)](https://arxiv.org/pdf/2401.15077) based draft model. A more detailed example for offline mode, including how to extract request level acceptance rate, can be found [here](gh-file:examples/offline_inference/eagle.py).
```python ```python
from vllm import LLM, SamplingParams from vllm import LLM, SamplingParams

View File

@ -3,7 +3,7 @@ title: Supported Models
--- ---
[](){ #supported-models } [](){ #supported-models }
vLLM supports [generative](generative-models) and [pooling](pooling-models) models across various tasks. vLLM supports [generative](./generative_models.md) and [pooling](./pooling_models.md) models across various tasks.
If a model supports more than one task, you can set the task via the `--task` argument. If a model supports more than one task, you can set the task via the `--task` argument.
For each task, we list the model architectures that have been implemented in vLLM. For each task, we list the model architectures that have been implemented in vLLM.
@ -376,7 +376,7 @@ Specified using `--task generate`.
### Pooling Models ### Pooling Models
See [this page](pooling-models) for more information on how to use pooling models. See [this page](./pooling_models.md) for more information on how to use pooling models.
!!! warning !!! warning
Since some model architectures support both generative and pooling tasks, Since some model architectures support both generative and pooling tasks,
@ -628,7 +628,7 @@ Specified using `--task generate`.
### Pooling Models ### Pooling Models
See [this page](pooling-models) for more information on how to use pooling models. See [this page](./pooling_models.md) for more information on how to use pooling models.
!!! warning !!! warning
Since some model architectures support both generative and pooling tasks, Since some model architectures support both generative and pooling tasks,

View File

@ -5,7 +5,7 @@ title: OpenAI-Compatible Server
vLLM provides an HTTP server that implements OpenAI's [Completions API](https://platform.openai.com/docs/api-reference/completions), [Chat API](https://platform.openai.com/docs/api-reference/chat), and more! This functionality lets you serve models and interact with them using an HTTP client. vLLM provides an HTTP server that implements OpenAI's [Completions API](https://platform.openai.com/docs/api-reference/completions), [Chat API](https://platform.openai.com/docs/api-reference/chat), and more! This functionality lets you serve models and interact with them using an HTTP client.
In your terminal, you can [install](../getting_started/installation.md) vLLM, then start the server with the [`vllm serve`][serve-args] command. (You can also use our [Docker][deployment-docker] image.) In your terminal, you can [install](../getting_started/installation/README.md) vLLM, then start the server with the [`vllm serve`][serve-args] command. (You can also use our [Docker][deployment-docker] image.)
```bash ```bash
vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123 vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123

View File

@ -1,4 +1,4 @@
# Seed Parameter Behavior in vLLM # Seed Parameter Behavior
## Overview ## Overview