Reid a7bab0c9e5
[Misc] small update (#20462)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-03 20:33:44 -07:00

59 lines
1.4 KiB
Markdown

# Structured Outputs
This script demonstrates various structured output capabilities of vLLM's OpenAI-compatible server.
It can run individual constraint type or all of them.
It supports both streaming responses and concurrent non-streaming requests.
To use this example, you must start an vLLM server with any model of your choice.
```bash
vllm serve Qwen/Qwen2.5-3B-Instruct
```
To serve a reasoning model, you can use the following command:
```bash
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
--reasoning-parser deepseek_r1
```
If you want to run this script standalone with `uv`, you can use the following:
```bash
uvx --from git+https://github.com/vllm-project/vllm#subdirectory=examples/online_serving/structured_outputs \
structured-output
```
See [feature docs](https://docs.vllm.ai/en/latest/features/structured_outputs.html) for more information.
!!! tip
If vLLM is running remotely, then set `OPENAI_BASE_URL=<remote_url>` before running the script.
## Usage
Run all constraints, non-streaming:
```bash
uv run structured_outputs.py
```
Run all constraints, streaming:
```bash
uv run structured_outputs.py --stream
```
Run certain constraints, for example `structural_tag` and `regex`, streaming:
```bash
uv run structured_outputs.py \
--constraint structural_tag regex \
--stream
```
Run all constraints, with reasoning models and streaming:
```bash
uv run structured_outputs.py --reasoning --stream
```