Aaron Pham 29283e8976
[Chore] Cleanup guided namespace, move to structured outputs config (#22772)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-18 09:20:27 +00:00
..
2025-07-03 20:33:44 -07:00

Structured Outputs

This script demonstrates various structured output capabilities of vLLM's OpenAI-compatible server. It can run individual constraint type or all of them. It supports both streaming responses and concurrent non-streaming requests.

To use this example, you must start an vLLM server with any model of your choice.

vllm serve Qwen/Qwen2.5-3B-Instruct

To serve a reasoning model, you can use the following command:

vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
    --reasoning-parser deepseek_r1

If you want to run this script standalone with uv, you can use the following:

uvx --from git+https://github.com/vllm-project/vllm#subdirectory=examples/online_serving/structured_outputs \
    structured-output

See feature docs for more information.

!!! tip If vLLM is running remotely, then set OPENAI_BASE_URL=<remote_url> before running the script.

Usage

Run all constraints, non-streaming:

uv run structured_outputs.py

Run all constraints, streaming:

uv run structured_outputs.py --stream

Run certain constraints, for example structural_tag and regex, streaming:

uv run structured_outputs.py \
    --constraint structural_tag regex \
    --stream

Run all constraints, with reasoning models and streaming:

uv run structured_outputs.py --reasoning --stream