mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2026-05-23 18:44:28 +08:00
[Doc] Update several links in reasoning_outputs.md (#17846)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
This commit is contained in:
parent
3d1e387652
commit
89a0315f4c
@ -17,7 +17,9 @@ vLLM currently supports the following reasoning models:
|
|||||||
| [IBM Granite 3.2 language models](https://huggingface.co/collections/ibm-granite/granite-32-language-models-67b3bc8c13508f6d064cff9a) | `granite` | ❌ | ❌ |
|
| [IBM Granite 3.2 language models](https://huggingface.co/collections/ibm-granite/granite-32-language-models-67b3bc8c13508f6d064cff9a) | `granite` | ❌ | ❌ |
|
||||||
| [Qwen3 series](https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f) | `qwen3` | `guided_json`, `guided_regex` | ✅ |
|
| [Qwen3 series](https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f) | `qwen3` | `guided_json`, `guided_regex` | ✅ |
|
||||||
|
|
||||||
- IBM Granite 3.2 reasoning is disabled by default; to enable it, you must also pass `thinking=True` in your `chat_template_kwargs`.
|
:::{note}
|
||||||
|
IBM Granite 3.2 reasoning is disabled by default; to enable it, you must also pass `thinking=True` in your `chat_template_kwargs`.
|
||||||
|
:::
|
||||||
|
|
||||||
## Quickstart
|
## Quickstart
|
||||||
|
|
||||||
@ -83,7 +85,7 @@ Streaming chat completions are also supported for reasoning models. The `reasoni
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
OpenAI Python client library does not officially support `reasoning_content` attribute for streaming output. But the client support extra attributes in the response. You can use `hasattr` to check if the `reasoning_content` attribute is present in the response. For example:
|
OpenAI Python client library does not officially support `reasoning_content` attribute for streaming output. But the client supports extra attributes in the response. You can use `hasattr` to check if the `reasoning_content` attribute is present in the response. For example:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from openai import OpenAI
|
from openai import OpenAI
|
||||||
@ -221,7 +223,7 @@ print(f"Function called: {tool_call.name}")
|
|||||||
print(f"Arguments: {tool_call.arguments}")
|
print(f"Arguments: {tool_call.arguments}")
|
||||||
```
|
```
|
||||||
|
|
||||||
For more examples, please refer to <gh-file:examples/online_serving/openai_chat_completion_tool_calls_with_reasoning.py> .
|
For more examples, please refer to <gh-file:examples/online_serving/openai_chat_completion_tool_calls_with_reasoning.py>.
|
||||||
|
|
||||||
## Limitations
|
## Limitations
|
||||||
|
|
||||||
@ -229,7 +231,7 @@ For more examples, please refer to <gh-file:examples/online_serving/openai_chat_
|
|||||||
|
|
||||||
## How to support a new reasoning model
|
## How to support a new reasoning model
|
||||||
|
|
||||||
You can add a new `ReasoningParser` similar to `vllm/entrypoints/openai/reasoning_parsers/deepseek_r1_reasoning_parser.py`.
|
You can add a new `ReasoningParser` similar to <gh-file:vllm/entrypoints/openai/reasoning_parsers/deepseek_r1_reasoning_parser.py>.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# import the required packages
|
# import the required packages
|
||||||
@ -286,7 +288,7 @@ class ExampleParser(ReasoningParser):
|
|||||||
"""
|
"""
|
||||||
```
|
```
|
||||||
|
|
||||||
Additionally, to enable structured output, you'll need to create a new `Reasoner` similar to the one in `vllm/model_executor/guided_decoding/reasoner/deepseek_reasoner.py`.
|
Additionally, to enable structured output, you'll need to create a new `Reasoner` similar to the one in <gh-file:vllm/model_executor/guided_decoding/reasoner/deepseek_reasoner.py>.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@dataclass
|
@dataclass
|
||||||
@ -312,7 +314,7 @@ class DeepSeekReasoner(Reasoner):
|
|||||||
...
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
The structured output engine like `xgrammar` will use `end_token_id` to check if the reasoning content is present in the model output and skip the structured output if it is the case.
|
The structured output engine like [xgrammar](https://github.com/mlc-ai/xgrammar) will use `end_token_id` to check if the reasoning content is present in the model output and skip the structured output if it is the case.
|
||||||
|
|
||||||
Finally, you can enable reasoning for the model by using the `--reasoning-parser` flags.
|
Finally, you can enable reasoning for the model by using the `--reasoning-parser` flags.
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user