Russell Bryant
1cb6005627
Add filtering for chat template kwargs ( #25794 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Russell Bryant
3e7f33c801
Validate API tokens in constant time ( #25781 )
...
Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Russell Bryant
dbb7782d5b
Add option to restrict media domains ( #25783 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:56 -07:00
Xu Wenqing
ced693e845
Support LongCat-Flash-Chat tool call ( #24083 )
...
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Harry Mellor
22114ffebb
Add backward compatibility for guided_... API ( #25615 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
wang.yuqi
f3d9099b44
[V0 deprecation] Remove unreachable model_config.supported_tasks ( #25642 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Harry Mellor
7319686692
Improve --help for enhanced user experience ( #24903 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Tao Hui
71566e8afc
[Bugfix] Fix DeepSeekV31ToolParser to correctly parse multiple tools in non-streaming output ( #25405 )
...
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:55 -07:00
Woosuk Kwon
9914857f2b
[V0 Deprecation] Remove max_seq_len_to_capture ( #25543 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Ben Browning
d7fb5a4ae8
[Bugfix] [Frontend] Cleanup gpt-oss non-streaming chat tool calls ( #25514 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Andrew Xia
eb1f43bc82
[gpt-oss][bugfix] remove logic to require resp_ in ResponseAPI ( #25428 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Alec S
d8fda7420a
[Bugfix] gpt-oss container tool output bug ( #25485 )
...
Signed-off-by: Alec Solder <alecs@fb.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Zhikaiiii
c4a15ee240
[Frontend] Add a new xml-based tool parser for qwen3-coder ( #25028 )
...
Signed-off-by: Zhikaiiii <1658973216@qq.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:54 -07:00
Alec S
aac85cc6d6
[Frontend] Responses API MCP tools for built in tools and to pass through headers ( #24628 )
...
Signed-off-by: Alec Solder <alecs@fb.com>
Signed-off-by: Alec S <10566873+alecsolder@users.noreply.github.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:53 -07:00
Harry Mellor
44be2b7349
Make mypy behave like a proper pre-commit hook ( #25313 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:53 -07:00
WeiQing Chen
1b3aa0f297
[Bugfix] Fix hermes tool parser handling of non-string argument types ( #22002 )
...
Signed-off-by: wangzi <3220100013@zju.edu.cn>
Signed-off-by: David Chen <530634352@qq.com>
Co-authored-by: wangzi <3220100013@zju.edu.cn>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:53 -07:00
Woosuk Kwon
71f2b5ddea
[V0 Deprecation] Remove async_output_proc, preemption mode, delay factor ( #25334 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:53 -07:00
Cyrus Leung
a31d353b71
[Optimization] Cache chat template result when processor fails to be loaded ( #25341 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:53 -07:00
Woosuk Kwon
32d43a5a9e
[V0 Deprecation] Remove LLMEngine ( #25033 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:53 -07:00
Cyrus Leung
e33af1e0c2
[V1] Support LLM.apply_model ( #18465 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:53 -07:00
Chauncey
239aef5c9f
[Bugfix] fix tool call arguments is empty ( #25223 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: xin.li <xin.li@daocloud.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:53 -07:00
Maximilien de Bayser
937ab7e85e
Don't skip special tokens with hermes-style tool calling ( #25281 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:53 -07:00
Alec S
8da7b98366
[Frontend] Responses API messages out, just harmony for now ( #24985 )
...
Signed-off-by: Alec Solder <alecs@fb.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:53 -07:00
Cyrus Leung
6c117cff7d
[Frontend] Pass API server count to each process ( #23717 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-20 01:15:19 +08:00
Harry Mellor
058525b997
Move PoolerConfig from config/__init__.py to config/pooler.py ( #25181 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-19 11:02:55 +00:00
Andrew Xia
6d8246aaff
[gpt-oss] Add ResponseReasoningPartAddedEvent, ResponseReasoningPartDoneEvent for streaming ( #24938 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-18 19:11:59 -07:00
Andrew Sansom
9a4600e4dc
[CORE] Prompt Embeddings Support for v1 Engine ( #24278 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <qthequartermasterman@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-09-19 08:03:09 +08:00
Woosuk Kwon
e19bce40a1
[V0 Deprecation] Remove AsyncLLMEngine ( #25025 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 11:07:42 -07:00
Hyogeun Oh (오효근)
b419937c78
[Docs] Fix warnings in mkdocs build (continued) ( #25163 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-09-18 08:23:26 -07:00
wang.yuqi
5f696c33b1
[New Model] Support BertForTokenClassification / Named Entity Recognition (NER) task ( #24872 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-18 23:22:01 +08:00
dongbo910220
67244c86f0
feat(api): Return 503 on /health when engine is dead ( #24897 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-09-18 14:29:40 +00:00
Chauncey
cc935fdd7e
[Frontend] Support setting logprobs to -1 ( #25031 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-09-18 10:34:42 +00:00
Aaron Pham
29283e8976
[Chore] Cleanup guided namespace, move to structured outputs config ( #22772 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-18 09:20:27 +00:00
Simon Mo
e111d5b0ae
[CLI] Use streaming in CLI chat and completion commands ( #23769 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-09-17 22:30:26 -07:00
Andrew Sansom
bec060fd99
Mark prompt logprobs as incompatible with prompt embeds at API level ( #25077 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
2025-09-17 21:25:07 -07:00
Andrew Xia
bff2e5f1d6
[gpt-oss][2] fix types for streaming ( #24556 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-17 22:04:28 +00:00
Shijun Yin
2b85697031
[BugFix] enable DOTALL to match multi-line tool_call parameters in extract_tool_call_required_streaming ( #24668 )
...
Signed-off-by: Shijun Yin <shijun.yin@outlook.com>
2025-09-17 09:21:18 +00:00
Chauncey
544fe76b95
[Frontend] Support returning all prompt logprobs ( #24956 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-09-17 09:03:52 +00:00
Zhuohan Li
6c47f6bfa4
[Core] Remove tokenizer group in vLLM ( #24078 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
2025-09-17 08:42:59 +00:00
Woosuk Kwon
5801e49776
[V0 Deprecation] Remove MQLLMEngine ( #25019 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-16 21:29:27 -07:00
Prashant Gupta
ea3de5ef0d
[misc] fix typo in value error ( #24995 )
...
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
2025-09-16 20:58:38 -07:00
Andrew Xia
86daa875fe
[gpt-oss][1][bugfix] fix streaming final output ( #24466 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-16 13:56:16 -06:00
Andrew Xia
f4d6eb95cf
[gpt-oss][1b] streaming add item id, content id ( #24788 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-16 18:41:12 +00:00
Cheng Kuan Yong Jason
68dbde5dbb
[Bugfix] remove duplicate tokens streamed in required tool choice streaming ( #23312 )
...
Signed-off-by: Jason Cheng <jasoncky96@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-09-16 15:16:32 +08:00
Andrew Xia
73df49ef3a
[gpt-oss][1a] create_responses stream outputs BaseModel type, api server is SSE still ( #24759 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-15 13:08:08 -07:00
Andrew Xia
25aba2b6a3
[gpt-oss] Add IncompleteDetails to ResponsesRepsonse ( #24561 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-15 13:07:55 -07:00
Harry Mellor
c4afdb69cc
Move MultiModalConfig from config/__init__.py to config/multimodal.py ( #24659 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-15 17:43:16 +00:00
Chenheli Hua
7f2ea7074e
[Frontend][Multimodal] Allow skipping media data when UUIDs are provided. ( #23950 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-09-13 02:16:06 +00:00
Kebe
684b6870e1
[Bugfix][Frontend] Fix --enable-log-outputs does not match the documentation ( #24626 )
...
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-09-12 08:01:24 -07:00
Didier Durand
bcb06d7baf
[Doc]: fix typos in various files ( #24726 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-12 06:43:12 -07:00