Harry Mellor
44be2b7349
Make mypy behave like a proper pre-commit hook ( #25313 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:53 -07:00
WeiQing Chen
1b3aa0f297
[Bugfix] Fix hermes tool parser handling of non-string argument types ( #22002 )
...
Signed-off-by: wangzi <3220100013@zju.edu.cn>
Signed-off-by: David Chen <530634352@qq.com>
Co-authored-by: wangzi <3220100013@zju.edu.cn>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:53 -07:00
Woosuk Kwon
71f2b5ddea
[V0 Deprecation] Remove async_output_proc, preemption mode, delay factor ( #25334 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:53 -07:00
Cyrus Leung
a31d353b71
[Optimization] Cache chat template result when processor fails to be loaded ( #25341 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:53 -07:00
Woosuk Kwon
32d43a5a9e
[V0 Deprecation] Remove LLMEngine ( #25033 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:53 -07:00
Cyrus Leung
e33af1e0c2
[V1] Support LLM.apply_model ( #18465 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:53 -07:00
Chauncey
239aef5c9f
[Bugfix] fix tool call arguments is empty ( #25223 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: xin.li <xin.li@daocloud.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:53 -07:00
Maximilien de Bayser
937ab7e85e
Don't skip special tokens with hermes-style tool calling ( #25281 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:53 -07:00
Alec S
8da7b98366
[Frontend] Responses API messages out, just harmony for now ( #24985 )
...
Signed-off-by: Alec Solder <alecs@fb.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 13:35:53 -07:00
Cyrus Leung
6c117cff7d
[Frontend] Pass API server count to each process ( #23717 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-20 01:15:19 +08:00
Harry Mellor
058525b997
Move PoolerConfig from config/__init__.py to config/pooler.py ( #25181 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-19 11:02:55 +00:00
Andrew Xia
6d8246aaff
[gpt-oss] Add ResponseReasoningPartAddedEvent, ResponseReasoningPartDoneEvent for streaming ( #24938 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-18 19:11:59 -07:00
Andrew Sansom
9a4600e4dc
[CORE] Prompt Embeddings Support for v1 Engine ( #24278 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <qthequartermasterman@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-09-19 08:03:09 +08:00
Woosuk Kwon
e19bce40a1
[V0 Deprecation] Remove AsyncLLMEngine ( #25025 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 11:07:42 -07:00
Hyogeun Oh (오효근)
b419937c78
[Docs] Fix warnings in mkdocs build (continued) ( #25163 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-09-18 08:23:26 -07:00
wang.yuqi
5f696c33b1
[New Model] Support BertForTokenClassification / Named Entity Recognition (NER) task ( #24872 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-18 23:22:01 +08:00
dongbo910220
67244c86f0
feat(api): Return 503 on /health when engine is dead ( #24897 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-09-18 14:29:40 +00:00
Chauncey
cc935fdd7e
[Frontend] Support setting logprobs to -1 ( #25031 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-09-18 10:34:42 +00:00
Aaron Pham
29283e8976
[Chore] Cleanup guided namespace, move to structured outputs config ( #22772 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-18 09:20:27 +00:00
Simon Mo
e111d5b0ae
[CLI] Use streaming in CLI chat and completion commands ( #23769 )
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-09-17 22:30:26 -07:00
Andrew Sansom
bec060fd99
Mark prompt logprobs as incompatible with prompt embeds at API level ( #25077 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
2025-09-17 21:25:07 -07:00
Andrew Xia
bff2e5f1d6
[gpt-oss][2] fix types for streaming ( #24556 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-17 22:04:28 +00:00
Shijun Yin
2b85697031
[BugFix] enable DOTALL to match multi-line tool_call parameters in extract_tool_call_required_streaming ( #24668 )
...
Signed-off-by: Shijun Yin <shijun.yin@outlook.com>
2025-09-17 09:21:18 +00:00
Chauncey
544fe76b95
[Frontend] Support returning all prompt logprobs ( #24956 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-09-17 09:03:52 +00:00
Zhuohan Li
6c47f6bfa4
[Core] Remove tokenizer group in vLLM ( #24078 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
2025-09-17 08:42:59 +00:00
Woosuk Kwon
5801e49776
[V0 Deprecation] Remove MQLLMEngine ( #25019 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-16 21:29:27 -07:00
Prashant Gupta
ea3de5ef0d
[misc] fix typo in value error ( #24995 )
...
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
2025-09-16 20:58:38 -07:00
Andrew Xia
86daa875fe
[gpt-oss][1][bugfix] fix streaming final output ( #24466 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-16 13:56:16 -06:00
Andrew Xia
f4d6eb95cf
[gpt-oss][1b] streaming add item id, content id ( #24788 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-16 18:41:12 +00:00
Cheng Kuan Yong Jason
68dbde5dbb
[Bugfix] remove duplicate tokens streamed in required tool choice streaming ( #23312 )
...
Signed-off-by: Jason Cheng <jasoncky96@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-09-16 15:16:32 +08:00
Andrew Xia
73df49ef3a
[gpt-oss][1a] create_responses stream outputs BaseModel type, api server is SSE still ( #24759 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-15 13:08:08 -07:00
Andrew Xia
25aba2b6a3
[gpt-oss] Add IncompleteDetails to ResponsesRepsonse ( #24561 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-15 13:07:55 -07:00
Harry Mellor
c4afdb69cc
Move MultiModalConfig from config/__init__.py to config/multimodal.py ( #24659 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-15 17:43:16 +00:00
Chenheli Hua
7f2ea7074e
[Frontend][Multimodal] Allow skipping media data when UUIDs are provided. ( #23950 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-09-13 02:16:06 +00:00
Kebe
684b6870e1
[Bugfix][Frontend] Fix --enable-log-outputs does not match the documentation ( #24626 )
...
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-09-12 08:01:24 -07:00
Didier Durand
bcb06d7baf
[Doc]: fix typos in various files ( #24726 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-12 06:43:12 -07:00
Maximilien de Bayser
e090b7b45b
Enable conversion of multimodal models to pooling tasks ( #24451 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-09-12 03:30:41 +00:00
Nick Hill
b971f91504
[BugFix] Fix tokenize asyncio task leak ( #24677 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-09-11 19:44:04 +00:00
Harry Mellor
c1eda615ba
Fix model name included in responses ( #24663 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-11 10:47:51 -07:00
Boyuan Feng
94e6b2d55f
Allow users to specify kv cache memory size ( #21489 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-11 13:41:07 +00:00
Didier Durand
e2b1f863aa
[Doc]: fixing doc typos ( #24635 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-10 23:19:28 -07:00
rongfu.leng
09e68bce34
[Misc] update log level debug to warning when process port is used by ( #24226 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-09-10 11:32:57 -07:00
co63oc
3144d90217
fix some typos ( #24167 )
...
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-09-10 06:21:23 -07:00
Flora Feng
77f62613f9
Consolidate rendering parameters into RenderConfig dataclass ( #24543 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2025-09-10 08:44:47 +00:00
Flora Feng
15cb047e25
Extend renderer with embedding support and integrate completion endpoint ( #24405 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2025-09-10 01:46:46 +08:00
Chen Zhang
1116590b16
[gpt-oss] Validate gpt-oss python tool during initialization ( #23856 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-09-09 08:37:48 +00:00
Harry Mellor
3e0d4a3475
Move KVTransferConfig from config/__init__.py to config/kv_transfer.py ( #24434 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-08 20:30:32 -07:00
cjackal
13b89bd823
[doc] update vllm serve cli args documentation ( #24329 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
2025-09-09 03:07:58 +00:00
zhiweiz
170129eb28
[gpt-oss] Harmony changes with container tool support ( #23386 )
...
Signed-off-by: zhiweiz <zhiweiz@fb.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: zhiweiz <zhiweiz@fb.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
2025-09-08 19:03:50 -07:00
Chenheli Hua
01dfb5e982
[Frontend] User-provided uuids for medias in chat. (RFC #22044 ) ( #23449 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-09-08 06:42:20 -07:00