Chauncey
c02fccdbd2
[Refactor] Lazy import tool_parser ( #27974 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-11-04 10:10:10 +08:00
Misha Efimov
ba464e6ae2
Add ORCA endpoint load metrics support ( #24905 )
...
Signed-off-by: Misha Efimov <mef@google.com>
2025-11-03 08:21:31 +00:00
Benjamin Bartels
1e88fb751b
Adds anthropic /v1/messages endpoint to openai api_server ( #27882 )
...
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
2025-11-01 12:45:42 -07:00
Nick Hill
9e5bd3076e
[Cleanup] Remove no-longer-used SpeculativeConfig.enable_chunked_prefill ( #27826 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-31 10:57:45 -07:00
wang.yuqi
4464723f22
[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. ( #25524 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-10-30 12:13:05 +00:00
Cyrus Leung
f58d9b6404
[Misc] Separate out utils.counter and move utils.Device to engine ( #27588 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-28 12:20:46 +00:00
Cyrus Leung
6ebffafbb6
[Misc] Clean up more utils ( #27567 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-27 15:30:38 +00:00
Cyrus Leung
7c2bdb83dc
[Misc] Clean up utils ( #27552 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-27 09:05:40 +00:00
wang.yuqi
3fa2c12185
[Frontend][4/N] Improve all pooling task | Add plugin pooling task ( #26973 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com>
2025-10-23 14:46:18 +00:00
dongbo910220
a0003b56b0
[Chore] Separate out system utilities from vllm.utils ( #27201 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-10-22 20:25:25 +00:00
RED
c9461e05a4
Support Anthropic API /v1/messages Endpoint ( #22627 )
...
Signed-off-by: liuli <ll407707@alibaba-inc.com>
Co-authored-by: liuli <ll407707@alibaba-inc.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-10-22 09:13:18 -07:00
wang.yuqi
1f633b8632
[Frontend][3/N] Improve all pooling task | Support binary embedding response ( #27066 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-10-22 18:38:57 +08:00
iAmir97
7a6c8c3fa1
[Chore] Separate out vllm.utils.network_utils ( #27164 )
...
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com>
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>
2025-10-19 03:06:32 -07:00
Cyrus Leung
b3aba04e5a
[Benchmark] Convenience script for multiple parameter combinations ( #27085 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-18 23:57:01 -07:00
wang.yuqi
f54f85129e
[Model][2/N] Improve all pooling task | Support multi-vector retrieval ( #25370 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-10-15 11:14:41 +00:00
Max Wittig
fd85c9f426
[Bugfix][FE]: Always include usage with --enable-force-include-usage ( #20983 )
...
Signed-off-by: Max Wittig <max.wittig@siemens.com>
Signed-off-by: Antoine Auger <antoineauger@users.noreply.github.com>
Co-authored-by: Antoine Auger <antoineauger@users.noreply.github.com>
2025-10-14 09:17:39 +02:00
Lucia Fang
8317f72354
[Misc][DP] support customized aggregated logger for dp ( #24354 )
...
Signed-off-by: Lu Fang <fanglu@fb.com>
2025-10-13 17:45:59 -07:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y ( #26633 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Cyrus Leung
4bdf7ac593
[Bugfix] Fix SHM cache initialization ( #26427 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-09 02:48:04 -07:00
Harry Mellor
4e256cadc2
Remove all references to yapf as it's no longer used ( #26251 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 09:18:11 -07:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Isotr0py
a42d2df75f
[Frontend] Cache chat template kwargs resolution ( #26227 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-04 15:32:30 +00:00
Russell Bryant
7977e5027c
Add filtering for chat template kwargs ( #25794 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-27 10:46:49 +00:00
Russell Bryant
3f5d902d2a
Validate API tokens in constant time ( #25781 )
...
Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com>
2025-09-27 18:09:26 +08:00
wang.yuqi
7f570f1caa
[V0 deprecation] Remove unreachable model_config.supported_tasks ( #25642 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-09-25 11:26:31 +00:00
Cyrus Leung
6c117cff7d
[Frontend] Pass API server count to each process ( #23717 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-20 01:15:19 +08:00
Woosuk Kwon
e19bce40a1
[V0 Deprecation] Remove AsyncLLMEngine ( #25025 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-18 11:07:42 -07:00
dongbo910220
67244c86f0
feat(api): Return 503 on /health when engine is dead ( #24897 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-09-18 14:29:40 +00:00
Aaron Pham
29283e8976
[Chore] Cleanup guided namespace, move to structured outputs config ( #22772 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-18 09:20:27 +00:00
Andrew Xia
bff2e5f1d6
[gpt-oss][2] fix types for streaming ( #24556 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-17 22:04:28 +00:00
Woosuk Kwon
5801e49776
[V0 Deprecation] Remove MQLLMEngine ( #25019 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-16 21:29:27 -07:00
Andrew Xia
73df49ef3a
[gpt-oss][1a] create_responses stream outputs BaseModel type, api server is SSE still ( #24759 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-15 13:08:08 -07:00
Chen Zhang
1116590b16
[gpt-oss] Validate gpt-oss python tool during initialization ( #23856 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-09-09 08:37:48 +00:00
wuhang
a38f8bd54c
[Feature][Responses API]Support MCP tools with streaming mode + background mode ( #23927 )
...
Signed-off-by: wuhang <wuhang6@huawei.com>
2025-09-04 04:05:10 +00:00
Christian Pinto
1cb39dbcdd
[Misc] IO Processor plugins for pooling models ( #22820 )
...
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
2025-08-31 23:07:12 -07:00
wang.yuqi
d9e00dbd1f
[Performance] V1 Classify Models E2E Performance Optimization ( #23541 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-29 03:12:32 -07:00
Didier Durand
d3da2eea54
[Doc]: fix typos in Python scripts ( #23828 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-08-28 05:37:38 -07:00
Chen Zhang
3210264421
[Frontend] Add --log-error-stack to print stack trace for error response ( #22960 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-27 04:58:59 +00:00
Matúš Námešný
384dd1b0a8
[Bugfix] Add missing enable_log_outputs parameter to init_app_state function ( #23634 )
...
Signed-off-by: Matúš Námešný <matus.namesny@ameria.com>
2025-08-26 12:13:15 +00:00
Chen Zhang
b95697d731
[Frontend] improve error logging of chat completion ( #22957 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-20 13:03:37 -07:00
Russell Bryant
f77a0802b7
Limit HTTP header count and size ( #23267 )
...
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
2025-08-20 17:57:37 +00:00
22quinn
f7cf5b512e
[Frontend] Add /collective_rpc API endpoint ( #23075 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-08-19 17:29:32 +00:00
Csrayz
a0632a3e03
[Frontend] Expose do_log_stats interval to env ( #22905 )
...
Signed-off-by: Csrayz <jover@cmbchina.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-15 13:00:20 +00:00
yyweiss
baece8c3d2
[Frontend] Add unix domain socket support ( #18097 )
...
Signed-off-by: <yyweiss@gmail.com>
Signed-off-by: yyw <yyweiss@gmail.com>
2025-08-08 16:23:44 -07:00
Chen Zhang
fe6d8257a1
[gpt-oss] Support tool call and implement MCP tool server ( #22427 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-08 15:06:37 -07:00
Moritz Sanft
370661856b
[Frontend] Update OpenAI error response to upstream format ( #22099 )
...
Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>
2025-08-06 23:06:00 -07:00
Lionel Villard
ad6c655dde
preload heavy modules when mp method is forkserver ( #22214 )
...
Signed-off-by: Lionel Villard <villard@us.ibm.com>
2025-08-06 20:33:24 -07:00
Chen Zhang
19c9365aa4
[gpt-oss] add demo tool server ( #22393 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-06 17:47:14 -07:00
Nick Hill
8d524ce79f
[BugFix] Improve internal DP load balancing ( #21617 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-08-01 19:45:27 -07:00
Harry Mellor
2d7b09b998
Deprecate --disable-log-requests and replace with --enable-log-requests ( #21739 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-01 17:16:37 +01:00