Chauncey
df850c4912
[Feature][Responses API] Stream Function Call - harmony ( #24317 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-10-14 08:31:43 -07:00
Chauncey
780eb03d9b
[CI] Fix test_tool_id_kimi_k2 ( #26787 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-10-14 10:27:07 +00:00
Max Wittig
fd85c9f426
[Bugfix][FE]: Always include usage with --enable-force-include-usage ( #20983 )
...
Signed-off-by: Max Wittig <max.wittig@siemens.com>
Signed-off-by: Antoine Auger <antoineauger@users.noreply.github.com>
Co-authored-by: Antoine Auger <antoineauger@users.noreply.github.com>
2025-10-14 09:17:39 +02:00
Jialin Ouyang
35bc22f23c
[ResponseAPI] Further polish message serialization and unit tests ( #26728 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-10-13 23:31:35 +00:00
wang.yuqi
d2a7938582
[Frontend][1/N] Improve all pooling task | Support FP16 Embedding Base64 (Still uses fp32 by default). ( #26414 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-10-13 19:06:43 +00:00
Jialin Ouyang
4073c82c4e
[ResponseAPI] Simplify input/output message serialization ( #26620 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-10-13 09:59:15 +00:00
wang.yuqi
767c3ab869
[Model][0/N] Improve all pooling task | clean up ( #25817 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-10-13 16:44:50 +08:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y ( #26633 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Chauncey
910abdbd08
[Bugfix] fixed top_logprobs: -1 does not appear to work as intended ( #26470 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-10-11 00:41:17 +08:00
Mark McLoughlin
e519281920
[Metrics] Add test for multi-modal cache stats logging ( #26588 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-10-10 16:00:50 +00:00
Chauncey
1e6848a65d
[CI] fix test_run_batch.py::test_completions - AssertionError ( #26578 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-10-10 22:16:28 +08:00
Chauncey
720d3cd0f0
[CI] fix ruff format ( #26579 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-10-10 03:02:12 -07:00
Ashwin Phadke
ab196edefb
Remove LoRA bias support ( #25807 )
...
Signed-off-by: Ashwin Phadke <ashwinphadke12@rediffmail.com>
Signed-off-by: Ashwin Phadke <23502062+ashwin-phadke@users.noreply.github.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-10-10 09:50:33 +00:00
Luis Tomas Bolivar
3ee202ea1e
[GPT-OSS] Add support for arrays at tool message content ( #25593 )
...
Signed-off-by: Luis Tomas Bolivar <ltomasbo@redhat.com>
2025-10-10 09:00:45 +00:00
Cyrus Leung
ad430a67ca
[Metrics] Log multi-modal cache stats and fix reset ( #26285 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-10 01:45:55 -07:00
Ben Browning
da4455609d
[Chore]: One pythonic tool parser test uses the wrong parser ( #26515 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-10-10 04:03:55 +00:00
Julien Denize
c6187f55f7
Refactor MistralTokenizer ( #26358 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
2025-10-09 22:48:58 +00:00
Cyrus Leung
4bdf7ac593
[Bugfix] Fix SHM cache initialization ( #26427 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-09 02:48:04 -07:00
Cyrus Leung
dc7976dd9f
[Misc] Upgrade more code to Python 3.10 ( #26463 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-09 10:43:53 +01:00
Thomas Parnell
31a4b3e6c4
Revert #24446 and #26168 ( #26332 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-10-07 16:38:19 -06:00
Cyrus Leung
1e4ecca1d0
[V0 Deprecation] Remove VLLM_USE_V1 from tests ( #26341 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-07 15:42:31 +00:00
Andrew Xia
185d8ed44f
[responsesAPI][bugfix] serialize harmony messages ( #26185 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-10-07 07:07:53 +00:00
Harry Mellor
6c04638214
Fix per file ruff ignores related to line length ( #26262 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-06 05:12:40 +00:00
wuhang
91ac7f764d
[CI][gpt-oss] Enable python tool tests in CI ( #24315 )
...
Signed-off-by: wuhang <wuhang6@huawei.com>
2025-10-06 04:20:06 +00:00
Harry Mellor
1c0c68202c
Fix per file ruff ignores related to typing ( #26254 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 16:37:55 +00:00
Harry Mellor
4e256cadc2
Remove all references to yapf as it's no longer used ( #26251 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 09:18:11 -07:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Cyrus Leung
a964e5e6c3
[Bugfix] Allow --skip-tokenizer-init with echo and return_token_ids ( #26238 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-05 05:38:53 +00:00
Cyrus Leung
119f00630b
[Renderer] Clean up renderer code ( #26216 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-04 17:05:29 +00:00
Yannick Schnider
f05fea1f5e
[Core] Enable decode of context length equal to max model length ( #26168 )
...
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
2025-10-04 09:59:26 +00:00
Ben Browning
ea25a76c05
[BugFix] Use async Mistral Tokenizer in Chat Completions ( #26134 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-10-04 09:42:08 +08:00
Andrew Xia
831b124151
[responsesAPI] add better error messaging for long prompts ( #25724 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-10-03 14:33:13 -07:00
Yang Liu
812b7f54a8
[Renderer] Move Processor out of AsyncLLM ( #24138 )
...
Signed-off-by: Yang <lymailforjob@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-03 11:29:45 +00:00
kyt
2ed3f20dba
[openai] Fix missing tool usage check (system message) ( #24768 )
...
Signed-off-by: kyt <eluban4532@gmail.com>
2025-10-03 18:55:44 +08:00
HUIJONG JEONG
3e70e3d4d5
add(v1): RequestStatesStats to RequestOutput ( #24947 )
...
Signed-off-by: huijjj <huijong.jeong@squeezebits.com>
2025-10-03 08:56:25 +00:00
Andrew Xia
e5017cd6d6
[gpt-oss] disable tool server initialization if no tool in request ( #25790 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-10-03 05:08:35 +00:00
Andrew Xia
5db1870bb9
[gpt-oss] use vLLM instead of openai types for streaming ( #25186 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-09-30 22:47:07 +00:00
Cyrus Leung
2f652e6cdf
[Doc] Improve MM Pooling model documentation ( #25966 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-30 18:58:29 +00:00
Andrew Sansom
78a47f87ce
Test Prompt Embeds/LoRA compatibility and Enable LoRA Support for OPT Models ( #25717 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
2025-09-30 08:10:58 +08:00
Russell Bryant
7977e5027c
Add filtering for chat template kwargs ( #25794 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-27 10:46:49 +00:00
Russell Bryant
3958b96bf5
Add option to restrict media domains ( #25783 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Chenheli Hua <huachenheli@outlook.com>
2025-09-27 01:23:52 +00:00
Matthew Bonanni
3468f17ebe
[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names ( #25489 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-09-25 17:37:50 +00:00
Cyrus Leung
0bcc3a160d
[CI/Build] Fix flaky entrypoints test ( #25663 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-25 12:19:40 +00:00
Ben Browning
5caaeb714c
[Bugfix] [Frontend] Cleanup gpt-oss non-streaming chat tool calls ( #25514 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-09-24 03:20:38 +00:00
Andrew Xia
95bc60e4cb
[gpt-oss][bugfix] remove logic to require resp_ in ResponseAPI ( #25428 )
...
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-23 15:46:46 -07:00
Andreas Hartel
4322c553a6
[Test]: Hermes tool parser stream output error in Qwen3 case ( #25203 )
...
Signed-off-by: Andreas Hartel <andreas.hartel@aleph-alpha.com>
2025-09-23 17:56:31 +08:00
Alec S
45d7d852d3
[Frontend] Responses API MCP tools for built in tools and to pass through headers ( #24628 )
...
Signed-off-by: Alec Solder <alecs@fb.com>
Signed-off-by: Alec S <10566873+alecsolder@users.noreply.github.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-09-22 23:38:19 +00:00
WeiQing Chen
0eecb31663
[Bugfix] Fix hermes tool parser handling of non-string argument types ( #22002 )
...
Signed-off-by: wangzi <3220100013@zju.edu.cn>
Signed-off-by: David Chen <530634352@qq.com>
Co-authored-by: wangzi <3220100013@zju.edu.cn>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-09-22 11:35:39 +08:00
Yang Liu
04d3752329
[Bugfix][V0 Deprecation][CI] use async mock and await for async method ( #25325 )
...
Signed-off-by: Yang <lymailforjob@gmail.com>
2025-09-22 07:06:16 +08:00
Woosuk Kwon
72dd1595b4
[CI] Skip tests failing on main ( #25326 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-20 19:57:46 -07:00