548 Commits

Author SHA1 Message Date
Woosuk Kwon
5c65a72bb1
[V0 Deprecation] Remove more V0 tests (#25117)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-17 22:05:25 -07:00
Andrew Sansom
bec060fd99
Mark prompt logprobs as incompatible with prompt embeds at API level (#25077)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
2025-09-17 21:25:07 -07:00
Chauncey
544fe76b95
[Frontend] Support returning all prompt logprobs (#24956)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-09-17 09:03:52 +00:00
Zhuohan Li
6c47f6bfa4
[Core] Remove tokenizer group in vLLM (#24078)
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
2025-09-17 08:42:59 +00:00
Woosuk Kwon
5801e49776
[V0 Deprecation] Remove MQLLMEngine (#25019)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-16 21:29:27 -07:00
Andrew Xia
86daa875fe
[gpt-oss][1][bugfix] fix streaming final output (#24466)
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-16 13:56:16 -06:00
Andrew Xia
f4d6eb95cf
[gpt-oss][1b] streaming add item id, content id (#24788)
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-16 18:41:12 +00:00
Woosuk Kwon
759ef49b15
Remove V0 Encoder-Decoder Support (#24907)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-15 21:17:14 -07:00
Mark McLoughlin
2942970d44
[Metrics] Hide deprecated metrics with gpu_ prefix (#24245)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-09-15 20:15:57 -06:00
Andrew Xia
25aba2b6a3
[gpt-oss] Add IncompleteDetails to ResponsesRepsonse (#24561)
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-15 13:07:55 -07:00
Harry Mellor
c4afdb69cc
Move MultiModalConfig from config/__init__.py to config/multimodal.py (#24659)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-15 17:43:16 +00:00
Chenheli Hua
7f2ea7074e
[Frontend][Multimodal] Allow skipping media data when UUIDs are provided. (#23950)
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-09-13 02:16:06 +00:00
Wenlong Wang
72fc8aa412
[Multi Modal] Add FA3 in VIT (#24347)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-09-12 21:27:24 +08:00
Harry Mellor
c1eda615ba
Fix model name included in responses (#24663)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-11 10:47:51 -07:00
wang.yuqi
fd1ce98cdd
[CI] Split mteb test from Language Models Test (#24634)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-09-11 06:37:51 -07:00
wang.yuqi
a8b0361c92
[CI] Split pooling from entrypoints Test (#24632)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-09-11 01:53:09 -07:00
Russell Bryant
37e8182bfe
[v1] Add Whisper model support (encoder-decoder) (#21088)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
2025-09-10 13:53:35 -07:00
Ye (Charlotte) Qi
492196ed0e
[CI/Build] split true unit tests to Entrypoints Unit Tests (#24418)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-09-10 06:16:07 -07:00
lacora
0b9a612fa3
[BugFix][easy] Fix flaky test test_gpt_oss_multi_turn_chat (#24549)
Signed-off-by: lacora2017 <yehu@meta.com>
Co-authored-by: lacora2017 <yehu@meta.com>
2025-09-10 21:14:55 +08:00
Flora Feng
77f62613f9
Consolidate rendering parameters into RenderConfig dataclass (#24543)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2025-09-10 08:44:47 +00:00
Flora Feng
15cb047e25
Extend renderer with embedding support and integrate completion endpoint (#24405)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2025-09-10 01:46:46 +08:00
Didier Durand
46876dff32
[Doc]: fixing typos to improve docs (#24480)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-08 23:06:04 -07:00
Chenheli Hua
01dfb5e982
[Frontend] User-provided uuids for medias in chat. (RFC #22044) (#23449)
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-09-08 06:42:20 -07:00
Christian Pinto
9cd76b71ab
[Misc] Terratorch related fixes (#24337)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-09-08 06:40:26 -07:00
Flora Feng
0661cb9df3
Add renderer-based prompt processing for embedding and classification endpoints (#24356)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2025-09-07 08:26:48 +00:00
Ye (Charlotte) Qi
a3645ed94d
[Frontend][Responses API] Support reporting tool output tokens and fix reasoning token count (#24285)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-09-06 13:27:15 -07:00
Aaron Pham
fb691ee4e7
[Fix] [gpt-oss] fix non-tool calling path for chat completion (#24324) 2025-09-06 19:10:32 +00:00
Aaron Pham
c29fb540ff
[gpt-oss] tool parser supports for /chat/completions [1/n] (#22386)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2025-09-04 20:39:12 -07:00
mgazz
51d5e9be7d
[Core][Model] Terratorch backend integration (#23513)
Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-09-04 00:22:41 -07:00
Flora Feng
712b273f65
[Refactor] Introduce basic Renderer for completion-style request (#24010)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2025-09-04 05:21:12 +00:00
wuhang
a38f8bd54c
[Feature][Responses API]Support MCP tools with streaming mode + background mode (#23927)
Signed-off-by: wuhang <wuhang6@huawei.com>
2025-09-04 04:05:10 +00:00
wang.yuqi
51383bd472
[CI] Accelerate mteb test by setting SentenceTransformers mteb score to a constant (#24088)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-09-03 17:23:56 +08:00
dsinghvi
70549c1245
[CI/Build] Serve images used by multimodal tests through local HTTP Server (#23907)
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-09-03 16:13:11 +08:00
Didier Durand
d7e1e59972
[Doc]: fix typos in Python comments (#24093)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-02 21:05:45 -07:00
Mark McLoughlin
2417798471
[Metrics] Deprecate TPOT in favor of ITL (#24110)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-09-02 18:10:10 +00:00
Chenheli Hua
f399182e8c
Run ruff format on a few files. (#24075)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
2025-09-02 17:55:32 +00:00
Aziz
ce30dca5c4
[CI]: reduce HTTP calls inside entrypoints openai tests (#23646)
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
Signed-off-by: Aziz <azizbenothman76@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-02 10:49:32 +00:00
Nicolò Lucchesi
d46934b229
[Frontend] Gemma3n audio transcriptions/translations endpoint (#23735)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-09-01 18:07:46 +08:00
Jee Jee Li
628d00cd7b
[Bugfix] Fix test_lora_resolvers.py (#23984)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-30 11:16:11 +00:00
dubejf
5b31cb1781
[Bugfix] Fix --config arg expansion called from api_server.py (#23944)
Signed-off-by: Jean-Francois Dube <dubejf+gh@gmail.com>
Co-authored-by: Jean-Francois Dube <dubejf+gh@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-08-29 21:36:39 -07:00
22quinn
4d7fe40fc0
[RL][BugFix] Fix missing tokenizer error for token-in-token-out (#23904)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-08-30 01:09:55 +08:00
wang.yuqi
d9e00dbd1f
[Performance] V1 Classify Models E2E Performance Optimization (#23541)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-29 03:12:32 -07:00
Maximilien de Bayser
2554b27baa
[V0 Deprecation] Remove pooling model support in V0 (#23434)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-29 00:04:02 -07:00
Jee Jee Li
b4f9e9631c
[CI/Build] Clean up LoRA test (#23890)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-28 23:28:35 -07:00
Russell Bryant
c8b3b299c9
[tests] Improve speed and reliability of test_transcription_api_correctness (#23854)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-08-29 04:25:33 +00:00
Chen Zhang
eb1995167e
[gpt-oss] Enable unit test for response API harmony integration (#23533)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-26 18:23:26 -07:00
Guillaume Calmettes
ebd5a77bb5
feat: add usage to TranscriptionResponse (text and json response_format) (#23576)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
2025-08-26 05:26:26 -07:00
Cyrus Leung
ce0e9dbd43
[CI/Build] Fix typo in #23561 (#23616)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-25 23:13:03 -07:00
Cyrus Leung
6fd45e7b8a
[CI/Build] Use vLLM client's user agent to fetch images (#23561)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-25 19:34:12 -07:00
TeeKen Lau
5e021b4981
(Misc): add missing test for zero truncation size. (#23457)
Signed-off-by: teekenl <teekenlau@gmail.com>
2025-08-24 18:12:47 +08:00