Michael Goin
3253ae765e
[Flaky CI] Increase timeout tolerance for test_mp_crash_detection+test_default_mm_lora_chat_completions ( #23028 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-16 18:33:08 +00:00
Woonggi Min
68373d3126
[Frontend] Added support for HermesToolParser for models without special tokens ( #16890 )
...
Signed-off-by: minpeter <kali2005611@gmail.com>
2025-08-16 17:38:42 +00:00
Andrew Sansom
78863f8c5c
[BugFix] Add support for loading prompt embeds tensors serialized on unavailable devices and sparse tensors ( #22962 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
2025-08-16 06:25:10 +00:00
Michael Goin
8a87cd27d9
[CI] Speed up Whisper tests by reusing server ( #22859 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-15 16:56:31 -04:00
Nicolò Lucchesi
540d54ca8d
[CI] Re-enable transcriptions test_long_audio_request ( #22890 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-08-14 11:34:34 +00:00
Robert Shaw
a353bd083d
[CI] remove flaky v0 test ( #22864 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-08-13 21:41:51 -07:00
Will Eaton
b6af24fba7
[CI][Entrypoints]: add filter to generation to filter out invalid tool calls ( #22826 )
...
Signed-off-by: Will Eaton <weaton@redhat.com>
2025-08-13 20:09:07 -07:00
Kdump
653124bd46
[Frontend] Add chunked processing to handle long inputs in embedding models ( #22280 )
...
Signed-off-by: x22x22 <wadeking@qq.com>
Signed-off-by: Kdump <rootshellexp@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-13 04:14:24 -07:00
Woosuk Kwon
71683ca6f6
[V0 Deprecation] Remove multi-step scheduling ( #22138 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-08-12 20:18:39 -07:00
Michael Goin
ea1292ad3e
[CI Failure] Use float32 for tests/entrypoints/openai/test_audio.py ( #22686 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-11 20:20:42 -07:00
Chen Zhang
1891a265d3
[gpt-oss] Add test for response API + harmony (but skipped) ( #22554 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-11 17:47:24 -07:00
wang.yuqi
84cf78acee
[Model] Pooling models default to using chunked prefill & prefix caching if supported. ( #20930 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-11 09:41:37 -07:00
Maximilien de Bayser
39052dbca8
Support token_type_ids in V1 with less code changes ( #21985 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-08-10 22:54:59 -07:00
22quinn
b799f4b9ea
[CI/Build] Fix tensorizer test for load_format change ( #22583 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-08-10 19:30:00 -07:00
Russell Bryant
311d875614
Drop flaky test_healthcheck_response_time ( #22539 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-08-08 16:56:47 -07:00
yyweiss
baece8c3d2
[Frontend] Add unix domain socket support ( #18097 )
...
Signed-off-by: <yyweiss@gmail.com>
Signed-off-by: yyw <yyweiss@gmail.com>
2025-08-08 16:23:44 -07:00
Moritz Sanft
370661856b
[Frontend] Update OpenAI error response to upstream format ( #22099 )
...
Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>
2025-08-06 23:06:00 -07:00
wang.yuqi
586f286789
[Model] Pooling model activation supports per request control by PoolingParams ( #20538 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-05 00:37:00 -07:00
Roger Wang
067c34a155
docs: remove deprecated disable-log-requests flag ( #22113 )
...
Signed-off-by: Roger Wang <hey@rogerw.me>
2025-08-02 00:19:48 -07:00
Reza Barazesh
37efc63b64
[V0 deprecation] Guided decoding ( #21347 )
...
Signed-off-by: Reza Barazesh <rezabarazesh@meta.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-29 03:15:30 -07:00
Keyang Ru
9ace2eaf35
[Bugfix] Improve JSON extraction in LlamaToolParser ( #19024 )
...
Signed-off-by: keru <keyang.ru@oracle.com>
Co-authored-by: keru <keyang.ru@oracle.com>
2025-07-28 12:36:58 +00:00
Hongsheng Liu
7656cf4cf3
[Bugfix] [issue-21565] Fix the incompatibility issue with stream and named function calling when Thinking is disabled ( #21573 )
...
Signed-off-by: wangzi <3220100013@zju.edu.cn>
Co-authored-by: wangzi <3220100013@zju.edu.cn>
2025-07-27 22:43:50 -07:00
Cyrus Leung
86ae693f20
[Deprecation][2/N] Replace --task with --runner and --convert ( #21470 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-27 19:42:40 -07:00
mgazz
e189b50f53
Add support for Prithvi in Online serving mode ( #21518 )
...
Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-07-25 07:01:27 -07:00
Cyrus Leung
34ddcf9ff4
[Frontend] run-batch supports V1 ( #21541 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-24 20:05:55 -07:00
Liangliang Ma
13e4ee1dc3
[XPU][UT] increase intel xpu CI test scope ( #21492 )
...
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
2025-07-23 20:24:04 -07:00
Michael Goin
82ec66f514
[V0 Deprecation] Remove Prompt Adapters ( #20588 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-23 16:36:48 -07:00
22quinn
b3d82108e7
[Bugfix][Frontend] Fix openai CLI arg middleware ( #21220 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-07-19 02:40:38 -07:00
Asher
5a7fb3ab9e
[Model] Add ToolParser and MoE Config for Hunyuan A13B ( #20820 )
...
Signed-off-by: Asher Zhang <asherszhang@tencent.com>
2025-07-17 09:10:09 +00:00
Michael Goin
4e7dfbe7b4
Update PyTorch to torch==2.7.1 for CUDA ( #21011 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-17 02:30:44 +00:00
Mac Misiura
18bdcf4113
feat - add a new endpoint get_tokenizer_info to provide tokenizer/chat-template information ( #20575 )
...
Signed-off-by: m-misiura <mmisiura@redhat.com>
2025-07-16 21:52:14 +08:00
Maximilien de Bayser
6ebf313790
Avoid direct comparison of floating point numbers ( #21002 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-07-15 21:12:14 -07:00
Patrick von Platen
cfbcb9ed87
[Voxtral] Add more tests ( #21010 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-15 21:11:49 -07:00
Harry Mellor
1e36c8687e
[Deprecation] Remove nullable_kvs ( #20969 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-15 17:21:50 +00:00
Patrick von Platen
e7e3e6d263
Voxtral ( #20970 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-07-15 07:35:30 -07:00
Nicolò Lucchesi
80305c1b24
[CI] Fix flaky test_streaming_response test ( #20913 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-14 20:15:15 -07:00
Nicolò Lucchesi
149f2435a5
[Misc] Relax translations tests ( #20856 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-14 20:08:36 +00:00
Cyrus Leung
cbd14ed561
[Bugfix] Refactor /invocations to be task-agnostic ( #20764 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-11 03:20:54 -07:00
Alex Brooks
41060c6e08
[Core] Add Support for Default Modality Specific LoRAs [generate / chat completions] ( #19126 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-07-10 21:09:37 +01:00
Chauncey
8f2720def9
[Frontend] Support Tool Calling with both tool_choice='required' and $defs. ( #20629 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-07-10 13:56:35 +08:00
Chauncey
2155e95ef1
[Bugfix] Fix the issue where reasoning_content is None when Thinkng is enabled and tool_choice is set to 'required'. ( #20662 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-07-09 07:39:58 +00:00
kourosh hakhamaneshi
baed180aa0
[tech debt] Revisit lora request model checker ( #20636 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2025-07-09 09:42:41 +08:00
Sanger Steel
72d14d0eed
[Frontend] [Core] Integrate Tensorizer in to S3 loading machinery, allow passing arbitrary arguments during save/load ( #19619 )
...
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
Co-authored-by: Eta <esyra@coreweave.com>
2025-07-07 22:47:43 -07:00
ztang2370
a37d75bbec
[Front-end] microbatch tokenization ( #19334 )
...
Signed-off-by: zt2370 <ztang2370@gmail.com>
2025-07-07 17:54:10 +01:00
Woosuk Kwon
462b269280
Implement OpenAI Responses API [1/N] ( #20504 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-06 18:32:13 -07:00
sangbumlikeagod
9e5452ee34
[Bug][Frontend] Fix structure of transcription's decoder_prompt ( #18809 )
...
Signed-off-by: sangbumlikeagod <oironese@naver.com>
2025-07-04 11:28:07 +00:00
wang.yuqi
6f1229f91d
[Model][2/N] Automatic conversion of CrossEncoding model ( #19978 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
2025-07-03 13:59:23 +00:00
Cyrus Leung
b024a42e93
[Core] Move multimodal placeholder from chat utils to model definition ( #20355 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-03 08:18:30 +00:00
Chenheli Hua
2e7cbf2d7d
[Frontend] Support configurable mm placeholder strings & flexible video sampling policies via CLI flags. ( #20105 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
2025-07-01 23:34:03 -07:00
Yuxuan Zhang
ed70f3c64f
Add GLM4.1V model (Draft) ( #19331 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-01 12:48:26 +00:00