Matthew Bonanni
7eb6cb6c18
[Attention] Update tests to remove deprecated env vars ( #30563 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-17 09:49:59 -08:00
Nicolò Lucchesi
9ca8cb38fd
[CI][Bugfix] Fix flaky tests/entrypoints/openai/test_audio.py::test_chat_streaming_audio ( #30878 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-17 18:49:56 +01:00
Chauncey
9ad5b21710
[Refactor] [4/N] Move VLLM_SERVER_DEV endpoints into the serve directory ( #30749 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-17 02:27:30 -08:00
Nicolò Lucchesi
ca702a14dc
[Frontend] Add max-completion-token option to transcription/translation endpoints ( #30769 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-16 19:36:49 +00:00
Andrew Xia
0d0c929f23
[responsesAPI][8] input/output messages for ResponsesParser ( #30158 )
...
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <axia@meta.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-16 13:54:59 +08:00
penfree
bbd850e597
[Bugfix] fix streaming final output for non harmony ( #30237 )
...
Signed-off-by: penfree <qiupengfei@baidu.com>
Co-authored-by: penfree <qiupengfei@baidu.com>
2025-12-16 09:03:11 +08:00
Chauncey
2a1776b7ac
[Refactor] [2/N] Move tool parsers into the vLLM main directory ( #30675 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-15 12:54:52 +00:00
Wenqi Glantz
84e23d103d
additional protection for CVE-2025-62164 ( #30649 )
...
Signed-off-by: Wenqi Glantz <wglantz@nvidia.com>
2025-12-15 03:07:10 +00:00
Cyrus Leung
dcb31196da
[Chore] Remove redundant RequestPrompt ( #30612 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-14 09:22:37 +00:00
Cyrus Leung
64251f48df
[Chore] Adjust tokenizer import to avoid circular imports ( #30601 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-13 04:42:39 -08:00
Cyrus Leung
b09806e28f
[Bugfix] Dictionary MM embeddings for online chat ( #30507 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-13 15:48:56 +08:00
Benjamin Bartels
f3237f3f6b
[Frontend] Fixes anthropic streaming message_start usage nesting ( #30266 )
...
Signed-off-by: bbartels <benjamin@bartels.dev>
2025-12-12 16:28:54 +00:00
Ben Browning
8f8fda261a
[Bugfix] Multiple fixes for gpt-oss Chat Completion prompting ( #28729 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-12 12:59:53 +08:00
Will Eaton
a9e4106f28
[P/D] KV Load Failure Recovery/Abort Configuration ( #26813 )
...
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <me@wseaton.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-10 11:00:52 -08:00
Andrew Xia
c3487aca34
[responsesAPI][6] Fix multi turn MCP tokenization ( #30230 )
...
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-10 10:13:13 +08:00
wang.yuqi
2e660c2434
[Frontend] Binary embedding response does not return metadata by setting encoding_format to bytes_only. ( #30249 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-08 12:01:21 +00:00
daniel-salib
444f0e3f33
[Frontend] Add MCP type support infrastructure to Responses API ( #30054 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com>
2025-12-08 10:02:52 +08:00
Cyrus Leung
e83b7e379c
Revert "[Renderer] Separate out RendererConfig from ModelConfig ( #30145 )" ( #30199 )
2025-12-07 00:00:22 -08:00
Cyrus Leung
27f4c2fd46
[Renderer] Separate out RendererConfig from ModelConfig ( #30145 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 23:15:42 -08:00
jeremyteboul
dce6d229f7
Support multiple image/audio embeddings per requests ( #29988 )
...
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
2025-12-07 04:34:24 +00:00
Andrew Xia
421125d03a
[ez] move harmony utils to parser folder ( #30117 )
...
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-06 17:34:34 -05:00
Viacheslav
21bb323542
Gigachat 3 tool parser and tests ( #29905 )
...
Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com>
2025-12-06 12:04:14 +00:00
Deboleina
02a4169193
[Tests] Tool call tests for openai/gpt-oss-20b ( #26237 )
...
Signed-off-by: Debolina Roy <debroy@redhat.com>
2025-12-05 19:03:29 -08:00
Nicolò Lucchesi
e23ca3a0e8
[CI] Re-use whisper_client for all tests ( #30148 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-05 19:47:37 +00:00
Andrew Xia
da7bc54ea8
[responsesAPI][5] ResponsesParser with tools for full MCP python loop ( #29798 )
...
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <axia@meta.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-05 11:11:50 -05:00
strinczer
b73b158ab0
[Bugfix] Fix parse_output_message crash on commentary with no recipient ( #29972 )
...
Signed-off-by: Shai Trinczer <strinczer@icloud.com>
Signed-off-by: strinczer <strinczer@icloud.com>
2025-12-05 10:51:12 +00:00
wang.yuqi
74c4d80c6c
[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling ( #27145 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-04 13:44:15 +00:00
Cyrus Leung
9ae2f60374
[Misc] Various cleanups for MM input processing ( #29970 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 06:22:20 +00:00
Benjamin Bartels
fca3f46658
[Frontend] Fixes anthropic /v1/messages streaming not containing input_tokens on first chunk ( #29971 )
...
Signed-off-by: bbartels <benjamin@bartels.dev>
2025-12-04 05:50:27 +00:00
Chauncey
3f42b05fbc
[Refactor] [1/N] to simplify the vLLM serving architecture ( #28040 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-03 01:26:39 -08:00
Andrew Xia
3a7751485b
[responsesAPI] support input output messages for non harmony models ( #29549 )
...
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-02 23:59:23 -08:00
Andrew Xia
52cb349fc0
[responsesAPI][3] ResponsesParser to set up non harmony MCP ( #29413 )
...
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-02 11:24:45 -05:00
ImaGoodFella
60c3d413af
[Multimodal][Core] Optimize multimodal preprocessing cache by hashing image bytes instead of pixel values ( #29621 )
...
Signed-off-by: Rahul Steiger <rasteiger@ethz.ch>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-02 21:49:02 +08:00
Cyrus Leung
653591d5e7
[Chore] Move tokenizer initialization methods ( #29793 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-02 13:33:37 +08:00
Zuyi Zhao
53bf71b0f0
[Misc] Update conftest for entrypoints/sagemaker test folder ( #29799 )
...
Signed-off-by: Zuyi Zhao <zhaozuy@amazon.com>
2025-12-01 18:56:39 -09:00
Andrew Xia
fa8804ad9c
[responsesAPI][4] fix responseOutputItem Kimi K2 thinking bug ( #29555 )
...
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-02 02:11:35 +00:00
sangbumlikeagod
092bb73b8a
[Frontend] add 'verbose_json' and 'timestamp' feature on Whisper Transcription/Translation ( #24209 )
...
Signed-off-by: sangbumlikeagod <oironese@naver.com>
Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>
2025-12-01 18:19:17 +01:00
Cyrus Leung
f0a28bf661
[Misc] Unify tokenizer registration ( #29767 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-01 11:34:58 +00:00
daniel-salib
014ece97c7
[Frontend] Add tool filtering support to ToolServer ( #29224 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-01 08:03:57 +00:00
wang.yuqi
62de4f4257
[Frontend] Resettle pooling entrypoints ( #29634 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-01 15:30:43 +08:00
Jee Jee Li
b9d0504a36
[Bugfix] Revert test_tokenization.py ( #29729 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-29 16:35:15 +00:00
Cyrus Leung
34a984274e
[Misc] Refactor tokenizer interface ( #29693 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-29 04:02:21 -08:00
Jee Jee Li
39e63dec7c
[LoRA] Cleanup LoRA unused code ( #29611 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-28 22:52:58 -08:00
Tsukasa OI
762a4a6ca9
[Frontend] Perform offline path replacement to tokenizer ( #29706 )
...
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
2025-11-28 18:32:08 -08:00
Cyrus Leung
b2c50eda50
[Bugfix] Fix wrong mock attribute ( #29704 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-29 10:30:41 +08:00
Cyrus Leung
8d9338fae4
[Chore] Rename Processor to InputProcessor ( #29682 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-28 09:35:41 -08:00
Nicolò Lucchesi
e5a621b724
[CI] Add batched audios Whisper test ( #29308 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-11-27 19:31:52 +00:00
Andrew Xia
b07555d26f
[responsesAPI][2] parse ResponseFunctionToolCallOutputItem ( #29383 )
...
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-11-25 10:27:26 -08:00
wang.yuqi
7a80b01889
[CI] Resettle pooling entrypoints tests. ( #29370 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-11-25 10:39:10 +00:00
Mark McLoughlin
9cf4edae6e
[Metrics] Scheduled removal of deprecated metrics ( #29330 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-11-25 11:15:13 +08:00