389 Commits

Author SHA1 Message Date
zh Wang
d586ddc691
[BugFix] Fix authorization of openai_transcription_client.py (#17321)
Signed-off-by: zh Wang <rekind133@outlook.com>
2025-04-30 09:51:05 -07:00
Alec
0be6d05b5e
[V1][Metrics] add support for kv event publishing (#16750)
Signed-off-by: alec-flowers <aflowers@nvidia.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
2025-04-30 07:44:45 -07:00
Marco
54072f315f
[MODEL ADDITION] Ovis2 Model Addition (#15826)
Signed-off-by: Marco <121761685+mlinmg@users.noreply.github.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-04-30 07:33:29 +00:00
Huy Do
2c4f59afc3
Update PyTorch to 2.7.0 (#16859) 2025-04-29 19:08:04 -07:00
Bryan Lu
70788bdbdc
[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE (#17211)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
2025-04-29 21:10:00 +00:00
Harry Mellor
a6977dbd15
Simplify (and fix) passing of guided decoding backend options (#17008)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-29 19:02:23 +00:00
Chauncey
96e06e3cb7
[Misc] Add a Jinja template to support Mistral3 function calling (#17195)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-04-28 19:53:44 -07:00
Alex Brooks
fa93cd9f60
[Model] Add Granite Speech Support (#16246)
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-04-28 10:05:00 +00:00
Kuntai Du
9053d0b134
[Doc] Fix wrong github link in LMCache examples (#17274)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
2025-04-28 03:09:11 +00:00
Russell Bryant
f8acd01ff7
[V1] Add structural_tag support using xgrammar (#17085) 2025-04-26 14:06:37 +00:00
Isotr0py
8c1c926d00
[Bugfix] Fix missing int type for -n in multi-image example (#17223) 2025-04-26 08:49:52 +00:00
Yihua Cheng
5e83a7277f
[v1] [P/D] Adding LMCache KV connector for v1 (#16625) 2025-04-26 03:03:38 +00:00
Rui Qiao
c53e0730cb
[Misc] Refine ray_serve_deepseek example (#17204)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-04-25 16:06:59 -07:00
Benjamin Chislett
a0e619e62a
[V1][Spec Decode] EAGLE-3 Support (#16937)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Co-authored-by: Bryan Lu <yuzhelu@amazon.com>
2025-04-25 15:43:07 -07:00
Rui Qiao
583e900996
[Misc] Add example to run DeepSeek with Ray Serve LLM (#17134)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-04-24 22:25:21 +00:00
Maximilien de Bayser
05e1fbfc52
Add chat template for Llama 4 models (#16428)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-04-24 20:19:36 +00:00
Reid
1bcbcbf574
[Misc] refactor example series - structured outputs (#17040)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-24 07:49:48 -07:00
wang.yuqi
67309a1cb5
[Frontend] Using matryoshka_dimensions control the allowed output dimensions. (#16970) 2025-04-24 07:06:28 -07:00
Reid
db2f8d915c
[V1] Update structured output (#16812)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-23 23:57:17 -07:00
Reid
4b91c927f6
[Misc] refactor example series (#16972)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-22 11:44:21 +00:00
Cyrus Leung
205d84aaa9
[VLM] Clean up models (#16873)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-19 12:13:06 +00:00
Isotr0py
83f3c3bd91
[Model] Refactor Phi-4-multimodal to use merged processor and support V1 (#15477)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-19 02:26:11 -07:00
Nicolò Lucchesi
2ef0dc53b8
[Frontend] Add sampling params to v1/audio/transcriptions endpoint (#16591)
Signed-off-by: Jannis Schönleber <joennlae@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Jannis Schönleber <joennlae@gmail.com>
2025-04-19 07:03:54 +00:00
Yang Fan
2c1bd848a6
[Model][VLM] Add Qwen2.5-Omni model support (thinker only) (#15130)
Signed-off-by: fyabc <suyang.fy@alibaba-inc.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Xiong Wang <wangxiongts@163.com>
2025-04-18 23:14:36 -07:00
Reid
5a5e29de88
[Misc] refactor examples series - Chat Completion Client With Tools (#16829)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-18 23:24:42 +00:00
Cyrus Leung
aadb656562
[Misc] Clean up Kimi-VL (#16833)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-18 05:15:09 -07:00
Harry Mellor
e78587a64c
Improve-mm-and-pooler-and-decoding-configs (#16789)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-17 22:13:32 -07:00
Chauncey
7a4a5de729
[Misc] Update outdated note: LMCache now supports chunked prefill (#16697)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-04-18 05:12:42 +00:00
Yihua Cheng
3408e47159
[P/D][V1] KV Connector API V1 (#15960)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: remi <remi@mistral.ai>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
2025-04-17 13:22:40 -07:00
wang.yuqi
11c3b98491
[Doc] Document Matryoshka Representation Learning support (#16770) 2025-04-17 13:37:37 +00:00
Reid
99ed526101
[Misc] refactor examples series - lmcache (#16758)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-17 11:02:35 +00:00
Richard Liaw
8cac35ba43
[Ray] Improve documentation on batch inference (#16609)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2025-04-16 22:19:26 -07:00
Isotr0py
cb072ce93b
[Bugfix] Update Florence-2 tokenizer to make grounding tasks work (#16734)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-04-17 04:17:39 +00:00
Reid
7168920491
[Misc] refactor examples series (#16708)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-16 10:16:36 +00:00
Reid
6ae996a873
[Misc] refactor argument parsing in examples (#16635)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-15 08:05:30 +00:00
courage17340
b1308b84a3
[Model][VLM] Add Kimi-VL model support (#16387)
Signed-off-by: courage17340 <courage17340@163.com>
2025-04-14 21:41:48 +00:00
Reid
7cbfc10943
[Misc] refactor examples (#16563)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-14 09:59:15 +00:00
Jee Jee Li
3cdc57669f
[Misc] Delete redundant code (#16530)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-04-12 11:21:37 +00:00
Cyrus Leung
d9fc8cd9da
[V1] Enable multi-input by default (#15799)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-12 08:52:39 +00:00
Nicolò Lucchesi
f069f3ea74
[Misc] Openai transcription client example use same Whisper model (#16487)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-04-12 07:27:03 +00:00
wang.yuqi
fbf722c6e6
[Frontend] support matryoshka representation / support embedding API dimensions (#16331) 2025-04-11 23:23:10 -07:00
Ye (Charlotte) Qi
16eda8c43a
[Frontend] Added chat templates for LLaMa4 pythonic tool calling (#16463)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Kai Wu <kaiwu@meta.com>
2025-04-12 06:26:17 +08:00
Reid
35e076b3a8
[Misc] update api_client example (#16459)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-11 10:05:40 +00:00
Isotr0py
93195146ea
[Bugfix][VLM] Fix failing Phi-4-MM multi-images tests and add vision-speech test (#16424)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-04-11 04:57:16 +00:00
Lily Liu
e8224f3dca
[V1][Spec Decode] Eagle Model loading (#16035)
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
2025-04-10 11:21:48 -07:00
Ye (Charlotte) Qi
61de3ef74b
[Model] Remove image mm limit for LLaMa4 (#16365)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-04-10 09:36:27 +00:00
Reid
1bff42c4b7
[Misc] refactor Structured Outputs example (#16322)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-09 23:32:42 +00:00
zh Wang
a25866ac8d
[Bugfix] Fix profiling.py (#16202)
Signed-off-by: zh Wang <rekind133@outlook.com>
2025-04-09 17:03:34 +00:00
Chauncey
102bf967f0
[Model] Add smolvlm support (#16017)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-04-08 19:12:17 -07:00
Russell Bryant
2755c34a8f
[V1] Update structured output offline inference example (#15721)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-04-08 22:34:09 +00:00