Mark McLoughlin
2ad1bc7afe
[V1][Metrics] Add iteration_tokens_total histogram from V0 ( #13288 )
2025-02-15 03:56:19 -08:00
Alexander Matveev
45f90bcbba
[WIP] TPU V1 Support Refactored ( #13049 )
2025-02-14 00:21:53 -08:00
Harry Mellor
f2b20fe491
Consolidate Llama model usage in tests ( #13094 )
2025-02-13 22:18:03 -08:00
Nicolò Lucchesi
d84cef76eb
[Frontend] Add /v1/audio/transcriptions OpenAI API endpoint ( #12909 )
2025-02-13 07:23:45 -08:00
Vaibhav Jain
37dfa60037
[Bugfix] Missing Content Type returns 500 Internal Server Error ( #13193 )
2025-02-13 06:52:22 -08:00
LikeSundayLikeRain
04f50ad9d1
[Bugfix] deepseek_r1_reasoning_parser put reason content in wrong field in certain edge case ( #13097 )
2025-02-12 23:11:26 -08:00
Mark McLoughlin
75e6e14516
[V1][Metrics] Add several request timing histograms ( #12644 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-02-11 10:14:00 -05:00
Cody Yu
41c5dd45b9
[V1][Metrics] Add GPU prefix cache hit rate % gauge ( #12592 )
2025-02-11 08:27:25 +00:00
Ce Gao
fc6485d277
[Bugfix]: Reasoning output bug according to the chat template change ( #13025 )
...
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
2025-02-11 15:49:03 +08:00
Farzad Abdolhosseini
08b2d845d6
[Model] Ultravox Model: Support v0.5 Release ( #12912 )
...
Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>
2025-02-10 22:02:48 +00:00
Cyrus Leung
ce26b16268
[Misc] Remove unnecessary detokenization in multimodal processing ( #12868 )
2025-02-07 06:21:17 -08:00
Maximilien de Bayser
6e1fc61f0f
Prevent unecessary requests to huggingface hub ( #12837 )
2025-02-06 21:37:41 -08:00
Cyrus Leung
75404d041b
[VLM] Update compatibility with transformers 4.49
2025-02-05 19:09:45 -08:00
Mark McLoughlin
233df6f5c4
[V1][Metrics] Add request_success_total counter, labelled with finish reason ( #12579 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-02-04 19:46:54 -05:00
Russell Bryant
e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files ( #12628 )
...
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**
commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com>
Date: Fri Jan 31 14:18:24 2025 -0500
Add SPDX license headers to python source files
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
also be easily used by tools to help manage license compliance.
The Linux Foundation runs license scans against the codebase to help
ensure
we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
More information can be found on the SPDX site:
- https://spdx.dev/learn/handling-license-info/
Signed-off-by: Russell Bryant <rbryant@redhat.com>
commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com>
Date: Fri Jan 31 14:36:32 2025 -0500
Check for SPDX headers using pre-commit
Signed-off-by: Russell Bryant <rbryant@redhat.com>
---------
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-02 11:58:18 -08:00
Mark McLoughlin
f17f1d4608
[V1][Metrics] Add GPU cache usage % gauge ( #12561 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-01-29 18:31:01 -08:00
Mark McLoughlin
46fb056749
[V1][Metrics] Add TTFT and TPOT histograms ( #12530 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-01-29 04:11:16 +00:00
Ce Gao
a7e3eba66f
[Frontend] Support reasoning content for deepseek r1 ( #12473 )
...
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
2025-01-29 11:38:08 +08:00
Mark McLoughlin
c386c43ca3
[V1][Metrics] Add per-request prompt/generation_tokens histograms ( #12516 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-01-28 22:07:22 +00:00
Mark McLoughlin
3fd1fb63ef
[V1][Metrics] Hook up IterationStats for Prometheus metrics ( #12478 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-01-28 16:38:38 +00:00
Mark McLoughlin
01ba927040
[V1][Metrics] Add initial Prometheus logger ( #12416 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-01-27 12:26:28 -05:00
Pooya Davoodi
0cc6b383d7
[Frontend] Support scores endpoint in run_batch ( #12430 )
...
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
2025-01-27 04:30:17 +00:00
Kyle Mistele
0034b09ceb
[Frontend] Rerank API (Jina- and Cohere-compatible API) ( #12376 )
...
Signed-off-by: Kyle Mistele <kyle@mistele.com>
2025-01-26 19:58:45 -07:00
Matthew Hendrey
9ddc35220b
[Frontend] generation_config.json for maximum tokens( #12242 )
...
Signed-off-by: Matthew Hendrey <matthew.hendrey@gmail.com>
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: shangmingc <caishangming@linux.alibaba.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-01-26 19:59:25 +08:00
Wallas Henrique
58fd57ff1d
[Bugfix] Fix score api for missing max_model_len validation ( #12119 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com>
2025-01-17 16:24:22 +00:00
youkaichao
87a0c076af
[core] allow callable in collective_rpc ( #12151 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-17 20:47:01 +08:00
Jee Jee Li
07934cc237
[Misc][LoRA] Improve the readability of LoRA error messages ( #12102 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-17 19:32:28 +08:00
Isotr0py
d75ab55f10
[Misc] Add deepseek_vl2 chat template ( #12143 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-17 06:34:48 +00:00
Joe Runde
edce722eaa
[Bugfix] use right truncation for non-generative tasks ( #12050 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-01-16 00:31:01 +08:00
Joe Runde
ac2f3f7fee
[Bugfix] Validate lora adapters to avoid crashing server ( #11727 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-10 15:56:36 +08:00
Cyrus Leung
9a228348d2
[Misc] Provide correct Pixtral-HF chat template ( #11891 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-09 10:19:37 -07:00
Maximilien de Bayser
1fe554bac3
treat do_lower_case in the same way as the sentence-transformers library ( #11815 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-01-09 11:05:43 +08:00
Joe Runde
4db72e57f6
[Bugfix][Refactor] Unify model management in frontend ( #11660 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-01-01 02:21:51 +00:00
Michael Goin
74fa1d123c
[Bugfix] Fix OpenAI parallel sampling when using xgrammar ( #11637 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-31 03:43:54 +00:00
Cyrus Leung
101418096f
[VLM] Support caching in merged multi-modal processor ( #11396 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-27 17:22:48 +00:00
Cyrus Leung
7af553ea30
[Misc] Abstract the logic for reading and writing media content ( #11527 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-27 19:21:23 +08:00
Cyrus Leung
9edca6bf8f
[Frontend] Online Pooling API ( #11457 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-24 17:54:30 +08:00
Michael Goin
63afbe9215
[CI] Expand OpenAI test_chat.py guided decoding tests ( #11048 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-23 18:35:38 +00:00
Michael Goin
5bfb30a529
[Bugfix] Fix CFGGuide and use outlines for grammars that can't convert to GBNF ( #11389 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-23 23:06:20 +08:00
Roger Wang
29c748930e
[CI] Fix flaky entrypoint tests ( #11403 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2024-12-21 21:08:44 -08:00
Yanyi Liu
5aef49806d
[Feature] Add load generation config from model ( #11164 )
...
Signed-off-by: liuyanyi <wolfsonliu@163.com>
Signed-off-by: Yanyi Liu <wolfsonliu@163.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2024-12-19 10:50:38 +00:00
Michael Goin
a30482f054
[CI] Expand test_guided_generate to test all backends ( #11313 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-19 04:00:38 +00:00
Michael Goin
c77eb8a33c
[Bugfix] Set temperature=0.7 in test_guided_choice_chat ( #11264 )
2024-12-17 16:34:06 -08:00
Joe Runde
2d1b9baa8f
[Bugfix] Fix request cancellation without polling ( #11190 )
2024-12-17 12:26:32 -08:00
kYLe
66d4b16724
[Frontend] Add OpenAI API support for input_audio ( #11027 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-16 22:09:58 -08:00
Michael Goin
0064f697d3
[CI] Add test case with JSON schema using references + use xgrammar by default with OpenAI parse ( #10935 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-17 11:39:58 +08:00
youkaichao
551603feff
[core] overhaul memory profiling and fix backward compatibility ( #10511 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-16 13:32:25 -08:00
Isotr0py
d927dbcd88
[Model] Refactor Ultravox to use merged input processor ( #11198 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-12-16 10:09:53 +00:00
Brad Hilton
9c3dadd1c9
[Frontend] Add logits_processors as an extra completion argument ( #11150 )
...
Signed-off-by: Brad Hilton <brad.hilton.nw@gmail.com>
2024-12-14 16:46:42 +00:00
Cyrus Leung
0920ab9131
[Doc] Reorganize online pooling APIs ( #11172 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-14 00:22:22 +08:00