wang.yuqi
|
d9e00dbd1f
|
[Performance] V1 Classify Models E2E Performance Optimization (#23541)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-29 03:12:32 -07:00 |
|
wang.yuqi
|
84cf78acee
|
[Model] Pooling models default to using chunked prefill & prefix caching if supported. (#20930)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-11 09:41:37 -07:00 |
|
Moritz Sanft
|
370661856b
|
[Frontend] Update OpenAI error response to upstream format (#22099)
Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>
|
2025-08-06 23:06:00 -07:00 |
|
wang.yuqi
|
586f286789
|
[Model] Pooling model activation supports per request control by PoolingParams (#20538)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-05 00:37:00 -07:00 |
|
Maximilien de Bayser
|
6ebf313790
|
Avoid direct comparison of floating point numbers (#21002)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-07-15 21:12:14 -07:00 |
|
Cyrus Leung
|
cbd14ed561
|
[Bugfix] Refactor /invocations to be task-agnostic (#20764)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-11 03:20:54 -07:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
Frieda Huang
|
9cea90eab4
|
[Frontend] Add /classify endpoint (#17032)
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
|
2025-05-11 07:57:07 +00:00 |
|