Matthew Hendrey
|
9ddc35220b
|
[Frontend] generation_config.json for maximum tokens(#12242)
Signed-off-by: Matthew Hendrey <matthew.hendrey@gmail.com>
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: shangmingc <caishangming@linux.alibaba.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-01-26 19:59:25 +08:00 |
|
youkaichao
|
3f50c148fd
|
[core] add wake_up doc and some sanity check (#12361)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-24 02:00:50 +08:00 |
|
Nick Hill
|
aea94362c9
|
[Frontend][V1] Online serving performance improvements (#12287)
|
2025-01-22 22:22:12 +00:00 |
|
Cody Yu
|
7206ce4ce1
|
[Core] Support reset_prefix_cache (#12284)
|
2025-01-22 18:52:27 +00:00 |
|
Robin
|
fc66dee76d
|
[Misc] Fix the error in the tip for the --lora-modules parameter (#12319)
Signed-off-by: wangerxiao <863579016@qq.com>
|
2025-01-22 16:48:41 +00:00 |
|
youkaichao
|
68ad4e3a8d
|
[Core] Support fully transparent sleep mode (#11743)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-22 14:39:32 +08:00 |
|
Cyrus Leung
|
59a0192fb9
|
[Core] Interface for accessing model from VllmRunner (#10353)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-20 15:00:59 +08:00 |
|
Wallas Henrique
|
58fd57ff1d
|
[Bugfix] Fix score api for missing max_model_len validation (#12119)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
|
2025-01-17 16:24:22 +00:00 |
|
youkaichao
|
87a0c076af
|
[core] allow callable in collective_rpc (#12151)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-17 20:47:01 +08:00 |
|
Jee Jee Li
|
07934cc237
|
[Misc][LoRA] Improve the readability of LoRA error messages (#12102)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-17 19:32:28 +08:00 |
|
youkaichao
|
92e793d91a
|
[core] LLM.collective_rpc interface and RLHF example (#12084)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-16 20:19:52 +08:00 |
|
maang-h
|
57e729e874
|
[Doc]: Update OpenAI-Compatible Server documents (#12082)
|
2025-01-15 16:07:45 +00:00 |
|
Isotr0py
|
f967e51f38
|
[Model] Initialize support for Deepseek-VL2 models (#11578)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-01-12 00:17:24 -08:00 |
|
Fred Reiss
|
c9f09a4fe8
|
[mypy] Fix mypy warnings in api_server.py (#11941)
Signed-off-by: Fred Reiss <frreiss@us.ibm.com>
|
2025-01-11 01:04:58 +00:00 |
|
Joe Runde
|
ac2f3f7fee
|
[Bugfix] Validate lora adapters to avoid crashing server (#11727)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-10 15:56:36 +08:00 |
|
Ye (Charlotte) Qi
|
1d967acb45
|
[Bugfix] fix beam search input errors and latency benchmark script (#11875)
Signed-off-by: Ye Qi <yeq@meta.com>
Co-authored-by: yeq <yeq@devgpu004.lla3.facebook.com>
|
2025-01-09 17:36:39 +08:00 |
|
Cyrus Leung
|
d848800e88
|
[Misc] Move print_*_once from utils to logger (#11298)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
Co-authored-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
|
2025-01-09 12:48:12 +08:00 |
|
Maximilien de Bayser
|
1fe554bac3
|
treat do_lower_case in the same way as the sentence-transformers library (#11815)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-01-09 11:05:43 +08:00 |
|
Wallas Henrique
|
cfd3219f58
|
[Hardware][Apple] Native support for macOS Apple Silicon (#11696)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2025-01-08 16:35:49 +08:00 |
|
Rui Qiao
|
f8fcca100b
|
[Misc] Fix typo for valid_tool_parses (#11753)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-01-06 07:12:38 +00:00 |
|
Robert Shaw
|
33fc1e2e86
|
[Frontend] Improve StreamingResponse Exception Handling (#11752)
|
2025-01-05 16:35:01 -05:00 |
|
Robert Shaw
|
ad0d567e1c
|
[V1] Chore: cruft removal (#11724)
|
2025-01-03 23:25:02 +00:00 |
|
Robert Shaw
|
80c751e7f6
|
[V1] Simplify Shutdown (#11659)
|
2025-01-03 17:25:38 +00:00 |
|
Nathan Azrak
|
68d37809b9
|
[Misc] Minimum requirements for SageMaker compatibility (#11576)
|
2025-01-02 15:59:25 -08:00 |
|
Joe Runde
|
4db72e57f6
|
[Bugfix][Refactor] Unify model management in frontend (#11660)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-01-01 02:21:51 +00:00 |
|
Robert Shaw
|
5886aa496e
|
[V1] [6/N] API Server: Better Shutdown (#11586)
|
2024-12-30 15:51:02 +00:00 |
|
Robert Shaw
|
df04dffade
|
[V1] [4/N] API Server: ZMQ/MP Utilities (#11541)
|
2024-12-28 01:45:08 +00:00 |
|
Cyrus Leung
|
7af553ea30
|
[Misc] Abstract the logic for reading and writing media content (#11527)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-27 19:21:23 +08:00 |
|
Robert Shaw
|
55fb97f7bd
|
[2/N] API Server: Avoid ulimit footgun (#11530)
|
2024-12-26 23:43:05 +00:00 |
|
Robert Shaw
|
720b10fdc6
|
[1/N] API Server (Remove Proxy) (#11529)
|
2024-12-26 23:03:43 +00:00 |
|
Cyrus Leung
|
9edca6bf8f
|
[Frontend] Online Pooling API (#11457)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-24 17:54:30 +08:00 |
|
dpxa
|
4f074fbf53
|
[Misc]Suppress irrelevant exception stack trace information when CUDA… (#11438)
Co-authored-by: shiquan <shiquan>
|
2024-12-24 08:43:39 +00:00 |
|
Rafael Vasquez
|
32aa2059ad
|
[Docs] Convert rST to MyST (Markdown) (#11145)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
|
2024-12-23 22:35:38 +00:00 |
|
Ricky Xu
|
584f0ae40d
|
[V1] Make AsyncLLMEngine v1-v0 opaque (#11383)
Signed-off-by: Ricky Xu <xuchen727@hotmail.com>
|
2024-12-21 15:14:08 +08:00 |
|
Michael Goin
|
d573aeadcc
|
[Bugfix] Don't log OpenAI field aliases as ignored (#11378)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-20 19:03:50 +00:00 |
|
Yanyi Liu
|
5aef49806d
|
[Feature] Add load generation config from model (#11164)
Signed-off-by: liuyanyi <wolfsonliu@163.com>
Signed-off-by: Yanyi Liu <wolfsonliu@163.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-12-19 10:50:38 +00:00 |
|
Travis Johnson
|
17ca964273
|
[Model] IBM Granite 3.1 (#11307)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-12-19 11:27:24 +08:00 |
|
Joe Runde
|
2d1b9baa8f
|
[Bugfix] Fix request cancellation without polling (#11190)
|
2024-12-17 12:26:32 -08:00 |
|
kYLe
|
66d4b16724
|
[Frontend] Add OpenAI API support for input_audio (#11027)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-16 22:09:58 -08:00 |
|
Michael Goin
|
0064f697d3
|
[CI] Add test case with JSON schema using references + use xgrammar by default with OpenAI parse (#10935)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-17 11:39:58 +08:00 |
|
Isotr0py
|
d927dbcd88
|
[Model] Refactor Ultravox to use merged input processor (#11198)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-12-16 10:09:53 +00:00 |
|
yansh97
|
17138af7c4
|
[Bugfix] Fix the default value for temperature in ChatCompletionRequest (#11219)
|
2024-12-16 00:15:40 -08:00 |
|
Brad Hilton
|
9c3dadd1c9
|
[Frontend] Add logits_processors as an extra completion argument (#11150)
Signed-off-by: Brad Hilton <brad.hilton.nw@gmail.com>
|
2024-12-14 16:46:42 +00:00 |
|
Russell Bryant
|
4863e5fba5
|
[Core] V1: Use multiprocessing by default (#11074)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-12-13 16:27:32 -08:00 |
|
Cyrus Leung
|
0920ab9131
|
[Doc] Reorganize online pooling APIs (#11172)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-14 00:22:22 +08:00 |
|
zhangjf
|
5b0ed8391d
|
[Bugfix] using len(tokenizer) instead of tokenizer.vocab_size in AllowedTokenIdsLogitsProcessor (#11156)
|
2024-12-13 15:56:19 +00:00 |
|
Cyrus Leung
|
eeec9e3390
|
[Frontend] Separate pooling APIs in offline inference (#11129)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-13 10:40:07 +00:00 |
|
Jiaxin Shan
|
85362f028c
|
[Misc][LoRA] Ensure Lora Adapter requests return adapter name (#11094)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-12 09:25:16 +00:00 |
|
Russell Bryant
|
ccede2b264
|
[Core] cleanup zmq ipc sockets on exit (#11115)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-12-11 19:12:24 -08:00 |
|
Clayton
|
7439a8b5fc
|
[Bugfix] Multiple fixes to tool streaming with hermes and mistral (#10979)
Signed-off-by: cedonley <clayton@donley.io>
|
2024-12-12 01:10:12 +00:00 |
|