Robin
fc66dee76d
[Misc] Fix the error in the tip for the --lora-modules parameter ( #12319 )
...
Signed-off-by: wangerxiao <863579016@qq.com>
2025-01-22 16:48:41 +00:00
Wallas Henrique
58fd57ff1d
[Bugfix] Fix score api for missing max_model_len validation ( #12119 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com>
2025-01-17 16:24:22 +00:00
Jee Jee Li
07934cc237
[Misc][LoRA] Improve the readability of LoRA error messages ( #12102 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-17 19:32:28 +08:00
maang-h
57e729e874
[Doc]: Update OpenAI-Compatible Server documents ( #12082 )
2025-01-15 16:07:45 +00:00
Fred Reiss
c9f09a4fe8
[mypy] Fix mypy warnings in api_server.py ( #11941 )
...
Signed-off-by: Fred Reiss <frreiss@us.ibm.com>
2025-01-11 01:04:58 +00:00
Joe Runde
ac2f3f7fee
[Bugfix] Validate lora adapters to avoid crashing server ( #11727 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-10 15:56:36 +08:00
Maximilien de Bayser
1fe554bac3
treat do_lower_case in the same way as the sentence-transformers library ( #11815 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-01-09 11:05:43 +08:00
Wallas Henrique
cfd3219f58
[Hardware][Apple] Native support for macOS Apple Silicon ( #11696 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2025-01-08 16:35:49 +08:00
Rui Qiao
f8fcca100b
[Misc] Fix typo for valid_tool_parses ( #11753 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-01-06 07:12:38 +00:00
Robert Shaw
33fc1e2e86
[Frontend] Improve StreamingResponse Exception Handling ( #11752 )
2025-01-05 16:35:01 -05:00
Nathan Azrak
68d37809b9
[Misc] Minimum requirements for SageMaker compatibility ( #11576 )
2025-01-02 15:59:25 -08:00
Joe Runde
4db72e57f6
[Bugfix][Refactor] Unify model management in frontend ( #11660 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-01-01 02:21:51 +00:00
Robert Shaw
5886aa496e
[V1] [6/N] API Server: Better Shutdown ( #11586 )
2024-12-30 15:51:02 +00:00
Robert Shaw
df04dffade
[V1] [4/N] API Server: ZMQ/MP Utilities ( #11541 )
2024-12-28 01:45:08 +00:00
Robert Shaw
55fb97f7bd
[2/N] API Server: Avoid ulimit footgun ( #11530 )
2024-12-26 23:43:05 +00:00
Robert Shaw
720b10fdc6
[1/N] API Server (Remove Proxy) ( #11529 )
2024-12-26 23:03:43 +00:00
Cyrus Leung
9edca6bf8f
[Frontend] Online Pooling API ( #11457 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-24 17:54:30 +08:00
Ricky Xu
584f0ae40d
[V1] Make AsyncLLMEngine v1-v0 opaque ( #11383 )
...
Signed-off-by: Ricky Xu <xuchen727@hotmail.com>
2024-12-21 15:14:08 +08:00
Michael Goin
d573aeadcc
[Bugfix] Don't log OpenAI field aliases as ignored ( #11378 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-20 19:03:50 +00:00
Yanyi Liu
5aef49806d
[Feature] Add load generation config from model ( #11164 )
...
Signed-off-by: liuyanyi <wolfsonliu@163.com>
Signed-off-by: Yanyi Liu <wolfsonliu@163.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2024-12-19 10:50:38 +00:00
Travis Johnson
17ca964273
[Model] IBM Granite 3.1 ( #11307 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-12-19 11:27:24 +08:00
Joe Runde
2d1b9baa8f
[Bugfix] Fix request cancellation without polling ( #11190 )
2024-12-17 12:26:32 -08:00
Michael Goin
0064f697d3
[CI] Add test case with JSON schema using references + use xgrammar by default with OpenAI parse ( #10935 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-17 11:39:58 +08:00
yansh97
17138af7c4
[Bugfix] Fix the default value for temperature in ChatCompletionRequest ( #11219 )
2024-12-16 00:15:40 -08:00
Brad Hilton
9c3dadd1c9
[Frontend] Add logits_processors as an extra completion argument ( #11150 )
...
Signed-off-by: Brad Hilton <brad.hilton.nw@gmail.com>
2024-12-14 16:46:42 +00:00
Cyrus Leung
0920ab9131
[Doc] Reorganize online pooling APIs ( #11172 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-14 00:22:22 +08:00
zhangjf
5b0ed8391d
[Bugfix] using len(tokenizer) instead of tokenizer.vocab_size in AllowedTokenIdsLogitsProcessor ( #11156 )
2024-12-13 15:56:19 +00:00
Cyrus Leung
eeec9e3390
[Frontend] Separate pooling APIs in offline inference ( #11129 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-13 10:40:07 +00:00
Jiaxin Shan
85362f028c
[Misc][LoRA] Ensure Lora Adapter requests return adapter name ( #11094 )
...
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-12 09:25:16 +00:00
Russell Bryant
ccede2b264
[Core] cleanup zmq ipc sockets on exit ( #11115 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-12-11 19:12:24 -08:00
Clayton
7439a8b5fc
[Bugfix] Multiple fixes to tool streaming with hermes and mistral ( #10979 )
...
Signed-off-by: cedonley <clayton@donley.io>
2024-12-12 01:10:12 +00:00
Cyrus Leung
cad5c0a6ed
[Doc] Update docs to refer to pooling models ( #11093 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-11 13:36:27 +00:00
Cyrus Leung
8f10d5e393
[Misc] Split up pooling tasks ( #10820 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-11 01:28:00 -08:00
Rafael Vasquez
40766ca1b8
[Bugfix]: Clamp -inf logprob values in prompt_logprobs ( #11073 )
...
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-12-11 01:27:39 -08:00
Maximilien de Bayser
e39400a4b6
Fix streaming for granite tool call when <|tool_call|> is present ( #11069 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2024-12-11 04:51:40 +00:00
Travis Johnson
beb16b2c81
[Bugfix] Handle <|tool_call|> token in granite tool parser ( #11039 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-12-10 10:27:11 +00:00
Joe Runde
980ad394a8
[Frontend] Use request id from header ( #10968 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-12-10 13:46:29 +08:00
Russell Bryant
69d357ba12
[Core] Cleanup startup logging a bit ( #10961 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-12-07 02:30:23 +00:00
Cyrus Leung
d2f058e76c
[Misc] Rename embedding classes to pooling ( #10801 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-01 14:36:51 +08:00
tomeras91
395b1c7454
[Frontend] don't block event loop in tokenization (preprocess) in OpenAI compatible server ( #10635 )
...
Signed-off-by: Tomer Asida <tomera@ai21.com>
2024-11-27 13:21:10 -08:00
Ricky Xu
519e8e4182
[v1] EngineArgs for better config handling for v1 ( #10382 )
...
Signed-off-by: rickyx <rickyx@anyscale.com>
2024-11-25 21:09:43 -08:00
Chauncey
d04b13a380
[Bug]: Authorization ignored when root_path is set ( #10606 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2024-11-25 16:21:41 +00:00
Maximilien de Bayser
214efc2c3c
Support Cross encoder models ( #10400 )
...
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Co-authored-by: Flavia Beo <flavia.beo@ibm.com>
2024-11-24 18:56:20 -08:00
Varun Vinayak Shenoy
7d8ffb344f
[Bugfix] Internal Server Error when tool_choice is incorrect. ( #10567 )
...
Signed-off-by: Varun Shenoy <varun.vinayak.shenoy@oracle.com>
2024-11-22 21:13:29 -08:00
Noam Gat
11fcf0e066
Remove token-adding chat embedding params ( #10551 )
...
Signed-off-by: Noam Gat <noamgat@gmail.com>
2024-11-21 23:59:47 -08:00
Chauncey
da7e702c6f
[Bug]: When apply continue_final_message for OpenAI server, the "echo":false is ignored ( #10180 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2024-11-21 16:24:32 +00:00
Zhong Qishuai
f0e0238016
[Doc] fix a small typo in docstring of llama_tool_parser ( #10513 )
2024-11-21 09:05:23 +00:00
Guillaume Calmettes
c68f7ede6a
[Bugfix]: allow extra fields in requests to openai compatible server ( #10463 )
...
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
2024-11-20 16:42:21 -05:00
COSMOPlat
f028dff33d
[BugFix] Fix hermes tool parser output error stream arguments in some cases ( #10395 ) ( #10398 )
...
Signed-off-by: xiyuan lee <lixiyuan@haier.com>
2024-11-19 13:42:50 +00:00
Cyrus Leung
32e46e000f
[Frontend] Automatic detection of chat content format from AST ( #9919 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-16 13:35:40 +08:00