xwjiang2010
98d6682cd1
[VLM] Remove image_input_type from VLM config ( #5852 )
...
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-02 07:57:09 +00:00
sasha0552
c54269d967
[Frontend] Add tokenize/detokenize endpoints ( #5054 )
2024-06-26 16:54:22 +00:00
Cyrus Leung
03dccc886e
[Misc] Add vLLM version getter to utils ( #5098 )
2024-06-13 11:21:39 -07:00
Roger Wang
68bc81703e
[Frontend][Misc] Enforce Pixel Values as Input Type for VLMs in API Server ( #5374 )
2024-06-10 09:13:39 +00:00
Nadav Shmayovits
37464a0f74
[Bugfix] Fix call to init_logger in openai server ( #4765 )
2024-06-01 17:18:50 +00:00
Pierre Dulac
9216b9cc38
[Bugfix] Bypass authorization API token for preflight requests ( #4862 )
2024-05-16 09:42:21 -07:00
Chang Su
e254497b66
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API ( #3734 )
2024-05-11 11:30:37 -07:00
Cyrus Leung
f12b20decc
[Frontend] Move async logic outside of constructor ( #4674 )
2024-05-08 22:48:33 -07:00
Cyrus Leung
323f27b904
[Bugfix] Fix asyncio.Task not being subscriptable ( #4623 )
2024-05-06 09:31:05 -07:00
Yang, Bo
808632d3b4
[BugFix] Prevent the task of _force_log from being garbage collected ( #4567 )
2024-05-03 01:35:18 +00:00
youkaichao
5b8a7c1cb0
[Misc] centralize all usage of environment variables ( #4548 )
2024-05-02 11:13:25 -07:00
Robert Shaw
4dc8026d86
[Bugfix] Fix 307 Redirect for /metrics ( #4523 )
2024-05-01 09:14:13 -07:00
SangBin Cho
a88081bf76
[CI] Disable non-lazy string operation on logging ( #4326 )
...
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>
2024-04-26 00:16:58 -07:00
SangBin Cho
0ae11f78ab
[Mypy] Part 3 fix typing for nested directories for most of directory ( #4161 )
2024-04-22 21:32:44 -07:00
Harry Mellor
66ded03067
Allow model to be served under multiple names ( #2894 )
...
Co-authored-by: Alexandre Payot <alexandrep@graphcore.ai>
2024-04-18 00:16:26 -07:00
A-Mahla
0739b1947f
[Frontend][Bugfix] allow using the default middleware with a root path ( #3788 )
...
Co-authored-by: A-Mahla <>
2024-04-02 01:20:28 -07:00
yhu422
d8658c8cc1
Usage Stats Collection ( #2852 )
2024-03-28 22:16:12 -07:00
SangBin Cho
01bfb22b41
[CI] Try introducing isort. ( #3495 )
2024-03-25 07:59:47 -07:00
Simon Mo
ef65dcfa6f
[Doc] Add docs about OpenAI compatible server ( #3288 )
2024-03-18 22:05:34 -07:00
Dan Clark
03d37f2441
[Fix] Add args for mTLS support ( #3430 )
...
Co-authored-by: declark1 <daniel.clark@ibm.com>
2024-03-15 09:56:13 -07:00
Zhuohan Li
2f8844ba08
Re-enable the 80 char line width limit ( #3305 )
2024-03-10 19:49:14 -07:00
Nick Hill
d2339d6840
Connect engine healthcheck to openai server ( #3260 )
2024-03-07 16:38:12 -08:00
Jason Cox
d65fac2738
Add vLLM version info to logs and openai API server ( #3161 )
2024-03-02 21:00:29 -08:00
Allen.Dou
29e70e3e88
allow user chose log level by --log-level instead of fixed 'info'. ( #3109 )
...
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-03-01 23:28:41 +00:00
Harry Mellor
ef978fe411
Port metrics from aioprometheus to prometheus_client ( #2730 )
2024-02-25 11:54:00 -08:00
jvmncs
8f36444c4f
multi-LoRA as extra models in OpenAI server ( #2775 )
...
how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py )):
```terminal
$ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
$ python -m vllm.entrypoints.api_server \
--model meta-llama/Llama-2-7b-hf \
--enable-lora \
--lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH
```
the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs
no work has been done here to scope client permissions to specific models
2024-02-17 12:00:48 -08:00
Erfan Al-Hossami
9c1352eb57
[Feature] Simple API token authentication and pluggable middlewares ( #1106 )
2024-01-23 15:13:00 -08:00
Jannis Schönleber
71d63ed72e
migrate pydantic from v1 to v2 ( #2531 )
2024-01-21 16:05:56 -08:00
FlorianJoncour
14cc317ba4
OpenAI Server refactoring ( #2360 )
2024-01-16 21:33:14 -08:00
Chirag Jain
ce036244c9
Allow setting fastapi root_path argument ( #2341 )
2024-01-12 10:59:59 -08:00
Iskren Ivov Chernev
d0215a58e7
Ensure metrics are logged regardless of requests ( #2347 )
2024-01-05 05:24:42 -08:00
Harry Mellor
08133c4d1a
Add SSL arguments to API servers ( #2109 )
2023-12-18 10:56:23 +08:00
Simon Mo
2e8fc0d4c3
Fix completion API echo and logprob combo ( #1992 )
2023-12-10 13:20:30 -08:00
Jin Shang
1aa1361510
Fix OpenAI server completion_tokens referenced before assignment ( #1996 )
2023-12-09 21:01:21 -08:00
Roy
60dc62dc9e
add custom server params ( #1868 )
2023-12-03 12:59:18 -08:00
Simon Mo
5313c2cb8b
Add Production Metrics in Prometheus format ( #1890 )
2023-12-02 16:37:44 -08:00
Adam Brusselback
66785cc05c
Support chat template and echo for chat API ( #1756 )
2023-11-30 16:43:13 -08:00
Michael McCulloch
c782195662
Disable Logs Requests should Disable Logging of requests. ( #1779 )
...
Co-authored-by: Michael McCulloch <mjm.gitlab@fastmail.com>
2023-11-29 21:50:02 -08:00
Yunmo Chen
665cbcec4b
Added echo function to OpenAI API server. ( #1504 )
2023-11-26 21:29:17 -08:00
Simon Mo
5ffc0d13a2
Migrate linter from pylint to ruff ( #1665 )
2023-11-20 11:58:01 -08:00
liuyhwangyh
edb305584b
Support download models from www.modelscope.cn ( #1588 )
2023-11-17 20:38:31 -08:00
Iskren Ivov Chernev
686f5e3210
Return usage for openai streaming requests ( #1663 )
2023-11-16 15:28:36 -08:00
Fluder-Paradyne
7e90a2d117
Add /health Endpoint for both Servers ( #1540 )
2023-11-01 10:29:44 -07:00
Dan Lord
7013a80170
Add support for spaces_between_special_tokens
2023-10-30 16:52:56 -07:00
Yunfeng Bai
09ff7f106a
API server support ipv4 / ipv6 dualstack ( #1288 )
...
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-10-07 15:15:54 -07:00
Antoni Baum
acbed3ef40
Use monotonic time where appropriate ( #1249 )
2023-10-02 19:22:05 -07:00
Woosuk Kwon
f936657eb6
Provide default max model length ( #1224 )
2023-09-28 14:44:02 -07:00
Dan Lord
20f7cc4cde
Add skip_special_tokens sampling params ( #1186 )
2023-09-27 19:21:42 -07:00
Wen Sun
bbbf86565f
Align max_tokens behavior with openai ( #852 )
2023-09-23 18:10:13 -07:00
Ricardo Lu
f98b745a81
feat: support stop_token_ids parameter. ( #1097 )
2023-09-21 15:34:02 -07:00