xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-19 04:55:01 +08:00

Author	SHA1	Message	Date
xwjiang2010	98d6682cd1	[VLM] Remove `image_input_type` from VLM config (#5852 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-02 07:57:09 +00:00
sasha0552	c54269d967	[Frontend] Add tokenize/detokenize endpoints (#5054 )	2024-06-26 16:54:22 +00:00
Cyrus Leung	03dccc886e	[Misc] Add vLLM version getter to utils (#5098 )	2024-06-13 11:21:39 -07:00
Roger Wang	68bc81703e	[Frontend][Misc] Enforce Pixel Values as Input Type for VLMs in API Server (#5374 )	2024-06-10 09:13:39 +00:00
Nadav Shmayovits	37464a0f74	[Bugfix] Fix call to init_logger in openai server (#4765 )	2024-06-01 17:18:50 +00:00
Pierre Dulac	9216b9cc38	[Bugfix] Bypass authorization API token for preflight requests (#4862 )	2024-05-16 09:42:21 -07:00
Chang Su	e254497b66	[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734 )	2024-05-11 11:30:37 -07:00
Cyrus Leung	f12b20decc	[Frontend] Move async logic outside of constructor (#4674 )	2024-05-08 22:48:33 -07:00
Cyrus Leung	323f27b904	[Bugfix] Fix `asyncio.Task` not being subscriptable (#4623 )	2024-05-06 09:31:05 -07:00
Yang, Bo	808632d3b4	[BugFix] Prevent the task of `_force_log` from being garbage collected (#4567 )	2024-05-03 01:35:18 +00:00
youkaichao	5b8a7c1cb0	[Misc] centralize all usage of environment variables (#4548 )	2024-05-02 11:13:25 -07:00
Robert Shaw	4dc8026d86	[Bugfix] Fix 307 Redirect for `/metrics` (#4523 )	2024-05-01 09:14:13 -07:00
SangBin Cho	a88081bf76	[CI] Disable non-lazy string operation on logging (#4326 ) Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>	2024-04-26 00:16:58 -07:00
SangBin Cho	0ae11f78ab	[Mypy] Part 3 fix typing for nested directories for most of directory (#4161 )	2024-04-22 21:32:44 -07:00
Harry Mellor	66ded03067	Allow model to be served under multiple names (#2894 ) Co-authored-by: Alexandre Payot <alexandrep@graphcore.ai>	2024-04-18 00:16:26 -07:00
A-Mahla	0739b1947f	[Frontend][Bugfix] allow using the default middleware with a root path (#3788 ) Co-authored-by: A-Mahla <>	2024-04-02 01:20:28 -07:00
yhu422	d8658c8cc1	Usage Stats Collection (#2852 )	2024-03-28 22:16:12 -07:00
SangBin Cho	01bfb22b41	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
Simon Mo	ef65dcfa6f	[Doc] Add docs about OpenAI compatible server (#3288 )	2024-03-18 22:05:34 -07:00
Dan Clark	03d37f2441	[Fix] Add args for mTLS support (#3430 ) Co-authored-by: declark1 <daniel.clark@ibm.com>	2024-03-15 09:56:13 -07:00
Zhuohan Li	2f8844ba08	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
Nick Hill	d2339d6840	Connect engine healthcheck to openai server (#3260 )	2024-03-07 16:38:12 -08:00
Jason Cox	d65fac2738	Add vLLM version info to logs and openai API server (#3161 )	2024-03-02 21:00:29 -08:00
Allen.Dou	29e70e3e88	allow user chose log level by --log-level instead of fixed 'info'. (#3109 ) Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-03-01 23:28:41 +00:00
Harry Mellor	ef978fe411	Port metrics from `aioprometheus` to `prometheus_client` (#2730 )	2024-02-25 11:54:00 -08:00
jvmncs	8f36444c4f	multi-LoRA as extra models in OpenAI server (#2775 ) how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)): ```terminal $ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/ $ python -m vllm.entrypoints.api_server \ --model meta-llama/Llama-2-7b-hf \ --enable-lora \ --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH ``` the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs no work has been done here to scope client permissions to specific models	2024-02-17 12:00:48 -08:00
Erfan Al-Hossami	9c1352eb57	[Feature] Simple API token authentication and pluggable middlewares (#1106 )	2024-01-23 15:13:00 -08:00
Jannis Schönleber	71d63ed72e	migrate pydantic from v1 to v2 (#2531 )	2024-01-21 16:05:56 -08:00
FlorianJoncour	14cc317ba4	OpenAI Server refactoring (#2360 )	2024-01-16 21:33:14 -08:00
Chirag Jain	ce036244c9	Allow setting fastapi root_path argument (#2341 )	2024-01-12 10:59:59 -08:00
Iskren Ivov Chernev	d0215a58e7	Ensure metrics are logged regardless of requests (#2347 )	2024-01-05 05:24:42 -08:00
Harry Mellor	08133c4d1a	Add SSL arguments to API servers (#2109 )	2023-12-18 10:56:23 +08:00
Simon Mo	2e8fc0d4c3	Fix completion API echo and logprob combo (#1992 )	2023-12-10 13:20:30 -08:00
Jin Shang	1aa1361510	Fix OpenAI server completion_tokens referenced before assignment (#1996 )	2023-12-09 21:01:21 -08:00
Roy	60dc62dc9e	add custom server params (#1868 )	2023-12-03 12:59:18 -08:00
Simon Mo	5313c2cb8b	Add Production Metrics in Prometheus format (#1890 )	2023-12-02 16:37:44 -08:00
Adam Brusselback	66785cc05c	Support chat template and `echo` for chat API (#1756 )	2023-11-30 16:43:13 -08:00
Michael McCulloch	c782195662	Disable Logs Requests should Disable Logging of requests. (#1779 ) Co-authored-by: Michael McCulloch <mjm.gitlab@fastmail.com>	2023-11-29 21:50:02 -08:00
Yunmo Chen	665cbcec4b	Added echo function to OpenAI API server. (#1504 )	2023-11-26 21:29:17 -08:00
Simon Mo	5ffc0d13a2	Migrate linter from `pylint` to `ruff` (#1665 )	2023-11-20 11:58:01 -08:00
liuyhwangyh	edb305584b	Support download models from www.modelscope.cn (#1588 )	2023-11-17 20:38:31 -08:00
Iskren Ivov Chernev	686f5e3210	Return usage for openai streaming requests (#1663 )	2023-11-16 15:28:36 -08:00
Fluder-Paradyne	7e90a2d117	Add `/health` Endpoint for both Servers (#1540 )	2023-11-01 10:29:44 -07:00
Dan Lord	7013a80170	Add support for `spaces_between_special_tokens`	2023-10-30 16:52:56 -07:00
Yunfeng Bai	09ff7f106a	API server support ipv4 / ipv6 dualstack (#1288 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2023-10-07 15:15:54 -07:00
Antoni Baum	acbed3ef40	Use monotonic time where appropriate (#1249 )	2023-10-02 19:22:05 -07:00
Woosuk Kwon	f936657eb6	Provide default max model length (#1224 )	2023-09-28 14:44:02 -07:00
Dan Lord	20f7cc4cde	Add `skip_special_tokens` sampling params (#1186 )	2023-09-27 19:21:42 -07:00
Wen Sun	bbbf86565f	Align `max_tokens` behavior with openai (#852 )	2023-09-23 18:10:13 -07:00
Ricardo Lu	f98b745a81	feat: support stop_token_ids parameter. (#1097 )	2023-09-21 15:34:02 -07:00

1 2 3 4 5

223 Commits