22 Commits

Author SHA1 Message Date
Norman Mu
2f30e7c72f
[Frontend] Add --log-level option to api server (#4377) 2024-04-26 05:36:01 +00:00
SangBin Cho
09473ee41c
[mypy] Add mypy type annotation part 1 (#4006) 2024-04-12 14:35:50 -07:00
yhu422
d8658c8cc1
Usage Stats Collection (#2852) 2024-03-28 22:16:12 -07:00
SangBin Cho
01bfb22b41
[CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
Dan Clark
03d37f2441
[Fix] Add args for mTLS support (#3430)
Co-authored-by: declark1 <daniel.clark@ibm.com>
2024-03-15 09:56:13 -07:00
Dan Clark
c17ca8ef18
Add args for mTLS support (#3410)
Co-authored-by: Daniel Clark <daniel.clark@ibm.com>
2024-03-14 13:11:45 -07:00
Zhuohan Li
2f8844ba08
Re-enable the 80 char line width limit (#3305) 2024-03-10 19:49:14 -07:00
Sage Moore
ce4f5a29fb
Add Automatic Prefix Caching (#2762)
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-03-02 00:50:01 -08:00
Simon Mo
86fd8bb0ac
Add warning to prevent changes to benchmark api server (#2858) 2024-02-18 21:36:19 -08:00
shiyi.c_98
d10f8e1d43
[Experimental] Prefix Caching Support (#1669)
Co-authored-by: DouHappy <2278958187@qq.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-01-17 16:32:10 -08:00
Chirag Jain
ce036244c9
Allow setting fastapi root_path argument (#2341) 2024-01-12 10:59:59 -08:00
Ronen Schaffer
74d8d77626
Remove unused const TIMEOUT_TO_PREVENT_DEADLOCK (#2321) 2024-01-03 15:49:07 -08:00
Harry Mellor
08133c4d1a
Add SSL arguments to API servers (#2109) 2023-12-18 10:56:23 +08:00
Fluder-Paradyne
7e90a2d117
Add /health Endpoint for both Servers (#1540) 2023-11-01 10:29:44 -07:00
Yunfeng Bai
09ff7f106a
API server support ipv4 / ipv6 dualstack (#1288)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-10-07 15:15:54 -07:00
Roy
2d1e86f1b1
clean api code, remove redundant background task. (#1102) 2023-09-21 13:25:05 -07:00
Antoni Baum
080438477f
Start background task in AsyncLLMEngine.generate (#988)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-09-08 00:03:39 -07:00
Antoni Baum
c07ece5ca4
Make AsyncLLMEngine more robust & fix batched abort (#969)
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Avnish Narayan <38871737+avnishn@users.noreply.github.com>
2023-09-07 13:43:45 -07:00
Antoni Baum
1696725879
Initialize AsyncLLMEngine bg loop correctly (#943) 2023-09-04 17:41:22 -07:00
Nicolas Frenay
be54f8e5c4
[Fix] Change /generate response-type to json for non-streaming (#374) 2023-07-06 18:15:17 -07:00
Zhuohan Li
d6fa1be3a8
[Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
Woosuk Kwon
0b98ba15c7
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00