31 Commits

Author SHA1 Message Date
Pastel!
2874bac618
[Bugfix] Config got an unexpected keyword argument 'engine' (#8556) 2024-09-20 14:00:45 -07:00
Cyrus Leung
baaedfdb2d
[mypy] Enable following imports for entrypoints (#7248)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Fei <dfdfcai4@gmail.com>
2024-08-20 23:28:21 -07:00
Joe Runde
21b9c49aa3
[Frontend] Kill the server on engine death (#6594)
Signed-off-by: Joe Runde <joe@joerun.de>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-08-08 09:47:48 -07:00
Nick Hill
9a3f49ae07
[BugFix] Overhaul async request cancellation (#7111) 2024-08-07 13:21:41 +08:00
Cyrus Leung
cc08fc7225
[Frontend] Reapply "Factor out code for running uvicorn" (#7095) 2024-08-04 20:40:51 -07:00
Simon Mo
7eb0cb4a14
Revert "[Frontend] Factor out code for running uvicorn" (#7012)
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
2024-07-31 16:34:26 -07:00
Cyrus Leung
981b0d5673
[Frontend] Factor out code for running uvicorn (#6828) 2024-07-27 09:58:25 +08:00
youkaichao
3b08fe2b13
[misc][frontend] log all available endpoints (#6195)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-07-07 15:11:12 -07:00
Michael Goin
8065a7e220
[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718) 2024-06-20 17:00:13 -06:00
Norman Mu
2f30e7c72f
[Frontend] Add --log-level option to api server (#4377) 2024-04-26 05:36:01 +00:00
SangBin Cho
09473ee41c
[mypy] Add mypy type annotation part 1 (#4006) 2024-04-12 14:35:50 -07:00
yhu422
d8658c8cc1
Usage Stats Collection (#2852) 2024-03-28 22:16:12 -07:00
SangBin Cho
01bfb22b41
[CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
Dan Clark
03d37f2441
[Fix] Add args for mTLS support (#3430)
Co-authored-by: declark1 <daniel.clark@ibm.com>
2024-03-15 09:56:13 -07:00
Dan Clark
c17ca8ef18
Add args for mTLS support (#3410)
Co-authored-by: Daniel Clark <daniel.clark@ibm.com>
2024-03-14 13:11:45 -07:00
Zhuohan Li
2f8844ba08
Re-enable the 80 char line width limit (#3305) 2024-03-10 19:49:14 -07:00
Sage Moore
ce4f5a29fb
Add Automatic Prefix Caching (#2762)
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-03-02 00:50:01 -08:00
Simon Mo
86fd8bb0ac
Add warning to prevent changes to benchmark api server (#2858) 2024-02-18 21:36:19 -08:00
shiyi.c_98
d10f8e1d43
[Experimental] Prefix Caching Support (#1669)
Co-authored-by: DouHappy <2278958187@qq.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-01-17 16:32:10 -08:00
Chirag Jain
ce036244c9
Allow setting fastapi root_path argument (#2341) 2024-01-12 10:59:59 -08:00
Ronen Schaffer
74d8d77626
Remove unused const TIMEOUT_TO_PREVENT_DEADLOCK (#2321) 2024-01-03 15:49:07 -08:00
Harry Mellor
08133c4d1a
Add SSL arguments to API servers (#2109) 2023-12-18 10:56:23 +08:00
Fluder-Paradyne
7e90a2d117
Add /health Endpoint for both Servers (#1540) 2023-11-01 10:29:44 -07:00
Yunfeng Bai
09ff7f106a
API server support ipv4 / ipv6 dualstack (#1288)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-10-07 15:15:54 -07:00
Roy
2d1e86f1b1
clean api code, remove redundant background task. (#1102) 2023-09-21 13:25:05 -07:00
Antoni Baum
080438477f
Start background task in AsyncLLMEngine.generate (#988)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-09-08 00:03:39 -07:00
Antoni Baum
c07ece5ca4
Make AsyncLLMEngine more robust & fix batched abort (#969)
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Avnish Narayan <38871737+avnishn@users.noreply.github.com>
2023-09-07 13:43:45 -07:00
Antoni Baum
1696725879
Initialize AsyncLLMEngine bg loop correctly (#943) 2023-09-04 17:41:22 -07:00
Nicolas Frenay
be54f8e5c4
[Fix] Change /generate response-type to json for non-streaming (#374) 2023-07-06 18:15:17 -07:00
Zhuohan Li
d6fa1be3a8
[Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
Woosuk Kwon
0b98ba15c7
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00