Cade Daniel
|
14ccd94c89
|
[Core][Bugfix]Refactor block manager for better testability (#3492)
|
2024-03-27 23:59:28 -07:00 |
|
xwjiang2010
|
64172a976c
|
[Feature] Add vision language model support. (#3042)
|
2024-03-25 14:16:30 -07:00 |
|
SangBin Cho
|
01bfb22b41
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|
Thomas Parnell
|
cf2f084d56
|
Dynamic scheduler delay to improve ITL performance (#3279)
Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com>
|
2024-03-22 12:28:14 -07:00 |
|
SangBin Cho
|
6e435de766
|
[1/n][Chunked Prefill] Refactor input query shapes (#3236)
|
2024-03-20 14:46:05 -07:00 |
|
Tao He
|
14b8ae02e7
|
Fixes the misuse/mixuse of time.time()/time.monotonic() (#3220)
Signed-off-by: Tao He <sighingnow@gmail.com>
Co-authored-by: simon-mo <simon.mo@hey.com>
|
2024-03-15 18:25:43 +00:00 |
|
Zhuohan Li
|
2f8844ba08
|
Re-enable the 80 char line width limit (#3305)
|
2024-03-10 19:49:14 -07:00 |
|
Nick Hill
|
8999ec3c16
|
Store eos_token_id in Sequence for easy access (#3166)
|
2024-03-05 15:35:43 -08:00 |
|
Sage Moore
|
ce4f5a29fb
|
Add Automatic Prefix Caching (#2762)
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-03-02 00:50:01 -08:00 |
|
Massimiliano Pronesti
|
93dc5a2870
|
chore(vllm): codespell for spell checking (#2820)
|
2024-02-21 18:56:01 -08:00 |
|
Nick Hill
|
7d2dcce175
|
Support per-request seed (#2514)
|
2024-02-21 11:47:00 -08:00 |
|
Antoni Baum
|
017d9f1515
|
Add metrics to RequestOutput (#2876)
|
2024-02-20 21:55:57 -08:00 |
|
Antoni Baum
|
9b945daaf1
|
[Experimental] Add multi-LoRA support (#1804)
Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Co-authored-by: Avnish Narayan <avnish@anyscale.com>
|
2024-01-23 15:26:37 -08:00 |
|
Nick Hill
|
d75c40734a
|
[Fix] Keep scheduler.running as deque (#2523)
|
2024-01-20 22:36:09 -08:00 |
|
ljss
|
d2a68364c4
|
[BugFix] Fix abort_seq_group (#2463)
|
2024-01-18 15:10:42 -08:00 |
|
shiyi.c_98
|
d10f8e1d43
|
[Experimental] Prefix Caching Support (#1669)
Co-authored-by: DouHappy <2278958187@qq.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2024-01-17 16:32:10 -08:00 |
|
陈序
|
48cf1e413c
|
fix: deque mutated during iteration in abort_seq_group (#2371)
|
2024-01-12 17:44:18 +01:00 |
|
Jiaxiang
|
6549aef245
|
[DOC] Add additional comments for LLMEngine and AsyncLLMEngine (#1011)
|
2024-01-11 19:26:49 -08:00 |
|
Nadav Shmayovits
|
05921a9a7a
|
Changed scheduler to use deques instead of lists (#2290)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-01-07 09:48:07 -08:00 |
|
Woosuk Kwon
|
a1b9cb2a34
|
[BugFix] Fix recovery logic for sequence group (#2186)
|
2023-12-20 21:52:37 -08:00 |
|
Zhuofan
|
19849db573
|
[Fix] Fix bugs in scheduler (#1727)
|
2023-11-20 16:10:50 -08:00 |
|
陈序
|
3d4ceb292c
|
Fix hanging in the scheduler caused by long prompts (#1534)
|
2023-11-20 16:06:49 -08:00 |
|
Simon Mo
|
5ffc0d13a2
|
Migrate linter from pylint to ruff (#1665)
|
2023-11-20 11:58:01 -08:00 |
|
Light Lin
|
f61dc8072f
|
Fix type hints (#1427)
|
2023-10-20 08:50:47 -07:00 |
|
Woosuk Kwon
|
c1376e0f82
|
Change scheduler & input tensor shape (#1381)
|
2023-10-16 17:48:42 -07:00 |
|
Antoni Baum
|
acbed3ef40
|
Use monotonic time where appropriate (#1249)
|
2023-10-02 19:22:05 -07:00 |
|
Chris Bamford
|
bb1ba58f06
|
[Mistral] Mistral-7B-v0.1 support (#1196)
Co-authored-by: timlacroix <t@mistral.ai>
|
2023-09-28 10:41:03 -07:00 |
|
陈序
|
e21d7687a9
|
Fix hanging when prompt exceeds limit (#1029)
|
2023-09-17 01:48:56 -07:00 |
|
Antoni Baum
|
c07ece5ca4
|
Make AsyncLLMEngine more robust & fix batched abort (#969)
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Avnish Narayan <38871737+avnishn@users.noreply.github.com>
|
2023-09-07 13:43:45 -07:00 |
|
Zhuohan Li
|
002800f081
|
Align vLLM's beam search implementation with HF generate (#857)
|
2023-09-04 17:29:42 -07:00 |
|
Antoni Baum
|
ce741ba3e4
|
Refactor AsyncLLMEngine (#880)
|
2023-09-03 21:43:43 -07:00 |
|
Zhuohan Li
|
d2b2eed67c
|
[Fix] Fix a condition for ignored sequences (#867)
|
2023-08-27 23:00:56 -07:00 |
|
wenjun93
|
75c0ca9d43
|
Clean up code (#844)
|
2023-08-23 16:44:15 -07:00 |
|
Woosuk Kwon
|
55fe8a81ec
|
Refactor scheduler (#658)
|
2023-08-02 16:42:01 -07:00 |
|
Lily Liu
|
20044cab7a
|
Fix log message in scheduler (#652)
|
2023-08-02 13:35:10 -07:00 |
|
MoeedDar
|
328d231c17
|
Fixed old name reference for max_seq_len
|
2023-07-18 16:47:59 +01:00 |
|
Lily Liu
|
b4b195b360
|
fix max seq len (#489)
|
2023-07-17 23:20:20 -07:00 |
|
Zhuohan Li
|
2bdea7ac11
|
[Fix] Fix the condition of max_seq_len (#477)
|
2023-07-17 00:33:48 -04:00 |
|
Zhuohan Li
|
d6fa1be3a8
|
[Quality] Add code formatter and linter (#326)
|
2023-07-03 11:31:55 -07:00 |
|
Lily Liu
|
dafd924c1f
|
Raise error for long prompt (#273)
|
2023-06-30 18:48:49 -07:00 |
|
Woosuk Kwon
|
526df28fb2
|
[BugFix] Fix a bug in counting running sequences (#266)
|
2023-06-26 13:09:02 -07:00 |
|
Woosuk Kwon
|
3f92038b99
|
Add comments on swap space (#154)
|
2023-06-18 11:39:35 -07:00 |
|
Woosuk Kwon
|
0b98ba15c7
|
Change the name to vLLM (#150)
|
2023-06-17 03:07:40 -07:00 |
|