Antoni Baum
|
ff36139ffc
|
Remove AsyncLLMEngine busy loop, shield background task (#1059)
|
2023-09-17 00:29:08 -07:00 |
|
Woosuk Kwon
|
e3e79e9e8a
|
Implement AWQ quantization support for LLaMA (#1032)
Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>
|
2023-09-16 00:03:37 -07:00 |
|
Jerry Yang
|
b9fe4616f9
|
Abort when coroutine is cancelled (#1020)
|
2023-09-14 17:40:18 -07:00 |
|
Jasmond L
|
ab019eea75
|
Add Model Revision Support (#1014)
Co-authored-by: Jasmond Loh <Jasmond.Loh@hotmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-09-13 15:20:02 -07:00 |
|
Antoni Baum
|
9841d48a10
|
Use TGI-like incremental detokenization (#984)
|
2023-09-13 13:38:01 -07:00 |
|
Antoni Baum
|
0bb1e885a0
|
Make max_model_len configurable (#972)
|
2023-09-12 16:29:19 -07:00 |
|
leiwen83
|
d6545ad22e
|
add option to shorten prompt print in log (#991)
Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-09-12 15:10:14 -07:00 |
|
Jingru
|
4042d192f5
|
fix "tansformers_module" ModuleNotFoundError when load model with trust_remote_code=True (#871)
|
2023-09-08 17:21:30 -07:00 |
|
Antoni Baum
|
080438477f
|
Start background task in AsyncLLMEngine.generate (#988)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-09-08 00:03:39 -07:00 |
|
Zhuohan Li
|
c957c741d9
|
Enable safetensors loading for all models (#974)
|
2023-09-07 15:49:52 -07:00 |
|
Antoni Baum
|
c07ece5ca4
|
Make AsyncLLMEngine more robust & fix batched abort (#969)
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Avnish Narayan <38871737+avnishn@users.noreply.github.com>
|
2023-09-07 13:43:45 -07:00 |
|
Antoni Baum
|
c9927c1a6a
|
Use queue for finished requests (#957)
|
2023-09-05 19:27:23 -07:00 |
|
Wen Sun
|
22379d5513
|
fix: typo (#948)
|
2023-09-04 23:22:30 -07:00 |
|
Antoni Baum
|
1696725879
|
Initialize AsyncLLMEngine bg loop correctly (#943)
|
2023-09-04 17:41:22 -07:00 |
|
Zhuohan Li
|
002800f081
|
Align vLLM's beam search implementation with HF generate (#857)
|
2023-09-04 17:29:42 -07:00 |
|
Antoni Baum
|
ce741ba3e4
|
Refactor AsyncLLMEngine (#880)
|
2023-09-03 21:43:43 -07:00 |
|
Woosuk Kwon
|
55fe8a81ec
|
Refactor scheduler (#658)
|
2023-08-02 16:42:01 -07:00 |
|
Chaofan Lin
|
aa39e42c5a
|
fix doc (#622)
|
2023-07-31 13:11:57 -07:00 |
|
Fang li
|
953f28cf9a
|
fix ModuleNotFoundError (#599)
Co-authored-by: fangli <fangli@tencent.com>
|
2023-07-29 20:52:41 -07:00 |
|
Xudong Zhang
|
c0d00f5be6
|
[Fix] fix import error of RayWorker (#604) (#605)
|
2023-07-27 23:37:40 -07:00 |
|
Zhuohan Li
|
58a072be15
|
[Fix] Add model sequence length into model config (#575)
|
2023-07-25 23:46:30 -07:00 |
|
Antoni Baum
|
c487a221ee
|
Fix bad assert in initialize_cluster if PG already exists (#526)
|
2023-07-19 23:17:12 -07:00 |
|
Antoni Baum
|
9925c17940
|
Ray placement group support (#397)
|
2023-07-19 22:49:31 -07:00 |
|
Massimiliano Pronesti
|
16c3e295a8
|
fix(ray_utils): ignore re-init error (#465)
|
2023-07-19 17:01:19 -07:00 |
|
Lily Liu
|
b4b195b360
|
fix max seq len (#489)
|
2023-07-17 23:20:20 -07:00 |
|
Zhuohan Li
|
2bdea7ac11
|
[Fix] Fix the condition of max_seq_len (#477)
|
2023-07-17 00:33:48 -04:00 |
|
Zhangir Azerbayev
|
6d7d95a70a
|
Offload port selection to OS (#467)
|
2023-07-15 23:11:02 -07:00 |
|
xcnick
|
c6dfc3cdbe
|
Fix handling of special tokens in decoding. (#418)
|
2023-07-12 11:14:56 -04:00 |
|
codethazine
|
a945fcc2ae
|
Add trust-remote-code flag to handle remote tokenizers (#364)
|
2023-07-07 11:04:58 -07:00 |
|
coolcloudcol
|
7717d0838b
|
Fix an endless loop issue when engine_step throws a RuntimeError (#339)
|
2023-07-03 15:22:28 -07:00 |
|
Zhuohan Li
|
42e0c1df78
|
[Quality] Add CI for formatting (#343)
|
2023-07-03 14:50:56 -07:00 |
|
Zhuohan Li
|
d6fa1be3a8
|
[Quality] Add code formatter and linter (#326)
|
2023-07-03 11:31:55 -07:00 |
|
Lily Liu
|
dafd924c1f
|
Raise error for long prompt (#273)
|
2023-06-30 18:48:49 -07:00 |
|
Woosuk Kwon
|
998d9d1509
|
[Tokenizer] Add tokenizer mode (#298)
|
2023-06-28 14:19:22 -07:00 |
|
Woosuk Kwon
|
4338cc4750
|
[Tokenizer] Add an option to specify tokenizer (#284)
|
2023-06-28 09:46:58 -07:00 |
|
Zhuohan Li
|
0b7db411b5
|
[Bug] Fix the OOM condition for CPU cache (#260)
|
2023-06-26 11:16:13 -07:00 |
|
metacryptom
|
0603379863
|
fix wrong using getattr to get dict value (#232)
|
2023-06-24 22:00:24 -07:00 |
|
Zhuohan Li
|
1d24ccb96c
|
[Fix] Better error message when there is OOM during cache initialization (#203)
|
2023-06-22 15:30:06 +08:00 |
|
Woosuk Kwon
|
14f0b39cda
|
[Bugfix] Fix a bug in RequestOutput.finished (#202)
|
2023-06-22 00:17:24 -07:00 |
|
Zhuohan Li
|
2e0d314384
|
fix-ray (#193)
|
2023-06-22 00:21:41 +08:00 |
|
Woosuk Kwon
|
67d96c29fb
|
Use slow tokenizer for open llama models (#168)
|
2023-06-20 14:19:47 +08:00 |
|
Zhuohan Li
|
bf5f121c02
|
Reduce GPU memory utilization to make sure OOM doesn't happen (#153)
|
2023-06-18 17:33:50 +08:00 |
|
Woosuk Kwon
|
0b98ba15c7
|
Change the name to vLLM (#150)
|
2023-06-17 03:07:40 -07:00 |
|