Zhuohan Li
|
c1026311b5
|
[Community] Add vLLM Discord server (#1086)
|
2023-09-18 12:23:35 -07:00 |
|
Woosuk Kwon
|
2b1c116b5a
|
Add minimum capability requirement for AWQ (#1064)
|
2023-09-18 12:02:01 -07:00 |
|
Woosuk Kwon
|
cc796b1358
|
Convert before transpose (#1073)
|
2023-09-18 11:51:48 -07:00 |
|
Zhuohan Li
|
f029ef94d7
|
Fix get_max_num_running_seqs for waiting and swapped seq groups (#1068)
|
2023-09-18 11:49:40 -07:00 |
|
Roy
|
95592fa00a
|
align llm_engine and async_engine. (#1081)
|
2023-09-18 11:49:10 -07:00 |
|
orellavie1212
|
fbe66e1d0b
|
added support for quantize on LLM module (#1080)
|
2023-09-18 11:04:21 -07:00 |
|
Zhuohan Li
|
90979c38f8
|
[FIX] Don't initialize parameter by default (#1067)
|
2023-09-17 17:15:38 -07:00 |
|
陈序
|
e21d7687a9
|
Fix hanging when prompt exceeds limit (#1029)
|
2023-09-17 01:48:56 -07:00 |
|
Antoni Baum
|
ff36139ffc
|
Remove AsyncLLMEngine busy loop, shield background task (#1059)
|
2023-09-17 00:29:08 -07:00 |
|
Woosuk Kwon
|
e3e79e9e8a
|
Implement AWQ quantization support for LLaMA (#1032)
Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>
|
2023-09-16 00:03:37 -07:00 |
|
Jerry Yang
|
b9fe4616f9
|
Abort when coroutine is cancelled (#1020)
|
2023-09-14 17:40:18 -07:00 |
|
Woosuk Kwon
|
64ca424e75
|
Fix warning message on LLaMA FastTokenizer (#1037)
|
2023-09-14 17:33:32 -07:00 |
|
Lukas Kreussel
|
b5f93d0631
|
Only fail if logit_bias has actual values (#1045)
|
2023-09-14 17:33:01 -07:00 |
|
Woosuk Kwon
|
a58936966f
|
Add pandas to requirements.txt (#1047)
* Add pandas to requirements.txt
* Minor
|
2023-09-14 17:31:38 -07:00 |
|
Antoni Baum
|
dd54a4b026
|
Fix detokenization leaving special tokens (#1044)
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
|
2023-09-14 16:37:03 -07:00 |
|
Woosuk Kwon
|
eda1a7cad3
|
Announce paper release (#1036)
|
2023-09-13 17:38:13 -07:00 |
|
Zhuohan Li
|
f04908cae7
|
[FIX] Minor bug fixes (#1035)
* [FIX] Minor bug fixes
* Address review comments
|
2023-09-13 16:38:12 -07:00 |
|
Jasmond L
|
ab019eea75
|
Add Model Revision Support (#1014)
Co-authored-by: Jasmond Loh <Jasmond.Loh@hotmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-09-13 15:20:02 -07:00 |
|
Antoni Baum
|
9841d48a10
|
Use TGI-like incremental detokenization (#984)
|
2023-09-13 13:38:01 -07:00 |
|
Ikko Eltociear Ashimine
|
3272d7a0b7
|
Fix typo in README.md (#1033)
|
2023-09-13 12:55:23 -07:00 |
|
Antoni Baum
|
0bb1e885a0
|
Make max_model_len configurable (#972)
|
2023-09-12 16:29:19 -07:00 |
|
leiwen83
|
d6545ad22e
|
add option to shorten prompt print in log (#991)
Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-09-12 15:10:14 -07:00 |
|
Woosuk Kwon
|
90eb3f43ca
|
Bump up the version to v0.1.7 (#1013)
v0.1.7
|
2023-09-11 00:54:30 -07:00 |
|
Woosuk Kwon
|
e67b4f2c2a
|
Use FP32 in RoPE initialization (#1004)
Co-authored-by: One <imone@tuta.io>
|
2023-09-11 00:26:35 -07:00 |
|
Woosuk Kwon
|
d6770d1f23
|
Update setup.py (#1006)
|
2023-09-10 23:42:45 -07:00 |
|
Woosuk Kwon
|
b9cecc2635
|
[Docs] Update installation page (#1005)
|
2023-09-10 14:23:31 -07:00 |
|
Kyujin Cho
|
898285c9bf
|
fix: CUDA error when inferencing with Falcon-40B base model (#992)
|
2023-09-10 01:39:02 -07:00 |
|
Antoni Baum
|
a62de9ecfd
|
Fix wrong dtype in PagedAttentionWithALiBi bias (#996)
---------
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
|
2023-09-09 14:58:35 -07:00 |
|
Jingru
|
4042d192f5
|
fix "tansformers_module" ModuleNotFoundError when load model with trust_remote_code=True (#871)
|
2023-09-08 17:21:30 -07:00 |
|
Zhuohan Li
|
1117aa1411
|
Bump up the version to v0.1.6 (#989)
v0.1.6
|
2023-09-08 00:07:46 -07:00 |
|
Antoni Baum
|
080438477f
|
Start background task in AsyncLLMEngine.generate (#988)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-09-08 00:03:39 -07:00 |
|
Robert Irvine
|
4b5bcf8906
|
faster startup of vLLM (#982)
* update
---------
Co-authored-by: Robert Irvine <robert@seamlessml.com>
|
2023-09-08 14:48:54 +09:00 |
|
Woosuk Kwon
|
852ef5b4f5
|
Bump up the version to v0.1.5 (#944)
v0.1.5
|
2023-09-07 16:15:31 -07:00 |
|
Zhuohan Li
|
db09d4ad83
|
[FIX] Fix Alibi implementation in PagedAttention kernel (#945)
* [FIX] Fix Alibi implementation in PagedAttention kernel
* Fix test_attention
* Fix
---------
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Oliver-ss <yuansongwx@outlook.com>
|
2023-09-07 15:53:14 -07:00 |
|
Zhuohan Li
|
c957c741d9
|
Enable safetensors loading for all models (#974)
|
2023-09-07 15:49:52 -07:00 |
|
Antoni Baum
|
c07ece5ca4
|
Make AsyncLLMEngine more robust & fix batched abort (#969)
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Avnish Narayan <38871737+avnishn@users.noreply.github.com>
|
2023-09-07 13:43:45 -07:00 |
|
Woosuk Kwon
|
7a9c20c715
|
Bum up transformers version (#976)
|
2023-09-07 13:15:53 -07:00 |
|
Antoni Baum
|
005ba458b5
|
Set torch default dtype in a context manager (#971)
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
|
2023-09-07 15:39:37 +09:00 |
|
Woosuk Kwon
|
320a622ec4
|
[BugFix] Implement RoPE for GPT-J (#941)
|
2023-09-06 11:54:33 +09:00 |
|
Antoni Baum
|
c9927c1a6a
|
Use queue for finished requests (#957)
|
2023-09-05 19:27:23 -07:00 |
|
Woosuk Kwon
|
fbd80ad409
|
Clean up kernel unit tests (#938)
|
2023-09-05 16:57:38 -07:00 |
|
Wen Sun
|
22379d5513
|
fix: typo (#948)
|
2023-09-04 23:22:30 -07:00 |
|
Antoni Baum
|
1696725879
|
Initialize AsyncLLMEngine bg loop correctly (#943)
|
2023-09-04 17:41:22 -07:00 |
|
Zhuohan Li
|
002800f081
|
Align vLLM's beam search implementation with HF generate (#857)
|
2023-09-04 17:29:42 -07:00 |
|
Nelson Liu
|
e15932bb60
|
Only emit warning about internal tokenizer if it isn't being used (#939)
|
2023-09-05 00:50:55 +09:00 |
|
Antoni Baum
|
ce741ba3e4
|
Refactor AsyncLLMEngine (#880)
|
2023-09-03 21:43:43 -07:00 |
|
Woosuk Kwon
|
bf87484efa
|
[BugFix] Fix NaN errors in paged attention kernel (#936)
|
2023-09-04 09:20:06 +09:00 |
|
Woosuk Kwon
|
8ce9c50d40
|
Avoid compiling kernels for double data type (#933)
|
2023-09-02 14:59:47 +09:00 |
|
Woosuk Kwon
|
32b6816e55
|
Add tests for models (#922)
|
2023-09-01 11:19:43 +09:00 |
|
Zhuohan Li
|
c128d69856
|
Fix README.md Link (#927)
|
2023-08-31 17:18:34 -07:00 |
|