362 Commits

Author SHA1 Message Date
Woosuk Kwon
8d926e91f1
Announce the First vLLM Meetup (#1148) 2023-09-22 11:37:14 -07:00
Nick Perez
4ee52bb169
Docs: Fix broken link to openai example (#1145)
Link to `openai_client.py` is no longer valid - updated to `openai_completion_client.py`
2023-09-22 11:36:09 -07:00
Woosuk Kwon
7d7e3b78a3
Use --ipc=host in docker run for distributed inference (#1125) 2023-09-21 18:26:47 -07:00
Ricardo Lu
f98b745a81
feat: support stop_token_ids parameter. (#1097) 2023-09-21 15:34:02 -07:00
Roy
2d1e86f1b1
clean api code, remove redundant background task. (#1102) 2023-09-21 13:25:05 -07:00
Woosuk Kwon
1ac4ccf73c
Add float16 and float32 (#1115) 2023-09-21 00:52:47 -07:00
Woosuk Kwon
2ac4d5e2bf
Replace DtypeTensor (#1123) 2023-09-21 00:51:47 -07:00
Antoni Baum
3302f0aef3
rope_theta and max_position_embeddings from config (#1096)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: wnma3mz <wnma3mz@gmail.com>
2023-09-20 13:35:11 -07:00
Tanmay Verma
6f2dd6c37e
Add documentation to Triton server tutorial (#983) 2023-09-20 10:32:40 -07:00
Woosuk Kwon
bc0644574c
Add gpu_memory_utilization and swap_space to LLM (#1090) 2023-09-19 22:16:04 -07:00
Woosuk Kwon
400b8289f7
Add pyarrow to dependencies & Print warning on Ray import error (#1094) 2023-09-18 22:36:17 -07:00
Zhuohan Li
c1026311b5
[Community] Add vLLM Discord server (#1086) 2023-09-18 12:23:35 -07:00
Woosuk Kwon
2b1c116b5a
Add minimum capability requirement for AWQ (#1064) 2023-09-18 12:02:01 -07:00
Woosuk Kwon
cc796b1358
Convert before transpose (#1073) 2023-09-18 11:51:48 -07:00
Zhuohan Li
f029ef94d7
Fix get_max_num_running_seqs for waiting and swapped seq groups (#1068) 2023-09-18 11:49:40 -07:00
Roy
95592fa00a
align llm_engine and async_engine. (#1081) 2023-09-18 11:49:10 -07:00
orellavie1212
fbe66e1d0b
added support for quantize on LLM module (#1080) 2023-09-18 11:04:21 -07:00
Zhuohan Li
90979c38f8
[FIX] Don't initialize parameter by default (#1067) 2023-09-17 17:15:38 -07:00
陈序
e21d7687a9
Fix hanging when prompt exceeds limit (#1029) 2023-09-17 01:48:56 -07:00
Antoni Baum
ff36139ffc
Remove AsyncLLMEngine busy loop, shield background task (#1059) 2023-09-17 00:29:08 -07:00
Woosuk Kwon
e3e79e9e8a
Implement AWQ quantization support for LLaMA (#1032)
Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>
2023-09-16 00:03:37 -07:00
Jerry Yang
b9fe4616f9
Abort when coroutine is cancelled (#1020) 2023-09-14 17:40:18 -07:00
Woosuk Kwon
64ca424e75
Fix warning message on LLaMA FastTokenizer (#1037) 2023-09-14 17:33:32 -07:00
Lukas Kreussel
b5f93d0631
Only fail if logit_bias has actual values (#1045) 2023-09-14 17:33:01 -07:00
Woosuk Kwon
a58936966f
Add pandas to requirements.txt (#1047)
* Add pandas to requirements.txt

* Minor
2023-09-14 17:31:38 -07:00
Antoni Baum
dd54a4b026
Fix detokenization leaving special tokens (#1044)
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
2023-09-14 16:37:03 -07:00
Woosuk Kwon
eda1a7cad3
Announce paper release (#1036) 2023-09-13 17:38:13 -07:00
Zhuohan Li
f04908cae7
[FIX] Minor bug fixes (#1035)
* [FIX] Minor bug fixes

* Address review comments
2023-09-13 16:38:12 -07:00
Jasmond L
ab019eea75
Add Model Revision Support (#1014)
Co-authored-by: Jasmond Loh <Jasmond.Loh@hotmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-09-13 15:20:02 -07:00
Antoni Baum
9841d48a10
Use TGI-like incremental detokenization (#984) 2023-09-13 13:38:01 -07:00
Ikko Eltociear Ashimine
3272d7a0b7
Fix typo in README.md (#1033) 2023-09-13 12:55:23 -07:00
Antoni Baum
0bb1e885a0
Make max_model_len configurable (#972) 2023-09-12 16:29:19 -07:00
leiwen83
d6545ad22e
add option to shorten prompt print in log (#991)
Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-09-12 15:10:14 -07:00
Woosuk Kwon
90eb3f43ca
Bump up the version to v0.1.7 (#1013) v0.1.7 2023-09-11 00:54:30 -07:00
Woosuk Kwon
e67b4f2c2a
Use FP32 in RoPE initialization (#1004)
Co-authored-by: One <imone@tuta.io>
2023-09-11 00:26:35 -07:00
Woosuk Kwon
d6770d1f23
Update setup.py (#1006) 2023-09-10 23:42:45 -07:00
Woosuk Kwon
b9cecc2635
[Docs] Update installation page (#1005) 2023-09-10 14:23:31 -07:00
Kyujin Cho
898285c9bf
fix: CUDA error when inferencing with Falcon-40B base model (#992) 2023-09-10 01:39:02 -07:00
Antoni Baum
a62de9ecfd
Fix wrong dtype in PagedAttentionWithALiBi bias (#996)
---------

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
2023-09-09 14:58:35 -07:00
Jingru
4042d192f5
fix "tansformers_module" ModuleNotFoundError when load model with trust_remote_code=True (#871) 2023-09-08 17:21:30 -07:00
Zhuohan Li
1117aa1411
Bump up the version to v0.1.6 (#989) v0.1.6 2023-09-08 00:07:46 -07:00
Antoni Baum
080438477f
Start background task in AsyncLLMEngine.generate (#988)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-09-08 00:03:39 -07:00
Robert Irvine
4b5bcf8906
faster startup of vLLM (#982)
* update

---------

Co-authored-by: Robert Irvine <robert@seamlessml.com>
2023-09-08 14:48:54 +09:00
Woosuk Kwon
852ef5b4f5
Bump up the version to v0.1.5 (#944) v0.1.5 2023-09-07 16:15:31 -07:00
Zhuohan Li
db09d4ad83
[FIX] Fix Alibi implementation in PagedAttention kernel (#945)
* [FIX] Fix Alibi implementation in PagedAttention kernel

* Fix test_attention

* Fix

---------

Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Oliver-ss <yuansongwx@outlook.com>
2023-09-07 15:53:14 -07:00
Zhuohan Li
c957c741d9
Enable safetensors loading for all models (#974) 2023-09-07 15:49:52 -07:00
Antoni Baum
c07ece5ca4
Make AsyncLLMEngine more robust & fix batched abort (#969)
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Avnish Narayan <38871737+avnishn@users.noreply.github.com>
2023-09-07 13:43:45 -07:00
Woosuk Kwon
7a9c20c715
Bum up transformers version (#976) 2023-09-07 13:15:53 -07:00
Antoni Baum
005ba458b5
Set torch default dtype in a context manager (#971)
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
2023-09-07 15:39:37 +09:00
Woosuk Kwon
320a622ec4
[BugFix] Implement RoPE for GPT-J (#941) 2023-09-06 11:54:33 +09:00