Woosuk Kwon
|
2e8e49fce3
|
[Fix] Remove false assertion (#1222)
|
2023-09-28 10:52:38 -07:00 |
|
Woosuk Kwon
|
a8e98aee0c
|
Fix Mistral model (#1220)
|
2023-09-28 10:44:05 -07:00 |
|
Chris Bamford
|
bb1ba58f06
|
[Mistral] Mistral-7B-v0.1 support (#1196)
Co-authored-by: timlacroix <t@mistral.ai>
|
2023-09-28 10:41:03 -07:00 |
|
Qing
|
7bedab5748
|
Add rope_scaling to Qwen (#1210)
|
2023-09-28 00:49:23 -07:00 |
|
Dan Lord
|
20f7cc4cde
|
Add skip_special_tokens sampling params (#1186)
|
2023-09-27 19:21:42 -07:00 |
|
Danilo Peixoto
|
649aa730c5
|
Use standard extras for uvicorn (#1166)
|
2023-09-27 17:41:36 -07:00 |
|
Woosuk Kwon
|
a19bc5c628
|
Automatically configure max_num_batched_tokens (#1198)
|
2023-09-27 16:34:00 -07:00 |
|
Qing
|
28e616c4e3
|
fix qwen-14b model (#1173)
|
2023-09-27 16:33:16 -07:00 |
|
Wang Ran (汪然)
|
30e775281d
|
fix typo (#1184)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-09-27 16:22:45 -07:00 |
|
Lily Liu
|
21877b0d75
|
Support Longchat and RoPE scaling (#555)
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2023-09-27 03:36:02 -07:00 |
|
Antoni Baum
|
cf5cb1e33e
|
Allocate more shared memory to attention kernel (#1154)
|
2023-09-26 22:27:13 -07:00 |
|
Woosuk Kwon
|
03ffd0a022
|
Add comments on RoPE initialization (#1176)
|
2023-09-26 10:48:33 -07:00 |
|
Woosuk Kwon
|
a425bd9a9a
|
[Setup] Enable TORCH_CUDA_ARCH_LIST for selecting target GPUs (#1074)
|
2023-09-26 10:21:08 -07:00 |
|
Wen Sun
|
bbbf86565f
|
Align max_tokens behavior with openai (#852)
|
2023-09-23 18:10:13 -07:00 |
|
Woosuk Kwon
|
9f6be8692e
|
Fix config for Falcon (#1164)
|
2023-09-23 17:38:43 -07:00 |
|
Zhuohan Li
|
f187877945
|
[FIX] Simplify sampler logic (#1156)
|
2023-09-23 17:21:56 -07:00 |
|
Zhuohan Li
|
947b794146
|
[Sampler] Vectorized sampling (simplified) (#1048)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
|
2023-09-22 17:48:04 -07:00 |
|
Woosuk Kwon
|
8d926e91f1
|
Announce the First vLLM Meetup (#1148)
|
2023-09-22 11:37:14 -07:00 |
|
Nick Perez
|
4ee52bb169
|
Docs: Fix broken link to openai example (#1145)
Link to `openai_client.py` is no longer valid - updated to `openai_completion_client.py`
|
2023-09-22 11:36:09 -07:00 |
|
Woosuk Kwon
|
7d7e3b78a3
|
Use --ipc=host in docker run for distributed inference (#1125)
|
2023-09-21 18:26:47 -07:00 |
|
Ricardo Lu
|
f98b745a81
|
feat: support stop_token_ids parameter. (#1097)
|
2023-09-21 15:34:02 -07:00 |
|
Roy
|
2d1e86f1b1
|
clean api code, remove redundant background task. (#1102)
|
2023-09-21 13:25:05 -07:00 |
|
Woosuk Kwon
|
1ac4ccf73c
|
Add float16 and float32 (#1115)
|
2023-09-21 00:52:47 -07:00 |
|
Woosuk Kwon
|
2ac4d5e2bf
|
Replace DtypeTensor (#1123)
|
2023-09-21 00:51:47 -07:00 |
|
Antoni Baum
|
3302f0aef3
|
rope_theta and max_position_embeddings from config (#1096)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: wnma3mz <wnma3mz@gmail.com>
|
2023-09-20 13:35:11 -07:00 |
|
Tanmay Verma
|
6f2dd6c37e
|
Add documentation to Triton server tutorial (#983)
|
2023-09-20 10:32:40 -07:00 |
|
Woosuk Kwon
|
bc0644574c
|
Add gpu_memory_utilization and swap_space to LLM (#1090)
|
2023-09-19 22:16:04 -07:00 |
|
Woosuk Kwon
|
400b8289f7
|
Add pyarrow to dependencies & Print warning on Ray import error (#1094)
|
2023-09-18 22:36:17 -07:00 |
|
Zhuohan Li
|
c1026311b5
|
[Community] Add vLLM Discord server (#1086)
|
2023-09-18 12:23:35 -07:00 |
|
Woosuk Kwon
|
2b1c116b5a
|
Add minimum capability requirement for AWQ (#1064)
|
2023-09-18 12:02:01 -07:00 |
|
Woosuk Kwon
|
cc796b1358
|
Convert before transpose (#1073)
|
2023-09-18 11:51:48 -07:00 |
|
Zhuohan Li
|
f029ef94d7
|
Fix get_max_num_running_seqs for waiting and swapped seq groups (#1068)
|
2023-09-18 11:49:40 -07:00 |
|
Roy
|
95592fa00a
|
align llm_engine and async_engine. (#1081)
|
2023-09-18 11:49:10 -07:00 |
|
orellavie1212
|
fbe66e1d0b
|
added support for quantize on LLM module (#1080)
|
2023-09-18 11:04:21 -07:00 |
|
Zhuohan Li
|
90979c38f8
|
[FIX] Don't initialize parameter by default (#1067)
|
2023-09-17 17:15:38 -07:00 |
|
陈序
|
e21d7687a9
|
Fix hanging when prompt exceeds limit (#1029)
|
2023-09-17 01:48:56 -07:00 |
|
Antoni Baum
|
ff36139ffc
|
Remove AsyncLLMEngine busy loop, shield background task (#1059)
|
2023-09-17 00:29:08 -07:00 |
|
Woosuk Kwon
|
e3e79e9e8a
|
Implement AWQ quantization support for LLaMA (#1032)
Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>
|
2023-09-16 00:03:37 -07:00 |
|
Jerry Yang
|
b9fe4616f9
|
Abort when coroutine is cancelled (#1020)
|
2023-09-14 17:40:18 -07:00 |
|
Woosuk Kwon
|
64ca424e75
|
Fix warning message on LLaMA FastTokenizer (#1037)
|
2023-09-14 17:33:32 -07:00 |
|
Lukas Kreussel
|
b5f93d0631
|
Only fail if logit_bias has actual values (#1045)
|
2023-09-14 17:33:01 -07:00 |
|
Woosuk Kwon
|
a58936966f
|
Add pandas to requirements.txt (#1047)
* Add pandas to requirements.txt
* Minor
|
2023-09-14 17:31:38 -07:00 |
|
Antoni Baum
|
dd54a4b026
|
Fix detokenization leaving special tokens (#1044)
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
|
2023-09-14 16:37:03 -07:00 |
|
Woosuk Kwon
|
eda1a7cad3
|
Announce paper release (#1036)
|
2023-09-13 17:38:13 -07:00 |
|
Zhuohan Li
|
f04908cae7
|
[FIX] Minor bug fixes (#1035)
* [FIX] Minor bug fixes
* Address review comments
|
2023-09-13 16:38:12 -07:00 |
|
Jasmond L
|
ab019eea75
|
Add Model Revision Support (#1014)
Co-authored-by: Jasmond Loh <Jasmond.Loh@hotmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-09-13 15:20:02 -07:00 |
|
Antoni Baum
|
9841d48a10
|
Use TGI-like incremental detokenization (#984)
|
2023-09-13 13:38:01 -07:00 |
|
Ikko Eltociear Ashimine
|
3272d7a0b7
|
Fix typo in README.md (#1033)
|
2023-09-13 12:55:23 -07:00 |
|
Antoni Baum
|
0bb1e885a0
|
Make max_model_len configurable (#972)
|
2023-09-12 16:29:19 -07:00 |
|
leiwen83
|
d6545ad22e
|
add option to shorten prompt print in log (#991)
Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-09-12 15:10:14 -07:00 |
|