Woosuk Kwon
|
c3372e87be
|
Remove dependency on CuPy (#2152)
|
2023-12-17 01:49:07 -08:00 |
|
Woosuk Kwon
|
b0a1d667b0
|
Pin PyTorch & xformers versions (#2155)
|
2023-12-17 01:46:54 -08:00 |
|
Woosuk Kwon
|
e1d5402238
|
Fix all-reduce memory usage (#2151)
|
2023-12-17 01:44:45 -08:00 |
|
Woosuk Kwon
|
3d1cfbfc74
|
[Minor] Delete Llama tokenizer warnings (#2146)
|
2023-12-16 22:05:18 -08:00 |
|
Woosuk Kwon
|
37ca558103
|
Optimize model execution with CUDA graph (#1926)
Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
|
2023-12-16 21:12:08 -08:00 |
|
Roy
|
eed74a558f
|
Simplify weight loading logic (#2133)
|
2023-12-16 12:41:23 -08:00 |
|
Woosuk Kwon
|
2acd76f346
|
[ROCm] Temporarily remove GPTQ ROCm support (#2138)
|
2023-12-15 17:13:58 -08:00 |
|
Woosuk Kwon
|
b81a6a6bb3
|
[Docs] Add supported quantization methods to docs (#2135)
|
2023-12-15 13:29:22 -08:00 |
|
CHU Tianxiang
|
0fbfc4b81b
|
Add GPTQ support (#916)
|
2023-12-15 03:04:22 -08:00 |
|
Yunfeng Bai
|
c06170cc8e
|
Add a flag to include stop string in output text (#1976)
|
2023-12-15 00:45:58 -08:00 |
|
Mingcan Xiang
|
614856da25
|
Avoid multiple redefinition (#1817)
|
2023-12-14 09:35:58 -08:00 |
|
TJian
|
05bdf4eaf3
|
Fix Dockerfile.rocm (#2101)
Co-authored-by: miloice <jeffaw99@hotmail.com>
|
2023-12-14 00:45:58 -08:00 |
|
mezuzza
|
6774bd50b0
|
Fix typing in AsyncLLMEngine & add toml to requirements-dev (#2100)
|
2023-12-14 00:19:41 -08:00 |
|
Woosuk Kwon
|
31c1f3255e
|
Bump up to v0.2.5 (#2095)
v0.2.5
|
2023-12-13 23:56:15 -08:00 |
|
Antoni Baum
|
21d93c140d
|
Optimize Mixtral with expert parallelism (#2090)
|
2023-12-13 23:55:07 -08:00 |
|
Woosuk Kwon
|
f1c8520146
|
[BugFix] Fix input positions for long context with sliding window (#2088)
|
2023-12-13 12:28:13 -08:00 |
|
Woosuk Kwon
|
096827c284
|
[Docs] Add notes on ROCm-supported models (#2087)
|
2023-12-13 09:45:34 -08:00 |
|
Woosuk Kwon
|
6565d9e33e
|
Update installation instruction for vLLM + CUDA 11.8 (#2086)
|
2023-12-13 09:25:59 -08:00 |
|
TJian
|
f375ec8440
|
[ROCm] Upgrade xformers version for ROCm & update doc (#2079)
Co-authored-by: miloice <jeffaw99@hotmail.com>
|
2023-12-13 00:56:05 -08:00 |
|
Woosuk Kwon
|
518369d78c
|
Implement lazy model loader (#2044)
|
2023-12-12 22:21:45 -08:00 |
|
Woosuk Kwon
|
30bad5c492
|
Fix peak memory profiling (#2031)
|
2023-12-12 22:01:53 -08:00 |
|
Simon Mo
|
3fefe271ec
|
Update Dockerfile to build Megablocks (#2042)
|
2023-12-12 17:34:17 -08:00 |
|
Megha Agarwal
|
6428f1d051
|
Support MPT with GQA (#1938)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2023-12-12 10:16:05 -08:00 |
|
Woosuk Kwon
|
7e1b21daac
|
Remove einops from requirements (#2049)
|
2023-12-12 09:34:09 -08:00 |
|
Woosuk Kwon
|
cb3f30c600
|
Upgrade transformers version to 4.36.0 (#2046)
|
2023-12-11 18:39:14 -08:00 |
|
Woosuk Kwon
|
f3e024bece
|
[CI/CD] Upgrade PyTorch version to v2.1.1 (#2045)
|
2023-12-11 17:48:11 -08:00 |
|
Woosuk Kwon
|
31d2ab4aff
|
Remove python 3.10 requirement (#2040)
|
2023-12-11 12:26:42 -08:00 |
|
Simon Mo
|
eb17212858
|
Update Dockerfile to support Mixtral (#2027)
|
2023-12-11 11:59:08 -08:00 |
|
Woosuk Kwon
|
4dd4b5c538
|
Bump up to v0.2.4 (#2034)
v0.2.4
|
2023-12-11 11:49:39 -08:00 |
|
Woosuk Kwon
|
6120e5aaea
|
Fix import error msg for megablocks (#2038)
|
2023-12-11 11:40:56 -08:00 |
|
Ram
|
2eaa81b236
|
Update README.md to add megablocks requirement for mixtral (#2033)
|
2023-12-11 11:37:34 -08:00 |
|
Woosuk Kwon
|
81ce2a4b26
|
[Minor] Fix type annotation in Mixtral (#2036)
|
2023-12-11 11:32:39 -08:00 |
|
Woosuk Kwon
|
5dd80d3777
|
Fix latency benchmark script (#2035)
|
2023-12-11 11:19:08 -08:00 |
|
Woosuk Kwon
|
beeee69bc9
|
Revert adding Megablocks (#2030)
|
2023-12-11 10:49:00 -08:00 |
|
Ram
|
9bf28d0b69
|
Update requirements.txt for mixtral (#2029)
|
2023-12-11 10:39:29 -08:00 |
|
Ikko Eltociear Ashimine
|
c0ce15dfb2
|
Update run_on_sky.rst (#2025)
sharable -> shareable
|
2023-12-11 10:32:58 -08:00 |
|
Woosuk Kwon
|
b9bcdc7158
|
Change the load format to pt for Mixtral (#2028)
|
2023-12-11 10:32:17 -08:00 |
|
Woosuk Kwon
|
4ff0203987
|
Minor fixes for Mixtral (#2015)
|
2023-12-11 09:16:15 -08:00 |
|
Pierre Stock
|
b5f882cc98
|
Mixtral 8x7B support (#2011)
Co-authored-by: Pierre Stock <p@mistral.ai>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-12-11 01:09:15 -08:00 |
|
Simon Mo
|
2e8fc0d4c3
|
Fix completion API echo and logprob combo (#1992)
|
2023-12-10 13:20:30 -08:00 |
|
wbn
|
dacaf5a400
|
Replace head_mapping params with num_kv_heads to attention kernel. (#1997)
Co-authored-by: wangguoya <wangguoya@baidu.com>
Co-authored-by: Yang Zhao <zhaoyangstar@foxmail.com>
|
2023-12-10 10:12:53 -08:00 |
|
Woosuk Kwon
|
24cde76a15
|
[Minor] Add comment on skipping rope caches (#2004)
|
2023-12-10 10:04:12 -08:00 |
|
Jin Shang
|
1aa1361510
|
Fix OpenAI server completion_tokens referenced before assignment (#1996)
|
2023-12-09 21:01:21 -08:00 |
|
Woosuk Kwon
|
fe470ae5ad
|
[Minor] Fix code style for baichuan (#2003)
|
2023-12-09 19:24:29 -08:00 |
|
Jun Gao
|
3a8c2381f7
|
Fix for KeyError on Loading LLaMA (#1978)
|
2023-12-09 15:59:57 -08:00 |
|
Simon Mo
|
c85b80c2b6
|
[Docker] Add cuda arch list as build option (#1950)
|
2023-12-08 09:53:47 -08:00 |
|
firebook
|
2b981012a6
|
Fix Baichuan2-7B-Chat (#1987)
|
2023-12-08 09:38:36 -08:00 |
|
TJian
|
6ccc0bfffb
|
Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Amir Balwel <amoooori04@gmail.com>
Co-authored-by: root <kuanfu.liu@akirakan.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com>
Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>
|
2023-12-07 23:16:52 -08:00 |
|
Daya Khudia
|
c8e7eb1eb3
|
fix typo in getenv call (#1972)
|
2023-12-07 16:04:41 -08:00 |
|
AguirreNicolas
|
24f60a54f4
|
[Docker] Adding number of nvcc_threads during build as envar (#1893)
|
2023-12-07 11:00:32 -08:00 |
|