14 Commits

Author SHA1 Message Date
Zhuohan Li
db09d4ad83
[FIX] Fix Alibi implementation in PagedAttention kernel (#945)
* [FIX] Fix Alibi implementation in PagedAttention kernel

* Fix test_attention

* Fix

---------

Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Oliver-ss <yuansongwx@outlook.com>
2023-09-07 15:53:14 -07:00
Woosuk Kwon
bf87484efa
[BugFix] Fix NaN errors in paged attention kernel (#936) 2023-09-04 09:20:06 +09:00
Dean Leitersdorf
79af7e96a0
[OPTIMIZATION] Optimizes the single_query_cached_kv_attention kernel (#420) 2023-08-04 10:57:29 -07:00
Zhuohan Li
96853af5a8
Optimize MQA Kernel (#452) 2023-07-14 20:06:40 -04:00
Andre Slavescu
c894836108
[Model] Add support for GPT-J (#226)
Co-authored-by: woWoosuk Kwon <woosuk.kwon@berkeley.edu>
2023-07-08 17:55:16 -07:00
Woosuk Kwon
404422f42e
[Model] Add support for MPT (#334) 2023-07-03 16:47:53 -07:00
Woosuk Kwon
e41f06702c
Add support for BLOOM (#331) 2023-07-03 13:12:35 -07:00
Woosuk Kwon
0b98ba15c7
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
Woosuk Kwon
e38074b1e6
Support FP32 (#141) 2023-06-07 00:40:21 -07:00
Woosuk Kwon
d721168449
Improve setup script & Add a guard for bfloat16 kernels (#130) 2023-05-27 00:59:32 -07:00
Woosuk Kwon
667ba3995c
Add copyright headers to source files adapted from FT (#104) 2023-05-14 22:19:19 -07:00
Woosuk Kwon
130d5fd8c7
Fix a bug in attention kernel (#68) 2023-05-04 02:56:09 -07:00
Woosuk Kwon
e070829ae8
Support bfloat16 data type (#54) 2023-05-03 14:09:44 -07:00
Woosuk Kwon
436e523bf1
Refactor attention kernels (#53) 2023-05-03 13:40:13 -07:00