14 Commits

Author SHA1 Message Date
Luka Govedič
71c60491f2
[Kernel] Build flash-attn from source (#8245) 2024-09-20 23:27:10 -07:00
tomeras91
386087970a
[CI/Build] build on empty device for better dev experience (#4773) 2024-08-11 13:09:44 -07:00
Woosuk Kwon
805a8a75f2
[Misc] Support attention logits soft-capping with flash-attn (#7022) 2024-08-01 13:14:37 -07:00
Sage Moore
7e0861bd0b
[CI/Build] Update PyTorch to 2.4.0 (#6951)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-08-01 11:11:24 -07:00
Cody Yu
aa48e502fb
[MISC] Upgrade dependency to PyTorch 2.3.1 (#5327) 2024-07-12 12:04:26 -07:00
Isotr0py
edd5fe5fa2
[Bugfix] Add phi3v resize for dynamic shape and fix torchvision requirement (#5772) 2024-06-24 12:11:53 +08:00
Antoni Baum
0ab278ca31
[Core] Remove unnecessary copies in flash attn backend (#5138) 2024-06-03 09:39:31 -07:00
youkaichao
5bd3c65072
[Core][Optimization] remove vllm-nccl (#5091) 2024-05-29 05:13:52 +00:00
Woosuk Kwon
b57e6c5949
[Kernel] Add flash-attn back (#4907) 2024-05-19 18:11:30 -07:00
Woosuk Kwon
89579a201f
[Misc] Use vllm-flash-attn instead of flash-attn (#4686) 2024-05-08 13:15:34 -07:00
Michael Goin
d627a3d837
[Misc] Upgrade to torch==2.3.0 (#4454) 2024-04-29 20:05:47 -04:00
youkaichao
e4bf860a54
[CI][Build] change pynvml to nvidia-ml-py (#4302) 2024-04-23 18:33:12 -07:00
Roy
8db1bf32f8
[Misc] Upgrade triton to 2.2.0 (#4061) 2024-04-14 17:43:54 -07:00
Woosuk Kwon
cfaf49a167
[Misc] Define common requirements (#3841) 2024-04-05 00:39:17 -07:00