Update readme.md

This commit is contained in:
Jukka Seppänen 2024-10-24 02:55:09 +03:00 committed by GitHub
parent 00a550e81c
commit cafd0741f5
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -6,18 +6,7 @@
https://github.com/user-attachments/assets/a714b70f-dcdb-4f91-8a3d-8da679a28d6e https://github.com/user-attachments/assets/a714b70f-dcdb-4f91-8a3d-8da679a28d6e
## Requires flash_attn ! Can use flash_attn, pytorch attention (sdpa) or [sage attention](https://github.com/thu-ml/SageAttention), sage being fastest.
Not sure if this can be worked around, I compiled a wheel for my Windows setup (Python 3.12, torch 2.5.0+cu124) that worked for me:
https://huggingface.co/Kijai/Mochi_preview_comfy/blob/main/flash_attn-2.6.3-cp312-cp312-win_amd64.whl
Python 3.10 / CUDA 12.4 / Torch 2.4.1:
https://huggingface.co/Kijai/Mochi_preview_comfy/blob/main/flash_attn-2.6.3-cp310-cp310-win_amd64.whl
Other sources for pre-compiled wheels:
https://github.com/oobabooga/flash-attention/releases
Depending on frame count can fit under 20GB, VAE decoding is heavy and there is experimental tiled decoder (taken from CogVideoX -diffusers code) which allows higher frame counts, so far highest I've done is 97 with the default tile size 2x2 grid. Depending on frame count can fit under 20GB, VAE decoding is heavy and there is experimental tiled decoder (taken from CogVideoX -diffusers code) which allows higher frame counts, so far highest I've done is 97 with the default tile size 2x2 grid.