This commit is contained in:
kijai 2024-10-24 02:59:58 +03:00
commit 9bb3a79275

View File

@ -6,18 +6,7 @@
https://github.com/user-attachments/assets/a714b70f-dcdb-4f91-8a3d-8da679a28d6e https://github.com/user-attachments/assets/a714b70f-dcdb-4f91-8a3d-8da679a28d6e
## Requires flash_attn ! Can use flash_attn, pytorch attention (sdpa) or [sage attention](https://github.com/thu-ml/SageAttention), sage being fastest.
Not sure if this can be worked around, I compiled a wheel for my Windows setup (Python 3.12, torch 2.5.0+cu124) that worked for me:
https://huggingface.co/Kijai/Mochi_preview_comfy/blob/main/flash_attn-2.6.3-cp312-cp312-win_amd64.whl
Python 3.10 / CUDA 12.4 / Torch 2.4.1:
https://huggingface.co/Kijai/Mochi_preview_comfy/blob/main/flash_attn-2.6.3-cp310-cp310-win_amd64.whl
Other sources for pre-compiled wheels:
https://github.com/oobabooga/flash-attention/releases
Depending on frame count can fit under 20GB, VAE decoding is heavy and there is experimental tiled decoder (taken from CogVideoX -diffusers code) which allows higher frame counts, so far highest I've done is 97 with the default tile size 2x2 grid. Depending on frame count can fit under 20GB, VAE decoding is heavy and there is experimental tiled decoder (taken from CogVideoX -diffusers code) which allows higher frame counts, so far highest I've done is 97 with the default tile size 2x2 grid.