49 Commits

Author SHA1 Message Date
kijai
24a7edfca6 update, works 2024-11-05 08:41:50 +02:00
kijai
3535a846a8 update from main 2024-11-05 07:08:14 +02:00
Yoshimasa Niwa
5ca4bbf319 Workaround pad problem on mps
When using `torch.nn.functional.pad` with tensor that size is
larger than 2^16 (65526), the output tensor would be broken.

This patch moves tensor to CPU to workaround the problem.
It doesn't too much impacts in terms of speed of vea on mps.
2024-11-05 12:46:13 +09:00
kijai
56b5dbbf82 Add different RMSNorm functions for testing
Initial testing for me shows that the RMSNorm from flash_attn.ops.triton.layer_norm is ~8-10% faster, apex is untested as I don't currently have it installed.
2024-11-04 14:11:58 +02:00
kijai
4a7458ffd6 restore MochiEdit compatiblity
temporary
2024-11-03 19:18:29 +02:00
kijai
0dc011d1b6 cleanup and align more to comfy code, switch to using cpu seed as well 2024-11-03 18:36:23 +02:00
kijai
3cf9289e08 cleanup code 2024-11-03 01:47:34 +02:00
kijai
a6e545531c test 2024-11-03 00:32:51 +02:00
kijai
85c996d7b8 Add MochiSigmaSchedule node, better denoise formula 2024-11-01 19:36:40 +02:00
kijai
a5b06b02ad tiled encoding 2024-11-01 18:03:47 +02:00
kijai
69ab797b8c Add sampler preview 2024-11-01 06:18:41 +02:00
kijai
ac5de728ad Add VAE encoder 2024-11-01 05:22:49 +02:00
kijai
d971a19410 spatial VAE decoder fixes 2024-10-30 22:06:55 +02:00
kijai
f0f939b20b cleanup, fix untiled spatial vae decode 2024-10-30 21:12:34 +02:00
kijai
3395aa8ca0 cleanup, sampler output name fix 2024-10-27 20:02:44 +02:00
kijai
195da244df make cfg 1.0 not do uncond, set steps by sigma schedule 2024-10-27 19:52:16 +02:00
kijai
3613700752 possible sdpa kernel fixes and add optional cfg scheduling 2024-10-27 12:23:01 +02:00
kijai
e20eb66f93 cleanup 2024-10-27 03:00:52 +02:00
kijai
c5c136cb11 fix 2024-10-26 17:57:02 +03:00
kijai
ddfb3a6bf2 backends 2024-10-26 17:49:15 +03:00
kijai
f29f739707 support cublas_ops with GGUF
pretty big speed boost on 4090 at least, needs this installed:
https://github.com/aredden/torch-cublas-hgemm
2024-10-26 16:42:25 +03:00
kijai
0d15c0bd69 torch compile for vae loader 2024-10-26 03:24:25 +03:00
kijai
b932036af3 Update asymm_models_joint.py 2024-10-25 22:37:08 +03:00
kijai
e66735527c tweak 2024-10-25 20:04:01 +03:00
kijai
2e22529c99 clean 2024-10-25 19:51:15 +03:00
kijai
bd844331b2 Update t2v_synth_mochi.py 2024-10-25 19:41:50 +03:00
kijai
aa30132268 temporary monkey patch for torch compile Windows bug 2024-10-25 19:41:10 +03:00
kijai
25eeab3c4c torch.compile support
works in Windows with torch 2.5.0 and Triton from https://github.com/woct0rdho/triton-windows
2024-10-25 18:15:30 +03:00
kijai
36a4275b3b Add alternative VAE decoding node
This was actually unused code in the VAE model, only does spatial tiling though, but seams look better
2024-10-25 15:30:20 +03:00
kijai
813bbe8f4b Add model and vae loader nodes 2024-10-24 21:38:06 +03:00
kijai
2c67025577 remove prints 2024-10-24 17:16:26 +03:00
kijai
f4c13b1ef4 Add first GGUF test version 2024-10-24 17:05:50 +03:00
kijai
d699fae213 cleanup, possibly support older GPUs 2024-10-24 14:27:11 +03:00
kijai
257c526125 Cleanup, fix seed gen, better warnings for decoder 2024-10-24 12:45:01 +03:00
kijai
c673508188 tweaks 2024-10-24 02:59:52 +03:00
Jukka Seppänen
00a550e81c Should work without flash_attn (thanks @logtd), add sage_attn
tested to work in Linux at least
2024-10-24 02:25:57 +03:00
Jukka Seppänen
1ba3ac8e25 Revert "works without flash_attn (thanks @juxtapoz!)"
This reverts commit a1b1f86aa3b6780a4981157b8e2e37b0a1017568.
2024-10-24 02:23:46 +03:00
Jukka Seppänen
a1b1f86aa3 works without flash_attn (thanks @juxtapoz!)
at least on Linux, also sage_attn
2024-10-24 02:19:00 +03:00
kijai
83097a6b63 Update t2v_synth_mochi.py 2024-10-24 00:40:52 +03:00
kijai
bd954ec132 Update t2v_synth_mochi.py 2024-10-24 00:36:38 +03:00
kijai
508eaa22df Update t2v_synth_mochi.py 2024-10-24 00:34:50 +03:00
kijai
1cd5409295 Add bf16 model 2024-10-24 00:00:38 +03:00
kijai
a32064eefb fix non-accelerate model loading 2024-10-23 20:40:42 +03:00
kijai
57640ab0f8 Update t2v_synth_mochi.py 2024-10-23 19:38:01 +03:00
kijai
fb880273a0 update 2024-10-23 17:04:50 +03:00
kijai
db87f8e608 Add accelerate 2024-10-23 16:05:19 +03:00
kijai
34e029bacc fix dtype selection 2024-10-23 15:49:37 +03:00
kijai
4efb7c85df cleanup 2024-10-23 15:45:20 +03:00
kijai
b80cb4a691 initial commit 2024-10-23 15:34:22 +03:00