52 Commits

Author SHA1 Message Date
contentis
4cd881866b
Use single apply_rope function across models (#10547) 2025-11-04 20:10:11 -05:00
comfyanonymous
27d1bd8829
Fix rope scaling. (#10560) 2025-10-30 22:51:58 -04:00
comfyanonymous
614cf9805e
Add a ScaleROPE node. Currently only works on WAN models. (#10559) 2025-10-30 22:11:38 -04:00
rattus128
95ca2e56c8
WAN2.2: Fix cache VRAM leak on error (#10308)
Same change pattern as 7e8dd275c243ad460ed5015d2e13611d81d2a569
applied to WAN2.2

If this suffers an exception (such as a VRAM oom) it will leave the
encode() and decode() methods which skips the cleanup of the WAN
feature cache. The comfy node cache then ultimately keeps a reference
this object which is in turn reffing large tensors from the failed
execution.

The feature cache is currently setup at a class variable on the
encoder/decoder however, the encode and decode functions always clear
it on both entry and exit of normal execution.

Its likely the design intent is this is usable as a streaming encoder
where the input comes in batches, however the functions as they are
today don't support that.

So simplify by bringing the cache back to local variable, so that if
it does VRAM OOM the cache itself is properly garbage when the
encode()/decode() functions dissappear from the stack.
2025-10-13 15:23:11 -04:00
Finn-Hecker
93d859cfaa
Fix type annotation syntax in MotionEncoder_tc __init__ (#10186)
## Summary
Fixed incorrect type hint syntax in `MotionEncoder_tc.__init__()` parameter list.

## Changes
- Line 647: Changed `num_heads=int` to `num_heads: int` 
- This corrects the parameter annotation from a default value assignment to proper type hint syntax

## Details
The parameter was using assignment syntax (`=`) instead of type annotation syntax (`:`), which would incorrectly set the default value to the `int` class itself rather than annotating the expected type.
2025-10-03 14:32:19 -07:00
rattus128
4965c0e2ac
WAN: Fix cache VRAM leak on error (#10141)
If this suffers an exception (such as a VRAM oom) it will leave the
encode() and decode() methods which skips the cleanup of the WAN
feature cache. The comfy node cache then ultimately keeps a reference
this object which is in turn reffing large tensors from the failed
execution.

The feature cache is currently setup at a class variable on the
encoder/decoder however, the encode and decode functions always clear
it on both entry and exit of normal execution.

Its likely the design intent is this is usable as a streaming encoder
where the input comes in batches, however the functions as they are
today don't support that.

So simplify by bringing the cache back to local variable, so that if
it does VRAM OOM the cache itself is properly garbage when the
encode()/decode() functions dissappear from the stack.
2025-10-01 18:42:16 -04:00
rattus128
653ceab414
Reduce Peak WAN inference VRAM usage - part II (#10062)
* flux: math: Use _addcmul to avoid expensive VRAM intermediate

The rope process can be the VRAM peak and this intermediate
for the addition result before releasing the original can OOM.
addcmul_ it.

* wan: Delete the self attention before cross attention

This saves VRAM when the cross attention and FFN are in play as the
VRAM peak.
2025-09-27 18:14:16 -04:00
comfyanonymous
fccab99ec0
Fix issue with .view() in HuMo. (#10014) 2025-09-24 20:09:42 -04:00
comfyanonymous
e8df53b764
Update WanAnimateToVideo to more easily extend videos. (#9959) 2025-09-19 18:48:56 -04:00
comfyanonymous
dc95b6acc0
Basic WIP support for the wan animate model. (#9939) 2025-09-19 03:07:17 -04:00
comfyanonymous
24b0fce099
Do padding of audio embed in model for humo for more flexibility. (#9935) 2025-09-18 19:54:16 -04:00
comfyanonymous
dd611a7700
Support the HuMo 17B model. (#9912) 2025-09-17 18:39:24 -04:00
comfyanonymous
9288c78fc5
Support the HuMo model. (#9903) 2025-09-17 00:12:48 -04:00
rattus128
e42682b24e
Reduce Peak WAN inference VRAM usage (#9898)
* flux: Do the xq and xk ropes one at a time

This was doing independendent interleaved tensor math on the q and k
tensors, leading to the holding of more than the minimum intermediates
in VRAM. On a bad day, it would VRAM OOM on xk intermediates.

Do everything q and then everything k, so torch can garbage collect
all of qs intermediates before k allocates its intermediates.

This reduces peak VRAM usage for some WAN2.2 inferences (at least).

* wan: Optimize qkv intermediates on attention

As commented. The former logic computed independent pieces of QKV in
parallel which help more inference intermediates in VRAM spiking
VRAM usage. Fully roping Q and garbage collecting the intermediates
before touching K reduces the peak inference VRAM usage.
2025-09-16 19:21:14 -04:00
Jedrzej Kosinski
d7f40442f9
Enable Runtime Selection of Attention Functions (#9639)
* Looking into a @wrap_attn decorator to look for 'optimized_attention_override' entry in transformer_options

* Created logging code for this branch so that it can be used to track down all the code paths where transformer_options would need to be added

* Fix memory usage issue with inspect

* Made WAN attention receive transformer_options, test node added to wan to test out attention override later

* Added **kwargs to all attention functions so transformer_options could potentially be passed through

* Make sure wrap_attn doesn't make itself recurse infinitely, attempt to load SageAttention and FlashAttention if not enabled so that they can be marked as available or not, create registry for available attention

* Turn off attention logging for now, make AttentionOverrideTestNode have a dropdown with available attention (this is a test node only)

* Make flux work with optimized_attention_override

* Add logs to verify optimized_attention_override is passed all the way into attention function

* Make Qwen work with optimized_attention_override

* Made hidream work with optimized_attention_override

* Made wan patches_replace work with optimized_attention_override

* Made SD3 work with optimized_attention_override

* Made HunyuanVideo work with optimized_attention_override

* Made Mochi work with optimized_attention_override

* Made LTX work with optimized_attention_override

* Made StableAudio work with optimized_attention_override

* Made optimized_attention_override work with ACE Step

* Made Hunyuan3D work with optimized_attention_override

* Make CosmosPredict2 work with optimized_attention_override

* Made CosmosVideo work with optimized_attention_override

* Made Omnigen 2 work with optimized_attention_override

* Made StableCascade work with optimized_attention_override

* Made AuraFlow work with optimized_attention_override

* Made Lumina work with optimized_attention_override

* Made Chroma work with optimized_attention_override

* Made SVD work with optimized_attention_override

* Fix WanI2VCrossAttention so that it expects to receive transformer_options

* Fixed Wan2.1 Fun Camera transformer_options passthrough

* Fixed WAN 2.1 VACE transformer_options passthrough

* Add optimized to get_attention_function

* Disable attention logs for now

* Remove attention logging code

* Remove _register_core_attention_functions, as we wouldn't want someone to call that, just in case

* Satisfy ruff

* Remove AttentionOverrideTest node, that's something to cook up for later
2025-09-12 18:07:38 -04:00
comfyanonymous
491755325c
Better s2v memory estimation. (#9584) 2025-08-27 19:02:42 -04:00
comfyanonymous
496888fd68
Improve s2v performance when generating videos longer than 120 frames. (#9582) 2025-08-27 16:06:40 -04:00
comfyanonymous
88aee596a3
WIP Wan 2.2 S2V model. (#9568) 2025-08-27 01:10:34 -04:00
Jedrzej Kosinski
fc247150fe
Implement EasyCache and Invent LazyCache (#9496)
* Attempting a universal implementation of EasyCache, starting with flux as test; I screwed up the math a bit, but when I set it just right it works.

* Fixed math to make threshold work as expected, refactored code to use EasyCacheHolder instead of a dict wrapped by object

* Use sigmas from transformer_options instead of timesteps to be compatible with a greater amount of models, make end_percent work

* Make log statement when not skipping useful, preparing for per-cond caching

* Added DIFFUSION_MODEL wrapper around forward function for wan model

* Add subsampling for heuristic inputs

* Add subsampling to output_prev (output_prev_subsampled now)

* Properly consider conds in EasyCache logic

* Created SuperEasyCache to test what happens if caching and reuse is moved outside the scope of conds, added PREDICT_NOISE wrapper to facilitate this test

* Change max reuse_threshold to 3.0

* Mark EasyCache/SuperEasyCache as experimental (beta)

* Make Lumina2 compatible with EasyCache

* Add EasyCache support for Qwen Image

* Fix missing comma, curse you Cursor

* Add EasyCache support to AceStep

* Add EasyCache support to Chroma

* Added EasyCache support to Cosmos Predict t2i

* Make EasyCache not crash with Cosmos Predict ImagToVideo latents, but does not work well at all

* Add EasyCache support to hidream

* Added EasyCache support to hunyuan video

* Added EasyCache support to hunyuan3d

* Added EasyCache support to LTXV (not very good, but does not crash)

* Implemented EasyCache for aura_flow

* Renamed SuperEasyCache to LazyCache, hardcoded subsample_factor to 8 on nodes

* Eatra logging when verbose is true for EasyCache
2025-08-22 22:41:08 -04:00
contentis
fe31ad0276
Add elementwise fusions (#9495)
* Add elementwise fusions

* Add addcmul pattern to Qwen
2025-08-22 19:39:15 -04:00
comfyanonymous
1702e6df16
Implement wan2.2 camera model. (#9357)
Use the old WanCameraImageToVideo node.
2025-08-15 17:29:58 -04:00
comfyanonymous
560d38f34c
Wan2.2 fun control support. (#9292) 2025-08-12 23:26:33 -04:00
comfyanonymous
1e638a140b
Tiny wan vae optimizations. (#9136) 2025-08-01 05:25:38 -04:00
comfyanonymous
da9dab7edd
Small wan camera memory optimization. (#9111) 2025-07-30 05:55:26 -04:00
comfyanonymous
dca6bdd4fa
Make wan2.2 5B i2v take a lot less memory. (#9102) 2025-07-29 19:44:18 -04:00
comfyanonymous
c60dc4177c
Remove unecessary clones in the wan2.2 VAE. (#9083) 2025-07-28 14:48:19 -04:00
comfyanonymous
a88788dce6
Wan 2.2 support. (#9080) 2025-07-28 08:00:23 -04:00
comfyanonymous
0621d73a9c
Remove useless code. (#9059) 2025-07-26 04:44:19 -04:00
comfyanonymous
e6e5d33b35
Remove useless code. (#9041)
This is only needed on old pytorch 2.0 and older.
2025-07-25 04:58:28 -04:00
comfyanonymous
5e5e46d40c
Not really tested WAN Phantom Support. (#8321) 2025-05-28 23:46:15 -04:00
comfyanonymous
f85c08df06
Make VACE conditionings stackable. (#8240) 2025-05-22 19:22:26 -04:00
George0726
c820ef950d
Add Wan-FUN Camera Control models and Add WanCameraImageToVideo node (#8013)
* support wan camera models

* fix by ruff check

* change camera_condition type; make camera_condition optional

* support camera trajectory nodes

* fix camera direction

---------

Co-authored-by: Qirui Sun <sunqr0667@126.com>
2025-05-15 19:00:43 -04:00
comfyanonymous
3041e5c354
Switch mochi and wan modes to use pytorch RMSNorm. (#7925)
* Switch genmo model to native RMSNorm.

* Switch WAN to native RMSNorm.
2025-05-03 19:07:55 -04:00
comfyanonymous
dbc726f80c
Better vace memory estimation. (#7875) 2025-04-29 20:42:00 -04:00
comfyanonymous
3ab231f01f
Fix issue with WAN VACE implementation. (#7724) 2025-04-21 23:36:12 -04:00
comfyanonymous
5d0d4ee98a
Add strength control for vace. (#7717) 2025-04-21 19:36:20 -04:00
comfyanonymous
ce22f687cc
Support for WAN VACE preview model. (#7711)
* Support for WAN VACE preview model.

* Remove print.
2025-04-21 14:40:29 -04:00
comfyanonymous
dbcfd092a2 Set default context_img_len to 257 2025-04-17 12:42:34 -04:00
comfyanonymous
c14429940f Support loading WAN FLF model. 2025-04-17 12:04:48 -04:00
comfyanonymous
0d720e4367 Don't hardcode length of context_img in wan code. 2025-04-17 06:25:39 -04:00
comfyanonymous
a2448fc527 Remove useless code. 2025-03-14 18:10:37 -04:00
comfyanonymous
6a0daa79b6 Make the SkipLayerGuidanceDIT node work on WAN. 2025-03-14 10:55:19 -04:00
comfyanonymous
f4dac8ab6f Wan code small cleanup. 2025-02-27 07:22:42 -05:00
comfyanonymous
3ea3bc8546 Fix wan issues when prompt length is long. 2025-02-26 20:34:02 -05:00
comfyanonymous
0270a0b41c Reduce artifacts on Wan by doing the patch embedding in fp32. 2025-02-26 16:59:26 -05:00
comfyanonymous
4bca7367f3 Don't try to use clip_fea on t2v model. 2025-02-26 08:38:09 -05:00
comfyanonymous
fa62287f1f More code reuse in wan.
Fix bug when changing the compute dtype on wan.
2025-02-26 05:22:29 -05:00
comfyanonymous
4ced06b879 WIP support for Wan I2V model. 2025-02-26 01:49:43 -05:00
comfyanonymous
9a66bb972d Make wan work with all latent resolutions.
Cleanup some code.
2025-02-25 19:56:04 -05:00
comfyanonymous
ea0f939df3 Fix issue with wan and other attention implementations. 2025-02-25 19:13:39 -05:00