ComfyUI

mirror of https://git.datalinker.icu/comfyanonymous/ComfyUI synced 2025-12-09 22:14:34 +08:00

Author	SHA1	Message	Date
rattus128	653ceab414	Reduce Peak WAN inference VRAM usage - part II (#10062 ) * flux: math: Use _addcmul to avoid expensive VRAM intermediate The rope process can be the VRAM peak and this intermediate for the addition result before releasing the original can OOM. addcmul_ it. * wan: Delete the self attention before cross attention This saves VRAM when the cross attention and FFN are in play as the VRAM peak.	2025-09-27 18:14:16 -04:00
comfyanonymous	fccab99ec0	Fix issue with .view() in HuMo. (#10014 )	2025-09-24 20:09:42 -04:00
comfyanonymous	24b0fce099	Do padding of audio embed in model for humo for more flexibility. (#9935 )	2025-09-18 19:54:16 -04:00
comfyanonymous	dd611a7700	Support the HuMo 17B model. (#9912 )	2025-09-17 18:39:24 -04:00
comfyanonymous	9288c78fc5	Support the HuMo model. (#9903 )	2025-09-17 00:12:48 -04:00
rattus128	e42682b24e	Reduce Peak WAN inference VRAM usage (#9898 ) * flux: Do the xq and xk ropes one at a time This was doing independendent interleaved tensor math on the q and k tensors, leading to the holding of more than the minimum intermediates in VRAM. On a bad day, it would VRAM OOM on xk intermediates. Do everything q and then everything k, so torch can garbage collect all of qs intermediates before k allocates its intermediates. This reduces peak VRAM usage for some WAN2.2 inferences (at least). * wan: Optimize qkv intermediates on attention As commented. The former logic computed independent pieces of QKV in parallel which help more inference intermediates in VRAM spiking VRAM usage. Fully roping Q and garbage collecting the intermediates before touching K reduces the peak inference VRAM usage.	2025-09-16 19:21:14 -04:00
Jedrzej Kosinski	d7f40442f9	Enable Runtime Selection of Attention Functions (#9639 ) * Looking into a @wrap_attn decorator to look for 'optimized_attention_override' entry in transformer_options * Created logging code for this branch so that it can be used to track down all the code paths where transformer_options would need to be added * Fix memory usage issue with inspect * Made WAN attention receive transformer_options, test node added to wan to test out attention override later * Added *kwargs to all attention functions so transformer_options could potentially be passed through Make sure wrap_attn doesn't make itself recurse infinitely, attempt to load SageAttention and FlashAttention if not enabled so that they can be marked as available or not, create registry for available attention * Turn off attention logging for now, make AttentionOverrideTestNode have a dropdown with available attention (this is a test node only) * Make flux work with optimized_attention_override * Add logs to verify optimized_attention_override is passed all the way into attention function * Make Qwen work with optimized_attention_override * Made hidream work with optimized_attention_override * Made wan patches_replace work with optimized_attention_override * Made SD3 work with optimized_attention_override * Made HunyuanVideo work with optimized_attention_override * Made Mochi work with optimized_attention_override * Made LTX work with optimized_attention_override * Made StableAudio work with optimized_attention_override * Made optimized_attention_override work with ACE Step * Made Hunyuan3D work with optimized_attention_override * Make CosmosPredict2 work with optimized_attention_override * Made CosmosVideo work with optimized_attention_override * Made Omnigen 2 work with optimized_attention_override * Made StableCascade work with optimized_attention_override * Made AuraFlow work with optimized_attention_override * Made Lumina work with optimized_attention_override * Made Chroma work with optimized_attention_override * Made SVD work with optimized_attention_override * Fix WanI2VCrossAttention so that it expects to receive transformer_options * Fixed Wan2.1 Fun Camera transformer_options passthrough * Fixed WAN 2.1 VACE transformer_options passthrough * Add optimized to get_attention_function * Disable attention logs for now * Remove attention logging code * Remove _register_core_attention_functions, as we wouldn't want someone to call that, just in case * Satisfy ruff * Remove AttentionOverrideTest node, that's something to cook up for later	2025-09-12 18:07:38 -04:00
comfyanonymous	491755325c	Better s2v memory estimation. (#9584 )	2025-08-27 19:02:42 -04:00
comfyanonymous	496888fd68	Improve s2v performance when generating videos longer than 120 frames. (#9582 )	2025-08-27 16:06:40 -04:00
comfyanonymous	88aee596a3	WIP Wan 2.2 S2V model. (#9568 )	2025-08-27 01:10:34 -04:00
Jedrzej Kosinski	fc247150fe	Implement EasyCache and Invent LazyCache (#9496 ) * Attempting a universal implementation of EasyCache, starting with flux as test; I screwed up the math a bit, but when I set it just right it works. * Fixed math to make threshold work as expected, refactored code to use EasyCacheHolder instead of a dict wrapped by object * Use sigmas from transformer_options instead of timesteps to be compatible with a greater amount of models, make end_percent work * Make log statement when not skipping useful, preparing for per-cond caching * Added DIFFUSION_MODEL wrapper around forward function for wan model * Add subsampling for heuristic inputs * Add subsampling to output_prev (output_prev_subsampled now) * Properly consider conds in EasyCache logic * Created SuperEasyCache to test what happens if caching and reuse is moved outside the scope of conds, added PREDICT_NOISE wrapper to facilitate this test * Change max reuse_threshold to 3.0 * Mark EasyCache/SuperEasyCache as experimental (beta) * Make Lumina2 compatible with EasyCache * Add EasyCache support for Qwen Image * Fix missing comma, curse you Cursor * Add EasyCache support to AceStep * Add EasyCache support to Chroma * Added EasyCache support to Cosmos Predict t2i * Make EasyCache not crash with Cosmos Predict ImagToVideo latents, but does not work well at all * Add EasyCache support to hidream * Added EasyCache support to hunyuan video * Added EasyCache support to hunyuan3d * Added EasyCache support to LTXV (not very good, but does not crash) * Implemented EasyCache for aura_flow * Renamed SuperEasyCache to LazyCache, hardcoded subsample_factor to 8 on nodes * Eatra logging when verbose is true for EasyCache	2025-08-22 22:41:08 -04:00
contentis	fe31ad0276	Add elementwise fusions (#9495 ) * Add elementwise fusions * Add addcmul pattern to Qwen	2025-08-22 19:39:15 -04:00
comfyanonymous	1702e6df16	Implement wan2.2 camera model. (#9357 ) Use the old WanCameraImageToVideo node.	2025-08-15 17:29:58 -04:00
comfyanonymous	560d38f34c	Wan2.2 fun control support. (#9292 )	2025-08-12 23:26:33 -04:00
comfyanonymous	da9dab7edd	Small wan camera memory optimization. (#9111 )	2025-07-30 05:55:26 -04:00
comfyanonymous	dca6bdd4fa	Make wan2.2 5B i2v take a lot less memory. (#9102 )	2025-07-29 19:44:18 -04:00
comfyanonymous	a88788dce6	Wan 2.2 support. (#9080 )	2025-07-28 08:00:23 -04:00
comfyanonymous	5e5e46d40c	Not really tested WAN Phantom Support. (#8321 )	2025-05-28 23:46:15 -04:00
comfyanonymous	f85c08df06	Make VACE conditionings stackable. (#8240 )	2025-05-22 19:22:26 -04:00
George0726	c820ef950d	Add Wan-FUN Camera Control models and Add WanCameraImageToVideo node (#8013 ) * support wan camera models * fix by ruff check * change camera_condition type; make camera_condition optional * support camera trajectory nodes * fix camera direction --------- Co-authored-by: Qirui Sun <sunqr0667@126.com>	2025-05-15 19:00:43 -04:00
comfyanonymous	3041e5c354	Switch mochi and wan modes to use pytorch RMSNorm. (#7925 ) * Switch genmo model to native RMSNorm. * Switch WAN to native RMSNorm.	2025-05-03 19:07:55 -04:00
comfyanonymous	dbc726f80c	Better vace memory estimation. (#7875 )	2025-04-29 20:42:00 -04:00
comfyanonymous	3ab231f01f	Fix issue with WAN VACE implementation. (#7724 )	2025-04-21 23:36:12 -04:00
comfyanonymous	5d0d4ee98a	Add strength control for vace. (#7717 )	2025-04-21 19:36:20 -04:00
comfyanonymous	ce22f687cc	Support for WAN VACE preview model. (#7711 ) * Support for WAN VACE preview model. * Remove print.	2025-04-21 14:40:29 -04:00
comfyanonymous	dbcfd092a2	Set default context_img_len to 257	2025-04-17 12:42:34 -04:00
comfyanonymous	c14429940f	Support loading WAN FLF model.	2025-04-17 12:04:48 -04:00
comfyanonymous	0d720e4367	Don't hardcode length of context_img in wan code.	2025-04-17 06:25:39 -04:00
comfyanonymous	a2448fc527	Remove useless code.	2025-03-14 18:10:37 -04:00
comfyanonymous	6a0daa79b6	Make the SkipLayerGuidanceDIT node work on WAN.	2025-03-14 10:55:19 -04:00
comfyanonymous	f4dac8ab6f	Wan code small cleanup.	2025-02-27 07:22:42 -05:00
comfyanonymous	3ea3bc8546	Fix wan issues when prompt length is long.	2025-02-26 20:34:02 -05:00
comfyanonymous	0270a0b41c	Reduce artifacts on Wan by doing the patch embedding in fp32.	2025-02-26 16:59:26 -05:00
comfyanonymous	4bca7367f3	Don't try to use clip_fea on t2v model.	2025-02-26 08:38:09 -05:00
comfyanonymous	fa62287f1f	More code reuse in wan. Fix bug when changing the compute dtype on wan.	2025-02-26 05:22:29 -05:00
comfyanonymous	4ced06b879	WIP support for Wan I2V model.	2025-02-26 01:49:43 -05:00
comfyanonymous	9a66bb972d	Make wan work with all latent resolutions. Cleanup some code.	2025-02-25 19:56:04 -05:00
comfyanonymous	ea0f939df3	Fix issue with wan and other attention implementations.	2025-02-25 19:13:39 -05:00
comfyanonymous	f37551c1d2	Change wan rope implementation to the flux one. Should be more compatible.	2025-02-25 19:11:14 -05:00
comfyanonymous	63023011b9	WIP support for Wan t2v model.	2025-02-25 17:20:35 -05:00

40 Commits