ComfyUI

mirror of https://git.datalinker.icu/comfyanonymous/ComfyUI synced 2025-12-09 22:14:34 +08:00

Author	SHA1	Message	Date
rattus	4086acf3c2	Fix on-load VRAM OOM (#11144 ) slow down the CPU on model load to not run ahead. This fixes a VRAM on flux 2 load. I went to try and debug this with the memory trace pickles, which needs --disable-cuda-malloc which made the bug go away. So I tried this synchronize and it worked. The has some very complex interactions with the cuda malloc async and I dont have solid theory on this one yet. Still debugging but this gets us over the OOM for the moment.	2025-12-06 18:42:09 -05:00
comfyanonymous	50ca97e776	Speed up lora compute and lower memory usage by doing it in fp16. (#11161 )	2025-12-06 18:36:20 -05:00
comfyanonymous	43071e3de3	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
rattus	6be85c7920	mp: use look-ahead actuals for stream offload VRAM calculation (#11096 ) TIL that the WAN TE has a 2GB weight followed by 16MB as the next size down. This means that team 8GB VRAM would fully offload the TE in async offload mode as it just multiplied this giant size my the num streams. Do the more complex logic of summing up the upcoming to-load weight sizes to avoid triple counting this massive weight. partial unload does the converse of recording the NS most recent unloads as they go.	2025-12-03 23:28:44 -05:00
rattus	519c941165	Prs/lora reservations (reduce massive Lora reservations especially on Flux2) (#11069 ) * mp: only count the offload cost of math once This was previously bundling the combined weight storage and computation cost * ops: put all post async transfer compute on the main stream Some models have massive weights that need either complex dequantization or lora patching. Don't do these patchings on the offload stream, instead do them on the main stream to syncrhonize the potentially large vram spikes for these compute processes. This avoids having to assume a worst case scenario of multiple offload streams all spiking VRAM is parallel with whatever the main stream is doing.	2025-12-03 02:28:45 -05:00
rattus	f17251bec6	Account for the VRAM cost of weight offloading (#10733 ) * mm: default to 0 for NUM_STREAMS Dont count the compute stream as an offload stream. This makes async offload accounting easier. * mm: remove 128MB minimum This is from a previous offloading system requirement. Remove it to make behaviour of the loader and partial unloader consistent. * mp: order the module list by offload expense Calculate an approximate offloading temporary VRAM cost to offload a weight and primary order the module load list by that. In the simple case this is just the same as the module weight, but with Loras, a weight with a lora consumes considerably more VRAM to do the Lora application on-the-fly. This will slightly prioritize lora weights, but is really for proper VRAM offload accounting. * mp: Account for the VRAM cost of weight offloading when checking the VRAM headroom, assume that the weight needs to be offloaded, and only load if it has space for both the load and offload * the number of streams. As the weights are ordered from largest to smallest by offload cost this is guaranteed to fit in VRAM (tm), as all weights that follow will be smaller. Make the partial unload aware of this system as well by saving the budget for offload VRAM to the model state and accounting accordingly. Its possible that partial unload increases the size of the largest offloaded weights, and thus needs to unload a little bit more than asked to accomodate the bigger temp buffers. Honor the existing codes floor on model weight loading of 128MB by having the patcher honor this separately withough regard to offloading. Otherwise when MM specifies its 128MB minimum, MP will see the biggest weights, and budget that 128MB to only offload buffer and load nothing which isnt the intent of these minimums. The same clamp applies in case of partial offload of the currently loading model.	2025-11-27 01:03:03 -05:00
comfyanonymous	bdb10a583f	Fix loras not working on mixed fp8. (#10899 )	2025-11-26 00:07:58 -05:00
comfyanonymous	25022e0b09	Cleanup and fix issues with text encoder quants. (#10872 )	2025-11-25 01:48:53 -05:00
rattus	18e7d6dba5	mm/mp: always unload re-used but modified models (#10724 ) The partial unloader path in model re-use flow skips straight to the actual unload without any check of the patching UUID. This means that if you do an upscale flow with a model patch on an existing model, it will not apply your patchings. Fix by delaying the partial_unload until after the uuid checks. This is done by making partial_unload a model of partial_load where extra_mem is -ve.	2025-11-12 16:19:53 -05:00
comfyanonymous	dea899f221	Unload weights if vram usage goes up between runs. (#10690 )	2025-11-09 18:51:33 -05:00
comfyanonymous	e632e5de28	Add logging for model unloading. (#10692 )	2025-11-09 18:06:39 -05:00
comfyanonymous	44869ff786	Fix issue with pinned memory. (#10597 )	2025-11-01 17:25:59 -04:00
comfyanonymous	614cf9805e	Add a ScaleROPE node. Currently only works on WAN models. (#10559 )	2025-10-30 22:11:38 -04:00
rattus	513b0c46fb	Add RAM Pressure cache mode (#10454 ) * execution: Roll the UI cache into the outputs Currently the UI cache is parallel to the output cache with expectations of being a content superset of the output cache. At the same time the UI and output cache are maintained completely seperately, making it awkward to free the output cache content without changing the behaviour of the UI cache. There are two actual users (getters) of the UI cache. The first is the case of a direct content hit on the output cache when executing a node. This case is very naturally handled by merging the UI and outputs cache. The second case is the history JSON generation at the end of the prompt. This currently works by asking the cache for all_node_ids and then pulling the cache contents for those nodes. all_node_ids is the nodes of the dynamic prompt. So fold the UI cache into the output cache. The current UI cache setter now writes to a prompt-scope dict. When the output cache is set, just get this value from the dict and tuple up with the outputs. When generating the history, simply iterate prompt-scope dict. This prepares support for more complex caching strategies (like RAM pressure caching) where less than 1 workflow will be cached and it will be desirable to keep the UI cache and output cache in sync. * sd: Implement RAM getter for VAE * model_patcher: Implement RAM getter for ModelPatcher * sd: Implement RAM getter for CLIP * Implement RAM Pressure cache Implement a cache sensitive to RAM pressure. When RAM headroom drops down below a certain threshold, evict RAM-expensive nodes from the cache. Models and tensors are measured directly for RAM usage. An OOM score is then computed based on the RAM usage of the node. Note the due to indirection through shared objects (like a model patcher), multiple nodes can account the same RAM as their individual usage. The intent is this will free chains of nodes particularly model loaders and associate loras as they all score similar and are sorted in close to each other. Has a bias towards unloading model nodes mid flow while being able to keep results like text encodings and VAE. * execution: Convert the cache entry to NamedTuple As commented in review. Convert this to a named tuple and abstract away the tuple type completely from graph.py.	2025-10-30 17:39:02 -04:00
Jedrzej Kosinski	998bf60beb	Add units/info for the numbers displayed on 'load completely' and 'load partially' log messages (#10538 )	2025-10-29 19:37:06 -04:00
comfyanonymous	25de7b1bfa	Try to fix slow load issue on low ram hardware with pinned mem. (#10536 )	2025-10-29 17:20:27 -04:00
comfyanonymous	ec4fc2a09a	Fix case of weights not being unpinned. (#10533 )	2025-10-29 15:48:06 -04:00
comfyanonymous	3fa7a5c04a	Speed up offloading using pinned memory. (#10526 ) To enable this feature use: --fast pinned_memory	2025-10-29 00:21:01 -04:00
comfyanonymous	f1dd6e50f8	Fix bug with applying loras on fp8 scaled without fp8 ops. (#10279 )	2025-10-09 19:02:40 -04:00
comfyanonymous	139addd53c	More surgical fix for #10267 (#10276 )	2025-10-09 16:37:35 -04:00
comfyanonymous	3412d53b1d	USO style reference. (#9677 ) Load the projector.safetensors file with the ModelPatchLoader node and use the siglip_vision_patch14_384.safetensors "clip vision" model and the USOStyleReferenceNode.	2025-09-02 15:36:22 -04:00
comfyanonymous	0963493a9c	Support for Qwen Diffsynth Controlnets canny and depth. (#9465 ) These are not real controlnets but actually a patch on the model so they will be treated as such. Put them in the models/model_patches/ folder. Use the new ModelPatchLoader and QwenImageDiffsynthControlnet nodes.	2025-08-20 22:26:37 -04:00
City	d9277301d2	Initial code for new SLG node (#8759 )	2025-07-02 20:13:43 -04:00
Kohaku-Blueleaf	520eb77b72	LoRA Trainer: LoRA training node in weight adapter scheme (#8446 )	2025-06-13 19:25:59 -04:00
Jedrzej Kosinski	2e24a15905	Call unpatch_hooks at the start of ModelPatcher.partially_unload (#7253 ) * Call unpatch_hooks at the start of ModelPatcher.partially_unload * Only call unpatch_hooks in partially_unload if lowvram is possible	2025-03-16 06:02:45 -04:00
Jedrzej Kosinski	528d1b3563	When cached_hook_patches contain weights for hooks, only use hook_backup for unused keys (#7067 )	2025-03-09 04:26:31 -04:00
comfyanonymous	fa62287f1f	More code reuse in wan. Fix bug when changing the compute dtype on wan.	2025-02-26 05:22:29 -05:00
comfyanonymous	019c7029ea	Add a way to set a different compute dtype for the model at runtime. Currently only works for diffusion models.	2025-02-13 20:34:03 -05:00
comfyanonymous	ab888e1e0b	Add add_weight_wrapper function to model patcher. Functions can now easily be added to wrap/modify model weights.	2025-02-12 05:55:35 -05:00
Jedrzej Kosinski	6c9bd11fa3	Hooks Part 2 - TransformerOptionsHook and AdditionalModelsHook (#6377 ) * Add 'sigmas' to transformer_options so that downstream code can know about the full scope of current sampling run, fix Hook Keyframes' guarantee_steps=1 inconsistent behavior with sampling split across different Sampling nodes/sampling runs by referencing 'sigmas' * Cleaned up hooks.py, refactored Hook.should_register and add_hook_patches to use target_dict instead of target so that more information can be provided about the current execution environment if needed * Refactor WrapperHook into TransformerOptionsHook, as there is no need to separate out Wrappers/Callbacks/Patches into different hook types (all affect transformer_options) * Refactored HookGroup to also store a dictionary of hooks separated by hook_type, modified necessary code to no longer need to manually separate out hooks by hook_type * In inner_sample, change "sigmas" to "sampler_sigmas" in transformer_options to not conflict with the "sigmas" that will overwrite "sigmas" in _calc_cond_batch * Refactored 'registered' to be HookGroup instead of a list of Hooks, made AddModelsHook operational and compliant with should_register result, moved TransformerOptionsHook handling out of ModelPatcher.register_all_hook_patches, support patches in TransformerOptionsHook properly by casting any patches/wrappers/hooks to proper device at sample time * Made hook clone code sane, made clear ObjectPatchHook and SetInjectionsHook are not yet operational * Fix performance of hooks when hooks are appended via Cond Pair Set Props nodes by properly caching between positive and negative conds, make hook_patches_backup behave as intended (in the case that something pre-registers WeightHooks on the ModelPatcher instead of registering it at sample time) * Filter only registered hooks on self.conds in CFGGuider.sample * Make hook_scope functional for TransformerOptionsHook * removed 4 whitespace lines to satisfy Ruff, * Add a get_injections function to ModelPatcher * Made TransformerOptionsHook contribute to registered hooks properly, added some doc strings and removed a so-far unused variable * Rename AddModelsHooks to AdditionalModelsHook, rename SetInjectionsHook to InjectionsHook (not yet implemented, but at least getting the naming figured out) * Clean up a typehint	2025-01-11 12:20:23 -05:00
Chenlei Hu	d055325783	Document get_attr and get_model_object (#6357 ) * Document get_attr and get_model_object * Update model_patcher.py * Update model_patcher.py * Update model_patcher.py	2025-01-06 20:12:22 -05:00
Jedrzej Kosinski	3507870535	Add 'sigmas' to transformer_options so that downstream code can know about the full scope of current sampling run, fix Hook Keyframes' guarantee_steps=1 inconsistent behavior with sampling split across different Sampling nodes/sampling runs by referencing 'sigmas' (#6273 )	2024-12-30 03:42:49 -05:00
comfyanonymous	b504bd606d	Add ruff rule for empty line with trailing whitespace.	2024-12-28 05:23:08 -05:00
Chenlei Hu	d7969cb070	Replace print with logging (#6138 ) * Replace print with logging * nit * nit * nit * nit * nit * nit	2024-12-20 16:24:55 -05:00
comfyanonymous	452179fe4f	Make ModelPatcher class clone function work with inheritance.	2024-12-03 13:57:57 -05:00
Jedrzej Kosinski	0ee322ec5f	ModelPatcher Overhaul and Hook Support (#5583 ) * Added hook_patches to ModelPatcher for weights (model) * Initial changes to calc_cond_batch to eventually support hook_patches * Added current_patcher property to BaseModel * Consolidated add_hook_patches_as_diffs into add_hook_patches func, fixed fp8 support for model-as-lora feature * Added call to initialize_timesteps on hooks in process_conds func, and added call prepare current keyframe on hooks in calc_cond_batch * Added default_conds support in calc_cond_batch func * Added initial set of hook-related nodes, added code to register hooks for loras/model-as-loras, small renaming/refactoring * Made CLIP work with hook patches * Added initial hook scheduling nodes, small renaming/refactoring * Fixed MaxSpeed and default conds implementations * Added support for adding weight hooks that aren't registered on the ModelPatcher at sampling time * Made Set Clip Hooks node work with hooks from Create Hook nodes, began work on better Create Hook Model As LoRA node * Initial work on adding 'model_as_lora' lora type to calculate_weight * Continued work on simpler Create Hook Model As LoRA node, started to implement ModelPatcher callbacks, attachments, and additional_models * Fix incorrect ref to create_hook_patches_clone after moving function * Added injections support to ModelPatcher + necessary bookkeeping, added additional_models support in ModelPatcher, conds, and hooks * Added wrappers to ModelPatcher to facilitate standardized function wrapping * Started scaffolding for other hook types, refactored get_hooks_from_cond to organize hooks by type * Fix skip_until_exit logic bug breaking injection after first run of model * Updated clone_has_same_weights function to account for new ModelPatcher properties, improved AutoPatcherEjector usage in partially_load * Added WrapperExecutor for non-classbound functions, added calc_cond_batch wrappers * Refactored callbacks+wrappers to allow storing lists by id * Added forward_timestep_embed_patch type, added helper functions on ModelPatcher for emb_patch and forward_timestep_embed_patch, added helper functions for removing callbacks/wrappers/additional_models by key, added custom_should_register prop to hooks * Added get_attachment func on ModelPatcher * Implement basic MemoryCounter system for determing with cached weights due to hooks should be offloaded in hooks_backup * Modified ControlNet/T2IAdapter get_control function to receive transformer_options as additional parameter, made the model_options stored in extra_args in inner_sample be a clone of the original model_options instead of same ref * Added create_model_options_clone func, modified type annotations to use __future__ so that I can use the better type annotations * Refactored WrapperExecutor code to remove need for WrapperClassExecutor (now gone), added sampler.sample wrapper (pending review, will likely keep but will see what hacks this could currently let me get rid of in ACN/ADE) * Added Combine versions of Cond/Cond Pair Set Props nodes, renamed Pair Cond to Cond Pair, fixed default conds never applying hooks (due to hooks key typo) * Renamed Create Hook Model As LoRA nodes to make the test node the main one (more changes pending) * Added uuid to conds in CFGGuider and uuids to transformer_options to allow uniquely identifying conds in batches during sampling * Fixed models not being unloaded properly due to current_patcher reference; the current ComfyUI model cleanup code requires that nothing else has a reference to the ModelPatcher instances * Fixed default conds not respecting hook keyframes, made keyframes not reset cache when strength is unchanged, fixed Cond Set Default Combine throwing error, fixed model-as-lora throwing error during calculate_weight after a recent ComfyUI update, small refactoring/scaffolding changes for hooks * Changed CreateHookModelAsLoraTest to be the new CreateHookModelAsLora, rename old ones as 'direct' and will be removed prior to merge * Added initial support within CLIP Text Encode (Prompt) node for scheduling weight hook CLIP strength via clip_start_percent/clip_end_percent on conds, added schedule_clip toggle to Set CLIP Hooks node, small cleanup/fixes * Fix range check in get_hooks_for_clip_schedule so that proper keyframes get assigned to corresponding ranges * Optimized CLIP hook scheduling to treat same strength as same keyframe * Less fragile memory management. * Make encode_from_tokens_scheduled call cleaner, rollback change in model_patcher.py for hook_patches_backup dict * Fix issue. * Remove useless function. * Prevent and detect some types of memory leaks. * Run garbage collector when switching workflow if needed. * Moved WrappersMP/CallbacksMP/WrapperExecutor to patcher_extension.py * Refactored code to store wrappers and callbacks in transformer_options, added apply_model and diffusion_model.forward wrappers * Fix issue. * Refactored hooks in calc_cond_batch to be part of get_area_and_mult tuple, added extra_hooks to ControlBase to allow custom controlnets w/ hooks, small cleanup and renaming * Fixed inconsistency of results when schedule_clip is set to False, small renaming/typo fixing, added initial support for ControlNet extra_hooks to work in tandem with normal cond hooks, initial work on calc_cond_batch merging all subdicts in returned transformer_options * Modified callbacks and wrappers so that unregistered types can be used, allowing custom_nodes to have their own unique callbacks/wrappers if desired * Updated different hook types to reflect actual progress of implementation, initial scaffolding for working WrapperHook functionality * Fixed existing weight hook_patches (pre-registered) not working properly for CLIP * Removed Register/Direct hook nodes since they were present only for testing, removed diff-related weight hook calculation as improved_memory removes unload_model_clones and using sample time registered hooks is less hacky * Added clip scheduling support to all other native ComfyUI text encoding nodes (sdxl, flux, hunyuan, sd3) * Made WrapperHook functional, added another wrapper/callback getter, added ON_DETACH callback to ModelPatcher * Made opt_hooks append by default instead of replace, renamed comfy.hooks set functions to be more accurate * Added apply_to_conds to Set CLIP Hooks, modified relevant code to allow text encoding to automatically apply hooks to output conds when apply_to_conds is set to True * Fix cached_hook_patches not respecting target_device/memory_counter results * Fixed issue with setting weights from hooks instead of copying them, added additional memory_counter check when caching hook patches * Remove unnecessary torch.no_grad calls for hook patches * Increased MemoryCounter minimum memory to leave free by 2 until a better way to get inference memory estimate of currently loaded models exists For encode_from_tokens_scheduled, allow start_percent and end_percent in add_dict to limit which scheduled conds get encoded for optimization purposes * Removed a .to call on results of calculate_weight in patch_hook_weight_to_device that was screwing up the intermediate results for fp8 prior to being passed into stochastic_rounding call * Made encode_from_tokens_scheduled work when no hooks are set on patcher * Small cleanup of comments * Turn off hook patch caching when only 1 hook present in sampling, replace some current_hook = None with calls to self.patch_hooks(None) instead to avoid a potential edge case * On Cond/Cond Pair nodes, removed opt_ prefix from optional inputs * Allow both FLOATS and FLOAT for floats_strength input * Revert change, does not work * Made patch_hook_weight_to_device respect set_func and convert_func * Make discard_model_sampling True by default * Add changes manually from 'master' so merge conflict resolution goes more smoothly * Cleaned up text encode nodes with just a single clip.encode_from_tokens_scheduled call * Make sure encode_from_tokens_scheduled will respect use_clip_schedule on clip * Made nodes in nodes_hooks be marked as experimental (beta) * Add get_nested_additional_models for cases where additional_models could have their own additional_models, and add robustness for circular additional_models references * Made finalize_default_conds area math consistent with other sampling code * Changed 'opt_hooks' input of Cond/Cond Pair Set Default Combine nodes to 'hooks' * Remove a couple old TODO's and a no longer necessary workaround	2024-12-02 14:51:02 -05:00
comfyanonymous	79d5ceae6e	Improved memory management. (#5450 ) * Less fragile memory management. * Fix issue. * Remove useless function. * Prevent and detect some types of memory leaks. * Run garbage collector when switching workflow if needed. * Fix issue.	2024-12-02 14:39:34 -05:00
comfyanonymous	839ed3368e	Some improvements to the lowvram unloading.	2024-11-22 20:59:15 -05:00
comfyanonymous	bc6be6c11e	Some fixes to the lowvram system.	2024-11-22 16:40:04 -05:00
comfyanonymous	41444b5236	Add some new weight patching functionality. Add a way to reshape lora weights. Allow weight patches to all weight not just .weight and .bias Add a way for a lora to set a weight to a specific value.	2024-11-21 07:19:17 -05:00
comfyanonymous	f9f9faface	Fixed model merging issue with scaled fp8.	2024-10-20 06:24:31 -04:00
comfyanonymous	a68bbafddb	Support diffusion models with scaled fp8 weights.	2024-10-19 23:47:42 -04:00
comfyanonymous	3bb4dec720	Fix issue with loras, lowvram and --fast fp8.	2024-09-28 14:42:32 -04:00
comfyanonymous	0849c80e2a	get_key_patches now works without unloading the model.	2024-09-17 01:57:59 -04:00
comfyanonymous	9d720187f1	types -> comfy_types to fix import issue.	2024-09-12 03:57:46 -04:00
comfyanonymous	2ca8f6e23d	Make the stochastic fp8 rounding reproducible.	2024-08-26 15:12:06 -04:00
comfyanonymous	c6812947e9	Fix potential memory leak.	2024-08-26 02:07:32 -04:00
comfyanonymous	7df42b9a23	Fix dora.	2024-08-23 04:58:59 -04:00
comfyanonymous	c26ca27207	Move calculate function to comfy.lora	2024-08-22 17:12:00 -04:00
comfyanonymous	7c6bb84016	Code cleanups.	2024-08-22 17:05:12 -04:00

1 2 3

115 Commits