* mp: only count the offload cost of math once
This was previously bundling the combined weight storage and computation
cost
* ops: put all post async transfer compute on the main stream
Some models have massive weights that need either complex
dequantization or lora patching. Don't do these patchings on the offload
stream, instead do them on the main stream to syncrhonize the
potentially large vram spikes for these compute processes. This avoids
having to assume a worst case scenario of multiple offload streams
all spiking VRAM is parallel with whatever the main stream is doing.
This should fix the problem with the portable updater not working with portables created from a separate branch on the repo.
This does not affect any current portables who were all created on the master branch.
* Added output_matchtypes to generated json for v3, initial backend support for MatchType, created nodes_logic.py and added SwitchNode
* Fixed providing list of allowed_types
* Add workaround in validation.py for V3 Combo outputs not working as Combo inputs
* Make match type receive_type pass validation
* Also add MatchType check to input_type in validation - will likely trigger when connecting to non-lazy stuff
* Make sure this PR only has MatchType stuff
* Initial work on DynamicCombo
* Add get_dynamic function, not yet filled out correctly
* Mark Switch node as Beta
* Make sure other unfinished dynamic types are not accidentally used
* Send DynamicCombo.Option inputs in the same format as normal v1 inputs
* add dynamic combo test node
* Support validation of inputs and outputs
* Add missing input params to DynamicCombo.Input
* Add get_all function to inputs for id validation purposes
* Fix imports for v3 returning everything when doing io/ui/IO/UI instead of what is in __all__ of _io.py and _ui.py
* Modifying behavior of get_dynamic in V3 + serialization so can be used in execution code
* Fix v3 schema validation code after changes
* Refactor hidden_values for v3 in execution.py to be more general v3_data, add helper functions for dynamic behavior, preparing for restructuring dynamic type into object (not finished yet)
* Add nesting of inputs on DynamicCombo during execution
* Work with latest frontend commits
* Fix cringe arrows
* frontend will no longer namespace dynamic inputs widgets so reflect that in code, refactor build_nested_inputs
* Prepare Autogrow support for the love of the game
* satisfy ruff
* Create test nodes for Autogrow to collab with frontend development
* Add nested combo to DCTestNode
* Remove array support from build_nested_inputs, properly handle missing expected values
* Make execution.validate_inputs properly validate required dynamic inputs, renamed dynamic_data to dynamic_paths for clarity
* MatchType does not need any DynamicInput/Output features on backend; will increase compatibility with dynamic types
* Probably need this for ruff check
* Change MatchType to have template be the first and only required param; output id's do nothing right now, so no need
* Fix merge regression with LatentUpscaleModel type not being put in __all__ for _io.py, fix invalid type hint for validate_inputs
* Make Switch node inputs optional, disallow both inputs from being missing, and still work properly with lazy; when one input is missing, use the other no matter what the switch is set to
* Satisfy ruff
* Move MatchType code above the types that inherit from DynamicInput
* Add DynamicSlot type, awaiting frontend support
* Make curr_prefix creation happen in Autogrow, move curr_prefix in DynamicCombo to only be created if input exists in live_inputs
* I was confused, fixing accidentally redundant curr_prefix addition in Autogrow
* Make sure Autogrow inputs are force_input = True when WidgetInput, fix runtime validation by removing original input from expected inputs, fix min/max bounds, change test nodes slightly
* Remove unnecessary id usage in Autogrow test node outputs
* Commented out Switch node + test nodes
* Remove commented out code from Autogrow
* Make TemplatePrefix max more clear, allow max == 1
* Replace all dict[str] with dict[str, Any]
* Renamed add_to_dict_live_inputs to expand_schema_for_dynamic
* Fixed typo in DynamicSlot input code
* note about live_inputs not being present soon in get_v1_info (internal function anyway)
* For now, hide DynamicCombo and Autogrow from public interface
* Removed comment
* hunyuan upsampler: rework imports
Remove the transitive import of VideoConv3d and Resnet and takes these
from actual implementation source.
* model: remove unused give_pre_end
According to git grep, this is not used now, and was not used in the
initial commit that introduced it (see below).
This semantic is difficult to implement temporal roll VAE for (and would
defeat the purpose). Rather than implement the complex if, just delete
the unused feature.
(venv) rattus@rattus-box2:~/ComfyUI$ git log --oneline
220afe33 (HEAD) Initial commit.
(venv) rattus@rattus-box2:~/ComfyUI$ git grep give_pre
comfy/ldm/modules/diffusionmodules/model.py: resolution, z_channels, give_pre_end=False, tanh_out=False, use_linear_attn=False,
comfy/ldm/modules/diffusionmodules/model.py: self.give_pre_end = give_pre_end
comfy/ldm/modules/diffusionmodules/model.py: if self.give_pre_end:
(venv) rattus@rattus-box2:~/ComfyUI$ git co origin/master
Previous HEAD position was 220afe33 Initial commit.
HEAD is now at 9d8a8179 Enable async offloading by default on Nvidia. (#10953)
(venv) rattus@rattus-box2:~/ComfyUI$ git grep give_pre
comfy/ldm/modules/diffusionmodules/model.py: resolution, z_channels, give_pre_end=False, tanh_out=False, use_linear_attn=False,
comfy/ldm/modules/diffusionmodules/model.py: self.give_pre_end = give_pre_end
comfy/ldm/modules/diffusionmodules/model.py: if self.give_pre_end:
* move refiner VAE temporal roller to core
Move the carrying conv op to the common VAE code and give it a better
name. Roll the carry implementation logic for Resnet into the base
class and scrap the Hunyuan specific subclass.
* model: Add temporal roll to main VAE decoder
If there are no attention layers, its a standard resnet and VideoConv3d
is asked for, substitute in the temporal rolloing VAE algorithm. This
reduces VAE usage by the temporal dimension (can be huge VRAM savings).
* model: Add temporal roll to main VAE encoder
If there are no attention layers, its a standard resnet and VideoConv3d
is asked for, substitute in the temporal rolling VAE algorithm. This
reduces VAE usage by the temporal dimension (can be huge VRAM savings).
Added PATCH http method to access-control-allow-header-methods header because there are now PATCH endpoints exposed in the API.
See 277237ccc1/api_server/routes/internal/internal_routes.py (L34) for an example of an API endpoint that uses the PATCH method.
These are not actual controlnets so put it in the models/model_patches
folder and use the ModelPatchLoader + QwenImageDiffsynthControlnet node to
use it.
* feat(security): add System User protection with `__` prefix
Add protected namespace for custom nodes to store sensitive data
(API keys, licenses) that cannot be accessed via HTTP endpoints.
Key changes:
- New API: get_system_user_directory() for internal access
- New API: get_public_user_directory() with structural blocking
- 3-layer defense: header validation, path blocking, creation prevention
- 54 tests covering security, edge cases, and backward compatibility
System Users use `__` prefix (e.g., __system, __cache) following
Python's private member convention. They exist in user_directory/
but are completely blocked from /userdata HTTP endpoints.
* style: remove unused imports
* Support video tiny VAEs
* lighttaew scaling fix
* Also support video taes in previews
Only first frame for now as live preview playback is currently only available through VHS custom nodes.
* Support Wan 2.1 lightVAE
* Relocate elif block and set Wan VAE dim directly without using pruning rate for lightvae
* mm: default to 0 for NUM_STREAMS
Dont count the compute stream as an offload stream. This makes async
offload accounting easier.
* mm: remove 128MB minimum
This is from a previous offloading system requirement. Remove it to
make behaviour of the loader and partial unloader consistent.
* mp: order the module list by offload expense
Calculate an approximate offloading temporary VRAM cost to offload a
weight and primary order the module load list by that. In the simple
case this is just the same as the module weight, but with Loras, a
weight with a lora consumes considerably more VRAM to do the Lora
application on-the-fly.
This will slightly prioritize lora weights, but is really for
proper VRAM offload accounting.
* mp: Account for the VRAM cost of weight offloading
when checking the VRAM headroom, assume that the weight needs to be
offloaded, and only load if it has space for both the load and offload
* the number of streams.
As the weights are ordered from largest to smallest by offload cost
this is guaranteed to fit in VRAM (tm), as all weights that follow
will be smaller.
Make the partial unload aware of this system as well by saving the
budget for offload VRAM to the model state and accounting accordingly.
Its possible that partial unload increases the size of the largest
offloaded weights, and thus needs to unload a little bit more than
asked to accomodate the bigger temp buffers.
Honor the existing codes floor on model weight loading of 128MB by
having the patcher honor this separately withough regard to offloading.
Otherwise when MM specifies its 128MB minimum, MP will see the biggest
weights, and budget that 128MB to only offload buffer and load nothing
which isnt the intent of these minimums. The same clamp applies in
case of partial offload of the currently loading model.
* Create nodes_dataset.py
* Add encoded dataset caching mechanism
* make training node to work with our dataset system
* allow trainer node to get different resolution dataset
* move all dataset related implementation to nodes_dataset
* Rewrite dataset system with new io schema
* Rewrite training system with new io schema
* add ui pbar
* Add outputs' id/name
* Fix bad id/naming
* use single process instead of input list when no need
* fix wrong output_list flag
* use torch.load/save and fix bad behaviors