ComfyUI

mirror of https://git.datalinker.icu/comfyanonymous/ComfyUI synced 2025-12-08 21:44:33 +08:00

Author	SHA1	Message	Date
comfyanonymous	cdb2597000	Fix regression.	2025-12-05 23:00:58 -05:00
Jukka Seppänen	fd109325db	Kandinsky5 model support (#10988 ) * Add Kandinsky5 model support lite and pro T2V tested to work * Update kandinsky5.py * Fix fp8 * Fix fp8_scaled text encoder * Add transformer_options for attention * Code cleanup, optimizations, use fp32 for all layers originally at fp32 * ImageToVideo -node * Fix I2V, add necessary latent post process nodes * Support text to image model * Support block replace patches (SLG mostly) * Support official LoRAs * Don't scale RoPE for lite model as that just doesn't work... * Update supported_models.py * Rever RoPE scaling to simpler one * Fix typo * Handle latent dim difference for image model in the VAE instead * Add node to use different prompts for clip_l and qwen25_7b * Reduce peak VRAM usage a bit * Further reduce peak VRAM consumption by chunking ffn * Update chunking * Update memory_usage_factor * Code cleanup, don't force the fp32 layers as it has minimal effect * Allow for stronger changes with first frames normalization Default values are too weak for any meaningful changes, these should probably be exposed as advanced node options when that's available. * Add image model's own chat template, remove unused image2video template * Remove hard error in ReplaceVideoLatentFrames -node * Update kandinsky5.py * Update supported_models.py * Fix typos in prompt template They were now fixed in the original repository as well * Update ReplaceVideoLatentFrames Add tooltips Make source optional Better handle negative index * Rename NormalizeVideoLatentFrames -node For bit better clarity what it does * Fix NormalizeVideoLatentStart node out on non-op	2025-12-05 22:20:22 -05:00
comfyanonymous	092ee8a500	Fix some custom nodes. (#11134 )	2025-12-05 18:25:31 -05:00
Jukka Seppänen	79d17ba233	Context windows fixes and features (#10975 ) * Apply cond slice fix * Add FreeNoise * Update context_windows.py * Add option to retain condition by indexes for each window This allows for example Wan/HunyuanVideo image to video to "work" by using the initial start frame for each window, otherwise windows beyond first will be pure T2V generations. * Update context_windows.py * Allow splitting multiple conds into different windows * Add handling for audio_embed * whitespace * Allow freenoise to work on other dims, handle 4D batch timestep Refactor Freenoise function. And fix batch handling as timesteps seem to be expanded to batch size now. * Disable experimental options for now So that the Freenoise and bugfixes can be merged first --------- Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com> Co-authored-by: ozbayb <17261091+ozbayb@users.noreply.github.com>	2025-12-05 12:42:46 -08:00
comfyanonymous	6fd463aec9	Fix regression when text encoder loaded directly on GPU. (#11129 )	2025-12-05 15:33:16 -05:00
comfyanonymous	43071e3de3	Make old scaled fp8 format use the new mixed quant ops system. (#11000 )	2025-12-05 14:35:42 -05:00
Jedrzej Kosinski	0ec05b1481	Remove line made unnecessary (and wrong) after transformer_options was added to NextDiT's _forward definition (#11118 )	2025-12-05 14:05:38 -05:00
rattus	9bc893c5bb	sd: bump HY1.5 VAE estimate (#11107 ) Im able to push vram above estimate on partial unload. Bump the estimate. This is experimentally determined with a 720P and 480P datapoint calibrating for 24GB VRAM total.	2025-12-04 09:50:36 -08:00
rattus	f4bdf5f830	sd: revise hy VAE VRAM (#11105 ) This was recently collapsed down to rolling VAE through temporal. Clamp The time dimension.	2025-12-04 09:50:04 -08:00
rattus	6be85c7920	mp: use look-ahead actuals for stream offload VRAM calculation (#11096 ) TIL that the WAN TE has a 2GB weight followed by 16MB as the next size down. This means that team 8GB VRAM would fully offload the TE in async offload mode as it just multiplied this giant size my the num streams. Do the more complex logic of summing up the upcoming to-load weight sizes to avoid triple counting this massive weight. partial unload does the converse of recording the NS most recent unloads as they go.	2025-12-03 23:28:44 -05:00
comfyanonymous	ea17add3c6	Fix case where text encoders where running on the CPU instead of GPU. (#11095 )	2025-12-03 23:15:15 -05:00
rattus	519c941165	Prs/lora reservations (reduce massive Lora reservations especially on Flux2) (#11069 ) * mp: only count the offload cost of math once This was previously bundling the combined weight storage and computation cost * ops: put all post async transfer compute on the main stream Some models have massive weights that need either complex dequantization or lora patching. Don't do these patchings on the offload stream, instead do them on the main stream to syncrhonize the potentially large vram spikes for these compute processes. This avoids having to assume a worst case scenario of multiple offload streams all spiking VRAM is parallel with whatever the main stream is doing.	2025-12-03 02:28:45 -05:00
rattus	73f5649196	Implement temporal rolling VAE (Major VRAM reductions in Hunyuan and Kandinsky) (#10995 ) * hunyuan upsampler: rework imports Remove the transitive import of VideoConv3d and Resnet and takes these from actual implementation source. * model: remove unused give_pre_end According to git grep, this is not used now, and was not used in the initial commit that introduced it (see below). This semantic is difficult to implement temporal roll VAE for (and would defeat the purpose). Rather than implement the complex if, just delete the unused feature. (venv) rattus@rattus-box2:~/ComfyUI$ git log --oneline 220afe33 (HEAD) Initial commit. (venv) rattus@rattus-box2:~/ComfyUI$ git grep give_pre comfy/ldm/modules/diffusionmodules/model.py: resolution, z_channels, give_pre_end=False, tanh_out=False, use_linear_attn=False, comfy/ldm/modules/diffusionmodules/model.py: self.give_pre_end = give_pre_end comfy/ldm/modules/diffusionmodules/model.py: if self.give_pre_end: (venv) rattus@rattus-box2:~/ComfyUI$ git co origin/master Previous HEAD position was 220afe33 Initial commit. HEAD is now at 9d8a8179 Enable async offloading by default on Nvidia. (#10953) (venv) rattus@rattus-box2:~/ComfyUI$ git grep give_pre comfy/ldm/modules/diffusionmodules/model.py: resolution, z_channels, give_pre_end=False, tanh_out=False, use_linear_attn=False, comfy/ldm/modules/diffusionmodules/model.py: self.give_pre_end = give_pre_end comfy/ldm/modules/diffusionmodules/model.py: if self.give_pre_end: * move refiner VAE temporal roller to core Move the carrying conv op to the common VAE code and give it a better name. Roll the carry implementation logic for Resnet into the base class and scrap the Hunyuan specific subclass. * model: Add temporal roll to main VAE decoder If there are no attention layers, its a standard resnet and VideoConv3d is asked for, substitute in the temporal rolloing VAE algorithm. This reduces VAE usage by the temporal dimension (can be huge VRAM savings). * model: Add temporal roll to main VAE encoder If there are no attention layers, its a standard resnet and VideoConv3d is asked for, substitute in the temporal rolling VAE algorithm. This reduces VAE usage by the temporal dimension (can be huge VRAM savings).	2025-12-02 22:49:29 -05:00
comfyanonymous	b94d394a64	Support Z Image alibaba pai fun controlnets. (#11062 ) These are not actual controlnets so put it in the models/model_patches folder and use the ModelPatchLoader + QwenImageDiffsynthControlnet node to use it.	2025-12-02 21:38:31 -05:00
rattus	277237ccc1	attention: use flag based OOM fallback (#11038 ) Exception ref all local variables for the lifetime of exception context. Just set a flag and then if to dump the exception before falling back.	2025-12-02 17:24:19 -05:00
comfyanonymous	daaceac769	Hack to make zimage work in fp16. (#11057 )	2025-12-02 17:11:58 -05:00
Dr.Lt.Data	b4a20acc54	feat: Support ComfyUI-Manager for pip version (#7555 )	2025-12-01 22:32:52 -05:00
comfyanonymous	878db3a727	Implement the Ovis image model. (#11030 )	2025-12-01 20:56:17 -05:00
comfyanonymous	2640acb31c	Update qwen tokenizer to add qwen 3 tokens. (#11029 ) Doesn't actually change anything for current workflows because none of the current models have a template with the think tokens.	2025-12-01 17:13:48 -05:00
comfyanonymous	0a6746898d	Make the ScaleRope node work on Z Image and Lumina. (#10994 )	2025-11-29 18:00:55 -05:00
comfyanonymous	5151cff293	Add some missing z image lora layers. (#10980 )	2025-11-28 23:55:00 -05:00
comfyanonymous	52a32e2b32	Support some z image lora formats. (#10978 )	2025-11-28 21:12:42 -05:00
Jukka Seppänen	b907085709	Support video tiny VAEs (#10884 ) * Support video tiny VAEs * lighttaew scaling fix * Also support video taes in previews Only first frame for now as live preview playback is currently only available through VHS custom nodes. * Support Wan 2.1 lightVAE * Relocate elif block and set Wan VAE dim directly without using pruning rate for lightvae	2025-11-28 19:40:19 -05:00
rattus	0ff0457892	mm: wrap the raw stream in context manager (#10958 ) The documentation of torch.foo.Stream being usable with with: suggests it starts at version 2.7. Use the old API for backwards compatibility.	2025-11-28 16:38:12 -05:00
Urle Sistiana	6484ac89dc	fix QuantizedTensor.is_contiguous (#10956 ) (#10959 )	2025-11-28 16:33:07 -05:00
comfyanonymous	f55c98a89f	Disable offload stream when torch compile. (#10961 )	2025-11-28 16:16:46 -05:00
comfyanonymous	9d8a817985	Enable async offloading by default on Nvidia. (#10953 ) Add --disable-async-offload to disable it. If this causes OOMs that go away when you --disable-async-offload please report it.	2025-11-27 17:46:12 -05:00
rattus	3f382a4f98	quant ops: Dequantize weight in-place (#10935 ) In flux2 these weights are huge (200MB). As plain_tensor is a throw-away deep copy, do this multiplication in-place to save VRAM.	2025-11-27 08:06:30 -08:00
rattus	f17251bec6	Account for the VRAM cost of weight offloading (#10733 ) * mm: default to 0 for NUM_STREAMS Dont count the compute stream as an offload stream. This makes async offload accounting easier. * mm: remove 128MB minimum This is from a previous offloading system requirement. Remove it to make behaviour of the loader and partial unloader consistent. * mp: order the module list by offload expense Calculate an approximate offloading temporary VRAM cost to offload a weight and primary order the module load list by that. In the simple case this is just the same as the module weight, but with Loras, a weight with a lora consumes considerably more VRAM to do the Lora application on-the-fly. This will slightly prioritize lora weights, but is really for proper VRAM offload accounting. * mp: Account for the VRAM cost of weight offloading when checking the VRAM headroom, assume that the weight needs to be offloaded, and only load if it has space for both the load and offload * the number of streams. As the weights are ordered from largest to smallest by offload cost this is guaranteed to fit in VRAM (tm), as all weights that follow will be smaller. Make the partial unload aware of this system as well by saving the budget for offload VRAM to the model state and accounting accordingly. Its possible that partial unload increases the size of the largest offloaded weights, and thus needs to unload a little bit more than asked to accomodate the bigger temp buffers. Honor the existing codes floor on model weight loading of 128MB by having the patcher honor this separately withough regard to offloading. Otherwise when MM specifies its 128MB minimum, MP will see the biggest weights, and budget that 128MB to only offload buffer and load nothing which isnt the intent of these minimums. The same clamp applies in case of partial offload of the currently loading model.	2025-11-27 01:03:03 -05:00
Haoming	c38e7d6599	block info (#10841 )	2025-11-26 20:28:44 -08:00
comfyanonymous	eaf68c9b5b	Make lora training work on Z Image and remove some redundant nodes. (#10927 )	2025-11-26 19:25:32 -05:00
comfyanonymous	f16219e3aa	Add cheap latent preview for flux 2. (#10907 ) Thank you to the person who calculated them. You saved me a percent of my time.	2025-11-26 04:00:43 -05:00
comfyanonymous	58b8574661	Fix Flux2 reference image mem estimation. (#10905 )	2025-11-26 02:36:19 -05:00
comfyanonymous	bdb10a583f	Fix loras not working on mixed fp8. (#10899 )	2025-11-26 00:07:58 -05:00
comfyanonymous	0e24dbb19f	Adjustments to Z Image. (#10893 )	2025-11-25 19:02:51 -05:00
comfyanonymous	e9aae31fa2	Z Image model. (#10892 )	2025-11-25 18:41:45 -05:00
comfyanonymous	d196a905bb	Lower vram usage for flux 2 text encoder. (#10887 )	2025-11-25 14:58:39 -05:00
comfyanonymous	dff996ca39	Fix crash. (#10885 )	2025-11-25 14:30:24 -05:00
comfyanonymous	6b573ae0cb	Flux 2 (#10879 )	2025-11-25 10:50:19 -05:00
comfyanonymous	015a0599d0	I found a case where this is needed (#10875 )	2025-11-25 03:23:19 -05:00
comfyanonymous	acfaa5c4a1	Don't try fp8 matrix mult in quantized ops if not supported by hardware. (#10874 )	2025-11-25 02:55:49 -05:00
comfyanonymous	b6805429b9	Allow pinning quantized tensors. (#10873 )	2025-11-25 02:48:20 -05:00
comfyanonymous	25022e0b09	Cleanup and fix issues with text encoder quants. (#10872 )	2025-11-25 01:48:53 -05:00
Haoming	b2ef58e2b1	block info (#10844 )	2025-11-24 10:40:09 -08:00
Haoming	6a6d456c88	block info (#10842 )	2025-11-24 10:38:38 -08:00
Haoming	3d1fdaf9f4	block info (#10843 )	2025-11-24 10:30:40 -08:00
comfyanonymous	cbd68e3d58	Add better error message for common error. (#10846 )	2025-11-23 04:55:22 -05:00
comfyanonymous	532938b16b	--disable-api-nodes now sets CSP header to force frontend offline. (#10829 )	2025-11-21 17:51:55 -05:00
comfyanonymous	943b3b615d	HunyuanVideo 1.5 (#10819 ) * init * update * Update model.py * Update model.py * remove print * Fix text encoding * Prevent empty negative prompt Really doesn't work otherwise * fp16 works * I2V * Update model_base.py * Update nodes_hunyuan.py * Better latent rgb factors * Use the correct sigclip output... * Support HunyuanVideo1.5 SR model * whitespaces... * Proper latent channel count * SR model fixes This also still needs timesteps scheduling based on the noise scale, can be used with two samplers too already * vae_refiner: roll the convolution through temporal Work in progress. Roll the convolution through time using 2-latent-frame chunks and a FIFO queue for the convolution seams. * Support HunyuanVideo15 latent resampler * fix * Some cleanup Co-Authored-By: comfyanonymous <121283862+comfyanonymous@users.noreply.github.com> * Proper hyvid15 I2V channels Co-Authored-By: comfyanonymous <121283862+comfyanonymous@users.noreply.github.com> * Fix TokenRefiner for fp16 Otherwise x.sum has infs, just in case only casting if input is fp16, I don't know if necessary. * Bugfix for the HunyuanVideo15 SR model * vae_refiner: roll the convolution through temporal II Roll the convolution through time using 2-latent-frame chunks and a FIFO queue for the convolution seams. Added support for encoder, lowered to 1 latent frame to save more VRAM, made work for Hunyuan Image 3.0 (as code shared). Fixed names, cleaned up code. * Allow any number of input frames in VAE. * Better VAE encode mem estimation. * Lowvram fix. * Fix hunyuan image 2.1 refiner. * Fix mistake. * Name changes. * Rename. * Whitespace. * Fix. * Fix. --------- Co-authored-by: kijai <40791699+kijai@users.noreply.github.com> Co-authored-by: Rattus <rattus128@gmail.com>	2025-11-20 22:44:43 -05:00
comfyanonymous	cb96d4d18c	Disable workaround on newer cudnn. (#10807 )	2025-11-19 23:56:23 -05:00

1 2 3 4 5 ...

1838 Commits