ComfyUI

mirror of https://git.datalinker.icu/comfyanonymous/ComfyUI synced 2026-05-11 10:39:09 +08:00

Author	SHA1	Message	Date
rattus	ab7ab5be23	Fix Race condition in --async-offload that can cause corruption (#10501 ) * mm: factor out the current stream getter Make this a reusable function. * ops: sync the offload stream with the consumption of w&b This sync is nessacary as pytorch will queue cuda async frees on the same stream as created to tensor. In the case of async offload, this will be on the offload stream. Weights and biases can go out of scope in python which then triggers the pytorch garbage collector to queue the free operation on the offload stream possible before the compute stream has used the weight. This causes a use after free on weight data leading to total corruption of some workflows. So sync the offload stream with the compute stream after the weight has been used so the free has to wait for the weight to be used. The cast_bias_weight is extended in a backwards compatible way with the new behaviour opt-in on a defaulted parameter. This handles custom node packs calling cast_bias_weight and defeatures async-offload for them (as they do not handle the race). The pattern is now: cast_bias_weight(... , offloadable=True) #This might be offloaded thing(weight, bias, ...) uncast_bias_weight(...) * controlnet: adopt new cast_bias_weight synchronization scheme This is nessacary for safe async weight offloading. * mm: sync the last stream in the queue, not the next Currently this peeks ahead to sync the next stream in the queue of streams with the compute stream. This doesnt allow a lot of parallelization, as then end result is you can only get one weight load ahead regardless of how many streams you have. Rotate the loop logic here to synchronize the end of the queue before returning the next stream. This allows weights to be loaded ahead of the compute streams position.	2025-10-29 17:17:46 -04:00
comfyanonymous	ec4fc2a09a	Fix case of weights not being unpinned. (#10533 )	2025-10-29 15:48:06 -04:00
comfyanonymous	1a58087ac2	Reduce memory usage for fp8 scaled op. (#10531 )	2025-10-29 15:43:51 -04:00
comfyanonymous	e525673f72	Fix issue. (#10527 )	2025-10-29 00:37:00 -04:00
comfyanonymous	3fa7a5c04a	Speed up offloading using pinned memory. (#10526 ) To enable this feature use: --fast pinned_memory	2025-10-29 00:21:01 -04:00
contentis	8817f8fc14	Mixed Precision Quantization System (#10498 ) * Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint. * Updated design using Tensor Subclasses * Fix FP8 MM * An actually functional POC * Remove CK reference and ensure correct compute dtype * Update unit tests * ruff lint * Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint. * Updated design using Tensor Subclasses * Fix FP8 MM * An actually functional POC * Remove CK reference and ensure correct compute dtype * Update unit tests * ruff lint * Fix missing keys * Rename quant dtype parameter * Rename quant dtype parameter * Fix unittests for CPU build	2025-10-28 16:20:53 -04:00
comfyanonymous	f6bbc1ac84	Fix mistake. (#10484 )	2025-10-25 23:07:29 -04:00
comfyanonymous	098a352f13	Add warning for torch-directml usage (#10482 ) Added a warning message about the state of torch-directml.	2025-10-25 20:05:22 -04:00
comfyanonymous	426cde37f1	Remove useless function (#10472 )	2025-10-24 19:56:51 -04:00
comfyanonymous	1bcda6df98	WIP way to support multi multi dimensional latents. (#10456 )	2025-10-23 21:21:14 -04:00
strint	dc7c77e78c	better partial unload	2025-10-23 18:09:47 +08:00
strint	c312733b8c	refine log	2025-10-23 15:53:35 +08:00
strint	58d28edade	no limit for offload size	2025-10-23 15:50:57 +08:00
strint	aab0e244f7	fix MMAP_MEM_THRESHOLD_GB default	2025-10-23 14:44:51 +08:00
strint	f3c673d086	Merge branch 'master' of https://github.com/siliconflow/ComfyUI into refine_offload	2025-10-22 21:15:28 +08:00
comfyanonymous	9cdc64998f	Only disable cudnn on newer AMD GPUs. (#10437 )	2025-10-21 19:15:23 -04:00
strint	98ba311511	add env	2025-10-21 19:06:34 +08:00
strint	80383932ec	lazy rm file	2025-10-21 18:00:31 +08:00
strint	08e094ed81	use native mmap	2025-10-21 17:00:56 +08:00
strint	fff56de63c	fix format	2025-10-21 11:59:59 +08:00
strint	2d010f545c	refine code	2025-10-21 11:54:56 +08:00
strint	2f0d56656e	refine code	2025-10-21 11:38:17 +08:00
comfyanonymous	2c2aa409b0	Log message for cudnn disable on AMD. (#10418 )	2025-10-20 15:43:24 -04:00
strint	05c2518c6d	refact mmap	2025-10-21 02:59:51 +08:00
strint	8aeebbf7ef	fix to	2025-10-21 02:27:40 +08:00
strint	49561788cf	fix log	2025-10-21 02:03:38 +08:00
strint	e9e1d2f0e8	add mmap tensor	2025-10-21 00:40:14 +08:00
strint	4ac827d564	unload partial	2025-10-20 18:27:38 +08:00
strint	21ebcada1d	debug free mem	2025-10-20 16:22:50 +08:00
comfyanonymous	b4f30bd408	Pytorch is stupid. (#10398 )	2025-10-19 01:25:35 -04:00
comfyanonymous	dad076aee6	Speed up chroma radiance. (#10395 )	2025-10-18 23:19:52 -04:00
comfyanonymous	0cf33953a7	Fix batch size above 1 giving bad output in chroma radiance. (#10394 )	2025-10-18 23:15:34 -04:00
comfyanonymous	5b80addafd	Turn off cuda malloc by default when --fast autotune is turned on. (#10393 )	2025-10-18 22:35:46 -04:00
comfyanonymous	9da397ea2f	Disable torch compiler for cast_bias_weight function (#10384 ) * Disable torch compiler for cast_bias_weight function * Fix torch compile.	2025-10-17 20:03:28 -04:00
strint	49597bfa3e	load remains mmap	2025-10-17 21:43:49 +08:00
strint	6583cc0142	debug load mem	2025-10-17 18:28:25 +08:00
strint	5c3c6c02b2	add debug log of cpu load	2025-10-17 16:33:14 +08:00
comfyanonymous	b1293d50ef	workaround also works on cudnn 91200 (#10375 )	2025-10-16 19:59:56 -04:00
comfyanonymous	19b466160c	Workaround for nvidia issue where VAE uses 3x more memory on torch 2.9 (#10373 )	2025-10-16 18:16:03 -04:00
strint	e5ff6a1b53	refine log	2025-10-16 22:47:03 +08:00
strint	9352987e9b	add log	2025-10-16 22:25:17 +08:00
strint	c1eac555c0	add debug log	2025-10-16 21:42:48 +08:00
strint	2b222962c3	add debug log	2025-10-16 21:42:02 +08:00
strint	fa19dd4620	debug offload	2025-10-16 17:00:47 +08:00
strint	6e33ee391a	debug error	2025-10-16 16:45:08 +08:00
Faych	afa8a24fe1	refactor: Replace manual patches merging with merge_nested_dicts (#10360 )	2025-10-15 17:16:09 -07:00
Jedrzej Kosinski	493b81e48f	Fix order of inputs nested merge_nested_dicts (#10362 )	2025-10-15 16:47:26 -07:00
comfyanonymous	1c10b33f9b	gfx942 doesn't support fp8 operations. (#10348 )	2025-10-15 00:21:11 -04:00
comfyanonymous	3374e900d0	Faster workflow cancelling. (#10301 )	2025-10-13 23:43:53 -04:00
comfyanonymous	dfff7e5332	Better memory estimation for the SD/Flux VAE on AMD. (#10334 )	2025-10-13 22:37:19 -04:00

1 2 3 4 5 ...

1875 Commits