1874 Commits

Author SHA1 Message Date
comfyanonymous
ec4fc2a09a
Fix case of weights not being unpinned. (#10533) 2025-10-29 15:48:06 -04:00
comfyanonymous
1a58087ac2
Reduce memory usage for fp8 scaled op. (#10531) 2025-10-29 15:43:51 -04:00
comfyanonymous
e525673f72
Fix issue. (#10527) 2025-10-29 00:37:00 -04:00
comfyanonymous
3fa7a5c04a
Speed up offloading using pinned memory. (#10526)
To enable this feature use: --fast pinned_memory
2025-10-29 00:21:01 -04:00
contentis
8817f8fc14
Mixed Precision Quantization System (#10498)
* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.

* Updated design using Tensor Subclasses

* Fix FP8 MM

* An actually functional POC

* Remove CK reference and ensure correct compute dtype

* Update unit tests

* ruff lint

* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.

* Updated design using Tensor Subclasses

* Fix FP8 MM

* An actually functional POC

* Remove CK reference and ensure correct compute dtype

* Update unit tests

* ruff lint

* Fix missing keys

* Rename quant dtype parameter

* Rename quant dtype parameter

* Fix unittests for CPU build
2025-10-28 16:20:53 -04:00
comfyanonymous
f6bbc1ac84
Fix mistake. (#10484) 2025-10-25 23:07:29 -04:00
comfyanonymous
098a352f13
Add warning for torch-directml usage (#10482)
Added a warning message about the state of torch-directml.
2025-10-25 20:05:22 -04:00
comfyanonymous
426cde37f1
Remove useless function (#10472) 2025-10-24 19:56:51 -04:00
comfyanonymous
1bcda6df98
WIP way to support multi multi dimensional latents. (#10456) 2025-10-23 21:21:14 -04:00
strint
dc7c77e78c better partial unload 2025-10-23 18:09:47 +08:00
strint
c312733b8c refine log 2025-10-23 15:53:35 +08:00
strint
58d28edade no limit for offload size 2025-10-23 15:50:57 +08:00
strint
aab0e244f7 fix MMAP_MEM_THRESHOLD_GB default 2025-10-23 14:44:51 +08:00
strint
f3c673d086 Merge branch 'master' of https://github.com/siliconflow/ComfyUI into refine_offload 2025-10-22 21:15:28 +08:00
comfyanonymous
9cdc64998f
Only disable cudnn on newer AMD GPUs. (#10437) 2025-10-21 19:15:23 -04:00
strint
98ba311511 add env 2025-10-21 19:06:34 +08:00
strint
80383932ec lazy rm file 2025-10-21 18:00:31 +08:00
strint
08e094ed81 use native mmap 2025-10-21 17:00:56 +08:00
strint
fff56de63c fix format 2025-10-21 11:59:59 +08:00
strint
2d010f545c refine code 2025-10-21 11:54:56 +08:00
strint
2f0d56656e refine code 2025-10-21 11:38:17 +08:00
comfyanonymous
2c2aa409b0
Log message for cudnn disable on AMD. (#10418) 2025-10-20 15:43:24 -04:00
strint
05c2518c6d refact mmap 2025-10-21 02:59:51 +08:00
strint
8aeebbf7ef fix to 2025-10-21 02:27:40 +08:00
strint
49561788cf fix log 2025-10-21 02:03:38 +08:00
strint
e9e1d2f0e8 add mmap tensor 2025-10-21 00:40:14 +08:00
strint
4ac827d564 unload partial 2025-10-20 18:27:38 +08:00
strint
21ebcada1d debug free mem 2025-10-20 16:22:50 +08:00
comfyanonymous
b4f30bd408
Pytorch is stupid. (#10398) 2025-10-19 01:25:35 -04:00
comfyanonymous
dad076aee6
Speed up chroma radiance. (#10395) 2025-10-18 23:19:52 -04:00
comfyanonymous
0cf33953a7
Fix batch size above 1 giving bad output in chroma radiance. (#10394) 2025-10-18 23:15:34 -04:00
comfyanonymous
5b80addafd
Turn off cuda malloc by default when --fast autotune is turned on. (#10393) 2025-10-18 22:35:46 -04:00
comfyanonymous
9da397ea2f
Disable torch compiler for cast_bias_weight function (#10384)
* Disable torch compiler for cast_bias_weight function

* Fix torch compile.
2025-10-17 20:03:28 -04:00
strint
49597bfa3e load remains mmap 2025-10-17 21:43:49 +08:00
strint
6583cc0142 debug load mem 2025-10-17 18:28:25 +08:00
strint
5c3c6c02b2 add debug log of cpu load 2025-10-17 16:33:14 +08:00
comfyanonymous
b1293d50ef
workaround also works on cudnn 91200 (#10375) 2025-10-16 19:59:56 -04:00
comfyanonymous
19b466160c
Workaround for nvidia issue where VAE uses 3x more memory on torch 2.9 (#10373) 2025-10-16 18:16:03 -04:00
strint
e5ff6a1b53 refine log 2025-10-16 22:47:03 +08:00
strint
9352987e9b add log 2025-10-16 22:25:17 +08:00
strint
c1eac555c0 add debug log 2025-10-16 21:42:48 +08:00
strint
2b222962c3 add debug log 2025-10-16 21:42:02 +08:00
strint
fa19dd4620 debug offload 2025-10-16 17:00:47 +08:00
strint
6e33ee391a debug error 2025-10-16 16:45:08 +08:00
Faych
afa8a24fe1
refactor: Replace manual patches merging with merge_nested_dicts (#10360) 2025-10-15 17:16:09 -07:00
Jedrzej Kosinski
493b81e48f
Fix order of inputs nested merge_nested_dicts (#10362) 2025-10-15 16:47:26 -07:00
comfyanonymous
1c10b33f9b
gfx942 doesn't support fp8 operations. (#10348) 2025-10-15 00:21:11 -04:00
comfyanonymous
3374e900d0
Faster workflow cancelling. (#10301) 2025-10-13 23:43:53 -04:00
comfyanonymous
dfff7e5332
Better memory estimation for the SD/Flux VAE on AMD. (#10334) 2025-10-13 22:37:19 -04:00
comfyanonymous
e4ea393666
Fix loading old stable diffusion ckpt files on newer numpy. (#10333) 2025-10-13 22:18:58 -04:00