comfyanonymous
ec4fc2a09a
Fix case of weights not being unpinned. ( #10533 )
2025-10-29 15:48:06 -04:00
comfyanonymous
1a58087ac2
Reduce memory usage for fp8 scaled op. ( #10531 )
2025-10-29 15:43:51 -04:00
comfyanonymous
e525673f72
Fix issue. ( #10527 )
2025-10-29 00:37:00 -04:00
comfyanonymous
3fa7a5c04a
Speed up offloading using pinned memory. ( #10526 )
...
To enable this feature use: --fast pinned_memory
2025-10-29 00:21:01 -04:00
contentis
8817f8fc14
Mixed Precision Quantization System ( #10498 )
...
* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.
* Updated design using Tensor Subclasses
* Fix FP8 MM
* An actually functional POC
* Remove CK reference and ensure correct compute dtype
* Update unit tests
* ruff lint
* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.
* Updated design using Tensor Subclasses
* Fix FP8 MM
* An actually functional POC
* Remove CK reference and ensure correct compute dtype
* Update unit tests
* ruff lint
* Fix missing keys
* Rename quant dtype parameter
* Rename quant dtype parameter
* Fix unittests for CPU build
2025-10-28 16:20:53 -04:00
comfyanonymous
f6bbc1ac84
Fix mistake. ( #10484 )
2025-10-25 23:07:29 -04:00
comfyanonymous
098a352f13
Add warning for torch-directml usage ( #10482 )
...
Added a warning message about the state of torch-directml.
2025-10-25 20:05:22 -04:00
comfyanonymous
426cde37f1
Remove useless function ( #10472 )
2025-10-24 19:56:51 -04:00
comfyanonymous
1bcda6df98
WIP way to support multi multi dimensional latents. ( #10456 )
2025-10-23 21:21:14 -04:00
strint
dc7c77e78c
better partial unload
2025-10-23 18:09:47 +08:00
strint
c312733b8c
refine log
2025-10-23 15:53:35 +08:00
strint
58d28edade
no limit for offload size
2025-10-23 15:50:57 +08:00
strint
aab0e244f7
fix MMAP_MEM_THRESHOLD_GB default
2025-10-23 14:44:51 +08:00
strint
f3c673d086
Merge branch 'master' of https://github.com/siliconflow/ComfyUI into refine_offload
2025-10-22 21:15:28 +08:00
comfyanonymous
9cdc64998f
Only disable cudnn on newer AMD GPUs. ( #10437 )
2025-10-21 19:15:23 -04:00
strint
98ba311511
add env
2025-10-21 19:06:34 +08:00
strint
80383932ec
lazy rm file
2025-10-21 18:00:31 +08:00
strint
08e094ed81
use native mmap
2025-10-21 17:00:56 +08:00
strint
fff56de63c
fix format
2025-10-21 11:59:59 +08:00
strint
2d010f545c
refine code
2025-10-21 11:54:56 +08:00
strint
2f0d56656e
refine code
2025-10-21 11:38:17 +08:00
comfyanonymous
2c2aa409b0
Log message for cudnn disable on AMD. ( #10418 )
2025-10-20 15:43:24 -04:00
strint
05c2518c6d
refact mmap
2025-10-21 02:59:51 +08:00
strint
8aeebbf7ef
fix to
2025-10-21 02:27:40 +08:00
strint
49561788cf
fix log
2025-10-21 02:03:38 +08:00
strint
e9e1d2f0e8
add mmap tensor
2025-10-21 00:40:14 +08:00
strint
4ac827d564
unload partial
2025-10-20 18:27:38 +08:00
strint
21ebcada1d
debug free mem
2025-10-20 16:22:50 +08:00
comfyanonymous
b4f30bd408
Pytorch is stupid. ( #10398 )
2025-10-19 01:25:35 -04:00
comfyanonymous
dad076aee6
Speed up chroma radiance. ( #10395 )
2025-10-18 23:19:52 -04:00
comfyanonymous
0cf33953a7
Fix batch size above 1 giving bad output in chroma radiance. ( #10394 )
2025-10-18 23:15:34 -04:00
comfyanonymous
5b80addafd
Turn off cuda malloc by default when --fast autotune is turned on. ( #10393 )
2025-10-18 22:35:46 -04:00
comfyanonymous
9da397ea2f
Disable torch compiler for cast_bias_weight function ( #10384 )
...
* Disable torch compiler for cast_bias_weight function
* Fix torch compile.
2025-10-17 20:03:28 -04:00
strint
49597bfa3e
load remains mmap
2025-10-17 21:43:49 +08:00
strint
6583cc0142
debug load mem
2025-10-17 18:28:25 +08:00
strint
5c3c6c02b2
add debug log of cpu load
2025-10-17 16:33:14 +08:00
comfyanonymous
b1293d50ef
workaround also works on cudnn 91200 ( #10375 )
2025-10-16 19:59:56 -04:00
comfyanonymous
19b466160c
Workaround for nvidia issue where VAE uses 3x more memory on torch 2.9 ( #10373 )
2025-10-16 18:16:03 -04:00
strint
e5ff6a1b53
refine log
2025-10-16 22:47:03 +08:00
strint
9352987e9b
add log
2025-10-16 22:25:17 +08:00
strint
c1eac555c0
add debug log
2025-10-16 21:42:48 +08:00
strint
2b222962c3
add debug log
2025-10-16 21:42:02 +08:00
strint
fa19dd4620
debug offload
2025-10-16 17:00:47 +08:00
strint
6e33ee391a
debug error
2025-10-16 16:45:08 +08:00
Faych
afa8a24fe1
refactor: Replace manual patches merging with merge_nested_dicts ( #10360 )
2025-10-15 17:16:09 -07:00
Jedrzej Kosinski
493b81e48f
Fix order of inputs nested merge_nested_dicts ( #10362 )
2025-10-15 16:47:26 -07:00
comfyanonymous
1c10b33f9b
gfx942 doesn't support fp8 operations. ( #10348 )
2025-10-15 00:21:11 -04:00
comfyanonymous
3374e900d0
Faster workflow cancelling. ( #10301 )
2025-10-13 23:43:53 -04:00
comfyanonymous
dfff7e5332
Better memory estimation for the SD/Flux VAE on AMD. ( #10334 )
2025-10-13 22:37:19 -04:00
comfyanonymous
e4ea393666
Fix loading old stable diffusion ckpt files on newer numpy. ( #10333 )
2025-10-13 22:18:58 -04:00