4198 Commits

Author SHA1 Message Date
contentis
3b3ef9a77a
Quantized Ops fixes (#10715)
* offload support, bug fixes, remove mixins

* add readme
2025-11-12 18:26:52 -05:00
comfyanonymous
8b0b93df51
Update Python 3.14 compatibility notes in README (#10730) 2025-11-12 17:04:41 -05:00
rattus
1c7eaeca10
qwen: reduce VRAM usage (#10725)
Clean up a bunch of stacked and no-longer-needed tensors on the QWEN
VRAM peak (currently FFN).

With this I go from OOMing at B=37x1328x1328 to being able to
succesfully run B=47 (RTX5090).
2025-11-12 16:20:53 -05:00
rattus
18e7d6dba5
mm/mp: always unload re-used but modified models (#10724)
The partial unloader path in model re-use flow skips straight to the
actual unload without any check of the patching UUID. This means that
if you do an upscale flow with a model patch on an existing model, it
will not apply your patchings.

Fix by delaying the partial_unload until after the uuid checks. This
is done by making partial_unload a model of partial_load where extra_mem
is -ve.
2025-11-12 16:19:53 -05:00
Qiacheng Li
e1d85e7577
Update README.md for Intel Arc GPU installation, remove IPEX (#10729)
IPEX is no longer needed for Intel Arc GPUs.  Removing instruction to setup ipex.
2025-11-12 15:21:05 -05:00
comfyanonymous
1199411747
Don't pin tensor if not a torch.nn.parameter.Parameter (#10718) 2025-11-11 19:33:30 -05:00
comfyanonymous
5ebcab3c7d
Update CI workflow to remove dead macOS runner. (#10704)
* Update CI workflow to remove dead macOS runner.

* revert

* revert
2025-11-10 15:35:29 -05:00
rattus
c350009236
ops: Put weight cast on the offload stream (#10697)
This needs to be on the offload stream. This reproduced a black screen
with low resolution images on a slow bus when using FP8.
2025-11-09 22:52:11 -05:00
comfyanonymous
dea899f221
Unload weights if vram usage goes up between runs. (#10690) 2025-11-09 18:51:33 -05:00
comfyanonymous
e632e5de28
Add logging for model unloading. (#10692) 2025-11-09 18:06:39 -05:00
comfyanonymous
2abd2b5c20
Make ScaleROPE node work on Flux. (#10686) 2025-11-08 15:52:02 -05:00
comfyanonymous
a1a70362ca
Only unpin tensor if it was pinned by ComfyUI (#10677) 2025-11-07 11:15:05 -05:00
rattus
cf97b033ee
mm: guard against double pin and unpin explicitly (#10672)
As commented, if you let cuda be the one to detect double pin/unpinning
it actually creates an asyc GPU error.
2025-11-06 21:20:48 -05:00
comfyanonymous
eb1c42f649
Tell users they need to upload their logs in bug reports. (#10671) 2025-11-06 20:24:28 -05:00
comfyanonymous
e05c907126
Clarify release cycle. (#10667) 2025-11-06 04:11:30 -05:00
comfyanonymous
09dc24c8a9
Pinned mem also seems to work on AMD. (#10658) 2025-11-05 19:11:15 -05:00
comfyanonymous
1d69245981
Enable pinned memory by default on Nvidia. (#10656)
Removed the --fast pinned_memory flag.

You can use --disable-pinned-memory to disable it. Please report if it
causes any issues.
2025-11-05 18:08:13 -05:00
comfyanonymous
97f198e421
Fix qwen controlnet regression. (#10657) 2025-11-05 18:07:35 -05:00
Alexander Piskun
bda0eb2448
feat(API-nodes): move Rodin3D nodes to new client; removed old api client.py (#10645) 2025-11-05 02:16:00 -08:00
comfyanonymous
c4a6b389de
Lower ltxv mem usage to what it was before previous pr. (#10643)
Bring back qwen behavior to what it was before previous pr.
2025-11-04 22:47:35 -05:00
contentis
4cd881866b
Use single apply_rope function across models (#10547) 2025-11-04 20:10:11 -05:00
comfyanonymous
265adad858 ComfyUI version v0.3.68 v0.3.68 2025-11-04 19:42:23 -05:00
comfyanonymous
7f3e4d486c
Limit amount of pinned memory on windows to prevent issues. (#10638) 2025-11-04 17:37:50 -05:00
rattus
a389ee01bb
caching: Handle None outputs tuple case (#10637) 2025-11-04 14:14:10 -08:00
ComfyUI Wiki
9c71a66790
chore: update workflow templates to v0.2.11 (#10634) 2025-11-04 10:51:53 -08:00
comfyanonymous
af4b7b5edb
More fp8 torch.compile regressions fixed. (#10625) 2025-11-03 22:14:20 -05:00
comfyanonymous
0f4ef3afa0
This seems to slow things down slightly on Linux. (#10624) 2025-11-03 21:47:14 -05:00
comfyanonymous
6b88478f9f
Bring back fp8 torch compile performance to what it should be. (#10622) 2025-11-03 19:22:10 -05:00
comfyanonymous
e199c8cc67
Fixes (#10621) 2025-11-03 17:58:24 -05:00
comfyanonymous
0652cb8e2d
Speed up torch.compile (#10620) 2025-11-03 17:37:12 -05:00
comfyanonymous
958a17199a
People should update their pytorch versions. (#10618) 2025-11-03 17:08:30 -05:00
ComfyUI Wiki
e974e554ca
chore: update embedded docs to v0.3.1 (#10614) 2025-11-03 10:59:44 -08:00
Alexander Piskun
4e2110c794
feat(Pika-API-nodes): use new API client (#10608) 2025-11-03 00:29:08 -08:00
Alexander Piskun
e617cddf24
convert nodes_openai.py to V3 schema (#10604) 2025-11-03 00:28:13 -08:00
Alexander Piskun
1f3f7a2823
convert nodes_hypernetwork.py to V3 schema (#10583) 2025-11-03 00:21:47 -08:00
EverNebula
88df172790
fix(caching): treat bytes as hashable (#10567) 2025-11-03 00:16:40 -08:00
Alexander Piskun
6d6a18b0b7
fix(api-nodes-cloud): stop using sub-folder and absolute path for output of Rodin3D nodes (#10556) 2025-11-03 00:04:56 -08:00
comfyanonymous
97ff9fae7e
Clarify help text for --fast argument (#10609)
Updated help text for the --fast argument to clarify potential risks.
2025-11-02 13:14:04 -05:00
rattus
135fa49ec2
Small speed improvements to --async-offload (#10593)
* ops: dont take an offload stream if you dont need one

* ops: prioritize mem transfer

The async offload streams reason for existence is to transfer from
RAM to GPU. The post processing compute steps are a bonus on the side
stream, but if the compute stream is running a long kernel, it can
stall the side stream, as it wait to type-cast the bias before
transferring the weight. So do a pure xfer of the weight straight up,
then do everything bias, then go back to fix the weight type and do
weight patches.
2025-11-01 18:48:53 -04:00
comfyanonymous
44869ff786
Fix issue with pinned memory. (#10597) 2025-11-01 17:25:59 -04:00
Alexander Piskun
20182a393f
convert StabilityAI to use new API client (#10582) 2025-11-01 12:14:06 -07:00
Alexander Piskun
5f109fe6a0
added 12s-20s as available output durations for the LTXV API nodes (#10570) 2025-11-01 12:13:39 -07:00
comfyanonymous
c58c13b2ba
Fix torch compile regression on fp8 ops. (#10580) 2025-11-01 00:25:17 -04:00
comfyanonymous
7f374e42c8
ScaleROPE now works on Lumina models. (#10578) 2025-10-31 15:41:40 -04:00
comfyanonymous
27d1bd8829
Fix rope scaling. (#10560) 2025-10-30 22:51:58 -04:00
comfyanonymous
614cf9805e
Add a ScaleROPE node. Currently only works on WAN models. (#10559) 2025-10-30 22:11:38 -04:00
rattus
513b0c46fb
Add RAM Pressure cache mode (#10454)
* execution: Roll the UI cache into the outputs

Currently the UI cache is parallel to the output cache with
expectations of being a content superset of the output cache.
At the same time the UI and output cache are maintained completely
seperately, making it awkward to free the output cache content without
changing the behaviour of the UI cache.

There are two actual users (getters) of the UI cache. The first is
the case of a direct content hit on the output cache when executing a
node. This case is very naturally handled by merging the UI and outputs
cache.

The second case is the history JSON generation at the end of the prompt.
This currently works by asking the cache for all_node_ids and then
pulling the cache contents for those nodes. all_node_ids is the nodes
of the dynamic prompt.

So fold the UI cache into the output cache. The current UI cache setter
now writes to a prompt-scope dict. When the output cache is set, just
get this value from the dict and tuple up with the outputs.

When generating the history, simply iterate prompt-scope dict.

This prepares support for more complex caching strategies (like RAM
pressure caching) where less than 1 workflow will be cached and it
will be desirable to keep the UI cache and output cache in sync.

* sd: Implement RAM getter for VAE

* model_patcher: Implement RAM getter for ModelPatcher

* sd: Implement RAM getter for CLIP

* Implement RAM Pressure cache

Implement a cache sensitive to RAM pressure. When RAM headroom drops
down below a certain threshold, evict RAM-expensive nodes from the
cache.

Models and tensors are measured directly for RAM usage. An OOM score
is then computed based on the RAM usage of the node.

Note the due to indirection through shared objects (like a model
patcher), multiple nodes can account the same RAM as their individual
usage. The intent is this will free chains of nodes particularly
model loaders and associate loras as they all score similar and are
sorted in close to each other.

Has a bias towards unloading model nodes mid flow while being able
to keep results like text encodings and VAE.

* execution: Convert the cache entry to NamedTuple

As commented in review.

Convert this to a named tuple and abstract away the tuple type
completely from graph.py.
2025-10-30 17:39:02 -04:00
Alexander Piskun
dfac94695b
fix img2img operation in Dall2 node (#10552) 2025-10-30 10:22:35 -07:00
Alexander Piskun
163b629c70
use new API client in Pixverse and Ideogram nodes (#10543) 2025-10-29 23:49:03 -07:00
Jedrzej Kosinski
998bf60beb
Add units/info for the numbers displayed on 'load completely' and 'load partially' log messages (#10538) 2025-10-29 19:37:06 -04:00