Add a system where an input is marked as a cache barrier, deferring its
evaluation. Once the node is executed, the barrier is released and
everything behind the barrier is executed at increase priority.
RAMPressure caching may ned to purge the same model that you are
currently trying to offload for VRAM freeing. In this case, RAMPressure
cache takes priority and needs to be able to pull the trigger on dumping
the whole model and freeing the ModelPatcher in question. To do this,
defer the actual tranfer of model weights from GPU to RAM to
model_management state and not as part of ModelPatcher. This is dones as
a list of weakrefs.
If RAM cache decides to free to model you are currently unloading, then
the ModelPatcher and refs simply dissappear in the middle of the
unloading process, and both RAM and VRAM will be freed.
The unpatcher now queues the individual leaf modules to be offloaded
one-by-one so that RAM levels can be monitored.
Note that the UnloadPartially that is potentially done as part of a
load will not be freeable this way, however it shouldn't be anyway as
that is the currently active model and RAM cache cannot save you if
you cant even fit the one model you are currently trying to use.
This is currently put together as a list of indexes assuming the
current_loaded_models doesn't change. However we might need to pruge a
model as part of the offload process which means this list can change in
the middle of the freeing process. handle by taking independent refs to
the LoadedModel objects and dong safe by-value deletion of
current_loaded_models.
currently this hard assumes that the caller of model_unload will keep
current_loaded_models in sync. With RAMPressureCache its possible to
have the garbage collector occur in the middle of the model free process
which can split these two steps.
move the headroom logic into the RAM cache to make this a little easier
to call to "free me some RAM".
Rename the API to free_ram().
Split off the clean_list creation to a completely separate function to
avoid any stray strong reference to the content-to-be-freed on the
stack.
* feat: create a /jobs api to return queue and history jobs
* update unused vars
* include priority
* create jobs helper file
* fix ruff
* update how we set error message
* include execution error in both responses
* rename error -> failed, fix output shape
* re-use queue and history functions
* set workflow id
* allow srot by exec duration
* fix tests
* send priority and remove error msg
* use ws messages to get start and end times
* revert main.py fully
* refactor: move all /jobs business logic to jobs.py
* fix failing test
* remove some tests
* fix non dict nodes
* address comments
* filter by workflow id and remove null fields
* add clearer typing - remove get("..") or ..
* refactor query params to top get_job(s) doc, add remove_sensitive_from_queue
* add brief comment explaining why we skip animated
* comment that format field is for frontend backward compatibility
* fix whitespace
---------
Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
Co-authored-by: guill <jacob.e.segal@gmail.com>
index_timestep_zero can be selected in the
FluxKontextMultiReferenceLatentMethod now with the display name set to the
more generic "Edit Model Reference Method" node.
The inpaint part is currently missing and will be implemented later.
I think they messed up this model pretty bad. They added some
control_noise_refiner blocks but don't actually use them. There is a typo
in their code so instead of doing control_noise_refiner -> control_layers
it runs the whole control_layers twice.
Unfortunately they trained with this typo so the model works but is kind
of slow and would probably perform a lot better if they corrected their
code and trained it again.