When the VAE catches this VRAM OOM, it launches the fallback logic
straight from the exception context.
Python however refs the entire call stack that caused the exception
including any local variables for the sake of exception report and
debugging. In the case of tensors, this can hold on the references
to GBs of VRAM and inhibit the VRAM allocated from freeing them.
So dump the except context completely before going back to the VAE
via the tiler by getting out of the except block with nothing but
a flag.
The greately increases the reliability of the tiler fallback,
especially on low VRAM cards, as with the bug, if the leak randomly
leaked more than the headroom needed for a single tile, the tiler
would fallback would OOM and fail the flow.
* feature: Set the Ascend NPU to use a single one
* Enable the `--cuda-device` parameter to support both CUDA and Ascend NPUs simultaneously.
* Make the code just set the ASCENT_RT_VISIBLE_DEVICES environment variable without any other edits to master branch
---------
Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
* flux: math: Use _addcmul to avoid expensive VRAM intermediate
The rope process can be the VRAM peak and this intermediate
for the addition result before releasing the original can OOM.
addcmul_ it.
* wan: Delete the self attention before cross attention
This saves VRAM when the cross attention and FFN are in play as the
VRAM peak.
Adds installed and required workflow templates version information to the
/system_stats endpoint, allowing the frontend to detect and notify users
when their templates package is outdated.
- Add get_installed_templates_version() and get_required_templates_version()
methods to FrontendManager
- Include templates version info in system_stats response
- Add comprehensive unit tests for the new functionality