rattus 60ee574748
retune lowVramPatch VRAM accounting (#11173)
In the lowvram case, this now does its math in the model dtype in the
post de-quantization domain. Account for that. The patching was also
put back on the compute stream getting it off-peak so relax the
MATH_FACTOR to only x2 so get out of the worst-case assumption of
everything peaking at once.
2025-12-08 15:18:06 -05:00
..
2024-06-27 18:43:11 -04:00
2025-11-28 19:40:19 -05:00
2025-09-02 15:36:22 -04:00
2025-01-24 06:15:54 -05:00
2025-07-06 07:07:39 -04:00
2025-10-25 23:07:29 -04:00
2025-12-05 22:20:22 -05:00
2025-12-05 23:01:19 -05:00