rattus 135fa49ec2
Small speed improvements to --async-offload (#10593)
* ops: dont take an offload stream if you dont need one

* ops: prioritize mem transfer

The async offload streams reason for existence is to transfer from
RAM to GPU. The post processing compute steps are a bonus on the side
stream, but if the compute stream is running a long kernel, it can
stall the side stream, as it wait to type-cast the bias before
transferring the weight. So do a pure xfer of the weight straight up,
then do everything bias, then go back to fix the weight type and do
weight patches.
2025-11-01 18:48:53 -04:00
..
2024-12-20 16:24:55 -05:00
2024-06-27 18:43:11 -04:00
2025-09-02 15:36:22 -04:00
2025-01-24 06:15:54 -05:00
2025-07-06 07:07:39 -04:00
2025-09-15 18:10:55 -04:00
2025-10-25 23:07:29 -04:00
2025-10-30 17:39:02 -04:00