[Core] Run garbage collector after CUDA graph capture to fix throughput regression (#24128)

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
This commit is contained in:
Micah Williamson 2025-09-09 09:38:10 -05:00 committed by GitHub
parent 922d3b401b
commit 1c63a16b65
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -2885,6 +2885,7 @@ class GPUModelRunner(LoRAModelRunnerMixin, KVConnectorModelRunnerMixin):
finally:
if should_freeze:
gc.unfreeze()
gc.collect()
# Trigger CUDA graph capture for specific shapes.
# Capture the large shapes first so that the smaller shapes