[Core] Run garbage collector after CUDA graph capture to fix throughput regression (#24128)

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2026-01-25 13:04:30 +08:00 · 2025-09-09 09:38:10 -05:00 · 2025-09-09 09:38:10 -05:00 · 1c63a16b65
commit 1c63a16b65
parent 922d3b401b
1 changed files with 1 additions and 0 deletions
--- a/vllm/v1/worker/gpu_model_runner.py
+++ b/vllm/v1/worker/gpu_model_runner.py
@ -2885,6 +2885,7 @@ class GPUModelRunner(LoRAModelRunnerMixin, KVConnectorModelRunnerMixin):
            finally:
                if should_freeze:
                    gc.unfreeze()
+                    gc.collect()

        # Trigger CUDA graph capture for specific shapes.
        # Capture the large shapes first so that the smaller shapes