From 3ec8c25cd07c4a3d747b846ece8e305a7fb44349 Mon Sep 17 00:00:00 2001 From: Suhong Moon <46987248+SuhongMoon@users.noreply.github.com> Date: Sun, 17 Dec 2023 13:51:57 -0500 Subject: [PATCH] [Docs] Update documentation for gpu-memory-utilization option (#2162) --- docs/source/models/engine_args.rst | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/source/models/engine_args.rst b/docs/source/models/engine_args.rst index a70c22e9af11a..d89b795149501 100644 --- a/docs/source/models/engine_args.rst +++ b/docs/source/models/engine_args.rst @@ -89,9 +89,11 @@ Below, you can find an explanation of every engine argument for vLLM: CPU swap space size (GiB) per GPU. -.. option:: --gpu-memory-utilization +.. option:: --gpu-memory-utilization - The percentage of GPU memory to be used for the model executor. + The fraction of GPU memory to be used for the model executor, which can range from 0 to 1. + For example, a value of 0.5 would imply 50% GPU memory utilization. + If unspecified, will use the default value of 0.9. .. option:: --max-num-batched-tokens