[Misc][Doc] Add missing comment for LLM (#20285)

Signed-off-by: Lifan Shen <lifans@meta.com>
This commit is contained in:
Lifans 2025-07-01 19:04:24 -07:00 committed by GitHub
parent 9dae7d46bf
commit 9ec1e3065a
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -132,6 +132,14 @@ class LLM:
hf_overrides: If a dictionary, contains arguments to be forwarded to the hf_overrides: If a dictionary, contains arguments to be forwarded to the
HuggingFace config. If a callable, it is called to update the HuggingFace config. If a callable, it is called to update the
HuggingFace config. HuggingFace config.
mm_processor_kwargs: Arguments to be forwarded to the model's processor
for multi-modal data, e.g., image processor. Overrides for the
multi-modal processor obtained from `AutoProcessor.from_pretrained`.
The available overrides depend on the model that is being run.
For example, for Phi-3-Vision: `{"num_crops": 4}`.
override_pooler_config: Initialize non-default pooling config or
override default pooling config for the pooling model.
e.g. `PoolerConfig(pooling_type="mean", normalize=False)`.
compilation_config: Either an integer or a dictionary. If it is an compilation_config: Either an integer or a dictionary. If it is an
integer, it is used as the level of compilation optimization. If it integer, it is used as the level of compilation optimization. If it
is a dictionary, it can specify the full compilation configuration. is a dictionary, it can specify the full compilation configuration.
@ -1347,16 +1355,16 @@ class LLM:
during the sleep period, before `wake_up` is called. during the sleep period, before `wake_up` is called.
Args: Args:
level: The sleep level. Level 1 sleep will offload the model level: The sleep level. Level 1 sleep will offload the model
weights and discard the kv cache. The content of kv cache weights and discard the kv cache. The content of kv cache
is forgotten. Level 1 sleep is good for sleeping and waking is forgotten. Level 1 sleep is good for sleeping and waking
up the engine to run the same model again. The model weights up the engine to run the same model again. The model weights
are backed up in CPU memory. Please make sure there's enough are backed up in CPU memory. Please make sure there's enough
CPU memory to store the model weights. Level 2 sleep will CPU memory to store the model weights. Level 2 sleep will
discard both the model weights and the kv cache. The content discard both the model weights and the kv cache. The content
of both the model weights and kv cache is forgotten. Level 2 of both the model weights and kv cache is forgotten. Level 2
sleep is good for sleeping and waking up the engine to run a sleep is good for sleeping and waking up the engine to run a
different model or update the model, where previous model different model or update the model, where previous model
weights are not needed. It reduces CPU memory pressure. weights are not needed. It reduces CPU memory pressure.
""" """
self.reset_prefix_cache() self.reset_prefix_cache()
@ -1366,12 +1374,12 @@ class LLM:
""" """
Wake up the engine from sleep mode. See the [sleep][] method Wake up the engine from sleep mode. See the [sleep][] method
for more details. for more details.
Args: Args:
tags: An optional list of tags to reallocate the engine memory tags: An optional list of tags to reallocate the engine memory
for specific memory allocations. Values must be in for specific memory allocations. Values must be in
`("weights", "kv_cache")`. If None, all memory is reallocated. `("weights", "kv_cache")`. If None, all memory is reallocated.
wake_up should be called with all tags (or None) before the wake_up should be called with all tags (or None) before the
engine is used again. engine is used again.
""" """
self.llm_engine.wake_up(tags) self.llm_engine.wake_up(tags)