mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-18 05:55:02 +08:00
[Misc][Doc] Add missing comment for LLM (#20285)
Signed-off-by: Lifan Shen <lifans@meta.com>
This commit is contained in:
parent
9dae7d46bf
commit
9ec1e3065a
@ -132,6 +132,14 @@ class LLM:
|
|||||||
hf_overrides: If a dictionary, contains arguments to be forwarded to the
|
hf_overrides: If a dictionary, contains arguments to be forwarded to the
|
||||||
HuggingFace config. If a callable, it is called to update the
|
HuggingFace config. If a callable, it is called to update the
|
||||||
HuggingFace config.
|
HuggingFace config.
|
||||||
|
mm_processor_kwargs: Arguments to be forwarded to the model's processor
|
||||||
|
for multi-modal data, e.g., image processor. Overrides for the
|
||||||
|
multi-modal processor obtained from `AutoProcessor.from_pretrained`.
|
||||||
|
The available overrides depend on the model that is being run.
|
||||||
|
For example, for Phi-3-Vision: `{"num_crops": 4}`.
|
||||||
|
override_pooler_config: Initialize non-default pooling config or
|
||||||
|
override default pooling config for the pooling model.
|
||||||
|
e.g. `PoolerConfig(pooling_type="mean", normalize=False)`.
|
||||||
compilation_config: Either an integer or a dictionary. If it is an
|
compilation_config: Either an integer or a dictionary. If it is an
|
||||||
integer, it is used as the level of compilation optimization. If it
|
integer, it is used as the level of compilation optimization. If it
|
||||||
is a dictionary, it can specify the full compilation configuration.
|
is a dictionary, it can specify the full compilation configuration.
|
||||||
@ -1347,16 +1355,16 @@ class LLM:
|
|||||||
during the sleep period, before `wake_up` is called.
|
during the sleep period, before `wake_up` is called.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
level: The sleep level. Level 1 sleep will offload the model
|
level: The sleep level. Level 1 sleep will offload the model
|
||||||
weights and discard the kv cache. The content of kv cache
|
weights and discard the kv cache. The content of kv cache
|
||||||
is forgotten. Level 1 sleep is good for sleeping and waking
|
is forgotten. Level 1 sleep is good for sleeping and waking
|
||||||
up the engine to run the same model again. The model weights
|
up the engine to run the same model again. The model weights
|
||||||
are backed up in CPU memory. Please make sure there's enough
|
are backed up in CPU memory. Please make sure there's enough
|
||||||
CPU memory to store the model weights. Level 2 sleep will
|
CPU memory to store the model weights. Level 2 sleep will
|
||||||
discard both the model weights and the kv cache. The content
|
discard both the model weights and the kv cache. The content
|
||||||
of both the model weights and kv cache is forgotten. Level 2
|
of both the model weights and kv cache is forgotten. Level 2
|
||||||
sleep is good for sleeping and waking up the engine to run a
|
sleep is good for sleeping and waking up the engine to run a
|
||||||
different model or update the model, where previous model
|
different model or update the model, where previous model
|
||||||
weights are not needed. It reduces CPU memory pressure.
|
weights are not needed. It reduces CPU memory pressure.
|
||||||
"""
|
"""
|
||||||
self.reset_prefix_cache()
|
self.reset_prefix_cache()
|
||||||
@ -1366,12 +1374,12 @@ class LLM:
|
|||||||
"""
|
"""
|
||||||
Wake up the engine from sleep mode. See the [sleep][] method
|
Wake up the engine from sleep mode. See the [sleep][] method
|
||||||
for more details.
|
for more details.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
tags: An optional list of tags to reallocate the engine memory
|
tags: An optional list of tags to reallocate the engine memory
|
||||||
for specific memory allocations. Values must be in
|
for specific memory allocations. Values must be in
|
||||||
`("weights", "kv_cache")`. If None, all memory is reallocated.
|
`("weights", "kv_cache")`. If None, all memory is reallocated.
|
||||||
wake_up should be called with all tags (or None) before the
|
wake_up should be called with all tags (or None) before the
|
||||||
engine is used again.
|
engine is used again.
|
||||||
"""
|
"""
|
||||||
self.llm_engine.wake_up(tags)
|
self.llm_engine.wake_up(tags)
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user