[Doc]Add asynchronous engine arguments to documentation. (#3810)

Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
This commit is contained in:
Sean Gallen 2024-04-04 23:52:01 -05:00 committed by GitHub
parent c391e4b68e
commit 78107fa091
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -118,3 +118,19 @@ Below, you can find an explanation of every engine argument for vLLM:
.. option:: --quantization (-q) {awq,squeezellm,None}
Method used to quantize the weights.
Async Engine Arguments
----------------------
Below are the additional arguments related to the asynchronous engine:
.. option:: --engine-use-ray
Use Ray to start the LLM engine in a separate process as the server process.
.. option:: --disable-log-requests
Disable logging requests.
.. option:: --max-log-len
Max number of prompt characters or prompt ID numbers being printed in log. Defaults to unlimited.