mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-11 23:25:32 +08:00
docs: add instruction for langchain (#1162)
This commit is contained in:
parent
d27f4bae39
commit
05a38612b0
@ -66,6 +66,7 @@ Documentation
|
|||||||
serving/run_on_sky
|
serving/run_on_sky
|
||||||
serving/deploying_with_triton
|
serving/deploying_with_triton
|
||||||
serving/deploying_with_docker
|
serving/deploying_with_docker
|
||||||
|
serving/serving_with_langchain
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|||||||
31
docs/source/serving/serving_with_langchain.rst
Normal file
31
docs/source/serving/serving_with_langchain.rst
Normal file
@ -0,0 +1,31 @@
|
|||||||
|
.. _run_on_langchain:
|
||||||
|
|
||||||
|
Serving with Langchain
|
||||||
|
============================
|
||||||
|
|
||||||
|
vLLM is also available via `Langchain <https://github.com/langchain-ai/langchain>`_ .
|
||||||
|
|
||||||
|
To install langchain, run
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ pip install langchain -q
|
||||||
|
|
||||||
|
To run inference on a single or multiple GPUs, use ``VLLM`` class from ``langchain``.
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
from langchain.llms import VLLM
|
||||||
|
|
||||||
|
llm = VLLM(model="mosaicml/mpt-7b",
|
||||||
|
trust_remote_code=True, # mandatory for hf models
|
||||||
|
max_new_tokens=128,
|
||||||
|
top_k=10,
|
||||||
|
top_p=0.95,
|
||||||
|
temperature=0.8,
|
||||||
|
# tensor_parallel_size=... # for distributed inference
|
||||||
|
)
|
||||||
|
|
||||||
|
print(llm("What is the capital of France ?"))
|
||||||
|
|
||||||
|
Please refer to this `Tutorial <https://github.com/langchain-ai/langchain/blob/master/docs/extras/integrations/llms/vllm.ipynb>`_ for more details.
|
||||||
Loading…
x
Reference in New Issue
Block a user