mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-10 09:45:49 +08:00
docs: add instruction for langchain (#1162)
This commit is contained in:
parent
d27f4bae39
commit
05a38612b0
@ -66,6 +66,7 @@ Documentation
|
||||
serving/run_on_sky
|
||||
serving/deploying_with_triton
|
||||
serving/deploying_with_docker
|
||||
serving/serving_with_langchain
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
31
docs/source/serving/serving_with_langchain.rst
Normal file
31
docs/source/serving/serving_with_langchain.rst
Normal file
@ -0,0 +1,31 @@
|
||||
.. _run_on_langchain:
|
||||
|
||||
Serving with Langchain
|
||||
============================
|
||||
|
||||
vLLM is also available via `Langchain <https://github.com/langchain-ai/langchain>`_ .
|
||||
|
||||
To install langchain, run
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ pip install langchain -q
|
||||
|
||||
To run inference on a single or multiple GPUs, use ``VLLM`` class from ``langchain``.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.llms import VLLM
|
||||
|
||||
llm = VLLM(model="mosaicml/mpt-7b",
|
||||
trust_remote_code=True, # mandatory for hf models
|
||||
max_new_tokens=128,
|
||||
top_k=10,
|
||||
top_p=0.95,
|
||||
temperature=0.8,
|
||||
# tensor_parallel_size=... # for distributed inference
|
||||
)
|
||||
|
||||
print(llm("What is the capital of France ?"))
|
||||
|
||||
Please refer to this `Tutorial <https://github.com/langchain-ai/langchain/blob/master/docs/extras/integrations/llms/vllm.ipynb>`_ for more details.
|
||||
Loading…
x
Reference in New Issue
Block a user