diff --git a/docs/source/index.rst b/docs/source/index.rst index caa1935cbfe4..0231ce670db1 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -66,6 +66,7 @@ Documentation serving/run_on_sky serving/deploying_with_triton serving/deploying_with_docker + serving/serving_with_langchain .. toctree:: :maxdepth: 1 diff --git a/docs/source/serving/serving_with_langchain.rst b/docs/source/serving/serving_with_langchain.rst new file mode 100644 index 000000000000..8ae75d7a80d2 --- /dev/null +++ b/docs/source/serving/serving_with_langchain.rst @@ -0,0 +1,31 @@ +.. _run_on_langchain: + +Serving with Langchain +============================ + +vLLM is also available via `Langchain `_ . + +To install langchain, run + +.. code-block:: console + + $ pip install langchain -q + +To run inference on a single or multiple GPUs, use ``VLLM`` class from ``langchain``. + +.. code-block:: python + + from langchain.llms import VLLM + + llm = VLLM(model="mosaicml/mpt-7b", + trust_remote_code=True, # mandatory for hf models + max_new_tokens=128, + top_k=10, + top_p=0.95, + temperature=0.8, + # tensor_parallel_size=... # for distributed inference + ) + + print(llm("What is the capital of France ?")) + +Please refer to this `Tutorial `_ for more details. \ No newline at end of file