diff --git a/docs/source/deployment/frameworks/index.md b/docs/source/deployment/frameworks/index.md index 9744f5f4d3626..3408c6c10edef 100644 --- a/docs/source/deployment/frameworks/index.md +++ b/docs/source/deployment/frameworks/index.md @@ -10,6 +10,7 @@ chatbox dify dstack helm +litellm lobe-chat lws modal diff --git a/docs/source/deployment/frameworks/litellm.md b/docs/source/deployment/frameworks/litellm.md new file mode 100644 index 0000000000000..6dd3607ca5e37 --- /dev/null +++ b/docs/source/deployment/frameworks/litellm.md @@ -0,0 +1,75 @@ +(deployment-litellm)= + +# LiteLLM + +[LiteLLM](https://github.com/BerriAI/litellm) call all LLM APIs using the OpenAI format [Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, Groq etc.] + +LiteLLM manages: + +- Translate inputs to provider's `completion`, `embedding`, and `image_generation` endpoints +- [Consistent output](https://docs.litellm.ai/docs/completion/output), text responses will always be available at `['choices'][0]['message']['content']` +- Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - [Router](https://docs.litellm.ai/docs/routing) +- Set Budgets & Rate limits per project, api key, model [LiteLLM Proxy Server (LLM Gateway)](https://docs.litellm.ai/docs/simple_proxy) + +And LiteLLM supports all models on VLLM. + +## Prerequisites + +- Setup vLLM and litellm environment + +```console +pip install vllm litellm +``` + +## Deploy + +### Chat completion + +- Start the vLLM server with the supported chat completion model, e.g. + +```console +vllm serve qwen/Qwen1.5-0.5B-Chat +``` + +- Call it with litellm: + +```python +import litellm + +messages = [{ "content": "Hello, how are you?","role": "user"}] + +# hosted_vllm is prefix key word and necessary +response = litellm.completion( + model="hosted_vllm/qwen/Qwen1.5-0.5B-Chat", # pass the vllm model name + messages=messages, + api_base="http://{your-vllm-server-host}:{your-vllm-server-port}/v1", + temperature=0.2, + max_tokens=80) + +print(response) +``` + +### Embeddings + +- Start the vLLM server with the supported embedding model, e.g. + +```console +vllm serve BAAI/bge-base-en-v1.5 +``` + +- Call it with litellm: + +```python +from litellm import embedding +import os + +os.environ["HOSTED_VLLM_API_BASE"] = "http://{your-vllm-server-host}:{your-vllm-server-port}/v1" + +# hosted_vllm is prefix key word and necessary +# pass the vllm model name +embedding = embedding(model="hosted_vllm/BAAI/bge-base-en-v1.5", input=["Hello world"]) + +print(embedding) +``` + +For details, see the tutorial [Using vLLM in LiteLLM](https://docs.litellm.ai/docs/providers/vllm).