mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2026-04-04 04:37:03 +08:00
[Docs] Use 1-2-3 list for deploy steps in deployment/frameworks/ (#24633)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
This commit is contained in:
parent
ba6011027d
commit
d14c4ebf08
@ -4,9 +4,7 @@
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Setup vLLM environment
|
||||
|
||||
- Setup [AutoGen](https://microsoft.github.io/autogen/0.2/docs/installation/) environment
|
||||
Set up the vLLM and [AutoGen](https://microsoft.github.io/autogen/0.2/docs/installation/) environment:
|
||||
|
||||
```bash
|
||||
pip install vllm
|
||||
@ -18,14 +16,14 @@ pip install -U "autogen-agentchat" "autogen-ext[openai]"
|
||||
|
||||
## Deploy
|
||||
|
||||
- Start the vLLM server with the supported chat completion model, e.g.
|
||||
1. Start the vLLM server with the supported chat completion model, e.g.
|
||||
|
||||
```bash
|
||||
python -m vllm.entrypoints.openai.api_server \
|
||||
--model mistralai/Mistral-7B-Instruct-v0.2
|
||||
```
|
||||
```bash
|
||||
python -m vllm.entrypoints.openai.api_server \
|
||||
--model mistralai/Mistral-7B-Instruct-v0.2
|
||||
```
|
||||
|
||||
- Call it with AutoGen:
|
||||
1. Call it with AutoGen:
|
||||
|
||||
??? code
|
||||
|
||||
|
||||
@ -6,27 +6,31 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Setup vLLM environment
|
||||
Set up the vLLM environment:
|
||||
|
||||
```bash
|
||||
pip install vllm
|
||||
```
|
||||
|
||||
## Deploy
|
||||
|
||||
- Start the vLLM server with the supported chat completion model, e.g.
|
||||
1. Start the vLLM server with the supported chat completion model, e.g.
|
||||
|
||||
```bash
|
||||
vllm serve qwen/Qwen1.5-0.5B-Chat
|
||||
```
|
||||
```bash
|
||||
vllm serve qwen/Qwen1.5-0.5B-Chat
|
||||
```
|
||||
|
||||
- Download and install [Chatbox desktop](https://chatboxai.app/en#download).
|
||||
1. Download and install [Chatbox desktop](https://chatboxai.app/en#download).
|
||||
|
||||
- On the bottom left of settings, Add Custom Provider
|
||||
1. On the bottom left of settings, Add Custom Provider
|
||||
- API Mode: `OpenAI API Compatible`
|
||||
- Name: vllm
|
||||
- API Host: `http://{vllm server host}:{vllm server port}/v1`
|
||||
- API Path: `/chat/completions`
|
||||
- Model: `qwen/Qwen1.5-0.5B-Chat`
|
||||
|
||||

|
||||

|
||||
|
||||
- Go to `Just chat`, and start to chat:
|
||||
1. Go to `Just chat`, and start to chat:
|
||||
|
||||

|
||||

|
||||
|
||||
@ -8,44 +8,50 @@ This guide walks you through deploying Dify using a vLLM backend.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Setup vLLM environment
|
||||
- Install [Docker](https://docs.docker.com/engine/install/) and [Docker Compose](https://docs.docker.com/compose/install/)
|
||||
Set up the vLLM environment:
|
||||
|
||||
```bash
|
||||
pip install vllm
|
||||
```
|
||||
|
||||
And install [Docker](https://docs.docker.com/engine/install/) and [Docker Compose](https://docs.docker.com/compose/install/).
|
||||
|
||||
## Deploy
|
||||
|
||||
- Start the vLLM server with the supported chat completion model, e.g.
|
||||
1. Start the vLLM server with the supported chat completion model, e.g.
|
||||
|
||||
```bash
|
||||
vllm serve Qwen/Qwen1.5-7B-Chat
|
||||
```
|
||||
```bash
|
||||
vllm serve Qwen/Qwen1.5-7B-Chat
|
||||
```
|
||||
|
||||
- Start the Dify server with docker compose ([details](https://github.com/langgenius/dify?tab=readme-ov-file#quick-start)):
|
||||
1. Start the Dify server with docker compose ([details](https://github.com/langgenius/dify?tab=readme-ov-file#quick-start)):
|
||||
|
||||
```bash
|
||||
git clone https://github.com/langgenius/dify.git
|
||||
cd dify
|
||||
cd docker
|
||||
cp .env.example .env
|
||||
docker compose up -d
|
||||
```
|
||||
```bash
|
||||
git clone https://github.com/langgenius/dify.git
|
||||
cd dify
|
||||
cd docker
|
||||
cp .env.example .env
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
- Open the browser to access `http://localhost/install`, config the basic login information and login.
|
||||
1. Open the browser to access `http://localhost/install`, config the basic login information and login.
|
||||
|
||||
- In the top-right user menu (under the profile icon), go to Settings, then click `Model Provider`, and locate the `vLLM` provider to install it.
|
||||
1. In the top-right user menu (under the profile icon), go to Settings, then click `Model Provider`, and locate the `vLLM` provider to install it.
|
||||
|
||||
1. Fill in the model provider details as follows:
|
||||
|
||||
- Fill in the model provider details as follows:
|
||||
- **Model Type**: `LLM`
|
||||
- **Model Name**: `Qwen/Qwen1.5-7B-Chat`
|
||||
- **API Endpoint URL**: `http://{vllm_server_host}:{vllm_server_port}/v1`
|
||||
- **Model Name for API Endpoint**: `Qwen/Qwen1.5-7B-Chat`
|
||||
- **Completion Mode**: `Completion`
|
||||
|
||||

|
||||

|
||||
|
||||
- To create a test chatbot, go to `Studio → Chatbot → Create from Blank`, then select Chatbot as the type:
|
||||
1. To create a test chatbot, go to `Studio → Chatbot → Create from Blank`, then select Chatbot as the type:
|
||||
|
||||

|
||||

|
||||
|
||||
- Click the chatbot you just created to open the chat interface and start interacting with the model:
|
||||
1. Click the chatbot you just created to open the chat interface and start interacting with the model:
|
||||
|
||||

|
||||

|
||||
|
||||
@ -6,7 +6,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Setup vLLM and Haystack environment
|
||||
Set up the vLLM and Haystack environment:
|
||||
|
||||
```bash
|
||||
pip install vllm haystack-ai
|
||||
@ -14,13 +14,13 @@ pip install vllm haystack-ai
|
||||
|
||||
## Deploy
|
||||
|
||||
- Start the vLLM server with the supported chat completion model, e.g.
|
||||
1. Start the vLLM server with the supported chat completion model, e.g.
|
||||
|
||||
```bash
|
||||
vllm serve mistralai/Mistral-7B-Instruct-v0.1
|
||||
```
|
||||
```bash
|
||||
vllm serve mistralai/Mistral-7B-Instruct-v0.1
|
||||
```
|
||||
|
||||
- Use the `OpenAIGenerator` and `OpenAIChatGenerator` components in Haystack to query the vLLM server.
|
||||
1. Use the `OpenAIGenerator` and `OpenAIChatGenerator` components in Haystack to query the vLLM server.
|
||||
|
||||
??? code
|
||||
|
||||
|
||||
@ -13,7 +13,7 @@ And LiteLLM supports all models on VLLM.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Setup vLLM and litellm environment
|
||||
Set up the vLLM and litellm environment:
|
||||
|
||||
```bash
|
||||
pip install vllm litellm
|
||||
@ -23,13 +23,13 @@ pip install vllm litellm
|
||||
|
||||
### Chat completion
|
||||
|
||||
- Start the vLLM server with the supported chat completion model, e.g.
|
||||
1. Start the vLLM server with the supported chat completion model, e.g.
|
||||
|
||||
```bash
|
||||
vllm serve qwen/Qwen1.5-0.5B-Chat
|
||||
```
|
||||
```bash
|
||||
vllm serve qwen/Qwen1.5-0.5B-Chat
|
||||
```
|
||||
|
||||
- Call it with litellm:
|
||||
1. Call it with litellm:
|
||||
|
||||
??? code
|
||||
|
||||
@ -51,13 +51,13 @@ vllm serve qwen/Qwen1.5-0.5B-Chat
|
||||
|
||||
### Embeddings
|
||||
|
||||
- Start the vLLM server with the supported embedding model, e.g.
|
||||
1. Start the vLLM server with the supported embedding model, e.g.
|
||||
|
||||
```bash
|
||||
vllm serve BAAI/bge-base-en-v1.5
|
||||
```
|
||||
```bash
|
||||
vllm serve BAAI/bge-base-en-v1.5
|
||||
```
|
||||
|
||||
- Call it with litellm:
|
||||
1. Call it with litellm:
|
||||
|
||||
```python
|
||||
from litellm import embedding
|
||||
|
||||
@ -11,7 +11,7 @@ Here are the integrations:
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Setup vLLM and langchain environment
|
||||
Set up the vLLM and langchain environment:
|
||||
|
||||
```bash
|
||||
pip install -U vllm \
|
||||
@ -22,33 +22,33 @@ pip install -U vllm \
|
||||
|
||||
### Deploy
|
||||
|
||||
- Start the vLLM server with the supported embedding model, e.g.
|
||||
1. Start the vLLM server with the supported embedding model, e.g.
|
||||
|
||||
```bash
|
||||
# Start embedding service (port 8000)
|
||||
vllm serve ssmits/Qwen2-7B-Instruct-embed-base
|
||||
```
|
||||
```bash
|
||||
# Start embedding service (port 8000)
|
||||
vllm serve ssmits/Qwen2-7B-Instruct-embed-base
|
||||
```
|
||||
|
||||
- Start the vLLM server with the supported chat completion model, e.g.
|
||||
1. Start the vLLM server with the supported chat completion model, e.g.
|
||||
|
||||
```bash
|
||||
# Start chat service (port 8001)
|
||||
vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
|
||||
```
|
||||
```bash
|
||||
# Start chat service (port 8001)
|
||||
vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
|
||||
```
|
||||
|
||||
- Use the script: <gh-file:examples/online_serving/retrieval_augmented_generation_with_langchain.py>
|
||||
1. Use the script: <gh-file:examples/online_serving/retrieval_augmented_generation_with_langchain.py>
|
||||
|
||||
- Run the script
|
||||
1. Run the script
|
||||
|
||||
```python
|
||||
python retrieval_augmented_generation_with_langchain.py
|
||||
```
|
||||
```python
|
||||
python retrieval_augmented_generation_with_langchain.py
|
||||
```
|
||||
|
||||
## vLLM + llamaindex
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Setup vLLM and llamaindex environment
|
||||
Set up the vLLM and llamaindex environment:
|
||||
|
||||
```bash
|
||||
pip install vllm \
|
||||
@ -60,24 +60,24 @@ pip install vllm \
|
||||
|
||||
### Deploy
|
||||
|
||||
- Start the vLLM server with the supported embedding model, e.g.
|
||||
1. Start the vLLM server with the supported embedding model, e.g.
|
||||
|
||||
```bash
|
||||
# Start embedding service (port 8000)
|
||||
vllm serve ssmits/Qwen2-7B-Instruct-embed-base
|
||||
```
|
||||
```bash
|
||||
# Start embedding service (port 8000)
|
||||
vllm serve ssmits/Qwen2-7B-Instruct-embed-base
|
||||
```
|
||||
|
||||
- Start the vLLM server with the supported chat completion model, e.g.
|
||||
1. Start the vLLM server with the supported chat completion model, e.g.
|
||||
|
||||
```bash
|
||||
# Start chat service (port 8001)
|
||||
vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
|
||||
```
|
||||
```bash
|
||||
# Start chat service (port 8001)
|
||||
vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
|
||||
```
|
||||
|
||||
- Use the script: <gh-file:examples/online_serving/retrieval_augmented_generation_with_llamaindex.py>
|
||||
1. Use the script: <gh-file:examples/online_serving/retrieval_augmented_generation_with_llamaindex.py>
|
||||
|
||||
- Run the script
|
||||
1. Run the script:
|
||||
|
||||
```python
|
||||
python retrieval_augmented_generation_with_llamaindex.py
|
||||
```
|
||||
```python
|
||||
python retrieval_augmented_generation_with_llamaindex.py
|
||||
```
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user