[doc] recommend pip instead of conda (#8446)

This commit is contained in:
youkaichao 2024-09-12 23:52:41 -07:00 committed by GitHub
parent 9b4a3b235e
commit cab69a15e4
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -26,6 +26,10 @@ You can install vLLM using pip:
$ # Install vLLM with CUDA 12.1. $ # Install vLLM with CUDA 12.1.
$ pip install vllm $ pip install vllm
.. note::
Although we recommend using ``conda`` to create and manage Python environments, it is highly recommended to use ``pip`` to install vLLM. This is because ``pip`` can install ``torch`` with separate library packages like ``NCCL``, while ``conda`` installs ``torch`` with statically linked ``NCCL``. This can cause issues when vLLM tries to use ``NCCL``. See `this issue <https://github.com/vllm-project/vllm/issues/8420>`_ for more details.
.. note:: .. note::
As of now, vLLM's binaries are compiled with CUDA 12.1 and public PyTorch release versions by default. As of now, vLLM's binaries are compiled with CUDA 12.1 and public PyTorch release versions by default.
@ -34,7 +38,7 @@ You can install vLLM using pip:
.. code-block:: console .. code-block:: console
$ # Install vLLM with CUDA 11.8. $ # Install vLLM with CUDA 11.8.
$ export VLLM_VERSION=0.4.0 $ export VLLM_VERSION=0.6.1.post1
$ export PYTHON_VERSION=310 $ export PYTHON_VERSION=310
$ pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux1_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118 $ pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux1_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
@ -48,7 +52,7 @@ You can install vLLM using pip:
.. code-block:: console .. code-block:: console
$ export VLLM_VERSION=0.5.4 # vLLM's main branch version is currently set to latest released tag $ export VLLM_VERSION=0.6.1.post1 # vLLM's main branch version is currently set to latest released tag
$ pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-${VLLM_VERSION}-cp38-abi3-manylinux1_x86_64.whl $ pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-${VLLM_VERSION}-cp38-abi3-manylinux1_x86_64.whl
$ # You can also access a specific commit $ # You can also access a specific commit
$ # export VLLM_COMMIT=... $ # export VLLM_COMMIT=...
@ -80,11 +84,11 @@ You can also build and install vLLM from source:
.. tip:: .. tip::
Building from source requires quite a lot compilation. If you are building from source for multiple times, it is beneficial to cache the compilation results. For example, you can install `ccache <https://github.com/ccache/ccache>`_ via either `conda install ccache` or `apt install ccache` . As long as `which ccache` command can find the `ccache` binary, it will be used automatically by the build system. After the first build, the subsequent builds will be much faster. Building from source requires quite a lot compilation. If you are building from source for multiple times, it is beneficial to cache the compilation results. For example, you can install `ccache <https://github.com/ccache/ccache>`_ via either ``conda install ccache`` or ``apt install ccache`` . As long as ``which ccache`` command can find the ``ccache`` binary, it will be used automatically by the build system. After the first build, the subsequent builds will be much faster.
.. tip:: .. tip::
To avoid your system being overloaded, you can limit the number of compilation jobs To avoid your system being overloaded, you can limit the number of compilation jobs
to be run simultaneously, via the environment variable `MAX_JOBS`. For example: to be run simultaneously, via the environment variable ``MAX_JOBS``. For example:
.. code-block:: console .. code-block:: console
@ -99,7 +103,7 @@ You can also build and install vLLM from source:
$ # Use `--ipc=host` to make sure the shared memory is large enough. $ # Use `--ipc=host` to make sure the shared memory is large enough.
$ docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.10-py3 $ docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.10-py3
If you don't want to use docker, it is recommended to have a full installation of CUDA Toolkit. You can download and install it from `the official website <https://developer.nvidia.com/cuda-toolkit-archive>`_. After installation, set the environment variable `CUDA_HOME` to the installation path of CUDA Toolkit, and make sure that the `nvcc` compiler is in your `PATH`, e.g.: If you don't want to use docker, it is recommended to have a full installation of CUDA Toolkit. You can download and install it from `the official website <https://developer.nvidia.com/cuda-toolkit-archive>`_. After installation, set the environment variable ``CUDA_HOME`` to the installation path of CUDA Toolkit, and make sure that the ``nvcc`` compiler is in your ``PATH``, e.g.:
.. code-block:: console .. code-block:: console