Add logo and polish readme (#156)

2025-12-15 03:15:01 +08:00 · 2023-06-19 16:31:13 +08:00 · 2023-06-19 16:31:13 +08:00 · a255885f83
commit a255885f83
parent 5822ede66e
14 changed files with 264 additions and 48 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,15 +1,172 @@
-**/*.pyc
+# Byte-compiled / optimized / DLL files
-**/__pycache__/
+__pycache__/
-*.egg-info/
+*.py[cod]
-*.eggs/
+*$py.class
 *.so
 *.log
 *.csv
 build/
 docs/build/
 dist/
-*.pkl
+# C extensions
-*.png
+*.so
-**/log.txt
+
 # Distribution / packaging
 .Python
 build/
 develop-eggs/
 dist/
 downloads/
 eggs/
 .eggs/
 lib/
 lib64/
 parts/
 sdist/
 var/
 wheels/
 share/python-wheels/
 *.egg-info/
 .installed.cfg
 *.egg
 MANIFEST
 # PyInstaller
 #  Usually these files are written by a python script from a template
 #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 *.manifest
 *.spec
 # Installer logs
 pip-log.txt
 pip-delete-this-directory.txt
 # Unit test / coverage reports
 htmlcov/
 .tox/
 .nox/
 .coverage
 .coverage.*
 .cache
 nosetests.xml
 coverage.xml
 *.cover
 *.py,cover
 .hypothesis/
 .pytest_cache/
 cover/
 # Translations
 *.mo
 *.pot
 # Django stuff:
 *.log
 local_settings.py
 db.sqlite3
 db.sqlite3-journal
 # Flask stuff:
 instance/
 .webassets-cache
 # Scrapy stuff:
 .scrapy
 # Sphinx documentation
 docs/_build/
 # PyBuilder
 .pybuilder/
 target/
 # Jupyter Notebook
 .ipynb_checkpoints
 # IPython
 profile_default/
 ipython_config.py
 # pyenv
 #   For a library or package, you might want to ignore these files since the code is
 #   intended to run in multiple environments; otherwise, check them in:
 # .python-version
 # pipenv
 #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 #   install all needed dependencies.
 #Pipfile.lock
 # poetry
 #   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
 #   This is especially recommended for binary packages to ensure reproducibility, and is more
 #   commonly ignored for libraries.
 #   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
 #poetry.lock
 # pdm
 #   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
 #pdm.lock
 #   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
 #   in version control.
 #   https://pdm.fming.dev/#use-with-ide
 .pdm.toml
 # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
 __pypackages__/
 # Celery stuff
 celerybeat-schedule
 celerybeat.pid
 # SageMath parsed files
 *.sage.py
 # Environments
 .env
 .venv
 env/
 venv/
 ENV/
 env.bak/
 venv.bak/
 # Spyder project settings
 .spyderproject
 .spyproject
 # Rope project settings
 .ropeproject
 # mkdocs documentation
 /site
 # mypy
 .mypy_cache/
 .dmypy.json
 dmypy.json
 # Pyre type checker
 .pyre/
 # pytype static type analyzer
 .pytype/
 # Cython debug symbols
 cython_debug/
 # PyCharm
 #  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
 #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 .idea/
 # VSCode
 .vscode/
 # DS Store
 .DS_Store
 # Results
 *.csv
 # Python pickle files
 *.pkl
--- a/README.md
+++ b/README.md
@ -1,50 +1,84 @@
-# vLLM: Easy, Fast, and Cheap LLM Serving for Everyone
+<p align="center">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="./docs/source/assets/logos/vllm-logo-text-dark.png">
    <img alt="vLLM" src="./docs/source/assets/logos/vllm-logo-text-light.png" width=55%>
  </picture>
 </p>
-| [**Documentation**](https://llm-serving-cacheflow.readthedocs-hosted.com/_/sharing/Cyo52MQgyoAWRQ79XA4iA2k8euwzzmjY?next=/en/latest/) | [**Blog**]() |
+<h3 align="center">
 Easy, fast, and cheap LLM serving for everyone
 </h3>
-vLLM is a fast and easy-to-use library for LLM inference and serving.
+<p align="center">
 | <a href="https://llm-serving-cacheflow.readthedocs-hosted.com/_/sharing/Cyo52MQgyoAWRQ79XA4iA2k8euwzzmjY?next=/en/latest/"><b>Documentation</b></a> | <a href=""><b>Blog</b></a> |
-## Latest News 🔥
+</p>
 ---
 *Latest News* 🔥
 - [2023/06] We officially released vLLM! vLLM has powered [LMSYS Vicuna and Chatbot Arena](https://chat.lmsys.org) since mid April. Check out our [blog post]().
 ---
 vLLM is a fast and easy to use library for LLM inference and serving.
 vLLM is fast with:
 - State-of-the-art serving throughput
 - Efficient management of attention key and value memory with **PagedAttention**
 - Dynamic batching of incoming requests
 - Optimized CUDA kernels
 vLLM is flexible and easy to use with:
 - Seamless integration with popular HuggingFace models
 - High-throughput serving with various decoding algorithms, including *parallel sampling*, *beam search*, and more
 - Tensor parallelism support for distributed inference
 - Streaming outputs
 - OpenAI-compatible API server
 Install vLLM with pip or [from source](https://llm-serving-cacheflow.readthedocs-hosted.com/en/latest/getting_started/installation.html#build-from-source):
 ```bash
 pip install vllm
 ```
 ## Getting Started
 Visit our [documentation](https://llm-serving-cacheflow.readthedocs-hosted.com/_/sharing/Cyo52MQgyoAWRQ79XA4iA2k8euwzzmjY?next=/en/latest/) to get started.
- [Installation](https://llm-serving-cacheflow.readthedocs-hosted.com/_/sharing/Cyo52MQgyoAWRQ79XA4iA2k8euwzzmjY?next=/en/latest/getting_started/installation.html): `pip install vllm`
+- [Installation](https://llm-serving-cacheflow.readthedocs-hosted.com/_/sharing/Cyo52MQgyoAWRQ79XA4iA2k8euwzzmjY?next=/en/latest/getting_started/installation.html)
 - [Quickstart](https://llm-serving-cacheflow.readthedocs-hosted.com/_/sharing/Cyo52MQgyoAWRQ79XA4iA2k8euwzzmjY?next=/en/latest/getting_started/quickstart.html)
 - [Supported Models](https://llm-serving-cacheflow.readthedocs-hosted.com/_/sharing/Cyo52MQgyoAWRQ79XA4iA2k8euwzzmjY?next=/en/latest/models/supported_models.html)
 ## Key Features
 vLLM comes with many powerful features that include:
 - State-of-the-art performance in serving throughput
 - Efficient management of attention key and value memory with **PagedAttention**
 - Seamless integration with popular HuggingFace models
 - Dynamic batching of incoming requests
 - Optimized CUDA kernels
 - High-throughput serving with various decoding algorithms, including *parallel sampling* and *beam search*
 - Tensor parallelism support for distributed inference
 - Streaming outputs
 - OpenAI-compatible API server
 ## Performance
 vLLM outperforms HuggingFace Transformers (HF) by up to 24x and Text Generation Inference (TGI) by up to 3.5x, in terms of throughput.
 For details, check out our [blog post]().
 <p align="center">
-  <img src="./assets/figures/perf_a10g_n1.png" width="45%">
+  <picture>
-  <img src="./assets/figures/perf_a100_n1.png" width="45%">
+  <source media="(prefers-color-scheme: dark)" srcset="./docs/source/assets/figures/perf_a10g_n1_dark.png">
  <img src="./docs/source/assets/figures/perf_a10g_n1_light.png" width="45%">
  </picture>
  <picture>
  <source media="(prefers-color-scheme: dark)" srcset="./docs/source/assets/figures/perf_a100_n1_dark.png">
  <img src="./docs/source/assets/figures/perf_a100_n1_light.png" width="45%">
  </picture>
  <br>
  <em> Serving throughput when each request asks for 1 output completion. </em>
 </p>
 <p align="center">
-  <img src="./assets/figures/perf_a10g_n3.png" width="45%">
+  <picture>
-  <img src="./assets/figures/perf_a100_n3.png" width="45%">
+  <source media="(prefers-color-scheme: dark)" srcset="./docs/source/assets/figures/perf_a10g_n3_dark.png">
-  <br>
+  <img src="./docs/source/assets/figures/perf_a10g_n3_light.png" width="45%">
  </picture>
  <picture>
  <source media="(prefers-color-scheme: dark)" srcset="./docs/source/assets/figures/perf_a100_n3_dark.png">
  <img src="./docs/source/assets/figures/perf_a100_n3_light.png" width="45%">
  </picture>  <br>
  <em> Serving throughput when each request asks for 3 output completions. </em>
 </p>
--- a/docs/source/assets/figures/perf_a100_n1_dark.png
+++ b/docs/source/assets/figures/perf_a100_n1_dark.png
--- a/docs/source/assets/figures/perf_a100_n1_light.png
+++ b/docs/source/assets/figures/perf_a100_n1_light.png
--- a/docs/source/assets/figures/perf_a100_n3_dark.png
+++ b/docs/source/assets/figures/perf_a100_n3_dark.png
--- a/docs/source/assets/figures/perf_a100_n3_light.png
+++ b/docs/source/assets/figures/perf_a100_n3_light.png
--- a/docs/source/assets/figures/perf_a10g_n1_dark.png
+++ b/docs/source/assets/figures/perf_a10g_n1_dark.png
--- a/docs/source/assets/figures/perf_a10g_n1_light.png
+++ b/docs/source/assets/figures/perf_a10g_n1_light.png
--- a/docs/source/assets/figures/perf_a10g_n3_dark.png
+++ b/docs/source/assets/figures/perf_a10g_n3_dark.png
--- a/docs/source/assets/figures/perf_a10g_n3_light.png
+++ b/docs/source/assets/figures/perf_a10g_n3_light.png
--- a/docs/source/assets/logos/vllm-logo-only-light.png
+++ b/docs/source/assets/logos/vllm-logo-only-light.png
--- a/docs/source/assets/logos/vllm-logo-text-dark.png
+++ b/docs/source/assets/logos/vllm-logo-text-dark.png
--- a/docs/source/assets/logos/vllm-logo-text-light.png
+++ b/docs/source/assets/logos/vllm-logo-text-light.png
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -1,18 +1,43 @@
 Welcome to vLLM!
 ================
-**vLLM** is a fast and easy-to-use library for LLM inference and serving.
+.. figure:: ./assets/logos/vllm-logo-text-light.png
-Its core features include:
+  :width: 60%
  :align: center
  :alt: vLLM
  :class: no-scaled-link
- State-of-the-art performance in serving throughput
+.. raw:: html
- Efficient management of attention key and value memory with **PagedAttention**
+
- Seamless integration with popular HuggingFace models
+   <p style="text-align:center">
- Dynamic batching of incoming requests
+   <strong>Easy, fast, and cheap LLM serving for everyone
- Optimized CUDA kernels
+   </strong>
- High-throughput serving with various decoding algorithms, including *parallel sampling* and *beam search*
+   </p>
- Tensor parallelism support for distributed inference
+
- Streaming outputs
+   <p style="text-align:center">
- OpenAI-compatible API server
+   <a class="github-button" href="https://github.com/WoosukKwon/vllm" data-show-count="true" data-size="large" aria-label="Star skypilot-org/skypilot on GitHub">Star</a>
   <a class="github-button" href="https://github.com/WoosukKwon/vllm/subscription" data-icon="octicon-eye" data-size="large" aria-label="Watch skypilot-org/skypilot on GitHub">Watch</a>
   <a class="github-button" href="https://github.com/WoosukKwon/vllm/fork" data-icon="octicon-repo-forked" data-size="large" aria-label="Fork skypilot-org/skypilot on GitHub">Fork</a>
   </p>
 vLLM is a fast and easy to use library for LLM inference and serving.
 vLLM is fast with:
 * State-of-the-art serving throughput
 * Efficient management of attention key and value memory with **PagedAttention**
 * Dynamic batching of incoming requests
 * Optimized CUDA kernels
 vLLM is flexible and easy to use with:
 * Seamless integration with popular HuggingFace models
 * High-throughput serving with various decoding algorithms, including *parallel sampling*, *beam search*, and more
 * Tensor parallelism support for distributed inference
 * Streaming outputs
 * OpenAI-compatible API server
 For more information, please refer to our `blog post <>`_.