2024-12-06 20:33:30 +08:00

224 lines
9.0 KiB
Plaintext

Metadata-Version: 2.1
Name: videosys
Version: 2.0.0
Summary: VideoSys
License: Apache Software License 2.0
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Environment :: GPU :: NVIDIA CUDA
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: System :: Distributed Computing
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click
Requires-Dist: colossalai
Requires-Dist: contexttimer
Requires-Dist: diffusers==0.30.0
Requires-Dist: einops
Requires-Dist: fabric
Requires-Dist: ftfy
Requires-Dist: imageio
Requires-Dist: imageio-ffmpeg
Requires-Dist: matplotlib
Requires-Dist: ninja
Requires-Dist: numpy<2.0.0
Requires-Dist: omegaconf
Requires-Dist: packaging
Requires-Dist: psutil
Requires-Dist: pydantic
Requires-Dist: ray
Requires-Dist: rich
Requires-Dist: safetensors
Requires-Dist: timm
Requires-Dist: torch>=1.13
Requires-Dist: tqdm
Requires-Dist: transformers
<p align="center">
<img width="55%" alt="VideoSys" src="./assets/figures/logo.png?raw=true">
</p>
<h3 align="center">
An easy and efficient system for video generation
</h3>
</p>
### Latest News 🔥
- [2024/08] 🔥 Evole from [OpenDiT](https://github.com/NUS-HPC-AI-Lab/VideoSys/tree/v1.0.0) to <b>VideoSys: An easy and efficient system for video generation.</b>
- [2024/08] 🔥 <b>Release PAB paper: [Real-Time Video Generation with Pyramid Attention Broadcast](https://arxiv.org/abs/2408.12588).</b>
- [2024/06] Propose Pyramid Attention Broadcast (PAB)[[paper](https://arxiv.org/abs/2408.12588)][[blog](https://oahzxl.github.io/PAB/)][[doc](./docs/pab.md)], the first approach to achieve <b>real-time</b> DiT-based video generation, delivering <b>negligible quality loss</b> without <b>requiring any training</b>.
- [2024/06] Support [Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan) and [Latte](https://github.com/Vchitect/Latte).
- [2024/03] Propose Dynamic Sequence Parallel (DSP)[[paper](https://arxiv.org/abs/2403.10266)][[doc](./docs/dsp.md)], achieves **3x** speed for training and **2x** speed for inference in Open-Sora compared with sota sequence parallelism.
- [2024/03] Support [Open-Sora: Democratizing Efficient Video Production for All](https://github.com/hpcaitech/Open-Sora).
- [2024/02] 🎉 Release [OpenDiT](https://github.com/NUS-HPC-AI-Lab/VideoSys/tree/v1.0.0): An Easy, Fast and Memory-Efficent System for DiT Training and Inference.
# About
VideoSys is an open-source project that provides a user-friendly and high-performance infrastructure for video generation. This comprehensive toolkit will support the entire pipeline from training and inference to serving and compression.
We are committed to continually integrating cutting-edge open-source video models and techniques. Stay tuned for exciting enhancements and new features on the horizon!
## Installation
Prerequisites:
- Python >= 3.10
- PyTorch >= 1.13 (We recommend to use a >2.0 version)
- CUDA >= 11.6
We strongly recommend using Anaconda to create a new environment (Python >= 3.10) to run our examples:
```shell
conda create -n videosys python=3.10 -y
conda activate videosys
```
Install VideoSys:
```shell
git clone https://github.com/NUS-HPC-AI-Lab/VideoSys
cd VideoSys
pip install -e .
```
## Usage
VideoSys supports many diffusion models with our various acceleration techniques, enabling these models to run faster and consume less memory.
<b>You can find all available models and their supported acceleration techniques in the following table. Click `Doc` to see how to use them.</b>
<table>
<tr>
<th rowspan="2">Model</th>
<th rowspan="2">Train</th>
<th rowspan="2">Infer</th>
<th colspan="2">Acceleration Techniques</th>
<th rowspan="2">Usage</th>
</tr>
<tr>
<th><a href="https://github.com/NUS-HPC-AI-Lab/VideoSys?tab=readme-ov-file#dyanmic-sequence-parallelism-dsp-paperdoc">DSP</a></th>
<th><a href="https://github.com/NUS-HPC-AI-Lab/VideoSys?tab=readme-ov-file#pyramid-attention-broadcast-pab-blogdoc">PAB</a></th>
</tr>
<tr>
<td>Open-Sora [<a href="https://github.com/hpcaitech/Open-Sora">source</a>]</td>
<td align="center">🟡</td>
<td align="center">✅</td>
<td align="center">✅</td>
<td align="center">✅</td>
<td align="center"><a href="./examples/open_sora/sample.py">Code</a></td>
</tr>
<tr>
<td>Open-Sora-Plan [<a href="https://github.com/PKU-YuanGroup/Open-Sora-Plan">source</a>]</td>
<td align="center">/</td>
<td align="center">✅</td>
<td align="center">✅</td>
<td align="center">✅</td>
<td align="center"><a href="./examples/open_sora_plan/sample.py">Code</a></td>
</tr>
<tr>
<td>Latte [<a href="https://github.com/Vchitect/Latte">source</a>]</td>
<td align="center">/</td>
<td align="center">✅</td>
<td align="center">✅</td>
<td align="center">✅</td>
<td align="center"><a href="./examples/latte/sample.py">Code</a></td>
</tr>
<tr>
<td>CogVideoX [<a href="https://github.com/THUDM/CogVideo">source</a>]</td>
<td align="center">/</td>
<td align="center">✅</td>
<td align="center">/</td>
<td align="center">✅</td>
<td align="center"><a href="./examples/cogvideox/sample.py">Code</a></td>
</tr>
</table>
## Acceleration Techniques
### Pyramid Attention Broadcast (PAB) [[paper](https://arxiv.org/abs/2408.12588)][[blog](https://arxiv.org/abs/2403.10266)][[doc](./docs/pab.md)]
Real-Time Video Generation with Pyramid Attention Broadcast
Authors: [Xuanlei Zhao](https://oahzxl.github.io/)<sup>1*</sup>, [Xiaolong Jin]()<sup>2*</sup>, [Kai Wang](https://kaiwang960112.github.io/)<sup>1*</sup>, and [Yang You](https://www.comp.nus.edu.sg/~youy/)<sup>1</sup> (* indicates equal contribution)
<sup>1</sup>National University of Singapore, <sup>2</sup>Purdue University
![method](./assets/figures/pab_method.png)
PAB is the first approach to achieve <b>real-time</b> DiT-based video generation, delivering <b>lossless quality</b> without <b>requiring any training</b>. By mitigating redundant attention computation, PAB achieves up to 21.6 FPS with 10.6x acceleration, without sacrificing quality across popular DiT-based video generation models including [Open-Sora](https://github.com/hpcaitech/Open-Sora), [Latte](https://github.com/Vchitect/Latte) and [Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan).
See its details [here](./docs/pab.md).
----
### Dyanmic Sequence Parallelism (DSP) [[paper](https://arxiv.org/abs/2403.10266)][[doc](./docs/dsp.md)]
![dsp_overview](./assets/figures/dsp_overview.png)
DSP is a novel, elegant and super efficient sequence parallelism for [Open-Sora](https://github.com/hpcaitech/Open-Sora), [Latte](https://github.com/Vchitect/Latte) and other multi-dimensional transformer architecture.
It achieves **3x** speed for training and **2x** speed for inference in Open-Sora compared with sota sequence parallelism ([DeepSpeed Ulysses](https://arxiv.org/abs/2309.14509)). For a 10s (80 frames) of 512x512 video, the inference latency of Open-Sora is:
| Method | 1xH800 | 8xH800 (DS Ulysses) | 8xH800 (DSP) |
| ------ | ------ | ------ | ------ |
| Latency(s) | 106 | 45 | 22 |
See its details [here](./docs/dsp.md).
## Contributing
We welcome and value any contributions and collaborations. Please check out [CONTRIBUTING.md](./CONTRIBUTING.md) for how to get involved.
## Contributors
<a href="https://github.com/NUS-HPC-AI-Lab/VideoSys/graphs/contributors">
<img src="https://contrib.rocks/image?repo=NUS-HPC-AI-Lab/VideoSys"/>
</a>
## Star History
[![Star History Chart](https://api.star-history.com/svg?repos=NUS-HPC-AI-Lab/VideoSys&type=Date)](https://star-history.com/#NUS-HPC-AI-Lab/VideoSys&Date)
## Citation
```
@misc{videosys2024,
author={VideoSys Team},
title={VideoSys: An Easy and Efficient System for Video Generation},
year={2024},
publisher={GitHub},
url = {https://github.com/NUS-HPC-AI-Lab/VideoSys},
}
@misc{zhao2024pab,
title={Real-Time Video Generation with Pyramid Attention Broadcast},
author={Xuanlei Zhao and Xiaolong Jin and Kai Wang and Yang You},
year={2024},
eprint={2408.12588},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.12588},
}
@misc{zhao2024dsp,
title={DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers},
author={Xuanlei Zhao and Shenggan Cheng and Chang Chen and Zangwei Zheng and Ziming Liu and Zheming Yang and Yang You},
year={2024},
eprint={2403.10266},
archivePrefix={arXiv},
primaryClass={cs.DC},
url={https://arxiv.org/abs/2403.10266},
}
@misc{zhao2024opendit,
author={Xuanlei Zhao, Zhongkai Zhao, Ziming Liu, Haotian Zhou, Qianli Ma, and Yang You},
title={OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference},
year={2024},
publisher={GitHub},
url={https://github.com/NUS-HPC-AI-Lab/VideoSys/tree/v1.0.0},
}
```