TeaCache/TeaCache4CogVideoX1.5/README.md

<!-- ## **TeaCache4CogVideoX1.5** -->
# TeaCache4CogVideoX1.5

[TeaCache](https://github.com/LiewFeng/TeaCache) can speedup [CogVideoX1.5](https://github.com/THUDM/CogVideo) 1.8x without much visual quality degradation, in a training-free manner. The following video shows the results generated by TeaCache-CogVideoX1.5 with various `rel_l1_thresh` values: 0 (original), 0.1 (1.3x speedup), 0.2 (1.8x speedup), and 0.3(2.1x speedup).Additionally, the image-to-video (i2v) results are also demonstrated, with the following speedups: 0.1 (1.5x speedup), 0.2 (2.2x speedup), and 0.3 (2.7x speedup).

https://github.com/user-attachments/assets/21261b03-71c6-47bf-9769-2a81c8dc452f

https://github.com/user-attachments/assets/5e98e646-4034-4ae7-9680-a65ecd88dac9

## 📈 Inference Latency Comparisons on a Single H100 GPU

| CogVideoX1.5-t2v | TeaCache (0.1) | TeaCache (0.2) | TeaCache (0.3) |
| :--------------: | :------------: | :------------: | :------------: |
|      ~465 s      |     ~322 s     |     ~260 s     |     ~204 s     |

| CogVideoX1.5-i2v | TeaCache (0.1) | TeaCache (0.2) | TeaCache (0.3) |
| :--------------: | :------------: | :------------: | :------------: |
|      ~475 s      |     ~316 s     |     ~239 s     |     ~204 s     |

## Installation

```shell
pip install --upgrade diffusers[torch] transformers protobuf tokenizers sentencepiece imageio imageio-ffmpeg
```

## Usage

You can modify the `rel_l1_thresh` to obtain your desired trade-off between latency and visul quality, and change the `ckpts_path`, `prompt`, `image_path` to customize your identity-preserving video.

For T2V inference, you can use the following command:

```bash
cd TeaCache4CogVideoX1.5

python3 teacache_sample_video.py \
    --rel_l1_thresh 0.2 \
    --ckpts_path THUDM/CogVideoX1.5-5B \
    --prompt "A clear, turquoise river flows through a rocky canyon, cascading over a small waterfall and forming a pool of water at the bottom. The river is the main focus of the scene, with its clear water reflecting the surrounding trees and rocks. The canyon walls are steep and rocky, with some vegetation growing on them. The trees are mostly pine trees, with their green needles contrasting with the brown and gray rocks. The overall tone of the scene is one of peace and tranquility." \
    --seed 42 \
    --num_inference_steps 50 \
    --output_path ./teacache_results
```

For I2V inference, you can use the following command:

```bash
cd TeaCache4CogVideoX1.5

python3 teacache_sample_video.py \
    --rel_l1_thresh 0.1 \
    --ckpts_path THUDM/CogVideoX1.5-5B-I2V \
    --prompt "A girl gazed at the camera and smiled, her hair drifting in the wind." \
    --seed 42 \
    --num_inference_steps 50 \
    --output_path ./teacache_results \
    --image_path ./image/path \
```

## Citation

If you find TeaCache is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.

```
@article{liu2024timestep,
  title={Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model},
  author={Liu, Feng and Zhang, Shiwei and Wang, Xiaofeng and Wei, Yujie and Qiu, Haonan and Zhao, Yuzhong and Zhang, Yingya and Ye, Qixiang and Wan, Fang},
  journal={arXiv preprint arXiv:2411.19108},
  year={2024}
}
```


## Acknowledgements

We would like to thank the contributors to the [CogVideoX](https://github.com/THUDM/CogVideo) and [Diffusers](https://github.com/huggingface/diffusers).