# Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

Feng Liu¹^*, Shiwei Zhang², Xiaofeng Wang^1,3, Yujie Wei⁴, Haonan Qiu⁵
Yuzhong Zhao¹, Yingya Zhang², Qixiang Ye¹, Fang Wan¹^†

¹University of Chinese Academy of Sciences, ²Alibaba Group
³Institute of Automation, Chinese Academy of Sciences
⁴Fudan University, ⁵Nanyang Technological University

(* Work was done during internship at Alibaba Group. † Corresponding author.)

Paper | Project Page

![visualization](./assets/tisser.png) ## Introduction We introduce Timestep Embedding Aware Cache (TeaCache), a training-free caching approach that estimates and leverages the fluctuating differences among model outputs across timesteps. For more details and visual results, please visit our [project page](https://github.com/LiewFeng/TeaCache). ## Installation Prerequisites: - Python >= 3.10 - PyTorch >= 1.13 (We recommend to use a >2.0 version) - CUDA >= 11.6 We strongly recommend using Anaconda to create a new environment (Python >= 3.10) to run our examples: ```shell conda create -n teacache python=3.10 -y conda activate teacache ``` Install VideoSys: ```shell git clone https://github.com/LiewFeng/TeaCache cd TeaCache pip install -e . ``` ## Evaluation of TeaCache We first generate videos according to VBench's prompts. And then calculate Vbench, PSNR, LPIPS and SSIM based on the video generated. 1. Generate video ``` cd eval/teacache python experiments/latte.py python experiments/opensora.py python experiments/open_sora_plan.py ``` 2. Calculate Vbench score ``` # vbench is calculated independently # get scores for all metrics python vbench/run_vbench.py --video_path aaa --save_path bbb # calculate final score python vbench/cal_vbench.py --score_dir bbb ``` 3. Calculate other metrics ``` # these metrics are calculated compared with original model # gt video is the video of original model # generated video is our methods's results python common_metrics/eval.py --gt_video_dir aa --generated_video_dir bb ``` ## Citation ``` @misc{liu2024timestep, title={Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model}, author={Feng Liu and Shiwei Zhang and Xiaofeng Wang and Yujie Wei and Haonan Qiu and Yuzhong Zhao and Yingya Zhang and Qixiang Ye and Fang Wan}, year={2024}, eprint={2411.19108}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2411.19108} } ``` ## Acknowledgement This repository is built based on [VideoSys](https://github.com/NUS-HPC-AI-Lab/VideoSys). Thanks for their contributions!