xinyun/ComfyUI-CogVideoXWrapper

mirror of https://git.datalinker.icu/kijai/ComfyUI-CogVideoXWrapper.git synced 2026-06-24 02:47:01 +08:00

Go to file

snomiao 713af2ce1b chore(publish): Add Github Action for Publishing to Comfy Registry

2024-08-28 13:02:51 +00:00

.github/workflows

chore(publish): Add Github Action for Publishing to Comfy Registry

2024-08-28 13:02:51 +00:00

+1 frames

2024-08-28 00:06:32 +03:00

__init__.py

initial

2024-08-06 01:56:25 +03:00

.gitattributes

Initial commit

2024-08-06 01:54:04 +03:00

.gitignore

example

2024-08-06 02:02:04 +03:00

nodes.py

Update nodes.py

2024-08-28 01:16:10 +03:00

pipeline_cogvideox.py

tweaks

2024-08-27 22:43:57 +03:00

readme.md

Update readme.md

2024-08-27 18:01:25 +03:00

requirements.txt

Update requirements.txt

2024-08-27 20:46:40 +03:00

readme.md

WORK IN PROGRESS

Update

5b model is now also supported for basic text2vid: https://huggingface.co/THUDM/CogVideoX-5b

It is also autodownloaded to ComfyUI/models/CogVideo/CogVideoX-5b, text encoder is not needed as we use the ComfyUI T5.

https://github.com/user-attachments/assets/991205cc-826e-4f93-831a-c10441f0f2ce

Requires diffusers 0.30.1 (this is specified in requirements.txt)

Uses same T5 model than SD3 and Flux, fp8 works fine too. Memory requirements depend mostly on the video length. VAE decoding seems to be the only big that takes a lot of VRAM when everything is offloaded, peaks at around 13-14GB momentarily at that stage. Sampling itself takes only maybe 5-6GB.

Hacked in img2img to attempt vid2vid workflow, works interestingly with some inputs, highly experimental.

https://github.com/user-attachments/assets/e6951ef4-ea7a-4752-94f6-cf24f2503d83

https://github.com/user-attachments/assets/9e41f37b-2bb3-411c-81fa-e91b80da2559

Also added temporal tiling as means of generating endless videos:

https://github.com/kijai/ComfyUI-CogVideoXWrapper

https://github.com/user-attachments/assets/ecdac8b8-d434-48b6-abd6-90755b6b552d

Original repo: https://github.com/THUDM/CogVideo