Lightricks LTXVideo: Sleeper Hit Open Source Video Generation

Lightricks LTXVideo: Sleeper Hit Open Source Video Generation

With new packages like Mochi and Hunyuan Video now out, there have been some other video packages that have come out that have also slipped under the radar that definitely deserve some more love. LTXVideo by Lightricks appears to be slept on despite coming out with an out of the box, state of the art length of 251 frames for its video generation, along with text, image, and video prompting methods through some easy to install and use ComfyUI workflows. Let's look at how to get this set up in a pod.

Although an A100 or H100 is still recommended for maximum video length and quality, the package is optimized for speed and usability and can comfortably run on lower GPU specs, with a 48GB model like the A40 more than capable of utilizing the best the package has to offer. Let's go through setting up the package in your ComfyUI pod.

Startup and Updating ComfyUI

  1. Spin up a ComfyUI pod of any flavor you choose (in this example I'm using ComfyUI and Manager and Downloader by Camenduru.) An A100 or H100 is recommended but a 48GB GPU should be adequate for smaller tasks.
  2. Once the pod is up and running, update it to the latest version (if it looks like the screenshot with the menu on the right, you need to update)

You can update under the Update ComfyUI option on the menu and hit Restart, and then refresh your browser.

This will get you onto the latest version.

Installing LTXVideo Nodes in ComfyUI

Go to the ComfyUI Manager and install both LTX node groups, and restart again.

Do the same for VideoHelperSuite.

Downloading the LTX Repo

Clone the repo per the instructions. On RunPod, this can be accomplished by:

  1. Connecting to the web terminal and downloading this .safetensors model into your checkpoints directory:
root@deb85afc7e5e:/content/ComfyUI# cd models/
root@deb85afc7e5e:/content/ComfyUI/models# cd checkpoints/
root@deb85afc7e5e:/content/ComfyUI/models/checkpoints# ls
put_checkpoints_here
root@deb85afc7e5e:/content/ComfyUI/models/checkpoints# wget ^C
root@deb85afc7e5e:/content/ComfyUI/models/checkpoints# wget https://huggingface.co/Lightricks/LTX-Video/resolve/main/ltx-video-2b-v0.9.safetensors

This will download the file through wget (just one file, so it's easier this way.)

  1. Install git-lfs if it's not already installed:
root@deb85afc7e5e:/content/ComfyUI/models/checkpoints# apt-get update
root@deb85afc7e5e:/content/ComfyUI/models/checkpoints# apt-get install git-lfs
  1. Create a text_encoder folder in your models folder and clone the following repo there. It will appear to hang but it is downloading the repo which consists of about 20GB (git-lfs isn't great about showing a progress meter) You can ignore the comment it makes about Windows.
root@deb85afc7e5e:/content/ComfyUI/models/text_encoders# git clone https://huggingface.co/PixArt-alpha/PixArt-XL-2-1024-MS

Unpacking objects: 100% (85/85), 7.10 MiB | 7.59 MiB/s, done.
Filtering content: 100% (8/8), 4.34 GiB | 16.83 MiB/s, done..08 MiB/s
Encountered 2 file(s) that may not have been copied correctly on Windows:
        text_encoder/model-00002-of-00002.safetensors
        text_encoder/model-00001-of-00002.safetensors

Creating your video

You're now ready to start creating videos. LTX has two big advantages over other open source video processes at the moment:

  1. It is very fast - it's able to generate videos in real time, though doing this does require some serious compromises in step count and resolution. Nevertheless, it is easily the fastest open source video package.
  2. It comes with text to video, image to video, and video to video right out of the box, which is a rarity with these packages.

You can download all three workflows off of the repo, and just drag them into your ComfyUI window.

0:00
/0:05

txt2vid: A meteor shower occuring over a pastoral landscape at dusk in the hills of Ireland. The scene has a dreamy quality to it, and the meteors leave long trails as they streak over the countryside. The landscape is dimly lit with the light from cottages that dot the hills of the countryside. (generated in 12 seconds)

0:00
/0:06

img2vid: A woman smiling and speaking animatedly. The video appears to be an audition for a movie, and the woman is saying the words with conviction. (generated in 24 seconds, image example from thispersondoesnotexist.com)

0:00
/0:08

vid2vid: A woman wearing safety goggles looks through a microscope and makes a stunning discovery. Her expression begins as one of contemplation, but then changes to one of suprise, shock, and elation, as if she was amazed by what she saw. (generated in 24 sec, source vid)

Conclusion

While all of the latest crop of open source video packages are able to create state of the art video from just text, LTX is the first capable of generating video in real time or close to it along with a multitude of prompting methods that come ready to use in ComfyUI. We're looking forward to what you create - feel free to show us your renditions on our Discord!