V 2.0:

Comfyui Workflow update with Florence2 Autocaption (instead of BLIP as V 1.0).

Florence2 node added to GUI
- select "caption", "detailed caption" or "more detailed caption"
added text node "replace "photo/image" caption with" and set it to "video". This will replace any "photo" or "image" text in caption with term "video" to push the prompt more into a video related prompt, instead of photo related. Instead of "video" you can try using a term like "animation" or "clip".

V 1.0:

ComfyUi Workflow for LTX Image to Video with Autocaption and enhanced motion by applying compression as described by Throttlekitty in this reddit thread:

https://www.reddit.com/r/StableDiffusion/comments/1h1bb0f/playing_with_the_new_ltx_video_model_pretty/

"By the way, there seems to be a new trick for I2V to get around the "no motion" outputs for the current LTX Video model. It turns out the model doesn't like pristine images, it was trained on videos. So you can pass an image through ffmpeg, use h264 with a CRF around 20-30 to get that compression. Apparently this is enough to get the model to latch on to the image and actually do something with it."

Workflow includes a GUI to manage parameters like video length, CFG, Compression (CRF), etc. As described above the CRF value has impact on the motion, it is supposed to prevent a still image without any motion. Usefull values seem to be between 20-40. The CRF value can be adjusted within the "video combine" node in lower left of the GUI. Increase the value if you dont see motion.

Autocaption can be enhanced by applying Text before (Pre) and after autocaption.

Width/Height & Scale of Input image settings in GUI manages the pic size to be sent to the sampler, usually it is like 768x512, if you apply a scale of 2, it will work a bit like supersampling (recommend to leave as is)

First time you use it, it might downloand missing models (LTX,Blip, upsampler)

simple workflow description:

Load or drag and drop Image (lower left in gui), comfyui will apply upscaling, apply compression and resize to fit LTX output, apply caption, then sends pic to sampler to generate video. You can enter text to "Pre Text" and "After Text" nodes to insert Text before/after autocaption if needed (i.e. describe camera movement).

Description

LTX Image to Video with autocaption workflow

Model Details

Available Files

Tags

Versions

Related Models

Model Information