Workflow: Image -> Autocaption (Prompt) -> WAN I2V with Upscale and Frame Interpolation and Video Extension

There is a Florence Caption Version and a LTX Prompt Enhancer (LTXPE) version. LTXPE is more heavy on VRAM

V1.0 WAN 2.2. 14B Image to Video workflow with LightX2v Lora support for low steps (4-8 steps)

Wan 2.2. uses 2 models to process a clip. A High Noise and a Low Noise model, processed in sequence.
compatible with LightX2v Lora from Wan2.1 to process clips fast with low steps.
compatible to some of the Wan2.1 Loras, required to inject twice due to 2 model setup.
See notes in workflow.
GGUF models
5sec clip with 6 Steps @ 480p take about 4mins, including autoprompt, 2x upscaling to 960p & frame interpolation to 30fps. (RTX4080-16gb Vram and 64gb Ram, sage attention)

Models can be donwloaded here:

WAN 2.2. I2V 5B Model (GGUF) workflow with Florence or LTXPE auto caption

location to save those files within your Comfyui folder:

Wan GGUF Model -> models/unet

Textencoder -> models/clip

Vae -> models/vae

Tips:

Default strength of LightX2v Lora with 0.8 is setup for a more realistic look, hair and skin look more real. For anime or comic like look you can increase strength to 1.0 or beyond (black nodes in wokflow)

WAN 2.2 IMAGE to VIDEO with Caption and Postprocessing

Model Details