Workflow: Image -> Autocaption (Prompt) by Florence -> WAN I2V with Upscale and Frame Interpolation

Creates Video Clips with up to 480p resoltion (720p with corresponding model)

V1.0: WAN 2.1. Image to Video with Florence caption or own prompt plus upscale, frame interpolation and clip extend.

Workflow is setup to use a GGUF model, there is an additional workflow included with MultiGPU loader to move some models to CPU or share between multiple GPU, plus a clear Vram node for lower Vram usage.

When generating a Clip you can chose to apply upscaling and/or frame interpolation. Upscale factor depends on upscale model used (2x or 4x, see "load upscale model" node). Frame Interpolation is set to increase frame rate from 16fps (model standard) to 32fps. Result will be shown in "Video Combine Final" node on the right, while the left node shows the unprocessed clip.

Use the switch above Final Video to toggle upscale and frame interpolation, placed some notes in the workflows with more details.

Recommend to "Toggle Link visibility" to hide the cables.

Models can be downloaded here:

Wan 2.1. I2V: https://huggingface.co/city96/Wan2.1-I2V-14B-480P-gguf/tree/main

Clip (fp8): https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/text_encoders

Clip Vision: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/clip_vision

VAE: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/vae

Description

WAN 2.1 IMAGE to VIDEO with Caption and Postprocessing

Model Details

Available Files

Tags

Versions

Related Models

Model Information