This is a basic workflow for use in conjunction with WAN video, Sage Attention, and the new teacache support. Basic tests with 30 samples brought my generation speed down from 17 minutes to around 6. This is a massive upgrade in speed. A few people have had trouble installing it, so I thought I would post my workflow here to help out.
Note - this is the Kijai version. Models used are different than the Comfy supported ones. Click here for a link to Kijai's walkthrough with huggingface links and example workflows
I am aware that Sage Attention is a pain to set up, and I do not yet have a guide here to do so. I can offer advice though - make sure that your Nvidia SDK, Torch version, Python Version, and sage attention version are lined up and compatible. This is where I had most of my issues. Start with Sage Attention, check compatibilities, and work backwards from there to get it.
I did dig around on Youtube and found a good tutorial on how to get it running (the speaker is not me). Keep in mind again that your versioning really matters - when I set it up, I had to downgrade my NVidia SDK version so everything was compatible.
My test with this setup on a GTX4090+64 GB RAM (30 steps)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [05:56<00:00, 11.89s/it]
My test using SDPA instead:
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [17:27<00:00, 34.91s/it]
6 minutes, versus 17.5 minutes. Almost 3x faster isn't bad.
Go ahead and upload yours!
Your query returned no results – please try removing some filters or trying a different term.