SakeTami
Innovate Futures @ Benji
Innovate Futures @ Benji

patreon


For Patreon Supporters - Wan Fun Control -ImageRestyle V2V (Version 20250328) (Workflow)



Video: https://youtu.be/YiVkevHuXIU

Related Post: https://www.patreon.com/posts/125384420

In this post, I’ll walk you through my ComfyUI workflow for generating AI-enhanced videos using Alibaba’s WAN 2.1 Fun Control models. This powerful tool allows you to restyle existing videos while preserving motion consistency—perfect for creating influencer content, music videos, or AI-driven animations.

Workflow Overview

The WAN 2.1 Video-to-Video (V2V) workflow leverages ControlNet-guided diffusion to:
Restyle videos (e.g., change outfits, backgrounds, or entire aesthetics).
Preserve motion from reference videos (e.g., dance moves, gestures).
Maintain consistency across frames (no flickering or distortions).

Unlike traditional methods (e.g., AnimateDiff), WAN 2.1 uses a diffusion transformer architecture, ensuring smoother outputs with coherent styles.

Step-by-Step Workflow

1. Setup & Model Installation






2. Key Components



A. Reference Video & ControlNet


DW Pose (for skeletal motion tracking).

Line Art (for outline consistency).

Depth Maps (for spatial depth).



B. Style Transfer with Flux

Text prompts (e.g., "hip-hop dancer in a blue jacket").

LoRA models (for character consistency, if needed).



C. WAN 2.1 Fun Control Processing

Clip Vision Encode: Embeds the style reference.

K Sampler: Generates frames (steps: 20, CFG: 7.5).

Skip Layer Guidance: Enhances details (blocks 9-10).


D. Refinement (Optional)



How It Works




Consistent character designs (thanks to LoRAs).

Stable backgrounds (no flickering).


Example Use Cases

Input: A dancer in casual clothes.

Output: Same moves, but with a cyberpunk outfit and neon-lit backdrop.


Input: A stock video of a person talking.

Output: The same speech delivered by a custom AI avatar.


Input: A rough storyboard animation.

Output: A polished, stylized final render.


Optimization Tips





Conclusion

The WAN 2.1 Video-to-Video workflow in ComfyUI is a breakthrough for AI video generation. By combining ControlNet-guided motion with diffusion-based style transfer, it outperforms older tools like AnimateDiff in consistency and ease of use.

Ready to try it?

Workflow updated(2025-03-29):

Flux group, I update the input resolution times 2 for Empty Latent, because it works better for Flux Union Pro ControlNet to work with image above 1024px.

Therefore , the output image, I did a resize back to 832X480px or 480x832px for Wan first image frame.


Attached the Wan 2.1 Fun Control Video2video workflow Version 20250328

Comments

Tuple is a data type in Python. Are you using Python 3.10 or above?

Benjamin Law

I am getting this error. unsupported operand type(s) for /: 'tuple' and 'tuple'

Mayur Jha

Then I paid for the patreon membership, even if the free one worked it supports you, you put work into it. That one is complex for a beginner, and offers no way to load an image to style the first frame. I don't get it, the beginner workflow does nothing useful for me, I load my first frame image and it does nothing with it, it just outputs a video of the wireframe with a black background, the same way it looks in dwpose.

Steven Haffley

I am confused. All I want to do is what you did with the man with demon wings. I download your starter workflow, and after 4 days of trying to figure out why it doesn't work with a 5000 series card (you need triton nightly), I finally got everything to work. I generates at the end the wire frame where i see the animation moving, but it does absolutely nothing with the image load for the first frame style. It outputs no video other than the wire frame.

Steven Haffley


More Creators