(Workflow) Wan2.1 V2V StyleTransfer img2txt2vid (Full) Ver. 20250323
Added 2025-03-23 13:26:46 +0000 UTC
Tutorial Video: https://youtu.be/aDzFbb-YDbI
Related Post : https://www.patreon.com/posts/124981437
Creating AI Influencer Videos with Wan 2.1 and ComfyUI: A Workflow Breakdown
In this blog post, I’ll walk you through my workflow in ComfyUI using Wan 2.1 and Control LoRA to create AI influencer videos. This workflow is particularly powerful for generating dynamic, high-quality videos with smooth motions and consistent styles. By leveraging Depth Map Control LoRA, we can transfer styles and motions from reference videos to create stunning AI-generated content. Let’s dive into how this works.
The Core Workflow: Depth Map Control LoRA
The workflow revolves around using Depth Map Control LoRA to guide the AI in generating videos with consistent motions and styles. Here’s how it works:
1. Depth Map Control LoRA
The Depth Map Control LoRA is a specialized model that uses depth maps to control the motion and style of AI-generated videos. It’s similar to ControlNet but tailored for video generation.
How it works:
The workflow starts by generating a Depth Map from the reference video using the Dev Anything V2 preprocessor. This Depth Map is a black-and-white image that captures the spatial depth of the scene.
The Depth Map is then passed into the Control LoRA Depth model, which uses it to guide the AI in generating videos with the same motions as the reference video.
The AI uses text prompts to restyle the video, applying new colors, outfits, and backgrounds while maintaining the original motion.
When to use it:
This technique is ideal for creating style transfers in videos. For example, if you have a reference video of a dancer, you can use the Depth Map Control LoRA to generate a new video with the same dance moves but in a completely different style (e.g., different clothing, hairstyles, or backgrounds).
2. Text-to-Video Model
The workflow uses Wan 2.1’s Text-to-Video model, which is lightweight and fast. The 1.3 billion-parameter model is particularly efficient, consuming less than 5 GB of VRAM and generating videos in under 2 minutes on high-end GPUs like the Nvidia 4090.
How it works:
The AI uses text prompts to restyle the video based on the Depth Map. For example, if you input a text prompt describing a "racing girl in a red outfit," the AI will generate a video with that style while maintaining the original motion from the Depth Map.
The InstructPix2Pix Conditioning node processes the Depth Map and text prompts to ensure the generated video matches the desired style.
When to use it:
This model is perfect for quick iterations and testing. If you need higher-quality results, you can switch to the 14 billion-parameter model, which offers more detailed outputs but requires more computational power.
Key Features of the Workflow
Depth Map Control:
The Depth Map acts as a motion guide, ensuring that the generated video retains the same movements as the reference video. This is particularly useful for dance videos or any content where motion consistency is crucial.
Style Transfer:
By using text prompts, you can completely change the style of the video (e.g., clothing, hairstyles, backgrounds) while keeping the motion intact. This is similar to IPAdapter but applied to video generation.
Smooth Motions:
Unlike AnimateDiff, which can sometimes produce flickering or inconsistent frames, this workflow ensures smooth and coherent motions thanks to the Diffusion Transformer model used in Wan 2.1.
Custom Samplers:
The workflow uses two custom samplers to refine the video generation process:
The first sampler adds noise to the latent data, acting as an upsampling and resampling method.
The second sampler processes the latent data further, refining the details and ensuring high-quality output.
Workflow Steps in ComfyUI
Here’s a step-by-step breakdown of how to set up and use this workflow in ComfyUI:
Load the Reference Video:
Start by loading the reference video into ComfyUI. This video will be used to generate the Depth Map.
Generate the Depth Map:
Use the Dev Anything V2 preprocessor to generate a Depth Map from the reference video. This black-and-white image will guide the AI in replicating the motion.
Load the Control LoRA Model:
Load the Control LoRA Depth model into ComfyUI. This model will use the Depth Map to control the motion of the generated video.
Set Up Text Prompts:
Input text prompts to define the style of the generated video (e.g., clothing, hairstyles, backgrounds). The AI will use these prompts to restyle the video while maintaining the original motion.
Run the Workflow:
Once everything is set up, run the workflow. The AI will process the Depth Map and text prompts to generate a new video with the desired style and motion.
Refine the Output:
If needed, you can refine the output by adjusting the text prompts or using additional tools like Skip Layer Guidance and Tile Control LoRA to enhance the video further.
When to Use This Workflow
AI Influencer Videos:
This workflow is perfect for creating AI influencer videos, such as dance routines, lifestyle content, or TikTok-style shorts. The ability to transfer styles while maintaining smooth motions makes it ideal for social media content.
Style Transfers:
If you have a reference video and want to create a new version with a different style (e.g., changing the outfit, background, or overall aesthetic), this workflow is a great choice.
Quick Iterations:
The lightweight 1.3 billion-parameter model allows for fast iterations, making it easy to test different styles and ideas without requiring extensive computational resources.

Conclusion
This workflow in ComfyUI using Wan 2.1 and Depth Map Control LoRA is a powerful tool for creating AI-generated videos with consistent motions and dynamic styles. Whether you’re creating AI influencer content or experimenting with style transfers, this workflow offers a flexible and efficient way to generate high-quality videos. By leveraging Depth Maps and text prompts, you can easily create stunning videos that stand out on social media platforms.
Feel free to experiment with the settings and share your results! If you have any questions or need further clarification, drop a comment below. Happy creating!
Comments
Hi, i want to use the 14B model, but it seems its not compatible with the control lora 1.3B. I always get a error in SamplerCustom. Is there a 14B lora? or do you have another solution? SamplerCustom The new shape must be larger than the original tensor in all dimensions.
Black Baron
2025-04-02 07:30:41 +0000 UTCAlso wondering. My output wasn't what I wanted. could one do i2v with this if we matched the first frame?
Kateri W
2025-03-29 18:45:23 +0000 UTCIs it possible to stack a character lora for character consistency. Haven't tried this one work of yours cause I'm a bit sleepy right now.
Zazoum
2025-03-25 14:18:12 +0000 UTC