Qwen Image Edit & Wan 2.2 - Create Coherent AI Video Scenes With This!
Added 2025-10-15 14:00:55 +0000 UTC
Tutorial Video : https://youtu.be/YQLq--X--HY
In this video, we explore a powerful AI storytelling pipeline that combines language models, text-to-image generation, and image-to-video workflows to create structured, multi-scene AI videos. Instead of relying on a single reference image or generating random clips, the creator demonstrates how to use Qwen 3 Max to generate a sequence of detailed text prompts—each describing a specific scene with subject, action, and environment—for a cohesive 30-second narrative. These prompts are then used to generate consistent character images via Flux Context, followed by turning each image into a 5-second video clip using WAN 2.2 MOE and Light X2V image-to-video LoRAs. The result is a cinematic-style AI video composed of six distinct but visually coherent scenes, complete with sound design. This method offers far more control than traditional long-form AI video generation, avoiding issues like prompt drift and visual inconsistency.
Who is This Content Suitable For?
This content is ideal for:
AI creators and digital storytellers looking to build narrative-driven AI videos with structure and continuity.
ComfyUI users who want to master advanced workflows involving multi-scene generation, Flux Context, and image-to-video LoRAs.
Content developers interested in AI filmmaking, short-form video creation, and automated animation pipelines.
Marketers, educators, or indie filmmakers seeking scalable ways to produce engaging video content using generative AI.
Anyone frustrated with inconsistent AI video outputs and looking for a reliable, production-level workflow.
Why Does This Matter?
Most AI video models struggle with long-term coherence, often breaking down after 10–15 seconds with random objects, shifting styles, or illogical transitions. This video presents a smarter alternative: treating AI video creation like real filmmaking—by planning scenes, maintaining character consistency, and editing clips together. By leveraging LLMs for script breakdowns, controlled image generation, and modular video synthesis, creators can produce high-quality, meaningful narratives instead of chaotic clips. This approach represents a shift from experimental AI demos to practical, repeatable content creation systems, making it easier to produce professional-grade AI videos for storytelling, marketing, or entertainment.
lovis93/next-scene-qwen-image-lora-2509
https://huggingface.co/lovis93/next-scene-qwen-image-lora-2509
lightx2v/Wan2.2-I2V-A14B-Moe-Distill-Lightx2v
https://huggingface.co/lightx2v/Wan2.2-I2V-A14B-Moe-Distill-Lightx2v
HunyuanVideo-Foley Custom Node:
https://github.com/phazei/ComfyUI-HunyuanVideo-Foley
HunyuanVideo-Foley Model Download:
https://huggingface.co/phazei/HunyuanVideo-Foley/tree/main
SRPO Lora
https://huggingface.co/Alissonerdx/flux.1-dev-SRPO-LoRas/tree/main
Attached 3 workflows that mentioned in this tutorial:
Comments
I'd like to thank you for this amazing workflow! This is jus something that I needed right now and this workflow saved me from lots of headache!This is just perfect! May I ask you another workflow, that could do these morphing videos? Same Idea, but WAN vace and WAN combined together. Like in this video: https://www.youtube.com/live/lPMhXfNne0E?si=9YyawfcWejxHIX0p, 37:19 --->
Minna
2025-10-16 15:56:27 +0000 UTCFantastic! Forgive me for the question, how many GB do the models weigh for the entire workflow?
Enzo Brand
2025-10-16 15:21:00 +0000 UTC