Wan2.2 Reward LoRAs MPS & HPS TestLab workflow And More In-Depth.
Added 2025-09-10 13:00:23 +0000 UTC
Tutorial Video : https://youtu.be/2xpOCCTeSXo
Related Post: https://www.patreon.com/posts/138580645
About HPS And MPS
https://github.com/tgxs002/HPSv2
https://github.com/Kwai-Kolors/MPS
Alibaba-pai/Wan2.2-Fun-Reward-LoRAs
https://huggingface.co/alibaba-pai/Wan2.2-Fun-Reward-LoRAs
(Download Into your models/loras/ folder)
Here's a clear breakdown of your questions, focusing on technical distinctions and practical implications:
1. Advantage of Reward LoRA in WAN 2.2 Video Model
WAN 2.2 (a fine-tuned version of Stability AI's video diffusion model) uses Reward LoRA to align video generation with human preferences. Here's why it matters:
Key Advantages:
Quality Optimization:
Reward LoRA adjusts the model's gradients during training using a human preference reward signal (e.g., from HPSv2/MPS). This teaches WAN 2.2 to prioritize outputs humans rate as "high quality" (e.g., smoother motion, better coherence, fewer artifacts).Fixes Common Video Issues:
Without reward guidance, video models often suffer from:Temporal flickering
Inconsistent object tracking
Unnatural motion Reward LoRA directly penalizes these flaws by learning from human-rated "good vs. bad" videos.
Efficiency:
LoRA (Low-Rank Adaptation) modifies only a small subset of model weights. This makes fine-tuning cheaper/faster than full-model retraining while preserving the base model's capabilities.Customization:
You can swap reward signals (e.g., HPSv2 for aesthetics, MPS for prompt alignment) to steer WAN 2.2 toward your specific needs (e.g., cinematic vs. realistic videos).
How It Works:
A reward model (like HPSv2/MPS) scores generated videos.
Reward LoRA uses these scores to adjust WAN 2.2's training via Reinforcement Learning from Human Feedback (RLHF).
Result: WAN 2.2 learns to self-correct toward higher-reward outputs.


When training WAN 2.2 with Reward LoRA, you’d use HPSv2 or MPS as the reward signal:
HPSv2: Simpler integration; gives one score to optimize for "overall quality".
MPS: More nuanced; you could weight specific dimensions (e.g., prioritize "prompt match" over "aesthetics").
Attached the Wan 2.2 Test Lab Workflow that I ran in this tutorial : https://youtu.be/2xpOCCTeSXo