Innovate Futures @ Benji

Wan2.2 Reward LoRAs MPS & HPS TestLab workflow And More In-Depth.

Added 2025-09-10 13:00:23 +0000 UTC

Tutorial Video : https://youtu.be/2xpOCCTeSXo

About HPS And MPS

https://github.com/tgxs002/HPSv2

https://github.com/Kwai-Kolors/MPS

Alibaba-pai/Wan2.2-Fun-Reward-LoRAs

https://huggingface.co/alibaba-pai/Wan2.2-Fun-Reward-LoRAs

(Download Into your models/loras/ folder)

Here's a clear breakdown of your questions, focusing on technical distinctions and practical implications:

1. Advantage of Reward LoRA in WAN 2.2 Video Model

WAN 2.2 (a fine-tuned version of Stability AI's video diffusion model) uses Reward LoRA to align video generation with human preferences. Here's why it matters:

Key Advantages:

Quality Optimization:
Reward LoRA adjusts the model's gradients during training using a human preference reward signal (e.g., from HPSv2/MPS). This teaches WAN 2.2 to prioritize outputs humans rate as "high quality" (e.g., smoother motion, better coherence, fewer artifacts).
Fixes Common Video Issues:
Without reward guidance, video models often suffer from:
- Temporal flickering
- Inconsistent object tracking
- Unnatural motion Reward LoRA directly penalizes these flaws by learning from human-rated "good vs. bad" videos.
Efficiency:
LoRA (Low-Rank Adaptation) modifies only a small subset of model weights. This makes fine-tuning cheaper/faster than full-model retraining while preserving the base model's capabilities.
Customization:
You can swap reward signals (e.g., HPSv2 for aesthetics, MPS for prompt alignment) to steer WAN 2.2 toward your specific needs (e.g., cinematic vs. realistic videos).

How It Works:

A reward model (like HPSv2/MPS) scores generated videos.
Reward LoRA uses these scores to adjust WAN 2.2's training via Reinforcement Learning from Human Feedback (RLHF).
Result: WAN 2.2 learns to self-correct toward higher-reward outputs.

When training WAN 2.2 with Reward LoRA, you’d use HPSv2 or MPS as the reward signal:
- HPSv2: Simpler integration; gives one score to optimize for "overall quality".
- MPS: More nuanced; you could weight specific dimensions (e.g., prioritize "prompt match" over "aesthetics").

Attached the Wan 2.2 Test Lab Workflow that I ran in this tutorial : https://youtu.be/2xpOCCTeSXo