Blankage

Sora Review + Update

Added 2025-01-10 03:31:00 +0000 UTC

TLDR: The lip sync on The Right Match (TG/RC) is taking longer than expected but I hope to be done with it this weekend so I can move onto the sound effects. Before I launch into my thoughts on Sora, just a quick update on The Right Match. I’m a bit more delayed than I’d like to be for a couple reasons. First, this animation is 3x longer than my longest so far and I (for some reason) just didn’t account for that. Second, this is my first time using Runway Act One for facial animation and there’s been a bit of a learning curve. Finally, my real job has been crazy lately which has been sucking up extra time. The good news is that I’m about 2/3 of the way done and I’ll hopefully be able to post the animation (without sound effects) this weekend! I’m 147 hours in, and I just refuse to rush at this point, so thanks so much for your patience and support!

Now, on to Sora. Disclaimer: Sora is a new model so nobody knows how to use it yet. I have generated with it a lot, but these are still early observations.

I am uploading my experiments as I complete them here: https://drive.google.com/drive/folders/18-pBw4bHF7DFttPgHb9CLTFcN7xrACaR?usp=drive_link

To start off with, Sora is too censored to be usable for me at this point. They apply stricter filters for uploaded keyframes, so literally every keyframe I uploaded for my tests got my outputs blocked – including this keyframe, for example. From a quality and coherence perspective, Sora seems comparable with Kling and Runway. It’s arguable which video model is best, but Sora certainly doesn’t just blow the others away or anything.

Still, Sora is in a class of its own because of its functionality. First of all, it’s the only video model I know that accepts video as an input. You can upload 2 videos and Sora will interpolate between them in the same way it interpolates between keyframes. This is a huge deal because it eliminates any of the jerkiness that would typically result when transitioning between clips.

Second of all, Sora is the only video model I know of that allows you to use a timeline to precisely schedule as many keyframes or segments of footage as you want. It’s perhaps a bit less useful than it sounds because the model still has a mind of its own, and it won’t transition smoothly unless you schedule things properly. Still, there’s obviously a ton of things you can try, and it gives you a lot of flexibility.

Third of all, Sora has what it calls a “remix” feature which acts somewhat like image-to-image but for video. You can take an uploaded or generated video and tell Sora what to change about it. You can also set the strength of the change from 1 to 9. This feature is very useful for gradual transformations, because you can keep the strength fairly low and iteratively remix a clip to slowly move things in a certain direction. The downsides are that it is not very good at keeping facial features consistent, and the background will also morph.

Finally, Sora has a blend feature that takes two clips and mixes them at a weight from 0.00 to 1.00. You have control over the starting and ending weights of the two clips as well as the interpolation curve. Unfortunately, this feature behaved a bit unpredictably for me, but I probably just need to experiment with it a bit more.

I will conclude with two negatives. First, Sora doesn’t seem to be able to pair keyframes with text in the same way Kling or Runway do. With other video models, you can use a starting and/or ending keyframe, and use text to guide the transition. Sora doesn’t seem to be able to do this. Instead, it appears to treat your text as if it’s another keyframe. I had the best results with just simply using image inputs instead of text, but then you don’t have as much control over how the transition unfolds.

Second, Sora is very picky with which keyframes you use. If they are low-quality, it will simply transition to something that looks better at the first opportunity. It also added film grain to my keyframes a lot which I found a bit odd. As I said, these are minor negatives, but together, they really made me want to use text-to-video instead of image-to-video as I would with other models. Fortunately, Sora’s other features mean that using text-to-video is much more viable. I’d love to explore all the different workflows that Sora enables, but unfortunately, it’s too expensive and censored at the moment for me to continue using it.