Veo 2 vs Sora ... then Veo o3?

Added 2025-01-03 18:30:12 +0000 UTC

Comments

Antoine Ferrere

2025-02-01 20:08:15 +0000 UTC

Sion Boguszewicz

2025-01-10 01:34:47 +0000 UTC

2025-01-06 17:57:23 +0000 UTC

2025-01-06 16:59:17 +0000 UTC

Martin Percy

2025-01-05 11:12:51 +0000 UTC

2025-01-05 10:31:03 +0000 UTC

2025-01-05 10:30:11 +0000 UTC

William Woof

2025-01-04 21:12:13 +0000 UTC

Martin Percy

2025-01-04 17:31:21 +0000 UTC

Grant Singleton

2025-01-04 15:56:25 +0000 UTC

Matthias Blank

2025-01-04 09:12:22 +0000 UTC

Matthias Blank

2025-01-04 09:11:39 +0000 UTC

2025-01-04 09:09:15 +0000 UTC

2025-01-04 09:08:56 +0000 UTC

2025-01-04 05:01:27 +0000 UTC

Anouar Mansour

2025-01-04 04:16:57 +0000 UTC

Daniel Henderson

2025-01-03 23:53:21 +0000 UTC

Lukas Bentkamp

2025-01-03 22:49:29 +0000 UTC

2025-01-03 22:44:37 +0000 UTC

Barnaby Golden

2025-01-03 22:43:17 +0000 UTC

Judy Hitchcock

2025-01-03 21:43:19 +0000 UTC

Daniel A Barbatti

2025-01-03 20:51:02 +0000 UTC

2025-01-03 20:48:52 +0000 UTC

2025-01-03 20:46:09 +0000 UTC

Matthias Blank

2025-01-03 20:42:56 +0000 UTC

2025-01-03 20:36:03 +0000 UTC

2025-01-03 19:38:33 +0000 UTC

John Merkowsky

2025-01-03 19:02:45 +0000 UTC

2025-01-03 18:55:02 +0000 UTC

Daniel A Barbatti

2025-01-03 18:52:30 +0000 UTC

2025-01-03 18:48:39 +0000 UTC

More Creators

TheTCFSPateron

Kagari_AI_art

codewithsadee

nathanaardvark

ruffythelion

Photosensualis

Armando Riojas Diaz

Momo from moportfolio

lingeriegirl

Brandon Twice

Strangely Indecisive

PRIME n CHOICE

Veo 2 vs Sora ... then Veo o3?

Comments

Amazing video!

Agreed. My point is this is a poor test or benchmark. The insight with o3 is that things with a deterministic and easy to measure benchmark will be easy to saturate.

I would have assumed that "how close are the predicted frames to the next N frames of the reference video" is how they are already training video models? Can't see how else you'd do it.

Thank you Martin! Tried these but it loses coherence very quickly. Example, the man never gets out of the carriage, in any of the four videos: https://drive.google.com/file/d/1HLmjrAPyxCXoodwYZWKBurj8xUjg8vIO/view?usp=sharing

Yes, in a way this is the ultimate question. Or, reframed, are there any circumstances where we don't have that signal, given enough work?

Tangential but I wish more effort was put into models that scored high on agentic behaviors. I wish Claude computer use would get an upgrade. I even wonder if reasoning models would be better at computer use since what button to click on has an objectively correct answer.

Ah. Of course. My mistake

Thanks. Looks a bit better than early Star Trek. Haha. 1 year and it will be great

Can't specifically do SpaceX rocket due to copyright it seems

https://drive.google.com/file/d/1TafSUMN60b2lY6KJJtIk4YSRCVvdh-gA/view?usp=sharing

Looking forward to this documentary!

Sora looks like a dream/nightmare, veo2 looks more stable. Very interesting about inference time verifier, would love to know what is really happening in the background. Looking forward to your documentary. Happy New Year.

Seems they are, but before that the RL made the model more likely to produce logically coherent answers. Otherwise it'd just be Cost.

Are you suggesting that the verifier model will include a physics model? Or that the verifier model will learn a physics model from video training?

Possibly, though if the data is clean enough, even a simple signal like 'did it replicate' could be enough to brute force tremendous progress. Even if many of the errors are realistic alternatives. Am sure much of the failed chains of thought for o3 contain brilliances.

this was the closest one: https://drive.google.com/file/d/1ZbygOOvaJr4ODQCGfqA_f0uxP0IkP2GL/view?usp=sharing

Yes to the first two, using high-quality data. And maybe insights from eclectic domains of data

In your opinion, is the best way to train a model to better answer open ended, non-verifiable questions like philosophical ones going to be scaling up parameter counts and pretraining data, vs this new paradigm of TTC? Or do you see a third way?

That first video looks very CGI to me. There is a disconnect for lighting and art style between the hippo and the renaissance background.

So the O series of models are bruteforcing solutions. Still impressive results though.

More Creators