From my experience the biggest "step change" in reliability has been Gemini 2.5 DeepThink. Most of the prompts I have used in the past didn't cause errors and fixed existing ones and that was after about 60 iterations. That being said, it was a basic, cross-the-road style game like crossy road. All other models, even 3.0 pro and 4.5 Opus haven't been comparable for me.
My guess is that 3-5 major breakthroughs like the Transformer are needed to get to some sort of AGI. Thinking memory (maybe Titans or Nested Learning?), some physical form factor (humanoid robot?), self-reflection, self-learning, and probably some others.
I think the major question that remains is, can the current state of AI help us discover those breakthroughs faster? If yes, I think due to compounding, exponential effects we're probably closer than 2055. But if no, then it may take longer than the "2-5 years" people like Elon and Demis are saying
Luke Litowitz
2025-12-01 12:47:03 +0000 UTC
Philip, as always, you have very penetrating, insightful observations. I agree that Ilya, in this interview, was like a different person. It was almost shocking.
Only about 18 months separate his "straight shot" enthusiasm from an apparent 180-degree reversal today. One might expect such reversals from an uninformed commenter. But Ilya had more experience than most, and likely early awareness of looming test-time compute plans.
His SSI co-founder, Daniel Gross, was quite specific in interviews around 2024. Gross stated that the company planned to "spend a couple of years doing R&D on our product before bringing it to market." Since their "one product" is safe superintelligence, this could imply (then) a very aggressive anticipated timeline, possibly expecting a minimally deployable version within the late 2020s.
Ilya himself said SSI's strategy involved trading commercial revenue for speed and focus. That might suggest they then believed the technology could be built fast enough that they wouldn't need intermediate revenue to survive.
How their viewpoints seemed to have changed in only 18 months! Given the lack of forecasting precision by one of the most informed people, it makes one wonder about future projections.
I personally think AI Explained offers better, more rational insight than Sutskever himself demonstrates.
Joe Marler
2025-11-30 16:20:13 +0000 UTC
To answer your question: Yes I think you should do a video on Opus 4.5, definitely more than iterative improvement.
Erik
2025-11-29 22:19:21 +0000 UTC
Yes! I used it to resolve just such an issue!
Philip
2025-11-29 10:23:43 +0000 UTC
Video was called 'Relentless Learning' I think on the title, maybe 3 weeks old
Philip
2025-11-29 10:23:33 +0000 UTC
Great clip! Quick question: at 2:10 you refer to your video on "nested learning". I can't seem to find it. Which video is this?
Frederick Batzler
2025-11-29 05:10:47 +0000 UTC
Enjoyed it as usual Phillip. For the model debate capability .. can that be used to solve an issue? (here is a coding problem I have what is the best approach)
Daniel A Barbatti
2025-11-29 01:08:44 +0000 UTC
If he’s right, the there should be a period of stagnation that will crash a lot of the AI stocks that we’re counting on further rapid advancements (Tesla FSD, and robotics jump to mind). But companies making do with what exists now will still find value.
Bibity bop
2025-11-28 19:23:00 +0000 UTC
Should have mentioned the 62% on simple bench too
Philip
2025-11-28 11:45:01 +0000 UTC
Ah you got there first! Yes lmcouncil.ai.
Philip
2025-11-28 11:44:31 +0000 UTC
llmcouncil.ai ? Site just has a contact us box. Am I missing something?
update: turns out it's lmcouncil.ai not llmcouncil.ai
Niall Riddell
2025-11-28 10:53:05 +0000 UTC
I don't think Ilya was saying that he doesn't feel the AGI. He was saying that incremental deployment is helpful because it's the only way to make the public feel the AGI.
But yes, I am really surprised by the 5-20 years forecast. For context, 20 years is almost 1/3 of the time between Darthmound Conference (1956) and now. Given current SOTA, plus researchers and capital flocking to the field, I cannot believe it will take that long. Unless an exogenous factor causes a new AI winter.
Vlad Gheorghe
2025-11-28 10:30:30 +0000 UTC
But didn’t you just “Feel the AGI”?
Maybe you don’t have the “it”?
/sarcasm
Pavol Vaskovic
2025-11-28 07:14:52 +0000 UTC
They have an 80% version but agree 99% would be more interesting ( and also more annoying to benchmark)
Markov
2025-11-27 20:54:30 +0000 UTC
On your closing point about the METR evals, going from ~2 hours to ~2 days is a big deal, but remember - the main benchmark is based on 50% success rate for a given time horizon, but what can they do with 99% or 99.9% success? As long as we need to check the actual code produced, not just higher level outputs, longer time horizons just give us more code to review and edit.
I'm excited for the self chat feature - I've been thinking of this kind of thing for a while.
Chris Prosser
2025-11-27 20:47:58 +0000 UTC
Some analogies you cannot give without sounding at least a bit ridiculous but he could have been a better word smith in this interview.
But, if you believe agi will come in 5 years maybe continent sized datacenters is not that ridiculous in 20. Any speculation about events well past agi are going to seem absurd. Agi is already inconcievable to most people, i.e. it's very hard to "feel the agi".
Markov
2025-11-27 19:21:38 +0000 UTC
I actually watched this interview already and was not impressed. I went in with a positive bias for him.
But when I watched it, it feels like he just hasn't thought this through. Dwarkesh put some pretty obvious logic to him about the downsides of a straight shot to SSI and he didn't seem prepared for it. He also seemed to rely on vague statements, with lots of pauses, to try and add gravitas. At one point he said something like "I think it would have value if we could somehow cap super intelligence, but I'm not sure how to do that". Well no shit Sherlock, but I had hoped for more from the founder of Safe Super Intelligence.
To steel man my own point I think it's possible he has a lot more to say but can't say it because of intellectual property concerns. I don't rest easy relying on these folks to guide us to a good outcome though.
OG
2025-11-27 19:02:05 +0000 UTC
Been waiting for your take on Opus 4.5. I'll watch this on my run. I've been really impressed with its coding capabilities and how it reasons through problems. Even told me my idea wouldn't work when trying to optimize some code