AIExplained

AIExplained

METR Doubling-Times Star Joel Becker, on Developer Slowdown With AI & Amodei Automation Prediction

Added 2025-08-20 12:20:52 +0000 UTC

Highlights from an interview with one of the most productive AI researchers around, Joel Becker, who contributed to the famed METR doubling times analysis and the recent eye-opening study on developers being slowed down when using AI. Billions of life-decisions and trillions of investment ride on the exact contours of AI progress, so it was definitely worth a chat. With paper context, my analysis, and more.

Clips:

06:03 - Models Used

06:30 - Incremental Progress Hides Exponentials

07:20 - Will 100% of coding be automated in 12 months?

08.55 - What about agents/robotics?

09:24 - Timelines vs AI-2027

METR Developer Study: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

Paper: https://arxiv.org/pdf/2507.09089

METR Doubling Times: https://metr.org/blog/2025-07-14-how-does-time-horizon-vary-across-domains/#:~:text=METR%20found%20that%20the%20time,greatly%20between%20different%20task%20types.

Time Magazine: https://time.com/7302351/ai-software-coding-study/

AI 2027 Updated Timelines: https://ai-2027.com/research/timelines-forecast

NYT Essay: https://www.nytimes.com/2025/08/19/opinion/ai-job-loss-deindustrialization.html

Amodei Quote: https://officechai.com/ai/90-of-coding-could-be-done-by-ai-in-3-6-months-100-possible-in-a-year-anthropic-ceo/

Francois Chollet Quote: https://x.com/fchollet/status/1870169764762710376?lang=en-GB

Sam Altman Bubble: https://www.theverge.com/ai-artificial-intelligence/759965/sam-altman-openai-ai-bubble-interview

METR Doubling-Times Star Joel Becker, on Developer Slowdown With AI & Amodei Automation Prediction

Comments

I simply don't see how this architecture can ever become AGI as long as models hallucinate and we have to fix bugs. Because the tasks that we're going to have them do are going to get increasingly complex and if they're still hallucinating or putting bugs in even 1% of the time, the enormity of time and effort into finding those things is going to outweigh any benefits.

Mike Hindes

2025-08-26 23:40:23 +0000 UTC

Dear Philip, it seems to me that it's clear now that the flattening of the improvement curve is real, and that maybe Gary Marcus and Yann LeCun are indeed right when it comes to the question whether today's Transformer based LLM architecture is getting us to AGI just by scaling even more, or no - and it seems not. Therefore it would be really great and it would be very helpful if you could do a video about possible ways out of this local maximum we are approaching. New architectures, new paradigms, interesting papers and experiments that have potential to get us to AGI? Exploring all ideas... Thanks!

Reinhold Gabloner

2025-08-23 09:26:42 +0000 UTC

Hi, thanks for this great video! I feel like not enough people talk about warp. I tried cursor, lovable etc but warp and claude code allowed me to create my first mobile app in flutter which is about to launch in the app and playstore. As someone who can‘t code at all this is a big achievement. I find it so remarkable that these agents can work for minutes at a time, correct their own mistakes until the test passes.

generousB

2025-08-23 07:46:18 +0000 UTC

Always happy to see an AI Explained video cutting through the discussion. Excellent work, Philip.

Christian Hendriksen

2025-08-21 19:58:11 +0000 UTC

Sometimes I am wondering whether there is a frontier where smarter models will become harder to work with/require more specific prompts/need more advanced context. Thus that due to it being more advanced we initially consider them less smart/useful, and "we" - the users - become more and more the limiting factor in using them (and thus leading to a diminishing return). Just because the model will know more/multiple interpretations of the same question, or understand more about the intricacies and nuances of the field you are familiar with by including or combining with knowledge from domains you are not familiar with leading to answers you initialy don't understand or accept.

Albert Jan van Hoek

2025-08-20 19:15:36 +0000 UTC

Thank you, I've been anticipating your thoughts on this topic! My YouTube feed has been inundated with takes on this study specifically and agentic productivity generally. Once again I feel like I have gained a deeper perspective that will help me cut through the noise!

Blake Chambers

2025-08-20 17:46:13 +0000 UTC

Pretty much in the same camp as Shawn above... I tend to use it for single functions, non-production utilities and maybe algorithms or regex I don't feel like figuring out at the time. Biggest issue is when it can't get you to 100% or you need to modify it later yourself.. you have to dig in and figure out everything it is done (gee I wouldn't have done it that way) and by the time you do that you probably could have written yourself LOL. Also as mentioned it can sometimes hallucinate and do something insane. A while back I was working on utility with a model.. it was having trouble debugging an issue and finally it just deleted that whole section of the code ! (thank God for backups) I have been writing code since 1974 so I kind have my own mad methods anyway;) All that being said it is wonderful for what I use it for and look forward to continued progress.

Daniel A Barbatti

2025-08-20 15:07:18 +0000 UTC

On the productivity end, as someone who can code, at least for my own purposes, but it's not a key part of my job role (as of now), I'm seeing it slowly start to dawn on non-coders that you can get the computer to do tedious things for you, and AI is here to help. So it's things like "Oh, I don't have to find 'someone who can code' in the office to have a computer check these five thousand spreadsheet entries?" Reminds me of a model of AI takeup I've seen that it'll be like how spreadsheets allowed everyone with a computer to do at least some data analysis and bookkeeping. I also had a non-coder ask me if this meant that coders on Fiverr will soon be able to do more for his business...seemed like a reasonable prediction but we'll see. So this might raise the floor more than the ceiling, at least relatively speaking.

Alfred Wallace

2025-08-20 13:52:17 +0000 UTC

Great discussion and an interesting study! I think it's also perhaps worth discussing that the engineers who participated in the study were largely unfamiliar with Cursor and IIRC only one had more than a week of experience using it. I think that engineer actually did see a performance gain and was surprised that wasn't the conclusion at the end of the study for the others. So while it's pretty clear that you can't just plug in AI and see instant productivity gains, I do think that there is a skill bar for using tools like Cursor more effectively and you might see a different task time improvement profile if there could be grouping by familiarity and experience with the tools in question. Some good dicussion here: https://thezvi.substack.com/p/on-metrs-ai-coding-rct

Shane Mitchell

2025-08-20 13:48:19 +0000 UTC

Great advice to anyone

Philip

2025-08-20 13:43:00 +0000 UTC

Nope, the only forecast I gave once, two years ago, was of proto-AGI in 2028. The proto- being extremely expensive, slow, unsafe [continuous learning has huge safety risks] and not for public consumption. The AGI being, as discussed in other videos, better than the average human at most tasks, broadly construed. Still seems reasonable to me, but a more relevant description for transformative AGI is one that is quick enough to be easily useable, and widely available, for which 2030 seems a touch too soon.

Philip

2025-08-20 13:42:48 +0000 UTC

Prompt engineering (and context engineering) makes a HUGE difference in the results you get from using agents for coding. I wonder how much experience those developers had with using coding agents properly. I use it constantly at work and both speed and quality has increased. I work on 4+ tasks at the same time with 4+ IDEs open, switching between them. I would definitely say my productivity has increased (though I guess I could be wrong, as shown by the study).

Etienne Beaulac

2025-08-20 13:31:34 +0000 UTC

When I use AI to code, I usually only ask it to write single functions or classes and show the model any relevant parts of my code base. I'm always afraid AI will introduce unintended changes if I rely on them too much. That way, I know exactly what I want the result to be and I can easily debug it if something is wrong.

Shawn Rosofsky

2025-08-20 13:28:49 +0000 UTC

Oh man. I used to be in camp “short timelines, fast takeoff.” For a while there, the scaling laws (pretraining, RL training, test time compute) seemed to point to more money, more AGI-like. Now though, I see things differently. I think, and maybe this is just my hunch and not particularly proven, that we won’t easily buy our way to AGI. The first clue was seeing what deepseek did with a much smaller budget. The second clue was seeing just how minor of an improvement gpt4.5 was for so much more compute budget. The third clue was seeing hallucination rates basically hover in place in spite of CapEx going parabolic. Long story short, I think we need a new paradigm breakthrough. I don’t know what it is, but I believe it has to basically eliminate hallucinations. I’d guess some sort of human brain matching architecture, something much more data efficient, and something that stores perfect representations of concepts instead of rough approximations of concepts. I don’t know the who, when, or how of this happening but imho we are in a sort of AI winter until then. By that I mean AI won’t be reliable and so it will remain generally not useful for important work. On the bright side, with so many data centres being built out now, once we discover the paradigm we should have more than enough compute to train up AGI assuming the paradigm is at least as efficient as today’s transformers, and ideally much more efficient.

John Merkowsky

2025-08-20 13:18:27 +0000 UTC

Again the focus is on shoehorning AI into existing ways of working. I want to see how current models perform in an organisation that has built itself around AI coding. That is using TDD, specification by example and a clearly defined machine-readable system architecture.

Barnaby Golden

2025-08-20 13:10:33 +0000 UTC

Did you just update your own AGI timeline? I thought it was also around 2027? 😜 Or am I misremembering?

Phillip Yao-Lakaschus

2025-08-20 12:46:41 +0000 UTC

More Creators

ケノール

ケノール

fanbox

Dungeon Scribe

Dungeon Scribe

patreon

mochi。

mochi。

fantia

kpopdance

kpopdance

patreon

FoolishFrankie

FoolishFrankie

patreon

Zedrin

Zedrin

patreon

Rebecca Caruso❤️

Rebecca Caruso❤️

patreon

鶏兄

fanbox

Depmin

Depmin

gumroad

kogome_euphrasia

kogome_euphrasia

patreon

ShimmeringSword

ShimmeringSword

patreon

DioGio

DioGio

patreon

Slormo

Slormo

patreon

unsfrau

unsfrau

fanbox

Dmitrii Kolpakov

Dmitrii Kolpakov

gumroad

hexaltart

hexaltart

patreon

仮眠

fantia

izayoiYU

izayoiYU

fanbox

ドラチェフ

ドラチェフ

fantia

Armand & Rolande

Armand & Rolande

patreon

Living Sensical Press

Living Sensical Press

gumroad

MoonRoomOom

MoonRoomOom

patreon

kuroni

kuroni

fanbox

古明地フラン

古明地フラン

fanbox

WooooGumroad

WooooGumroad

gumroad

Yapi

Yapi

patreon

でかぼっくり🔞

でかぼっくり🔞

fanbox

Leonardo

Leonardo

gumroad

Jabobco

Jabobco

patreon

ManiacPaint

ManiacPaint

patreon

おちん ochin

おちん ochin

fanbox

Macromastia Art

Macromastia Art

patreon

夜々村

夜々村

fantia

Nyantcha

Nyantcha

fanbox

Clally#6969

Clally#6969

gumroad

Amir Odom

Amir Odom

patreon

Lord_eustache

Lord_eustache

patreon

TCG Match Making

TCG Match Making

patreon

casita's builds

casita's builds

patreon

Lucyla803

Lucyla803

patreon