METR Doubling-Times Star Joel Becker, on Developer Slowdown With AI & Amodei Automation Prediction
Added 2025-08-20 12:20:52 +0000 UTC
Highlights from an interview with one of the most productive AI researchers around, Joel Becker, who contributed to the famed METR doubling times analysis and the recent eye-opening study on developers being slowed down when using AI. Billions of life-decisions and trillions of investment ride on the exact contours of AI progress, so it was definitely worth a chat. With paper context, my analysis, and more.
Clips:
06:03 - Models Used
06:30 - Incremental Progress Hides Exponentials
07:20 - Will 100% of coding be automated in 12 months?
I simply don't see how this architecture can ever become AGI as long as models hallucinate and we have to fix bugs. Because the tasks that we're going to have them do are going to get increasingly complex and if they're still hallucinating or putting bugs in even 1% of the time, the enormity of time and effort into finding those things is going to outweigh any benefits.
Mike Hindes
2025-08-26 23:40:23 +0000 UTC
Dear Philip, it seems to me that it's clear now that the flattening of the improvement curve is real, and that maybe Gary Marcus and Yann LeCun are indeed right when it comes to the question whether today's Transformer based LLM architecture is getting us to AGI just by scaling even more, or no - and it seems not. Therefore it would be really great and it would be very helpful if you could do a video about possible ways out of this local maximum we are approaching. New architectures, new paradigms, interesting papers and experiments that have potential to get us to AGI? Exploring all ideas... Thanks!
Reinhold Gabloner
2025-08-23 09:26:42 +0000 UTC
Hi, thanks for this great video! I feel like not enough people talk about warp. I tried cursor, lovable etc but warp and claude code allowed me to create my first mobile app in flutter which is about to launch in the app and playstore. As someone who can‘t code at all this is a big achievement. I find it so remarkable that these agents can work for minutes at a time, correct their own mistakes until the test passes.
generousB
2025-08-23 07:46:18 +0000 UTC
Always happy to see an AI Explained video cutting through the discussion. Excellent work, Philip.
Christian Hendriksen
2025-08-21 19:58:11 +0000 UTC
Sometimes I am wondering whether there is a frontier where smarter models will become harder to work with/require more specific prompts/need more advanced context. Thus that due to it being more advanced we initially consider them less smart/useful, and "we" - the users - become more and more the limiting factor in using them (and thus leading to a diminishing return). Just because the model will know more/multiple interpretations of the same question, or understand more about the intricacies and nuances of the field you are familiar with by including or combining with knowledge from domains you are not familiar with leading to answers you initialy don't understand or accept.
Albert Jan van Hoek
2025-08-20 19:15:36 +0000 UTC
Thank you, I've been anticipating your thoughts on this topic! My YouTube feed has been inundated with takes on this study specifically and agentic productivity generally. Once again I feel like I have gained a deeper perspective that will help me cut through the noise!
Blake Chambers
2025-08-20 17:46:13 +0000 UTC
Pretty much in the same camp as Shawn above... I tend to use it for single functions, non-production utilities and maybe algorithms or regex I don't feel like figuring out at the time. Biggest issue is when it can't get you to 100% or you need to modify it later yourself.. you have to dig in and figure out everything it is done (gee I wouldn't have done it that way) and by the time you do that you probably could have written yourself LOL. Also as mentioned it can sometimes hallucinate and do something insane. A while back I was working on utility with a model.. it was having trouble debugging an issue and finally it just deleted that whole section of the code ! (thank God for backups) I have been writing code since 1974 so I kind have my own mad methods anyway;) All that being said it is wonderful for what I use it for and look forward to continued progress.
Daniel A Barbatti
2025-08-20 15:07:18 +0000 UTC
On the productivity end, as someone who can code, at least for my own purposes, but it's not a key part of my job role (as of now), I'm seeing it slowly start to dawn on non-coders that you can get the computer to do tedious things for you, and AI is here to help. So it's things like "Oh, I don't have to find 'someone who can code' in the office to have a computer check these five thousand spreadsheet entries?" Reminds me of a model of AI takeup I've seen that it'll be like how spreadsheets allowed everyone with a computer to do at least some data analysis and bookkeeping. I also had a non-coder ask me if this meant that coders on Fiverr will soon be able to do more for his business...seemed like a reasonable prediction but we'll see.
So this might raise the floor more than the ceiling, at least relatively speaking.
Alfred Wallace
2025-08-20 13:52:17 +0000 UTC
Great discussion and an interesting study! I think it's also perhaps worth discussing that the engineers who participated in the study were largely unfamiliar with Cursor and IIRC only one had more than a week of experience using it. I think that engineer actually did see a performance gain and was surprised that wasn't the conclusion at the end of the study for the others.
So while it's pretty clear that you can't just plug in AI and see instant productivity gains, I do think that there is a skill bar for using tools like Cursor more effectively and you might see a different task time improvement profile if there could be grouping by familiarity and experience with the tools in question.
Some good dicussion here: https://thezvi.substack.com/p/on-metrs-ai-coding-rct
Shane Mitchell
2025-08-20 13:48:19 +0000 UTC
Great advice to anyone
Philip
2025-08-20 13:43:00 +0000 UTC
Nope, the only forecast I gave once, two years ago, was of proto-AGI in 2028. The proto- being extremely expensive, slow, unsafe [continuous learning has huge safety risks] and not for public consumption. The AGI being, as discussed in other videos, better than the average human at most tasks, broadly construed. Still seems reasonable to me, but a more relevant description for transformative AGI is one that is quick enough to be easily useable, and widely available, for which 2030 seems a touch too soon.
Philip
2025-08-20 13:42:48 +0000 UTC
Prompt engineering (and context engineering) makes a HUGE difference in the results you get from using agents for coding. I wonder how much experience those developers had with using coding agents properly.
I use it constantly at work and both speed and quality has increased. I work on 4+ tasks at the same time with 4+ IDEs open, switching between them. I would definitely say my productivity has increased (though I guess I could be wrong, as shown by the study).
Etienne Beaulac
2025-08-20 13:31:34 +0000 UTC
When I use AI to code, I usually only ask it to write single functions or classes and show the model any relevant parts of my code base. I'm always afraid AI will introduce unintended changes if I rely on them too much. That way, I know exactly what I want the result to be and I can easily debug it if something is wrong.
Shawn Rosofsky
2025-08-20 13:28:49 +0000 UTC
Oh man. I used to be in camp “short timelines, fast takeoff.” For a while there, the scaling laws (pretraining, RL training, test time compute) seemed to point to more money, more AGI-like.
Now though, I see things differently. I think, and maybe this is just my hunch and not particularly proven, that we won’t easily buy our way to AGI. The first clue was seeing what deepseek did with a much smaller budget. The second clue was seeing just how minor of an improvement gpt4.5 was for so much more compute budget. The third clue was seeing hallucination rates basically hover in place in spite of CapEx going parabolic.
Long story short, I think we need a new paradigm breakthrough. I don’t know what it is, but I believe it has to basically eliminate hallucinations. I’d guess some sort of human brain matching architecture, something much more data efficient, and something that stores perfect representations of concepts instead of rough approximations of concepts. I don’t know the who, when, or how of this happening but imho we are in a sort of AI winter until then. By that I mean AI won’t be reliable and so it will remain generally not useful for important work.
On the bright side, with so many data centres being built out now, once we discover the paradigm we should have more than enough compute to train up AGI assuming the paradigm is at least as efficient as today’s transformers, and ideally much more efficient.
John Merkowsky
2025-08-20 13:18:27 +0000 UTC
Again the focus is on shoehorning AI into existing ways of working. I want to see how current models perform in an organisation that has built itself around AI coding. That is using TDD, specification by example and a clearly defined machine-readable system architecture.
Barnaby Golden
2025-08-20 13:10:33 +0000 UTC
Did you just update your own AGI timeline? I thought it was also around 2027? 😜 Or am I misremembering?