SakeTami
AIExplained
AIExplained

patreon


Fired OpenAI researcher - 'OpenAI Planned to Sell AGI to China' and 'It's Coming by 2027' - Full Analysis of 165 page Doc

Recently fired OpenAI researcher Leopold Aschenbrenner has produced an essay that will either confirm him to be absolutely crazy, a target of an OpenAI lawsuit, or bizarrely prophetic. I went through all 165 pages, plus his recent 4.5 hr interview (and other less recent material) to bring you just the highlights. More deeply, I give my framework about whether I think scaling is enough. Join me on a journey into the murky, controversial depths of AGI speculation.

Link for Download and Off-line Viewing: https://drive.google.com/file/d/10MYpeAPpkmCVLFlBNHCOwxVpw9KfqViq/view?usp=sharing

Situational Awareness: https://situational-awareness.ai/wp-content/uploads/2024/06/situationalawareness.pdf

Interview: https://www.youtube.com/watch?v=zdbVtZIn9IM

AGI Investment Fund: https://www.forourposterity.com/

Nvidia Antitrust: https://www.nytimes.com/2024/06/05/technology/nvidia-microsoft-openai-antitrust-doj-ftc.html

Schulman Interview: https://www.youtube.com/watch?v=Wo95ob_s_NI

Sholto Douglas 'All': https://x.com/_sholtodouglas/status/1798052154709852198

But Some Don't See It That Way: https://x.com/NielsRogge/status/1798276170330567126

Why He Was Fired: https://www.theinformation.com/briefings/ex-openai-researcher-leopold-aschenbrenner-starts-agi-focused-investment-firm?rc=sy0ihq

Anthropic Government Link: https://time.com/6980000/anthropic/

Old Aschenbrenner Interview '20% growth is crazy': https://www.youtube.com/watch?v=LT--MRXr4HE

MATH Benchmark: https://arxiv.org/pdf/2103.03874
Counterfactuals: https://arxiv.org/pdf/2307.02477
Functional Benchmarks: https://arxiv.org/pdf/2402.19450v1

Fired OpenAI researcher - 'OpenAI Planned to Sell AGI to China' and 'It's Coming by 2027' - Full Analysis of 165 page Doc

Comments

Great analysis, the guy is too young and his thoughts are too deterministic….it takes time to see many of your own predictions to fail to get a better sense of how wide variety of scenarios are actually possible.

Ján Kmeťko

People here might find this Github useful, a huge amount of links related to AGI and they have just released a paper: https://github.com/ulab-uiuc/agi-survey

Kol Tregaskes

I feel like I need to put time into watching Leopold's podcast and reading his document but alas I don't have that time outside of work. So thank you for this summary, Philip. I've watched a few summarises on this so it really is getting a lot of coverage. His new investment firm should do well from this. ;-) For sure, I don't see why LLMs don't use tools more. Plug in Wolfram Alpha for mathematics, plug in AlphaFold etc etc. I don't know why all skills need to solely be in the AI. Leave language models to, well, language. Concentrate on fixing that then use other tools - tools we have spent years and decades building - for skills that LLMs are not good at. I (perhaps over-optimistically) predicted in an earlier comment that I feel AGI will be here by 2028. I stand by this prediction (ask me tomorrow and I'll probably have changed my mind. ;-)), more because I rather be over-optimistic and early rather than under-optimistic and late. I would still not expect to see any real-world impact for some time after this though. I'm not even expecting AGI to be released instantly. AGI will be a hugely powerful tool. Sure, OpenAI, Elon, Meta, Anthropic etc. will achieve AGI this decade (plausible) but I can't see it just being launched to the world. No way are governments going to let that out of the bag so quickly. Why would they give us normal people that much power? We would immediately plug into the stock market, make a gazillion dollars and crash the system. Why would rich people like poor people get rich too? Nah. And yes exponential curves are not only never smooth, they don't go on and on forever. Again, as a crypto follower, I'm well aware of this. ;-) And LOL, "invest in curtains". Curtains are great security guards. ;-) Do we think his document is a promotion for his new investment firm? Does it have to sound bullish to get investors? ;-) Forgive me if I've missed it but a pod or video on the scaling laws, and analysis thereof, would be interesting. There are clearly many elements to AGI; scaling laws of hardware and compute is just one of them. NVIDIA will continue to build better and better cards but we don't know if that will equate to better and better AI.

Kol Tregaskes

Stockfish can beat anyone with an asterisk. It needs time to search. If forced to play in a hyper time limited game, it can't search very deep without a lot of compute. That said, compute has gotten a lot better and humans haven't gotten a "compute upgrade" so.. Maybe now it can win the shortest bullet games.

Ryan Friedrich

Where can I learn more about "test-time compute overhang"? Does someone have a good summary in the context of AdaptiveML or AutoML?

Blake Chambers

user: Hey ChatGPT, could you please perform topic modeling on this publication? assistant: [Lists two topics] user: Could you please use the code interpreter and sklearn library? assistant: [Writes import statements] topic1 = ["... topic2 = ["... [Writes function to display topic1 and topic2] [executes code] user: [opens new chat] Hey ChatGPT, could you please perform topic modeling on this publication, making sure to use the code interpreter for this task and sklearn's implementation of LDA? --- I can't wait for more tools 🙃

Blake Chambers

As of now, how is chronology considered when preparing data for training? Is it reasonable to assume that today's techniques only evaluate chronological congruity and consequentially adjust the training process? This would explain why ChatGPT kept giving me the wrong Typescript/Webpack configurations (like 2015 nodejs-hype time period sprinkled into what is boilerplate today). Or is there a better explanation?

Blake Chambers

I said I could beat GPT-4! And that Stockfish could beat it easily too. No one can beat Stockfish!

Philip

22:20 - Wait... you can beat stockfish?! As of June 2024, Stockfish has an estimated Elo rating of 3600. That would make you a chess grandmaster. Am I missing something or are you really that good at chess?

Joshua Davis

I just found this news this morning: ex NSA head joins OpenAI. So probably US government is considering how to control this AI sturtups. https://www.perplexity.ai/page/Ex-NSA-Head-_pkQNXf2QHelrm8HTRJt7A

Arek Stryjski

I believe that a significant advancement in AI could come from a shift in architecture. Transformers, while powerful, often feel too granular and brute-force in their approach. I’m intrigued by the idea of using sparse autoencoders to identify features, which seems more computationally efficient and somewhat reminiscent of human-like processing. Imagine feeding these features back into the model, making it aware of its own reasoning. This could lead to a more efficient architecture. Using the same compute resources we have today, this approach might accelerate us along a steeper exponential curve in AI development.

Pablo Rodríguez

When you said you're going into his background it sounded as if you were just going through his career, and my literal thought was "I'd also like to know, whether there is a money scheme that cold potentially profit from hype". Also: I really wonder about the phrase "feel the AGI". This seems incredibly anchored in the culture of the super alignment team around Illia Stuzkova. And I get weird semi-religious group dynamics vibes from it. I'd love to know more about where it comes from and what they actually mean by it.

Jörg Weiß

@Mike: Yeah, at some point it will flatten. My personal guess is that first four more datasets will get ingested: 1. How to use apps 2. What do humans do at work 3. Math 4. Robots The first point means: we will be able to record "Semantic Macros", which are essentially a screencast+audio, and there we demo how to do something in an app. The same you would do with me when you show me your new app. You would show me how to use it. This will teach the models to operate all kinds of apps and to complex semantic sequences of steps. Second point is that hundreds of millions of people use the AI from 1. to help in their work, thus implicitly giving detailled explanations of what people do at their work place. Every day potentially a hundred million hours of new material could get collected (but let’s not forget data protection laws/settings). After just days the models will have seen/recorded every relevant scenario, and this will become the new training set. I guess that from 2035 on the air is getting very thin for humans to keep their jobs. Math proofs can get auto-generated. Such a proof may consists of one to, dunno, a hundred transformations of an original symbol sequence. Adding more parameters to a model may result in emergent properties, such as planning and having a scratch board, while signals move through the network. For robots we can auto-generate data in simulations and just let the bots do stuff. OpenAI can put experts into a body suit and let them do their work, such as cooking, ironing, repairing a car motor, dental operations, open-heart surgery, and so on. All those data sets will be coming in the next 5-10 years, along with a likely massive growth in compute.

André Thieme

I don't have a source for that being what Aschenbrenner thinks, I just think that is the most natural reading of what he wrote. I think this is the case because: - He mentions training for multiple epochs in the previous paragraph as a way to make some progress, but says that results in quick diminishing returns - In contrast to multi-epoch training, he then talks about how it could be possible to make models much more sample-efficient/able to get more information from a small amount of data - This is why he brings up the textbook example - through things like re-reading, internal monologues, trying out problems, etc. humans can learn more from data than just reading through it once or multiple times. AIs may be able to do the same. Aschenbrenner analogizes reading quickly through a dense text book once or multiple times to standard self-supervised learning with one or multiple epochs, and active reading to other more sample-efficient algorithms you can use to make an AI learn Broadly speaking, I think Aschenbrenner is envisioning an active process where models are given data, they reason about that data, extract what relevant information they can learn from the data, and then somehow update the model with that extracted information. This is what he is talking about in the "missing middle" footnote.

k

I love the idea to train LLM only from all texts from 1800s! That would be a dope time traveling experiment. How smart would we find a model that hasn’t memorized Wikipedia?

Pavol Vaskovic

Character voices with ChatGPT-4o voice https://youtu.be/4w0Pqs3CuWk

Pavol Vaskovic

Agreed Anouar!

Philip

Noted!

Philip

Interesting, got a source for that k?

Philip

Thanks Markus, and I agree. Tacit data, as mentioned in a previous comment, is crucial and underrated for that final 3%. Finding that balance between 'AGI next year' and 'AI is all hype', requires actually reading many of the most relevant/insightful papers, or least keeping some degree of perspective.

Philip

Agreed Steve

Philip

Terence Tao just backed us up Sean! Tacit knowledge is absolutely key, and so underrated: https://www.scientificamerican.com/article/ai-will-become-mathematicians-co-pilot/

Philip

Great question, Brian. This post from an OpenAI researcher - https://nonint.com/2024/06/03/general-intelligence-2024/ - which I read after making this video, sums it up quite well. I think that LLMs make embarrassing mistakes - and cannot be relied upon, in most domains - because of a lack of a System 2 set of verifiers. That's not the only problem though, plenty of tacit data was not in their training data set (which was a rather simplistic sweep of the de-duplicated Web for GPT-4), though this is being gradually remedied at multi-billion-dollar scale via ... Scale AI. Then of course, all these models were undertrained/under-scaled just through lack of money and time. Put all of this together, and I think an incredibly compute-heavy, 'overpriced', AGI-like system (using my understanding of that term) would be demonstrated in circa 2028. With no cost-effective embodiment, and relying on widespread corporate consent for the requisite data to perform job roles, it will be more a 'demo-AGI' than something that will transform the world overnight. However, as we enter the 2030s, I think such a system will be game-theory 'dominant', in many job roles, and for many positions, less a productivity-enhancer and more a replacement. And this is when there might be some wider consensus that AI is becoming transformative or, perhaps, worthy of the term AGI. GPT-5 should be one step in this direction, but 6-12 months from now is too early for all of the above to be in place, as best I can tell. Hope that helps explain my perspective!

Philip

Btw when I say agentic workflows don't work yet I'm referring to the results from aider.chat https://aider.chat/2024/06/02/main-swe-bench.html

GGuy

Great analysis! I definitely agree that the essay is strong overall, but it gets a bit lost in the assumptions.

Anouar Mansour

Thanks for the analysis, Philip! Also great to see a longer video from you, no need to always constrain yourself to 20 min 🙂

Mike

Maybe the next generation (e.g. GPT-5) shows that we were able to scale, but I wonder if it will actually quell the scaling skeptics. Won't the argument be that it's still a sigmoid, and now we've *really* reached the point where progress starts to flatten? In a sense, I am sympathetic to that argument. Progress will flatten at some point, the question is if it's at the GPT-5 or GPT-8 generation, and that could make a massive difference in whether we actually get AGI (much less superintelligence).

Mike

I don't think he's talking about epochs at 17:30. Instead, he's talking about using different training methods that allow models to learn substantially more from data than standard self-supervised learning does, similar to humans.

k

If it wasn't for the pip install issue this would already work 🤔

GGuy

"Hey ChatGPT, run some code that executes 3 moves for a game chess with only pawns using the stockfish engine."

GGuy

Agentic workflows don't work, in part, because breaking up a hard problem into 4 high school level problems doesn't statistically increase the chances of success since it's not 4x more unlikely to get each of the subtasks wrong. It feels like GPT-5 or GPT-6 is highly likely to solve that problem. At that point can everything be solved agentically?

GGuy

I think he worked on Illya’s alignment and safety issues team and it’s possible that being an engineer is not necessary to make a contribution to those issues. For example, the whole question of where to build these power plants to power the server farms being talked about now is one of his central points in this paper and that is not an engineering question but a geopolitical one.

Swoquix

I have been thinking, now that the uber-hype is slowing a bit, that the path to true AGI (meaning autonomous & creative AI) is longer than many in the AI bubble anticipate. It's just a hunch, but I believe the last 2-3% are incredibly hard to achieve. Sure, AI will change our world and reshape certain parts of the economy dramatically. But we won't be creating digital gods anytime soon, if ever. If it were easy, I am sure somebody else would have done it somewhere already, and we would be able to observe the effects from afar. But here it is again: The Fermi Paradox. Or, all this has been done already, and all we are is a simulation which doesn't bother to waste resources on aliens. ;) In any scenario, I am looking forward to whatever the future may bring. Thanks for your hard work Phillip.

Markus Heinsohn

It’ll also be interesting to see how much embodiment or other forms of learning on the fly can help with this. It is one thing to try to record our tacit knowledge and feed that into the training data. It is another if the model can experiment and discover some of that tacit knowledge on its own.

Shawn Fumo

To me, the more interesting question isn’t if it can beat Stockfish (like you, I doubt it can beat something so specialized on such a specialized task), but if it can do decently well at a large range of different kinds of games. And how competitive against humans it is at those many games?

Shawn Fumo

André, I’m pretty sure it isn’t actually the current system under the hood at least in terms of: speech to text, text to text, and then text to speech. There’s still the practical consideration of the software wrapping around the model, so probably something separate is listening for audio and cutting off the current response versus it constantly getting audio input at the same time it outputs (and waiting for some amount of silence before replying). Still, I wouldn’t be surprised if it could choose not to respond if you said to wait for a certain phrase or something. I also wouldn’t be surprised if it can change speed or voices in the middle of a response. As you said, hopefully we’ll see soon! I think the image output might be a sleeper hit. They haven’t emphasized it at all, but it seems like it is way more cohesive over time than anything we’ve seen before.

Shawn Fumo

As I was going through the paper I thought it was weird that an engineer deeply involved in building OpenAI's models could have such misconceptions about the value of benchmarks and the limitations of the underlying tech (or the capabilities, where it shines Vs where it doesn't). Then I figured out he's not an engineer, more of a background in business and economics, he indeed did not appear to have made any contributions to building and training models, neither there's any research output from him. His time at OpenAI was short, and his contribution to the field is not clear.

Donato Capitella

It was dedicated to Ilya which makes me wonder...what the heck DID Ilya see? He obviously had a big impact on Aschenbrenner. And my overall feeling towards this paper reminds me of a favorite quote, often attributed to Mark Twain: "It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so." Nobody knows how this will play out and the most probable scenario is that it will be different than we expect. In fairness, Aschenbrenner does acknowledge this in the final section "What if we're right?" but it's like his rational side quickly gets devoured by his emotional side after doing so. Philip, this is a great review and the turnaround time breathtakingly fast. I upvoted the request for this on the "Content Suggestions" channel on Discord but I didn't think you would pull it off in 24 hours. Well done, sir.

Steve DeMoss

Whether GPT-6 can build a better chess engine than Stockfish just depends on how smart GPT-6 is in 2026 (which includes whether or not the "test-time compute overhang" is solved). No one knows, yet.

Brian Crabtree

Philip - thank you for a fantastic critique. Please do not stop going on about the central importance of tacit knowledge to human general intelligence. And how LLMs inability to be trained on tacit knowledge puts into question their ability to achieve AGI and ASI anywhere near the timelines suggested by Aschenbrenner. Originally introduced by Michael Polanyi in 1966, he described tacit knowledge as “we know more than we can tell”. Ikujiro Nonaka helped popularise tacit knowledge in a 1991 HBR article. He described how the Matsushita Electric Company’s efforts to invent a bread-making machine had failed repeatedly because prototypes could not knead the dough correctly, resulting in uneven cooking. The company even tried comparing X-rays of machine-kneaded dough with the dough made by professional bakers, without success. It was only when Matsushita’s lead software developer spent time observing and interacting – tacit learning – an expert bread-maker, that she gained the tacit knowledge of how to knead correctly. She then was able to articulate the right product specifications. This is just one of millions of instances in which humans know more then we can tell. LLMs are trained on explicit data and, as increasingly smart as they are, without tacit knowledge *combined* with the ability to translate that into explicit knowledge, their understanding of the real world will remain limited for some time.

Sean Gallagher

You say, “My current working assumption is that we get a demonstration of remote worker-esque abilities by 2028.” If GPT-5 on the ChatGPT Mac app can pilot your mouse and keyboard while you watch the screen and talk with it back and forth, that seems quite “remote worker-esque” to me. Anything more than a single click would involve a chained set of actions requiring planning. While a human remote worker can manage tasks over weeks with low oversight, the GPT-5 ChatGPT Mac app might (initially) only work over seconds or minutes with high oversight. Though limited, I would call that “remote worker-esque,” and I expect it to be GPT-5’s headline grabbing, marquee feature in Nov/Dec 2024. Can you clarify what you mean by “remote worker-esque”?

Brian Crabtree

Run Stockfish with comparable compute against GPT-6 and Stockfish will dominate. The only way GPT can win is by building a better Chess engine, which it can’t do efficiently during a game, if at all.

Alexis Olson

You assert GPT-6 couldn't beat Stockfish without calling an external tool like Stockfish. Are you assuming they will not solve the "test-time compute overhang" by GPT-6? I would not bet against GPT-6 pondering each move for the equivalent of multiple months vs 2024 Stockfish. And if "pondering" still isn't enough, it could use those equivalent multiple months to build its own Stockfish from first principles (as opposed to calling a previously built, external tool). So even with the equivalent of months of pondering which may include spinning up customs tools, you still think it would lose against 2024 Stockfish?

Brian Crabtree

This is also my thought, I’ve anecdotally noticed a lot more “ai is all hype” mongering that seems to have developed in response to chatgpts influence on public opinion. I still have friends who have only touched 3.5. I think the push to freely release 4o is as much an attempt at “curtain revealing” as it is PR work for OpenAI. The release of GPT-5 with significant improvements would certainly put much of the negative claims to rest, and I agree with Philip that the next 1-2 years of progress are already baked in at this point.

Will Bogusz

Incredibly well-measured response and critiques. Thanks for remaining the most consistently high quality voice I can find on these topics. Not sure if it was planned but this dropped just 2 days after I mentioned it in the discord, love the turnaround time, whether intentional or not 😁

Will Bogusz

Thank you, Philip, for creating real value. I had the 165-page document open and was on page 40, with the 4.5-hour Dwarkesh interview open in the next tab, when I saw your analysis. Watching your take on it has shortened my research by a couple of hours. I share your view that there has to be a middle ground between Aschenbrenner and Gary Marcus. Unfortunately, that is very difficult to find, perhaps because it is hard for us to understand intelligence. Something that passes tests is easily perceived as intelligent. Something I am thinking a lot about these days is how likely it is that the improvement in raw intelligence could be a step function. What are your thoughts?

SteveHaupt

It will be interesting to discover if Omni with voice is actually any useful. It certainly sounds entertaining to demo this to a few people. But under the hood I guess it still is the current system with prompts. As soon as it recognizes a human voice while generating (and playing) an audio output, it hits the stop button and waits for a moment of silence before it replies. Also it may not be able to follow all instructions, but only some that have been programmed into it and are part of the training. But I for example would love to say: Computer, don’t interrupt me while I am talking. I may make very long breaks. When I’m done I’ll let you know and only then you start talking. Hopefully in a few week we’ll see. Can it output multiple voice within one prompt? Can it generate only parts of its output that has been sped up? In the “Count to 10 fast” demo the whole reply has been sped up, also that preamble where Omni said something like “Yes of course, I will now count to ten much faster”. It did not exclusively speed up the counting part.

André Thieme

It feels to me like this year may be somewhat pivotal in terms of seeing what changes and what doesn’t. There’s so many claiming that there is currently a plateau. Putting aside that I think 4o may be a bit more impressive than it seems (once we have full access to the voice and image capabilities), what is the actual next step that happens? If we do get a “GPT-5” from someone in terms of something that seems strikingly better than what we have now, that would really put me even more in the mindset of thinking big changes are afoot soon.

Shawn Fumo

Excellent job Phil. I started reading this document but only made slow progress. I really don’t want to read all this stuff. Good that you did it for me and summarized what I would have found interesting. Happy to be a member of this community.

André Thieme

You did not claim that he did get discredited because of his young age — and he did not get discredited because of his young age. Instead a thorough analysis has been presented, and it’s the real world that is discrediting to some extent. Let’s wait until GPT-5 and see if the trend goes more into the direction of Gary Marcus or Leo.

André Thieme

Could be, Lee. Thanks so much!

Philip

Hope I gave some backed-up technical reasons too!

Philip

Philip, I love your output. I think we, outside of the cutting edge AI labs current trained models (Like the one Sam A is already using daily according to his own words - if we can believe those! - the whale model from their Microsoft outing) need to consider that with the increased attention within much larger context windows, the maths compute may have increased. The Wolfram language plug-in definitely improved Math performance for GPT4 in its responses coherence and relevance. Memory is also being trialled in the wild, another step towards better agentic performance in the future. On the point of language being the best performance output for LLMs, I think that's true, but we are moving multiple-modal for input and output, and imagine how this improves AI solves. Just my 2ps worth, and I have much to learn, which I do with the Insiders videos and Discord community.

Lee FRASER

Tbf he is a kid, brain is not mature until about 25. Not surprising to hear not so well thought out musings from a 22 year old, especially given the kind of bubble he lives in. Edit: Have since discovered that this is somewhat of a "myth" but I feel like its somehow grounded in truth based on the experience I and my wife and friends have had, of just feeling quite different after that age - more grounded and open to being wrong. In any case, I am sure everything he said is not totally exaggerated, but also a lot clearly is not really well considered and he probably would have benefitted from reading it a few times before posting.

Daniel Henderson

sounds a bit salty. I am not a fan of Aschenbrenner but I do not like just discrediting him because of his young age.

Jörg Eitner

Ooh this sounds juicy..

Daniel Henderson

Oh this is gonna be spicy

David Shapiro


More Creators