AIExplained posts

o1 can 'self-correct'. That's kinda significant.

Drawing on 3 new articles, an interview in the last 48 hours, and 4 papers, I'll argue that we should not let o1's 'ability to self-correct' go sailing past in the night.

Link for Offline V...

2024-10-03 12:42:49 +0000 UTC View Post

Pod 8: Do we have a straight shot to AGI? 'Don't teach, incentivize' - Let's Think Sip by Sip

Are 'Benchmarks All You Need'? And do we have any conceptual breakthroughs to go before text-based AGI? I bring in the latest OpenAI quotes and reflect deeply on what it all means.

Link for ...

2024-09-30 14:58:11 +0000 UTC View Post

Is o1 No Longer a LLM? LeCun + New 'LRM' paper explained (+ exclusive interview clips)

A new paper from the last few days has dropped, and it's a good one. LLMs can now be said to plan, and I have all the analysis as well as exclusive clips from my interview with the lead author. And...

2024-09-24 14:05:00 +0000 UTC View Post

'Humanity's Last Exam' - I Doubt It

Less than 24 hours ago we got the claim that a multi-million dollar 'final test' for AI was being put together. But I ask questions about what it will achieve, drawing on evidence from 3 papers, Si...

2024-09-17 18:12:45 +0000 UTC View Post

The Struggle to Define 'AGI' - Controversial Terms in AI, Explained

Surely now that companies like OpenAI, whose only goal is to create an AGI, are worth $100B+, we have a settled definition of 'AGI' itself? No? Or even a set of rival definitions, each of which are...

2024-09-09 16:00:08 +0000 UTC View Post

10,000x Scaling Deep Dive, and a 5-year LLM Roadmap

A 20,000-word new report on AI scaling, and yes, I read it all to bring you the highlights. What are the biggest unanswered questions for whether we will scale models 10,000x and is there a deeper ...

2024-09-01 21:04:42 +0000 UTC View Post

Simple Bench Exclusive Tour: I couldn’t find a good reasoning benchmark, so I made one.

Full results from the first Simple Bench run (including latest model updates), the new website, more insight into the questions and what the gaping hole in basic reasoning means, plus my plans goin...

2024-08-19 14:44:44 +0000 UTC View Post

'The Bitter Lesson' - Controversial Terms in AI, Explained - New Series

The term 'bitter lesson' is thrown about a lot, but what does it actually mean? Does it leave humans irrelevant or is it about something deeper? Drawing on lessons from MuZero and the annotated ori...

2024-08-12 09:21:24 +0000 UTC View Post

Pod 7: The Story Behind SIMPLE Bench, More Results, and Next Steps - Let's Think Sip by Sip

Mistral Large flops hard, but what exactly is this benchmark, what are some more of its questions, why is it different, and what is next? All, or at least some, of these questions will be answered....

2024-07-30 18:50:25 +0000 UTC View Post

'Emergent Behaviors' - Controversial Terms in AI, Explained - New Series

One of my favorite videos in the series! A new bonus series explaining some of the most controversial terms in artificial intelligence, this time covering the term 'emergent behaviors'. Decidi...

2024-07-15 15:31:45 +0000 UTC View Post

Can ChatGPT Do Task X? It’s Surprisingly Hard to Answer

‘Can any model do [insert task]?’ is a much harder question than it seems. I’m going to give you five vivid categories, with unambiguous examples, drawing on 6 new papers, of the kind of deta...

2024-07-04 18:17:18 +0000 UTC View Post

Pod 6: No One Agrees @ OpenAI if GPT-4o is 'a smart highschooler' + My Take on Murati, Altman and Sutskever - Let's Think Sip by Sip

There is a clear dividing line emerging at the height of OpenAI, and in AGI labs more broadly. This pod reflects on the 'reasoning' and 'scale' axes, including fascinating new comments from OpenAI ...

2024-06-23 17:35:01 +0000 UTC View Post

'Open Source' - Controversial Terms in AI, Explained - New Series

A new bonus series (2/8 episodes) explaining some of the most controversial terms in artificial intelligence, this time covering the term 'open source'. In some quarters, it's the most controversia...

2024-06-19 12:52:18 +0000 UTC View Post

Fired OpenAI researcher - 'OpenAI Planned to Sell AGI to China' and 'It's Coming by 2027' - Full Analysis of 165 page Doc

Recently fired OpenAI researcher Leopold Aschenbrenner has produced an essay that will either confirm him to be absolutely crazy, a target of an OpenAI lawsuit, or bizarrely prophetic. I went throu...

2024-06-07 19:40:24 +0000 UTC View Post

'Stochastic Parrot' - Controversial Terms in AI, Explained - New Series

A new bonus series (8 episodes) explaining some of the most controversial terms in artificial intelligence, starting with an OG term for LLMs, as 'stochastic parrots'. Find out where the term came ...

2024-06-03 13:02:00 +0000 UTC View Post

New Benchmark Madness, But Hope on the Horizon

This video won’t just show you the problem with a range of the most popular benchmarks (though it will do that, from MMLU-Pro to GPQA, GSM8K, LMSYS and more). It will show you a useable path forw...

2024-05-20 14:25:21 +0000 UTC View Post

Prompt Injections in the AI Agent Era - Donato Capitella

Exclusive: The second, eye-opening instalment of AI Insiders the tutorial series on Prompt Injections - Donato Capitella on what the threat is, how it is changing, and what you can do about it, at ...

2024-05-17 12:56:20 +0000 UTC View Post

Pod 5: GPT 4o Reflections, Cryptic OpenAI Tweet, When to Declare AGI, and New Guests - Let's Think Sip by Sip

Let's take a moment to reflect on the import of GPT 4o and the cascading social ramifications of development and after development. Then, I investigate an interesting OpenAI tweet, talk aboutforthc...

2024-05-15 20:47:35 +0000 UTC View Post

Reflections on Sam Altman’s Recent Expectation-Setting on GPT-5

I believe the model that will end up being popularly known as GPT-5 has finished training. That comes not just from the analysis in my 2024-04-28 13:38:44 +0000 UTC View Post

Many-Shot Magic: 2 New Papers + 1 Failed Bet Show What Can Be Done with LLMs

Two recent papers (DeepMind + Anthropic tag-team) and a failed $10k bet have reminded people not to underestimate what models can learn from the data you give them in the prompt. Let me show you ho...

2024-04-23 14:56:12 +0000 UTC View Post

SmartGPT Website Demo and Community Project

I have always wanted to have a web demo of SmartGPT, to show anyone how powerful basic prompting scaffolds can be. But I wanted it to be even more interesting than what I showed last year, so the i...

2024-04-12 15:09:48 +0000 UTC View Post

Perplexity CEO on the Future of Search, and Why He's Not Scared of OpenAI or Google

Highlights from the interview with Aravind Srinivas, co-founder and CEO of Perplexity. Plus the news today not only of the first hints of instant search from OpenAI but of Google epochal shift to a...

2024-04-04 20:12:42 +0000 UTC View Post

AI Jobs Warning: 36 Hours Later, Author Interviewed, Paper Analysed in Full, and Why I am Still Somewhat Optimistic

Yesterday’s dramatic Bloomberg headlines showcased an ‘AI Jobs Apocalypse’, warnings of ‘millions of jobs lost in next 3-4 years’, triggered by a new 44-page paper from London. I intervie...

2024-03-28 23:25:52 +0000 UTC View Post

Pod 4: Unpredictability: AI, Content Creation, Timelines and Vernor Vinge - Let's Think Sip by Sip

The only theme for this episode is unpredictability, from the swirling new rumours of GPT-5 release dates from Business Insider, to the challenges of promoting interviews that don't happen, behind-...

2024-03-24 19:09:25 +0000 UTC View Post

A Note on Not Being Shocking, and Making Connections

I don’t often do personal updates, I just sprinkle them in, on the off chance anyone wants a bit more behind-the-scenes. Two things come to mind to mention today: the repercussions of not being <...

2024-03-15 13:42:37 +0000 UTC View Post

The AGI Lawsuit

What are just the most interesting details from the Musk-Altman Lawsuit? Can Gemini 1.5 help me sort through the morass of relevant tweets? I want to give you the history of the battle over the def...

2024-03-03 21:58:16 +0000 UTC View Post

AI Professional Tips and Networking

This month has seen the launch of a Discord channel that I am very excited about. We have hundreds of incredible people on here at the bleeding edge of implementing and understanding AI, and so nat...

2024-02-25 18:55:29 +0000 UTC View Post

$7 Trillion, a Bioweapon and a Nuke In Space - Under-the-Radar AI Safety Papers

Everything you missed in the world of AI threats because of Sora and Gemini. From Compute Overhang @Sama, to a laudable Bioweapon study from OpenAI, and from state-actors using GPT-4 to the future ...

2024-02-22 15:38:51 +0000 UTC View Post

Deepfakes - The Peril and Potential

Take a 14 minute tour with me of the cutting edge of deepfakes, from speech-to-speech to politics, YouTube and business. We'll discuss the upsides, including with a senior figure at Elevenlabs - an...

2024-02-15 17:20:08 +0000 UTC View Post

Perplexity CEO - Any questions?

As always, first dibs on questions for my interview guests goes to you guys. And I am lucky enough to able to have Aravind Srinivas, Perplexity founder and CEO, formerly of OpenAI,...

2024-02-08 11:56:25 +0000 UTC View Post