Drawing on 3 new articles, an interview in the last 48 hours, and 4 papers, I'll argue that we should not let o1's 'ability to self-correct' go sailing past in the night.
Link for Offline V...
2024-10-03 12:42:49 +0000 UTC
View Post
Are 'Benchmarks All You Need'? And do we have any conceptual breakthroughs to go before text-based AGI? I bring in the latest OpenAI quotes and reflect deeply on what it all means.
Link for ...
2024-09-30 14:58:11 +0000 UTC
View Post
A new paper from the last few days has dropped, and it's a good one. LLMs can now be said to plan, and I have all the analysis as well as exclusive clips from my interview with the lead author. And...
2024-09-24 14:05:00 +0000 UTC
View Post
Less than 24 hours ago we got the claim that a multi-million dollar 'final test' for AI was being put together. But I ask questions about what it will achieve, drawing on evidence from 3 papers, Si...
2024-09-17 18:12:45 +0000 UTC
View Post
Surely now that companies like OpenAI, whose only goal is to create an AGI, are worth $100B+, we have a settled definition of 'AGI' itself? No? Or even a set of rival definitions, each of which are...
2024-09-09 16:00:08 +0000 UTC
View Post
A 20,000-word new report on AI scaling, and yes, I read it all to bring you the highlights. What are the biggest unanswered questions for whether we will scale models 10,000x and is there a deeper ...
2024-09-01 21:04:42 +0000 UTC
View Post
Full results from the first Simple Bench run (including latest model updates), the new website, more insight into the questions and what the gaping hole in basic reasoning means, plus my plans goin...
2024-08-19 14:44:44 +0000 UTC
View Post
The term 'bitter lesson' is thrown about a lot, but what does it actually mean? Does it leave humans irrelevant or is it about something deeper? Drawing on lessons from MuZero and the annotated ori...
2024-08-12 09:21:24 +0000 UTC
View Post
Mistral Large flops hard, but what exactly is this benchmark, what are some more of its questions, why is it different, and what is next? All, or at least some, of these questions will be answered....
2024-07-30 18:50:25 +0000 UTC
View Post
One of my favorite videos in the series! A new bonus series explaining some of the most controversial terms in artificial intelligence, this time covering the term 'emergent behaviors'. Decidi...
2024-07-15 15:31:45 +0000 UTC
View Post
‘Can any model do [insert task]?’ is a much harder question than it seems. I’m going to give you five vivid categories, with unambiguous examples, drawing on 6 new papers, of the kind of deta...
2024-07-04 18:17:18 +0000 UTC
View Post
There is a clear dividing line emerging at the height of OpenAI, and in AGI labs more broadly. This pod reflects on the 'reasoning' and 'scale' axes, including fascinating new comments from OpenAI ...
2024-06-23 17:35:01 +0000 UTC
View Post
A new bonus series (2/8 episodes) explaining some of the most controversial terms in artificial intelligence, this time covering the term 'open source'. In some quarters, it's the most controversia...
2024-06-19 12:52:18 +0000 UTC
View Post
Recently fired OpenAI researcher Leopold Aschenbrenner has produced an essay that will either confirm him to be absolutely crazy, a target of an OpenAI lawsuit, or bizarrely prophetic. I went throu...
2024-06-07 19:40:24 +0000 UTC
View Post
A new bonus series (8 episodes) explaining some of the most controversial terms in artificial intelligence, starting with an OG term for LLMs, as 'stochastic parrots'. Find out where the term came ...
2024-06-03 13:02:00 +0000 UTC
View Post
This video won’t just show you the problem with a range of the most popular benchmarks (though it will do that, from MMLU-Pro to GPQA, GSM8K, LMSYS and more). It will show you a useable path forw...
2024-05-20 14:25:21 +0000 UTC
View Post
Exclusive: The second, eye-opening instalment of AI Insiders the tutorial series on Prompt Injections - Donato Capitella on what the threat is, how it is changing, and what you can do about it, at ...
2024-05-17 12:56:20 +0000 UTC
View Post
Let's take a moment to reflect on the import of GPT 4o and the cascading social ramifications of development and after development. Then, I investigate an interesting OpenAI tweet, talk aboutforthc...
2024-05-15 20:47:35 +0000 UTC
View Post
I believe the model that will end up being popularly known as GPT-5 has finished training. That comes not just from the analysis in my 2024-04-28 13:38:44 +0000 UTC
View Post
Two recent papers (DeepMind + Anthropic tag-team) and a failed $10k bet have reminded people not to underestimate what models can learn from the data you give them in the prompt. Let me show you ho...
2024-04-23 14:56:12 +0000 UTC
View Post
I have always wanted to have a web demo of SmartGPT, to show anyone how powerful basic prompting scaffolds can be. But I wanted it to be even more interesting than what I showed last year, so the i...
2024-04-12 15:09:48 +0000 UTC
View Post
Highlights from the interview with Aravind Srinivas, co-founder and CEO of Perplexity. Plus the news today not only of the first hints of instant search from OpenAI but of Google epochal shift to a...
2024-04-04 20:12:42 +0000 UTC
View Post
Yesterday’s dramatic Bloomberg headlines showcased an ‘AI Jobs Apocalypse’, warnings of ‘millions of jobs lost in next 3-4 years’, triggered by a new 44-page paper from London. I intervie...
2024-03-28 23:25:52 +0000 UTC
View Post
The only theme for this episode is unpredictability, from the swirling new rumours of GPT-5 release dates from Business Insider, to the challenges of promoting interviews that don't happen, behind-...
2024-03-24 19:09:25 +0000 UTC
View Post
I don’t often do personal updates, I just sprinkle them in, on the off chance anyone wants a bit more behind-the-scenes. Two things come to mind to mention today: the repercussions of not being <...
2024-03-15 13:42:37 +0000 UTC
View Post
What are just the most interesting details from the Musk-Altman Lawsuit? Can Gemini 1.5 help me sort through the morass of relevant tweets? I want to give you the history of the battle over the def...
2024-03-03 21:58:16 +0000 UTC
View Post
This month has seen the launch of a Discord channel that I am very excited about. We have hundreds of incredible people on here at the bleeding edge of implementing and understanding AI, and so nat...
2024-02-25 18:55:29 +0000 UTC
View Post
Everything you missed in the world of AI threats because of Sora and Gemini. From Compute Overhang @Sama, to a laudable Bioweapon study from OpenAI, and from state-actors using GPT-4 to the future ...
2024-02-22 15:38:51 +0000 UTC
View Post
Take a 14 minute tour with me of the cutting edge of deepfakes, from speech-to-speech to politics, YouTube and business. We'll discuss the upsides, including with a senior figure at Elevenlabs - an...
2024-02-15 17:20:08 +0000 UTC
View Post
As always, first dibs on questions for my interview guests goes to you guys. And I am lucky enough to able to have Aravind Srinivas, Perplexity founder and CEO, formerly of OpenAI,...
2024-02-08 11:56:25 +0000 UTC
View Post