SakeTami
AIExplained
AIExplained

patreon


Next-level reasoning: The Good News and Bad News - 2 new papers analysed

Paper 1: Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? https://arxiv.org/pdf/2504.13837

Paper 2: Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification https://arxiv.org/pdf/2502.01839

Tweet: https://x.com/YangYue_THU/status/1914690345964855566

Noam Brown Interview: https://www.youtube.com/watch?v=c675KAlmo8k

[Download Link]: https://drive.google.com/file/d/1ierwwx3KiKuTl1X7Lt8oDQ1iHiAXyczQ/view?usp=sharing

Next-level reasoning: The Good News and Bad News - 2 new papers analysed

Comments

o3 is the corporate suit: rational, useful, close minded. RL (school) has killed his free spirit 😂 4o is the hippie: often wrong, open minded, and creative. I want 100 hippies brainstorming, with one corporate drone hearing them out and weighing the wild approaches he would never come up with.

Bob Rein

So we still haven’t figured out how to get novel knowledge out of LLMs

Grant Singleton

It's crazy how simple and obvious self-verifying seems in retrospect.

Eagleshadow

This reminds me heavily of program search techniques, which have been a focus of many attempts to work on ARC. In program search you assume that there is some space of programs to search, and you are looking for efficient ways to find one of the correct ones. Because of combinatoric explosion, it is great to prune the space, but any effort to do so may cause you to exclude the correct answer. When the problem is framed in this way, it's not such a surprising or bad thing that the 'reasoning' models only represent efficient sampling of the base model, it is a feature, not a bug.

Tristan Reid


More Creators