AIExplained

AIExplained

Next-level reasoning: The Good News and Bad News - 2 new papers analysed

Added 2025-04-25 15:13:52 +0000 UTC

Paper 1: Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? https://arxiv.org/pdf/2504.13837

Paper 2: Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification https://arxiv.org/pdf/2502.01839

Tweet: https://x.com/YangYue_THU/status/1914690345964855566

Noam Brown Interview: https://www.youtube.com/watch?v=c675KAlmo8k

[Download Link]: https://drive.google.com/file/d/1ierwwx3KiKuTl1X7Lt8oDQ1iHiAXyczQ/view?usp=sharing

Next-level reasoning: The Good News and Bad News - 2 new papers analysed

Comments

o3 is the corporate suit: rational, useful, close minded. RL (school) has killed his free spirit 😂 4o is the hippie: often wrong, open minded, and creative. I want 100 hippies brainstorming, with one corporate drone hearing them out and weighing the wild approaches he would never come up with.

Bob Rein

2025-05-04 21:56:19 +0000 UTC

So we still haven’t figured out how to get novel knowledge out of LLMs

Grant Singleton

2025-05-03 16:28:13 +0000 UTC

It's crazy how simple and obvious self-verifying seems in retrospect.

Eagleshadow

2025-04-26 09:47:24 +0000 UTC

This reminds me heavily of program search techniques, which have been a focus of many attempts to work on ARC. In program search you assume that there is some space of programs to search, and you are looking for efficient ways to find one of the correct ones. Because of combinatoric explosion, it is great to prune the space, but any effort to do so may cause you to exclude the correct answer. When the problem is framed in this way, it's not such a surprising or bad thing that the 'reasoning' models only represent efficient sampling of the base model, it is a feature, not a bug.

Tristan Reid

2025-04-25 23:52:44 +0000 UTC

More Creators

Wild Time Vids

Wild Time Vids

patreon

ReneChiquete

ReneChiquete

patreon

conquest_comics

conquest_comics

patreon

midnight

midnight

patreon

hyungry

hyungry

patreon

SpaceTato

SpaceTato

gumroad

ENIGMA

ENIGMA

fanbox

monoblack

monoblack

patreon

doctorloops

doctorloops

patreon

vxserenade

vxserenade

patreon

ChrisV09

ChrisV09

patreon

かものめ

かものめ

fantia

Darkbox Silent Horror

Darkbox Silent Horror

gumroad

snibbits

snibbits

patreon

Kazuki-chan

Kazuki-chan

patreon

街灯幽/灯灯

街灯幽/灯灯

fanbox

Decim

Decim

gumroad

JCB2077

JCB2077

patreon

Noxe Fx

Noxe Fx

gumroad

allanrich

allanrich

gumroad

Antiroo

Antiroo

patreon

RaptorRoseWriting

RaptorRoseWriting

patreon

Catherynne M. Valente

Catherynne M. Valente

patreon

dimeeq

dimeeq

gumroad

Randy Meeks

Randy Meeks

patreon

bfuckr

bfuckr

fanbox

plic_explicit

plic_explicit

patreon

みんちり＠みんちりえ運営

みんちり＠みんちりえ運営

fanbox

ぽこぽこ

ぽこぽこ

fanbox

Riolu Kidd

Riolu Kidd

gumroad

Robin the Paw Princess

Robin the Paw Princess

patreon

Slim Thick🍑

Slim Thick🍑

patreon

matsu000011

matsu000011

fanbox

binibon

binibon

patreon

KennySalt's Bank

KennySalt's Bank

patreon

The Unlucky Tug

The Unlucky Tug

patreon

shadowboy32

shadowboy32

patreon

mioki

mioki

fanbox

negaSKA

negaSKA

fanbox

XABBX

XABBX

patreon