SakeTami
AIExplained
AIExplained

patreon


o1 can 'self-correct'. That's kinda significant.

Drawing on 3 new articles, an interview in the last 48 hours, and 4 papers, I'll argue that we should not let o1's 'ability to self-correct' go sailing past in the night.

Link for Offline Viewing and Download: https://drive.google.com/file/d/1iULiA-t8yEniM5UkRGEHq99-ZB_7uqX8/view?usp=sharing

Noam Brown (and co) Interview: https://www.youtube.com/watch?v=jPluSXJpdrA

Nature piece: https://www.nature.com/articles/d41586-024-03169-9?utm_medium=Social&utm_campaign=nature&utm_source=Twitter&s=09#Echobox=1727860786-1

Reflexion Paper: https://arxiv.org/pdf/2303.11366

Forest of Jumbled Thoughts: https://x.com/rao2z/status/1733847480985661933

AlphaChip: https://www.nature.com/articles/s41586-024-08032-5

Old Altman: https://x.com/sama/status/1616857158100078592/photo/2

New Altman: https://fortune.com/2024/10/01/openai-sam-altman-mira-murati-gpt-4o-o1-chatgpt-turbulent-year/

Google Also Self-correcting: https://www.bloomberg.com/news/articles/2024-10-02/google-is-working-on-reasoning-ai-chasing-openai-s-efforts

o1 Learning to Reason: https://openai.com/index/learning-to-reason-with-llms/

React: https://arxiv.org/pdf/2210.03629

Dera: https://arxiv.org/pdf/2303.17071

o1 can 'self-correct'. That's kinda significant.

Comments

Is there any guidance or discussion yet on fine-tuning o1-like models? My use case is I want to say 100,000 expert answers to farming related questions (complete with to how the problem was solved in a step-by-step manner including function calling) and then fine-tune an o1-like model, much that the step-by-step process thinking is more domain relevant (and function aware). This need for o1-like fine tuning feels relevant for most industries for companies who believe they have good or proprietary data about a given process.

Sean Devs

Thank you 🙏 Put lots of love and care an AI into it.

Enrico Ros

@enrico love your GitHub project !!!

GOLDEN AMERICAN

This makes me think of a cache hierarchy of LLM calls. It feels like there's a lot of room to engineer here, and I'm curious what the limit of optimizing and deploying a system like this at scale could be.

AD Mohanraj

Well said

Daniel Henderson

Yes, commented on that in one of your Youtube video. I don’t need agi, i need something 100% reliable on simple tasks. NotebookLM seems pretty good at that (no hallucination, respect the prompt, does not stop halfway…) but I need to do more testing to be sure. Then it won’t replace people in my team but it could save a lot of time. Maybe we will get there through agentic workflow, with agents verifying the work of the others. Until then if I have to double check every output the productivity gain is limited.

Eddine Maiza

The 100% reliability thing is so under-appreciated. One things some humans have over o1 is that they can stick on a task until it's 99.9% certain to be correct (re-checking sources, re-running numbers, on the ground confirmation etc), and so even if they take much longer the final product is much more reliable. So even 98% accuracy arrived at cheaply, quickly and impressively at hard tasks still leaves o1 and a future NotebookLM 5, for example, short.

Philip

It does, many congrats Bob, keep us updated. And thank you.

Philip

Great points

Philip

Yes, stumbled. We could've probed the search space for 2 decades and nobody would've batted an eye. How phenomenal it is that we are finding these easily implementable multipliers to scaling compute so rapidly after one another should not be understated.

r

Consider that RLHF optimized for confidence in the answer, so it's possible that self correction was just wiped from the model in favor of "making the user happy" (as you point out, the model would change its mind if the user question something truthful). Additionally, by using 50% of the token for reasoning, the model can change its mind multiple times, as only the final answer is shown to the user.

Enrico Ros

The quality, thought and research are tremendous. One exceptional source outweighs endless decent ones. I mirrored this at work when sharing AI coding assistants. We already had several long mediocre videos of largely redundant use cases. Instead, I made concise polished videos on specific new use cases. Hundreds watched and I received surprisingly emphatic feedback from executives and engineers. Quality matters :)

Bob Rein

The new untapped scaling demention is pretty exciting, and underhyped imo (an AI rarity lol).

Bob Rein

That was definitely an interesting video. I am not sure what to make of it though. Does the model really learn or does it stay limited to the context of a specific chat. I have seen some great output with o1 (eg for brainstorming, coding) but also some really bad when testing for professional use cases (including summarization). Sonnet 3.5 is more reliable overall, and for some use cases I had the best results with Notebook LM. What google did is quite impressive (and I am not talking about the podcast aspect which is cool but not relevant to me). To help me be more productive I don’t need AGI or advanced reasonning, I just need a model that is 100% reliable on simple tasks, eg is not lazy, respects the prompt and does not hallucinate. Notebook LM is the closest to that when it comes to working on my own documents.

Eddine Maiza

Timing just before a major capital raise is probably not a coincidence. Whether there were concerns around quality or safety, and if these were justified, is hard to know. I guess it is the CEO’s job to make that final call.

Eddine Maiza

Stumbled? That seems unfair.

Jason Tangen

Your content is unmatched. Keep on keeping on!

Doodiligent

I guess we are slowly getting to see what Ilya famously saw in July 2023. At the time it seemed to scare him into alignment research so I wonder what the internal viewpoint is on just how far this can scale.

Steve DeMoss

Major milestone, no doubt!

Sean Betts

I believe that Noam Brown has been saying that the reinforcement learning is on model-generated chains of thought, and that the model itself serves as the verifier generating reinforcement signals. I may have gotten some detail or another wrong, but I'm sure that the system is not being trained only to reproduce exogenously generated chains of thought (though that may still be a preliminary step).

Tom English

In hindsight, none of this is surprising. Good to hear that OpenAI basically stumbled on the solution "Just let it think longer" as the answer.

David Shapiro

The OpenAI plot of test-time performance on math-competition questions as a function of "thinking" time, taken along with Noam Brown's indication that the results apply more generally, suggests that demand for computation will be increasing faster than previously expected. It's fairly clear that OpenAI would like to give relatively small specialists like o1-mini a chance to think through responses, and resort to use of the full-scale reasoning model only when they fail. This might lead to a large reduction in computational costs at test time. (Note that I'm not saying "inference time," inasmuch as I cannot tell you what is being inferred.) It is interesting that OpenAI has concurrently worked on distillation of custom models and prompt cacheing. Evidently, reduction of GPU usage by customers is a subgoal to be achieved in reaching the goal of AGI. It frees up a critical resource for use in research.

Tom English

I resonated with your comment on "unbounded verification" or the ability of the model to go off into the real world (or internet, or reality sim) and truly verify things it doesn't innately know. That for me is when the test time compute paradigm will become truly useful. I don't enjoy that today's model is still limited by what it already knows and no amount of chain or tree of thought will fix that.

John Merkowsky

This decade can't get more interesting. Thanks as always Philip for what you do.

Jonathan Kirk

My understanding is it rewards the model for successful chain of thought so it learns to make more intelligent chains of thought enabling it to solve harder problems. Basically the former of the two situations you presented.

Bill Ray

I see, I was uninformed as to this aspect. Also Newsom's explanation does seem a bit disingenuous. It's unfortunate that this bill was killed like this. I wonder how much difference this bill could have made? Do you think that there will be a consensus in the future that will see this veto as the moment where things went, "off the rails?" Curious to hear your thoughts on this as it has not been widely covered.

Joshua Davis

I echo this, definitely set apart from the AI Bro crowd. I hope you build the Patreon following you deserve Philip @AI_Explained

Lee FRASER

What's not clear to me is whether training on successful chain-of-thought examples teaches the models how to do chain-of-thought in general or if it just gives them templates for chain-of-thought.

Barnaby Golden

Thanks so much prefer!

Philip

There are definitely mixed opinions on this within OpenAI, agreed. See: https://scottaaronson.blog/?p=8367

Philip

"Altman pushed for the product to be released almost immediately." - I feel that statement needs a bit of context. Maybe he only wanted to release the product to those people which had invested in OpenAI. OpenAI have always had staged rollouts. Not everyone gets access to the latest features and updates all at once. Because we are lacking this context it's hard for me to really say if his perspective is absent any concerns for safety. The reality was that o1 was extensively red teamed and having listened to some of the interviews I feel that their efforts were very through. https://www.youtube.com/watch?v=98V8wVir2HI

Joshua Davis

Man I love your content! I am at a point now, where every other youtuber I follow feels like gramophones, parroting AI news, but you always go deeper and have insights no one else has. Thanks Philip!

prefer


More Creators