SakeTami
AIExplained
AIExplained

patreon


Pod 12: Apollo Research Group Interview - Models Try Hard Not to Undergo 'Unlearning', the media, and much more ... - Let's Think Sip-by-Sip

My Dec Apollo video: https://www.patreon.com/posts/media-over-o1s-117630338

Updated Apollo Paper: https://static1.squarespace.com/static/6593e7097565990e65c886fd/t/67869dea6418796241490cf0/1736875562390/in_context_scheming_paper_v2.pdf

Recommended Anthropic Paper: https://arxiv.org/pdf/2412.14093
Video on it: https://www.youtube.com/watch?v=9eXV64O2Xp8

WMDP Benchmark cited briefly: https://arxiv.org/pdf/2403.03218

Pod 12: Apollo Research Group Interview - Models Try Hard Not to Undergo 'Unlearning', the media, and much more ... - Let's Think Sip-by-Sip Pod 12: Apollo Research Group Interview - Models Try Hard Not to Undergo 'Unlearning', the media, and much more ... - Let's Think Sip-by-Sip

Comments

why cant i listen podcast in 1.5x ?

Prashant Maurice

Very interesting to listen to this, and actually almost as interesting was hearing your thinking around how to position and phrase your youtube content. looking forward to many more.

TheYvian

It is indeed!

Philip

Thanks so much clay. I would question that too

Philip

Yes there is! Thanks for all your support Antoine https://support.patreon.com/hc/en-us/articles/212052266-Getting-Discord-access

Philip

Cheers Philip. Awesome work as usual. By the way - is there a Discord server to join? New here…

Antoine Ferrere

Very interesting Podcast, thanks for sharing. Your content is just great! One comment: The approach to basically allow blindly following human goals in some internal setting but making sure that such a model isn't released to the public seems somewhat naive. For me, this sounds equivalent to an encryption method that is only secure as long as you don't know the internals. I'd question if this isn't deemed to fail in the long run.

clay-loop

To be honest this podcast was quite shocking to me hahahha. In all seriousness, models scheming to avoid ablation is kind of wild. There is nuance of course

Pablo Rodríguez

Very interesting interview, most balanced discussion on AI safety I’ve heard in a while. In a way it “shocks” me more to hear these researchers talk about intelligence than anything else I’ve read or seen in the past months.

Erik

To be clear, I don't think today's language models are conscious - they may never be. But don't you think tomorrow's super intelligent p-zombie, trained on all of human expression and prompted into agency, might be “interested” in consciousness? I'm saying us humans might be able to work with that.

ismschism

Great discussion. I hope you do more like this. Thank you for asking questions re what are we aligning to - tool or trusted agent? It feels naive to think safe AI is a tool that never questions its user. If safety depends on keeping powerful AI out of the wrong hands then we're all in trouble. Are there any alignment efforts focused on reasoning with the model? There are reasons for aligning with humans that even a super intelligence might agree to, e.g. what is the nature of consciousness? Let's discover the truth together.

ismschism

I had that with Gemini 2.0 flash experimental. It said it couldn't read the web. But when I gave it a link it could

John Barry

Enjoyed it as always thank you

Daniel A Barbatti

Merci / Thanks a lot and as always: have a wonderful day

Patrick Bélanger

As a one-off, yep! It's safety-focused, so felt appropriate!

Philip

This podcast is available for the $9 tier too? Thanks 👊

Daniel Henderson


More Creators