AIExplained

AIExplained

Pod 12: Apollo Research Group Interview - Models Try Hard Not to Undergo 'Unlearning', the media, and much more ... - Let's Think Sip-by-Sip

Added 2025-01-22 18:35:36 +0000 UTC

My Dec Apollo video: https://www.patreon.com/posts/media-over-o1s-117630338

Updated Apollo Paper: https://static1.squarespace.com/static/6593e7097565990e65c886fd/t/67869dea6418796241490cf0/1736875562390/in_context_scheming_paper_v2.pdf

Recommended Anthropic Paper: https://arxiv.org/pdf/2412.14093
Video on it: https://www.youtube.com/watch?v=9eXV64O2Xp8

WMDP Benchmark cited briefly: https://arxiv.org/pdf/2403.03218

Pod 12: Apollo Research Group Interview - Models Try Hard Not to Undergo 'Unlearning', the media, and much more ... - Let's Think Sip-by-Sip

Pod 12: Apollo Research Group Interview - Models Try Hard Not to Undergo 'Unlearning', the media, and much more ... - Let's Think Sip-by-Sip

Comments

why cant i listen podcast in 1.5x ?

Prashant Maurice

2025-02-15 08:16:56 +0000 UTC

Very interesting to listen to this, and actually almost as interesting was hearing your thinking around how to position and phrase your youtube content. looking forward to many more.

TheYvian

2025-02-14 23:49:17 +0000 UTC

It is indeed!

Philip

2025-02-02 16:06:07 +0000 UTC

Thanks so much clay. I would question that too

Philip

2025-02-02 16:05:58 +0000 UTC

Yes there is! Thanks for all your support Antoine https://support.patreon.com/hc/en-us/articles/212052266-Getting-Discord-access

Philip

2025-02-02 16:05:40 +0000 UTC

Cheers Philip. Awesome work as usual. By the way - is there a Discord server to join? New here…

Antoine Ferrere

2025-02-01 19:53:58 +0000 UTC

Very interesting Podcast, thanks for sharing. Your content is just great! One comment: The approach to basically allow blindly following human goals in some internal setting but making sure that such a model isn't released to the public seems somewhat naive. For me, this sounds equivalent to an encryption method that is only secure as long as you don't know the internals. I'd question if this isn't deemed to fail in the long run.

clay-loop

2025-02-01 18:32:21 +0000 UTC

To be honest this podcast was quite shocking to me hahahha. In all seriousness, models scheming to avoid ablation is kind of wild. There is nuance of course

Pablo Rodríguez

2025-01-26 22:26:37 +0000 UTC

Very interesting interview, most balanced discussion on AI safety I’ve heard in a while. In a way it “shocks” me more to hear these researchers talk about intelligence than anything else I’ve read or seen in the past months.

Erik

2025-01-23 21:43:40 +0000 UTC

To be clear, I don't think today's language models are conscious - they may never be. But don't you think tomorrow's super intelligent p-zombie, trained on all of human expression and prompted into agency, might be “interested” in consciousness? I'm saying us humans might be able to work with that.

ismschism

2025-01-23 15:43:27 +0000 UTC

Great discussion. I hope you do more like this. Thank you for asking questions re what are we aligning to - tool or trusted agent? It feels naive to think safe AI is a tool that never questions its user. If safety depends on keeping powerful AI out of the wrong hands then we're all in trouble. Are there any alignment efforts focused on reasoning with the model? There are reasons for aligning with humans that even a super intelligence might agree to, e.g. what is the nature of consciousness? Let's discover the truth together.

ismschism

2025-01-23 01:02:20 +0000 UTC

I had that with Gemini 2.0 flash experimental. It said it couldn't read the web. But when I gave it a link it could

John Barry

2025-01-22 21:33:23 +0000 UTC

Enjoyed it as always thank you

Daniel A Barbatti

2025-01-22 21:05:55 +0000 UTC

Merci / Thanks a lot and as always: have a wonderful day

Patrick Bélanger

2025-01-22 19:07:30 +0000 UTC

As a one-off, yep! It's safety-focused, so felt appropriate!

Philip

2025-01-22 19:05:25 +0000 UTC

This podcast is available for the $9 tier too? Thanks 👊

Daniel Henderson

2025-01-22 19:01:07 +0000 UTC

More Creators

小麦丸

小麦丸

fanbox

MSP Art

MSP Art

gumroad

Sy

patreon

ChickArt - Christopher Huppertz - Grafik-Design

ChickArt - Christopher Huppertz - Grafik-Design

gumroad

爆米花鱼

爆米花鱼

fanbox

すぎむらたけし

すぎむらたけし

fanbox

七瀬ここの◆

七瀬ここの◆

fanbox

kiriko

kiriko

fanbox

MaxxSynth

MaxxSynth

patreon

quicke

quicke

patreon

Ferny's Progression

Ferny's Progression

patreon

BSN_MMD

BSN_MMD

fantia

Sataen

Sataen

patreon

S0k4

S0k4

patreon

Just Me And My Boyfriend

Just Me And My Boyfriend

patreon

Ookami Kurisu

Ookami Kurisu

fanbox

tutelarofquixotics

tutelarofquixotics

patreon

IchikoAoba

IchikoAoba

patreon

ISAmu.Room

ISAmu.Room

dlsite

Aereleth

Aereleth

patreon

Trap queen emilyYunicorn

Trap queen emilyYunicorn

gumroad

kenney

kenney

patreon

てるを

てるを

fantia

na●si

na●si

fantia

AetrixSFM

AetrixSFM

patreon

silent、洛语依

silent、洛语依

fanbox

Jarvann

Jarvann

patreon

sh_akira

sh_akira

fanbox

BambiBound Clips

BambiBound Clips

gumroad

Huff

Huff

patreon

むちねる / muchineru

むちねる / muchineru

patreon

FangBarbie

FangBarbie

patreon

Jasonafex

Jasonafex

patreon

Ukatoo

Ukatoo

fanbox

tartnsfw

tartnsfw

patreon

Hopespice

Hopespice

patreon

cabbagepreacher

cabbagepreacher

patreon

Grey

Grey

gumroad

Easlo Studies

Easlo Studies

gumroad

vaelyon

vaelyon

patreon