AIExplained

AIExplained

Last 24 Hours: Signs of Introspection in LLMs

Added 2025-10-30 18:40:22 +0000 UTC

Before this paper from Anthropic, out on the 29th, I was a lot more skeptical about LLMs self-reporting their internal state. This is not proof that they can, but partial proof of circuits showing true introspective capability, and more than that, the ability to map a question about it to those circuits.

Plus a big update to lmcouncil.ai.

https://www.anthropic.com/research/introspection

Full Paper: https://transformer-circuits.pub/2025/introspection/index.html#mechanisms

Earlier Work: https://www.anthropic.com/research/mapping-mind-language-model

https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html

Release Post: https://x.com/AnthropicAI/status/1983584136972677319

Last 24 Hours: Signs of Introspection in LLMs

Comments

That's so great to hear! Can you just confirm whether the scrolling hassle is from using it on mobile. or what type of screen? If mobile, after how many models does this expansion become a chore, the normal 3-4 or more? I can try to implement a solution in next 24 hours.

Philip

2025-11-09 10:34:22 +0000 UTC

Really loving lmcouncil.ai! Feature request: When I have a large number of council members, scrolling from left to right/right to left to review responses is quite cumbersome. Is it possible to add some scroll arrows (perhaps something like what you see in headline image from this blog: https://www.experienceux.co.uk/ux-blog/a-ux-perspective-on-horizontal-scrolling/ )

Bret Brizzee

2025-11-08 19:57:58 +0000 UTC

Done! Just for you!

Philip

2025-11-05 12:32:14 +0000 UTC

Thanks Joshua! And for being a Max sub...

Philip

2025-11-05 12:32:09 +0000 UTC

Damn, not too much I can do about that!

Philip

2025-11-05 12:31:57 +0000 UTC

In LM Council, please add an option to "delete all" past conversations. Thank you!

Riley Thomson

2025-11-05 01:02:33 +0000 UTC

Great content Philiph!

Joshua Davis

2025-11-03 04:47:12 +0000 UTC

The vocal fry in your voice is a bit disturbing.... 😅

Kishore Kumar

2025-11-02 21:35:21 +0000 UTC

This makes me wonder if human introspection is as a result of the physical capabilities of the brain or if it comes about as a result of lived human experiences. If it is a result of the physical brain then it would suggest the possibility that an artificial intelligence could stumble across introspection.

Barnaby Golden

2025-11-02 12:57:59 +0000 UTC

I'm wondering if this is not a case of just writing "bread" or "treasure" or "dust" with vector instead of with words and after that it being the same normal model inference as always. I'm hypothesizing that for example just because model was not trained on "Quick fox jumped over a lazy dog" with "bread" skewed activations, the context-filling circuits would kick in. All in all their job is to extract information that is not written directly. Combine it with suggestive question et voila. So to me the testing was clever and shows interesting way of inference, but was not rigorous enough to conclude what they tentatively did. Oh and congrats on lmcouncil expansion and development! Will most probably give it a try!

Paweł Pieniacki

2025-10-31 10:08:52 +0000 UTC

Again, very fascinating research by Anthropic. Although I am not sure what to make of this. When they inject activity patterns for specific concepts, we would expect (as seen by their prior work on monosemanticity) that the outputs are steered towards this concept. I feel like the combination of the given prompt, where the researchers reveal that the model is tested, and the actual activation injection, is something we would expect even if a model had no capability to introspect. It would be more impressive if the model detected these injections without getting any hints. Also, it's not clear how introspection could work in a feed forward network. One could imagine that parts of the state represented by earlier layers are analyzed by later layers, but this would be the barebone version of introspection. Anyways, very interesting research even though I don't find it super convincing for now.

Phillip Yao-Lakaschus

2025-10-30 21:17:41 +0000 UTC

Thanks Phillip, I was hoping you’d cover the details of that Anthropic research after running out of time to read more than the summary myself! I hope you can keep doing these videos and don’t get too distracted by building - it’s very addictive!

Erik

2025-10-30 20:56:08 +0000 UTC

More Creators

darktoonscave

darktoonscave

patreon

Katz Creates

Katz Creates

gumroad

shencomix

shencomix

patreon

Crazy pixel school

Crazy pixel school

gumroad

StayAlivePlz

StayAlivePlz

gumroad

ultimatewkar

ultimatewkar

patreon

NotStrooge

NotStrooge

patreon

ZiilpDev

ZiilpDev

patreon

ninadreamsclub

ninadreamsclub

patreon

Artnip

Artnip

fanbox

Imrax

Imrax

patreon

yoye

yoye

fanbox

SynthWave666

SynthWave666

patreon

Akaranger 18+ Screenshot Sets

Akaranger 18+ Screenshot Sets

patreon

IngensGiga

IngensGiga

patreon

yushimatohji

yushimatohji

fanbox

devakira

devakira

patreon

aah.row

aah.row

patreon

しろみね

しろみね

fanbox

PosiTVty

PosiTVty

patreon

とらちぃ。

とらちぃ。

fanbox

Beers And Bars

Beers And Bars

patreon

DustinCharley

DustinCharley

patreon

むうつき

むうつき

fanbox

Dirty Night ASMR Massage

Dirty Night ASMR Massage

patreon

House of Fortitude

House of Fortitude

patreon

Cessa

Cessa

patreon

えちひろMkⅡ

えちひろMkⅡ

fanbox

china.studiomade

china.studiomade

patreon

Quamax

Quamax

patreon

eroriro

eroriro

fanbox

dash23

dash23

patreon

REVEL HOUSE

REVEL HOUSE

patreon

Dreamer_05

Dreamer_05

patreon

becho_c

becho_c

patreon

Fooxied Games

Fooxied Games

patreon

Sarah Stern

Sarah Stern

gumroad

snoweyvr

snoweyvr

patreon

Andrew Givler - Author

Andrew Givler - Author

patreon

BlackGG

BlackGG

patreon