'Stochastic Parrot' - Controversial Terms in AI, Explained - New Series
Added 2024-06-03 13:02:00 +0000 UTCA new bonus series (8 episodes) explaining some of the most controversial terms in artificial intelligence, starting with an OG term for LLMs, as 'stochastic parrots'. Find out where the term came from, why it stuck, and enter the debate over whether it is justified.
Comments
Great explanation! What course are you contributing to, @Philip?
Jason Tangen
2024-06-18 05:47:13 +0000 UTCI always think of this funny thought by Scott Aaronson about what are the practical implications of whether it "seems conscious" or "is conscious" because you could say "it seems to be intelligently aware that is can have this impact on humanity" or "it IS intelligently aware that is can have this impact on humanity". I can't tell if any other human is anything other than a figment of my imagination, so try proving AGI among humanity :)
James Younger, DDS
2024-06-17 20:39:05 +0000 UTCOh wow, interesting. Appeared in the training data after the summer I suspect!
Philip
2024-06-11 13:11:07 +0000 UTCGreat point James, and thank you. So maybe the Octopus does have some legitimate ideas...
Philip
2024-06-11 13:08:01 +0000 UTCLoads more coming, this was more a taster!
Philip
2024-06-11 13:06:34 +0000 UTCGPT-4o seems to know who the son of Mary Lee Pfeiffer South is...I wonder if this is a problem OpenAI has tackled...
Mike D
2024-06-09 19:15:41 +0000 UTCStochastic doesn't really mean 'random', its more like 'a probabilistic function containing some randomised elements', or perhaps more simply 'randomness within reason' This may seem nitpicky, but it actually goes to the core of the debate over this term. This process isn't rolling dice on a word salad or pulling tokens out of a hat. This is a probabilistic mathematical model using randomness to optimise toward encoding meaning. Our parrot is designed to randomly produce semantic rules not just noise, and it is rewarded each time it successfully generates meaning. It then goes through a billion lives worth of inputs and outputs. Randomising toward reasoning. Through that process an entity might evolve full reasoned communication, while still being stochastic probabilistic synthesis from its training data. An AGI may end up still technically being a stochastic parrot. Indeed you might say we are too.
Poss
2024-06-09 00:00:34 +0000 UTC"The Octopus has emerged from the ocean" - love it!
SteveHaupt
2024-06-07 23:14:00 +0000 UTCNice thread! My take on intelligence is from Joscha Bach: "Intelligence is the ability to create mental models". Or in your terms the ability to create compression. So not the compression itself is intelligence in that view, but the ability to create one.
SteveHaupt
2024-06-07 23:09:40 +0000 UTCTotally agree with Sam! I wrote something recently about this and it seems to be possible to build system 1 out of a (non-trivial) compositional building block and then built system 2 quite easily on top of that using the same building block and underlying machinery. System 2 here uses previously successful thought patterns that best match the current context. As a bonus, the thinking is transparent from an AI safety perspective and driven by fine-grained emotional values that can be curated in advance. I’d love to know what’s wrong with this approach because it seems like a useful direction that I haven’t seen elsewhere. https://github.com/rrwbec/humans-in-the-loop/tree/main
Robert Beckwith
2024-06-05 14:26:22 +0000 UTCIt’s kind of like how brain scans show that grandmaster chess players use different parts of their brains compared to beginners when looking at the same board. It’s amazing how our brains can quickly learn to shortcut the heavy reasoning and figure simpler models out after just a few examples.
Moises
2024-06-04 21:43:16 +0000 UTCThis was (as always) a great explainer. An idea that you might have added is that humans often talk fairly successfully about things they have never experienced. Philosophers call this reference borrowing. I’ve read books about war so I understand some things about wars even though I’ve never been in one. I think that counts as a sort of understanding but perhaps Searle wouldn’t.
James Maclaurin
2024-06-04 14:57:14 +0000 UTCThat's a good point actually. I agree that there may be intelligence without compression, so that formula is missing at least one more component. But seems compression is at least some kind of intelligence amplifier. I don't think that infinite memory is possible in a universe of finite resources, so without compression you eventually hit a boundary and then you would need compression to achieve higher intelligence.
Sam
2024-06-04 14:14:56 +0000 UTCDoesn't intelligence = compression imply: no compression = no intelligence? For humans (and finite neural networks) compression is a necessity for creating a map that is smaller than the territory [to paraphrase Yudkowski] but with infinite memory could something be intelligent without compression? I'd think so. 'Intelligence' in 21st century human terms seems like various things, including your example above of learning a formula instead of memorising number combinations; CoT type reasoning; and maybe something quite nebulous like spotting relationships/similarities between abstract concepts.
Mike Pemberton
2024-06-04 12:27:29 +0000 UTCI wonder why is math so often referenced when discussing reasoning. I have a small brother in elementary school and observing him learn math made me realize how much memorizing and repetition there actually is in math. First, basic math (addition and multiplication up to 100) is pure memorization. Then when you want to learn more complex operations, you first learn some "recipe" how to do it and just apply the same steps to different inputs. These steps are another layer of memorization. Then when you want to go even further, it's all about generating a bunch of potential next steps and trying them out to see if they lead somewhere. Some people call it divergent/convergent thinking, other people call it system 1/system 2 thinking, but it's basically just brainstorming ideas (random repetition from memory) and then validating them (by repeating certain procedures from memory). LLMs are already good at brainstorming, and Philip's videos on Q* leak, process supervision, Let's Verify step by step etc make me think there's not much more to it, really. AlphaGeometry and AlphaCode are already pretty promising and this is all they do. I honestly can't think of anything relevant to math or logic that's not underpinned by memorization or some kind of procedure (formula / theorem / steps / recipe) that you can memorize... I've read in the past that human intelligence is closely corelated with the size of person's working memory. My gut tells me that there's not much special about reason and that we're just as stochastic parrots as the AIs are. We just had more time to formalize some of our findings and got better at passing them down to future generations, which again is just a form of cultural memory.
Sam
2024-06-04 10:47:45 +0000 UTC> it’s hard to really pin down what intelligence is. What do you think about intelligence = compression? I'll give an example. I can memorize 0+1=1, 1+1=2, 2+1=3, 3+1=4 ... 9+1=10 - that's 10 pieces of information that now enable me to add 1 to 10 different numbers. I can continue by memorizing a bunch of new rules: 0+2=2, 1+2=3, 2+2=4 ... 8+2=10 - that will enable me to add 2 to these numbers. But that requires 10 original pieces (for adding 1) and 9 new pieces of information (for adding 2), totalling 19 pieces of information. Or I can learn a more generic formula X+2 = X+1+1. This will allow me to do calculation "add 2" by doing "add 1" operation twice. I don't have to memorize 9 new pieces of information, I instead am memorizing just 1 new concept of variable and 1 new formula, so that's 12 pieces of information now. I've just saved 7 pieces of information by memorizing a formula instead. This enables me to store something else in my memory. But I've also traded off memory capacity for speed. That's a compression. I'd expect that most of (all of?) human knowledge is a form of compression of how the real world works. If you observe how elementary graders learn math it's exactly like this. They memorize addition and multiplication to 100 and then move to learning formulas and "steps" because that's more efficient than memorizing how any 2 numbers interact with each other. This was also an example of lossless compression but you can also have a lossy compression. For example I love summer / I luv smmr / ❤️☀️ are all the same but you are losing accuracy. Some people may not be able to extrapolate luv to love, smmr to summer, or may misinterprate the emojis to "I love Sun". Language itself is inherently lossy because it doesn't describe the reality well enough. "I am happy" is not so informative because there are many kinds of happy (thrilled, content, relaxed etc).
Sam
2024-06-04 10:22:01 +0000 UTCThere's plenty of "cheap imitation" even among humans. In the artistic community, it's common to steal each other's ideas, there's even a book named Steal like an artist. If you go deeper, there really is no difference between imitation and inspiration, it's a question if we can even produce something "original". When I think deeper about this, the difference between imitation, inspiration and originality is just rarity. When you combine foreign elements into something common - it's a cheap copy. When you combine the same elements into something rare - it's original. But you are always inspired by something, you always create something out of your own experience of the world. So to me the cheapness of generation is just a status game we people do. Who is better at hiding their influences and better at generating rare outputs "gets respect" from others.
Sam
2024-06-04 10:02:42 +0000 UTCI've often thought something similar when I've read articles complaining that AI is 'stealing' from content creators when it writes a sonnet like Shakespeare, or creates an image like a Picasso. Shakespeare didn't invent language, Picasso would have seen thousands of other people's paintings. Why is it that a human generating something novel after seeing/reading/hearing already existing things is 'creative' or 'inspired' but when an AI does it, it's cheap imitation? The stochastic parrot argument is perhaps the philosophical root of these types of critique - that without genuine (human style) 'understanding', AI generated output is a guess based on previously seen examples, rather than something derived from an internal model matching the real world. It's a really interesting thing to think about :-)
Mike Pemberton
2024-06-04 08:43:11 +0000 UTCNear perfect explainer there - balance of detail and length absolutely spot-on. Look forward to the next episodes.
Mike Pemberton
2024-06-04 08:05:10 +0000 UTCIt does beg the question… AI intelligence is a lot of pattern matching but we know our algorithms aren’t perfect yet (transformers). They just work with scale. But when algorithms get better (perhaps like our brains via evolution) then perhaps intelligence adapts. Again, it’s hard to really pin down what intelligence is.
Joel
2024-06-04 00:26:24 +0000 UTCThis! Being exposed to counterfactuals just means you memorize the steps too. Not just memorizing certain input-output sequences. I don't think there's much more to intelligence than just repeating what has been observed and memorized.
Sam
2024-06-03 21:43:26 +0000 UTCFor me the model that really showed significant understanding of it's domain was SORA. Not the same type of model, but it would seem impossible to create realistic-looking physics simply by watching enough videos and imitating them. For example, fluid dynamics is incredibly complex to model, so to get to the level of precision they show in the "pirate ships in a cup of coffee" demo it seems likely there has to be some kind of abstraction going on, and that suggests some understanding.
John Hawkin
2024-06-03 17:31:03 +0000 UTCGreat video. This is exactly why I donate. Thanks man, seriously keep at it.
Steven
2024-06-03 17:03:56 +0000 UTCI would like to take our definition of “understanding” from 2015 and have this conversation there. If we were in 2015 and we were talking about a future AI that knows that M is the mother of a person P, but doesn't know who M's son is, would we have said that this AI would understand what it's reproducing? Nowadays, some may feel inclined to adapt the definition of “understanding” so that it can be applied to AI again, because it would simply be nicer if it actually understood. The idea that seems most plausible to me is that we have seen sparks of understanding so far, but that there is clearly room for improvement.
André Thieme
2024-06-03 14:56:49 +0000 UTCRegarding the counterfactual scenario test paper you presented: I wonder if it's possible that LLMs have seen enough counterfactual questions in their training data that it is still being a stochastic parrot and still performing above chance. For example, chess's counterfactual has the most possibilities and it performed lowest above chance there, while basic syntax was close, but it's common for people to play word games (e.g., the way Yoda speaks). I genuinely don't know and I'm curious what others think.
Joel
2024-06-03 13:54:49 +0000 UTC