'The Bitter Lesson' - Controversial Terms in AI, Explained - New Series
Added 2024-08-12 09:21:24 +0000 UTCThe term 'bitter lesson' is thrown about a lot, but what does it actually mean? Does it leave humans irrelevant or is it about something deeper? Drawing on lessons from MuZero and the annotated original essay by Rich Sutton.
Link for Offline-viewing: https://drive.google.com/file/d/1q5hewZoI1zHJ8nP8uZCuyybYj1H7wkjq/view?usp=sharing
Bitter Lesson Essay
http://www.incompleteideas.net/IncIdeas/BitterLesson.html
MuZero
https://arxiv.org/pdf/1911.08265
MuZero in the Airforce?
AlphaTensor
https://deepmind.google/discover/blog/discovering-novel-algorithms-with-alphatensor/
Noam Brown (OpenAI) Assessment
https://x.com/polynoamial/status/1789381426187546644
Comments
My interpretation of the Bitter Lesson as it applies to AlphaGo is not that AlphaGo was trained on humans, and AlphaGo Zero wasn't, but rather the fact that previous Go programs (e.g. GNU Go) were collections of carefully hand-crafted heuristics and rules, based on the programmer's understanding of playing the game, whereas AlphaGo (even the original) involved a far more general, neural-network-based approach with loads of self-play reinforcement learning compute thrown at it. The fact that existing *amateur* human games were used to bootstrap AlphaGo before the self-play training phase is largely irrelevant, since the main points are the neural net, search, and the self-play training (loads of compute) of AlphaGo versus the old-school hand-crafted approach of stuff like GNU Go. As Rich Sutton himself said of Go: "Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale." This ties into the second part of the lesson, which is that people aren't actually very good at understanding how their own minds work anyway, and so meta-methods that try to learn how to solve problems for themselves are better. :P PS "AlphaGo Zero" was actually the first one to learn completely from scratch; everyone who's not obsessed with Go AI seems to forget that and only mention "AlphaZero" for some reason, though that came later. lol
Dark_Eternal
2024-10-19 07:52:13 +0000 UTCThis is actually scary. With the amount of compute we have amassed and the sheer number of highly intelligent people getting all the money our systems can pump into their research, it does seem to imply at any moment someone might find a method that is vastly more efficient than what we do in transformers. I'm usually not hyperparanoid about things going ZOOOOM, but this reminds of the danger. It's also scary to see how quickly all of this seeps into our infrastructure, especially our military infrastructure. Something is a bit more efficient? We take it and push for more. And it is oh so much harder to undo steps once they are taken.
Jörg Weiß
2024-10-14 17:30:10 +0000 UTCI totally agree with your conclusion. Even scaling 20x as Microsoft said yesterday, feels like it would only get you one or two steps forward, not even close to the finish line. I think it needs more than transformers, more even than neural networks. A monstrous, hybrid set of architectures perhaps.
Philip
2024-08-16 16:22:03 +0000 UTCI've been obsessed with this idea my whole life. Like the game of life and trying to find randomness, or how systems with simple rules can create complex group behaviors - ants, humans, you name it. I don't think they mentioned it explicitly, but they did talk about the bitter lesson here: https://www.youtube.com/watch?v=Gg-w_n9NJIE This solidifies my idea about AGI and mixture of experts. Sure, mixture of experts is going to allow you to grow faster, but you'll find saturation way quicker than just feeding "everything" to the model and letting it figure it out. It's also a reason to support the importance of natively multimodal models. I think that exponential curve we were seeing isn't so exponential anymore. Not sure if companies stopped releasing, or if we've reached saturation with current approaches. This is one of the reasons I think we need a change in architecture/paradigm that affects the model from the ground up. I believe that one of the biggest shifts in paradigm will come from mechanistic interpretability and feature interpretation. I think this resembles more the level of metacognition we humans have, and transformer models are too granular in my opinion at taking every single token. Although more important would be to allow for that internal feedback for metacognition, give to the model all the tokens, and let the model figure out the granularity of the features for its awareness to avoid again the bitter lesson. Also, I don't think our brains have the power computation that these models need to train and infer, even during our whole life of training. It makes me wonder if we're missing something fundamental about efficient learning and computation, given the differences between analog and digital systems. Just thinking out loud here, but it feels like we're at a point where we need to rethink some basics to keep pushing forward. I'm totally open to being wrong about this, though, these are just my thoughts.
Pablo Rodríguez
2024-08-14 20:48:18 +0000 UTCThis is a good series 👍🏿😊 I look forward to your every upload. They all provide excellent insight.
r
2024-08-13 16:04:47 +0000 UTCOne of my takeaways is the frame of a "human-centric" approach ~~opposing~~ conjugate to a "scaled-computation" approach (I don't like the dichotomy bend of "opposing" because it seems like an oversimplification of a nuanced orthogonality). I know it's hard to identify and explain the mechanisms, but I want to find out what has been gleaned so far from deep-learning solutions. Although it's an observation of outcomes, one example is how new corner patterns used by AlphaGo are now being used by Go players. So, with 4 steps being trimmed from a large matrix multiplication scenario, have new discoveries been made from understanding how the search approach changed as the computational scale grew? I like that these advancements are leading to more "how" questions.
Blake Chambers
2024-08-12 17:02:11 +0000 UTCSimulators are key, as you saw, and their increasing fidelity will dictate a lot of the pace of further AI improvements. There needs to be a step-change though to properly model human social dynamics.
Philip
2024-08-12 13:54:58 +0000 UTCExactly!
Philip
2024-08-12 13:53:48 +0000 UTCIf we really want to learn anything from nature/science, it is that incredibly complex things emerge from simple building blocks at scale. Choose your building blocks carefully!
Machiel Reyneke
2024-08-12 13:45:56 +0000 UTCSo better simulations and simulation evaluations are needed if we want an AI to learn by itself / against itself. Chess and Go are in constrained environment, the drone can be simulated in flight simulators, but what about the models that need to learn about interaction with humans (language and culture)?
Adin Softic
2024-08-12 10:48:42 +0000 UTC