Drawing on 3 exclusive interviews and weeks of research, here is my uncovering of 4 underappreciated trends as we debate whether embodiment is needed for AGI.
Do you have a link to a downloadable of this video for offline viewing?
r
2024-05-13 21:49:05 +0000 UTC
:)
Philip
2024-01-22 13:15:55 +0000 UTC
Great video. Thanks as always for tying all these different pieces together.
Shawn Fumo
2024-01-21 17:43:35 +0000 UTC
Yeah I backed rabbit’s r1 mostly because I was curious about the UI-driven nature of it. Even from a purely practical standpoint, there are little things I’d like to automate but the site doesn’t have an API or the app doesn’t integrate with iOS shortcuts or doesn’t give access to the particular thing I want. Even if there is an API, do I really want to spend time figuring out how to work with it just to do a little automation?
I get how they are doing websites, using a vm browser to drive it. I’m less sure how the phone apps can work. Do they have virtual iPhones? And how does that work with your own data like iCloud? Or is it only self-contained apps? Lots of questions.
I certainly can see why they’d want their own hardware though. I imagine Apple would be nervous about a service that explicitly automates sites/apps. They claim they will be good citizens and act like a human vs DoSing, but would Apple want to be associated in case it doesn’t work out?
I do wonder how apps/sites will feel about it in general. Like even if they do handle it very well, you aren’t seeing some ads or such that the app is throwing up. Will some end up banning r1 access?
Shawn Fumo
2024-01-21 17:43:05 +0000 UTC
Yeah and there may be advantages to both kinds. Like embodiment helps for not just robotics but various aspects of how the world works in a physical way that can show up in subtle ways in even virtual realms. But at the same time, not being constrained by embodiment can give a different perspective that might be more imaginative in certain ways.
Like the artifacts that can happen in images models because they don’t really understand the visual patterns they encode on a deep level. That’s a problem if you want realism, but can also be its unique source of “creativity”.
Shawn Fumo
2024-01-21 17:32:20 +0000 UTC
And the Altman/Ive smartphone coming too ...
Philip
2024-01-11 09:41:41 +0000 UTC
Rabbit's "Large Action Model" just very publicly advanced the SoTA on AI GUI manipulation. Now it's in the public imagination. Anthropic has a policy of not being the first to release new capabilities (ala ChatGPT) so the seal is broken for them. And I sense OpenAI felt some guilt about "shooting the industry out of a railgun" with ChatGPT as Sam Altman said. So, this time around, they may have been sitting on GUI manipulation until someone else showed it off first. I think Rabbit's 12-person team just did that. So I predict we'll see ChatGPT get server-side GUI manipulation within 3 months and other frontier labs will follow. I also think Imbue and Adept will demo similar GUI tech within a month to stay relevant.
Brian Crabtree
2024-01-11 07:03:14 +0000 UTC
Just stumbled upon this paper and thought it may be relevant: “Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers”: https://huggingface.co/papers/2401.01974
solarapparition
2024-01-05 04:54:23 +0000 UTC
Yeah, maybe more technically we can say most digital capabilities are orthogonal to physical capabilities. Except for where they overlap, of course.
solarapparition
2024-01-05 04:51:14 +0000 UTC
Love this discussion on whether AGI requires embodiment. I don’t think there is a right or wrong answer as there is (currently) no right or wrong answer to the question ‘what is AGI?’
Having said that - I believe that there will be a version of AGI, let’s call it ‘Digital AGI’, that won’t require embodiment and at this point I’m increasingly convinced that it’s more of a engineering problem than a research problem - we have all the pieces, just haven’t worked out how to put them all together yet.
For ‘Physical AGI’ I think there are still some research problems (and plenty of engineering problems!) to solve. I think we can see the path, but there are a few more obstacles to overcome.
With the current rate of progress, it might be we see both these flavours of AGI in a similar timeframe (within 12 months of each other) but I do think they will each be very different.
P.S. loving Jim Fang’s ‘the world is a series tokens’ perspective - if you push that thought to its conclusion, maybe superintelligence will be able to predict the future!
Sean Betts
2024-01-04 20:46:09 +0000 UTC
Oh yeah, I saw the Tencent thing. Seems like Apple is going to be facing some innovator’s dilemma soon…
solarapparition
2024-01-04 18:03:43 +0000 UTC
Nice, yah I've seen that one👍 Also check out AppAgent from Tencent released 2 weeks ago: https://appagent-official.github.io It can do some cool stuff like self exploration of GUIs, learn from human examples, and documenting what it learns for future reference which reminds me of Voyager. And I'm certain Andrej Karpathy is way beyond this internally at OpenAI (he created the original World of Bits agent benchmark in 2017).
Brian Crabtree
2024-01-04 17:18:16 +0000 UTC
So agreed that embodiment would be necessary to have the feedback needed for models to perform *physical* tasks. But to me, most knowledge work only really requires “digital” embodiment, as in, the ability to interact with and get feedback from the digital world. For example, if I had the ability to directly interact with my computer using code, I can perform every single part of my work as a software engineer without any physical body or feedback from the physical world, except maybe if there’s a power outage and I need to restart my machine.
I may just be biased because of my profession, but personally I think the most transformative aspects of gen AI will happen long before embodiment. Or, you know, a few years before.
solarapparition
2024-01-04 15:49:06 +0000 UTC
If you haven’t yet, check out Self Operating Computer: https://github.com/OthersideAI/self-operating-computer
Still in very early phase, but already capable of some rudimentary things along those lines.
solarapparition
2024-01-04 15:34:05 +0000 UTC
Excellent video. Tesla certainly looks to be the furthest ahead. The fluidity of movement in the Gen 2 Optimus arm's and hand is quite remarkable! If you watch all of the Tesla Optimus videos back to back you'll see they have made incredible progress in a short amount of time. In recent months they have been hiring people to work specifically on this during the night shift in their factories. That means their AI/Robots teams are likely working on developing Optimus 24 hours a day, 7 days a week. Elon Musk has repeatedly said that Tesla is a AI/Robotics company that currently makes cars. My prediction is that within 5 years they will be producing as many Robots as cars. It really looks like Tesla entends to use the Optimus robot to help drive down the cost of their next generation car so that they can finally produce a compelling electric cars that sell for less then $25k.
Jeff Thom
2024-01-04 15:29:57 +0000 UTC
If current AI systems can pilot a robot hand to spin a pen at superhuman levels, how far are we from AI systems that can pilot Chrome and Windows to control webpages and software at superhuman levels? Because, for this decade at least, an AGI that can pilot a mouse and keyboard is more valuable than an AGI that can pilot a robot. Why? Because the quantity of AI-pilotable robots is tiny and bottlenecked while the quantity of AI-pilotable VMware sessions is basically infinite and only bottlenecked by data center allocation and capacity. So I'm keeping my eye on advancements in GUI manipulation because unlocking that capability is like spawning a billion remote knowledge workers who all instantly get smarter with each new frontier model release.
Brian Crabtree
2024-01-04 12:28:43 +0000 UTC
A perfect way to start my morning, thank you! Practical robots with general reasoning capabilities would be a very big deal in my industry, looking forward to seeing what’s next!