The goal of building Humanoids is fascinating, albeit dividing among the community.
A lot of academics, investors, and industry leaders would balk at it with the argument that simpler non-anthropomorphic solutions along with changes to the environment are more practical. But this is akin to redoing the infrastructure to make autonomous cars work. There will be an equilibrium in both autonomous capability and environment design.
The problem of a smart robot design lies equally in building more capable robot hardware - sensors, actuators, and power systems, as it does in building smarter brains.
Car companies in the past, Honda, Toyota -- have taken their manufacturing background to build general-purpose humanoid robots. This allows practical decisions such as design for manufacturability and economies of scale. Yet the question remains why is this time going to be different? One could argue that the nexus of technologies such as accelerated computing, perception, and data along with the tools from deep learning warrants that we revisit this problem.
Optimus, effort is in a sense no different from Asimo, Atlas, T-HR3, Digit. All of them are backed by expert roboticists with years of design experience. However, Tesla Robotics Team has done a commendable job in fast iteration time for the design of the robot ground up from a good experimental study on actuator requirements, designing new actuators, and creating the integrated system. This would allow meaningful power while conserving energy unlike hydraulic actuators used by Atlas from Boston Dynamics. The current locomotion stack is not using any machine learning, but rather trajectory optimization using reference controllers. This is a sensible design choice to get up and running, but would definitely need redoing for longer-term operation. Their current simulation-based testing stack is currently limited and might be using Drake for controller design.
The hand also seems exciting with a metallic cable-driven system with four fingers and a thumb. In the brief demo, they showed it had a reasonably high loading capacity (holding plant-watering water cans, lifting bars of aluminum in the factory). However, due to the cable-driven design, the system will have a slower response time, harder to do learning-based control and no back drivability in autonomous mode. This will make general-purpose autonomous manipulation slightly challenging.
Tesla would benefit a lot from collaborating with the community. The argument that it can be misused is a misdirection because if Tesla indeed makes progress on it, it will likely not be alone, at least not for long. The technology will be replicated in some form out in the open sooner rather than later, as has been the case with language-models and diffusion based image generators. Tesla already has the community taking notice, and by being more open, Elon would only empower more people working on the problem which is afterall “beneficial to humanity” - a claim that Elon made is their driving principle. Designing the whole stack of hardware, simulation, and data infrastructure is requiring Tesla to reinvent the wheel on many fronts. They could release limited encrypted designs which would enable a community to integrate Tesla Bot as an experimental platform and build around it. For instance, the use of tools such as large-scale enterprise grade simulation for large-scale robot training both in control and perception could speed up development. They could collaborate with academics for better tools such as differentiable simulation and planners for multi-finger grasping.
Yet, the challenges of learning-based control and behavior chart a long road with a long tail of problems. Autonomous Driving (AV) offered a mirage of being just around the corner, and we are still chasing the dream over a decade later. The AV domain suffers from the long-tail of corner cases, which has made this into a game of whack-a-mole -- as was demonstrated by the AV team at Tesla yesterday in their demos of finding a car parked at an intersection required a full model update. The other problem is modeling human behavior, which has been the focus of AV companies for a few years now. Overall, we should learn a lesson in humility about the challenges of building deployable autonomy in the real world, before we approach humanoid AGI with unbridled enthusiasm.
Autonomous General Purpose Manipulation could be considered a harder problem that needs perception and agent modeling but also needs carefully controlled interactions with the environment. A testament to this would be the status of simulation engines used in two industries. AV stacks use rendering and procedural scene creation with no physics at all since there is no contact-based interaction. While Robotics is built for controlled contact in addition to many of the challenges of AV.
However, with cars, Tesla is able to pursue AV development while still generating revenue through electric cars. Humanoid robots, in contrast, are a zero-to-one problem with no revenue until the system works to deliver value. Given what we know now of the challenges of long-tail of problems, limitations of current machine learning tools, and the need for new robot design, it might be daunting to go in with the plan for a fully autonomous solution as a first product. The flip side is that robots, in the short term, have use cases in places where they do not necessarily need to interact with humans and the implications of failure are less severe. The community needs to find a revenue-positive pathway to support this development. And this could come from behind-the-scenes use cases for robot manipulation, in warehouses, retail stores, food preparation, and manufacturing. While automation-based solutions are still being pursued, it remains to be seen how a general-purpose hardware-based solution would stack up. We would need to look to low-volume, high variability products which require quick adaptation.
Overall, the current design is a very good first step. Interest in building such systems is welcome because Tesla and Elon Musk's involvement in the problem brings attention, talent, and resources to the problem, setting in motion a flywheel of progress. This effort should be lauded with cautious optimism by the community, for the compass points in the right direction, and Elon brings with him the heft of Tesla engineers as we trek through the AI/Robotics jungle.
Thank you for writing this. You hit it when u asked "what's going to be different this time?" Everyone still hunting for that answer.
Great to see you on Substack, Animesh. I have been following your work, and your perspective here will be interesting. Your comparison with AV is spot on, though I am less optimistic. Tesla has been marketing full autonomy for its drivers and charging them an arm and a leg for years. While they could get away with this 5 years ago, the EV market now is competitive and the edge cases make Tesla FSD still unusable for anyone outside the Bay Area (regulatory restrictions also do not help). I don't see this changing anytime soon. With this track record, my expectation for the Robot - with a market date of 2027 - is modest. Not for their stock prices, though. And in the end that may be the whole point. Still, open sourcing the "brains" of this humanoid may lead to far more progress in the community than a product 5 years away.