Nvidia CEO Jensen Huang believes Physical AI will be the next big thing. Emerging robots will take many forms, all powered by AI.
Recently Nvidia has been extolling a future where robots will be everywhere. Intelligence machines will be in the kitchen, the factory, the doctors office, and the highways, just to name a few settings where repetitive tasks will increasingly be done by smart machines. And Jensen’s company, of course, will provide all the AI software and hardware needed to teach and run the needed AIs.
What is Physical AI?
Jensen describes our current phase of AI as pioneering AI, creating the foundation models and tools needed to refine them for specific roles. The next phase which is already underway is Enterprise AI, where chatbots and AI models are improving productivity of enterprise employees, partners and and customers. At the culmination of this phase, everyone will have a personal AI assistant, or even a collection of AI’s to assist in performing specific tasks.
In these two phases, AI tells us things, or shows us things, by generating the likely next word in a sequence of words, or tokens. But the final third phase, according to Jensen, is physical AI, where the intelligence occupies a form and interacts with the world around it. To do this well requires the integration of input from sensors, and the manipulation of items in three-space.
“Building foundation models for general humanoid robots is one of the most exciting problems to solve in AI today,” said Jensen Huang, founder and CEO of NVIDIA. “The enabling technologies are coming together for leading roboticists around the world to take giant leaps towards artificial general robotics.”
OK, so you have to design the robot and its brain. Clearly a job for AI. But how do you test the robot against an infinite number of circumstances it could encounter, many of which can not be anticipated or perhaps replicated in the physical world? And how will we control it? You guessed it: we will using AI to simulate the world the ‘bot will occupy, and the myriad of devices and creatures with which the robot will interact.
“We’re going to need three computers… one to create the AI… one to simulate the AI… and one to run the AI,” said Jensen.
The Three Computer Problem
Jensen is, of course, talking about Nvidia’s portfolio of hardware and software solution. The process starts with Nvidia H100 and B100 servers to create the AI, workstations and servers using Nvidia Omniverse with RTX GPUs to simulate and test the AI and its environment, and Nvidia Jetsen (soon with Blackwell GPUs) to provide the on-board real-time sensing and control.
Nvidia has also introduced GR00T, which stands for Generalist Robot 00 Technology, to design, understand and emulate movements by observing human actions. GRooT will learn coordination, dexterity and other skills in order to navigate, adapt and interact with the real world. In his GTC keynote, Huang demonstrated several such robots on stage.
Two new AI NIMs will allow roboticists to develop simulation workflows for generative physical AI in NVIDIA Isaac Sim, a reference application for robotics simulation built on the NVIDIA Omniverse platform. First, the MimicGen NIM microservice generates synthetic motion data based on recorded tele-operated data using spatial computing devices like Apple Vision Pro. The Robocasa NIM microservice generates robot tasks and simulation-ready environments in OpenUSD, the universal framework that underpins Omniverse for developing and collaborating within 3D worlds.
Finally, NVIDIA OSMO is a cloud-native managed service that allows users to orchestrate and scale complex robotics development workflows across distributed computing resources, whether on premises or in the cloud.
OSMO helps simplify robot training and the creation of simulation workflows, cutting deployment and development cycle times from months to less than a week. Users can visualize and manage a range of tasks — like generating synthetic data, training models, conducting reinforcement learning and testing at scale for humanoids, autonomous mobile robots and industrial manipulators.
So, how do you design a robot that can grab objects without crushsing or dropping them. Nvidia Isaac Manipulator provide state-of-the-art dexterity and AI capabilities for robotic arms, built on a collection of foundation models. Among early ecosystem partners are Yaskawa, Universal Robots, a Teradyne company, PickNik Robotics, Solomon, READY Robotics and Franka Robotics.
Ok, how do you train a robot to “see”? Isaac Perceptor provides multi-camera, 3D surround-vision capabilities, which are increasingly being used in autonomous mobile robots in manufacturing and fulfillment operations to improve efficiency and worker safety while reducing error rates and costs. Early adopters include ArcBest, BYD and KION Group as they aim to achieve new levels of autonomy in material handling operations and more.
For operating robots, the new Jetson Thor SoC includes a Blackwell GPU based on with a transformer engine delivering 800 teraflops of 8-bit floating point AI performance to run multimodal generative AI models like GR00T. Equipped with a functional safety processor, a high-performance CPU cluster and 100GB of ethernet bandwidth, it significantly simplifies design and integration efforts.
Conclusions
Just when you thought it might be safe to go back in the water, da dum. Da dum. Da dum. Here come the robots. Jensen believes that robots will need to take human form because the factories and environments in which the will operate were all designed for human operators. Its far more economical to design humanistic robots that to redesign the factories and spaces in which they will be used.
Even if its just your kitchen.