Researchers used the system, called LucidSim, to train a robot dog in parkour, getting it to scramble over a box and climb stairs even though it had never seen any real-world data. The approach demonstrates how helpful generative AI could be when it comes to teaching robots to do challenging tasks. It also raises the possibility that we could ultimately train them in entirely virtual worlds. The research was presented at the Conference on Robot Learning (CoRL) last week.
“We’re in the middle of an industrial revolution for robotics,” says Ge Yang, a postdoc at MIT’s Computer Science and Artificial Intelligence Laboratory, who worked on the project. “This is our attempt at understanding the impact of these [generative AI] models outside of their original intended purposes, with the hope that it will lead us to the next generation of tools and models.”
LucidSim uses a combination of generative AI models to create the visual training data. First the researchers generated thousands of prompts for ChatGPT, getting it to create descriptions of a range of environments that represent the conditions the robot would encounter in the real world, including different types of weather, times of day, and lighting conditions. These included “an ancient alley lined with tea houses and small, quaint shops, each displaying traditional ornaments and calligraphy” and “the sun illuminates a somewhat unkempt lawn dotted with dry patches.”
These descriptions were fed into a system that maps 3D geometry and physics data onto AI-generated images, creating short videos mapping a trajectory for the robot to follow. The robot draws on this information to work out the height, width, and depth of the things it has to navigate—a box or a set of stairs, for example.
The researchers tested LucidSim by instructing a four-legged robot equipped with a webcam to complete several tasks, including locating a traffic cone or soccer ball, climbing over a box, and walking up and down stairs. The robot performed consistently better than when it ran a system trained on traditional simulations. In 20 trials to locate the cone, LucidSim had a 100% success rate, versus 70% for systems trained on standard simulations. Similarly, LucidSim reached the soccer ball in another 20 trials 85% of the time, and just 35% for the other system.
Finally, when the robot was running LucidSim, it successfully completed all 10 stair-climbing trials, compared with just 50% for the other system.
These results are likely to improve even further in the future if LucidSim draws directly from sophisticated generative video models rather than a rigged-together combination of language, image, and physics models, says Phillip Isola, an associate professor at MIT who worked on the research.
The researchers’ approach to using generative AI is a novel one that will pave the way for more interesting new research, says Mahi Shafiullah, a PhD student at New York University who is using AI models to train robots. He did not work on the project.