AI News Bureau
Instead of relying on the typical limited, task-specific data to train robots, this approach scales up, drawing on vast amounts of information similar to the extensive datasets used to train large language models (LLMs).
Written by: CDO Magazine Bureau
Updated 12:17 PM UTC, Wed November 6, 2024
Representative image by vecstock on freepik.
MIT researchers have created a versatile technique that combines a huge amount of heterogeneous data from multiple sources into a single system to better train general purpose robots.
Instead of relying on the typical limited, task-specific data to train robots, this approach scales up, drawing on vast amounts of information similar to the extensive datasets used to train large language models (LLMs). The team looked to models like GPT-4 for a kind of brute force data approach to problem-solving.
“In the language domain, the data are all just sentences,” says Lirui Wang, the new paper’s lead author. “In robotics, given all the heterogeneity in the data, if you want to pretrain in a similar manner, we need a different architecture.”
The team introduced a new architecture called Heterogeneous Pretrained Transformers (HPT), which pulls together information from different sensors and different environments. A transformer model was then employed to aggregate the data for training purposes. Users then input the robot design, configuration, and the job they want done.
“Our dream is to have a universal robot brain that you could download and use for your robot without any training at all,” CMU associate professor David Held said of the research. “While we are just in the early stages, we are going to keep pushing hard and hope scaling leads to a breakthrough in robotic policies, like it did with large language models.”