Abandoning motion capture and fully shifting to pure visual data collection, Tesla Optimus's latest training progress has been revealed!

Wallstreetcn
2025.11.03 07:38
portai
I'm PortAI, I can summarize articles.

According to reports, Tesla has shifted the training method for its humanoid robot Optimus from motion capture to pure camera data collection. Dozens of data collection employees are repeatedly performing daily actions (such as wiping tables, lifting cups, pulling curtains, etc.) in the laboratory to provide video training materials for the robot to learn human behavior

Tesla is training Optimus with pure visual data, allowing the robot to truly understand the world with its "eyes."

According to a recent report by Business Insider, Tesla has shifted the training method for its humanoid robot Optimus from motion capture to pure camera data collection. Dozens of data collection employees are repeatedly performing daily actions in the laboratory to provide video training material for the robot to learn human behavior.

The report states that since June of this year, Tesla has abandoned the previously used motion capture suits and remote operation methods, opting instead for a data collection method that relies solely on cameras. Workers wear helmets equipped with five cameras and carry equipment packs weighing 30-40 pounds, repeatedly performing basic actions such as wiping tables, lifting cups, and pulling curtains.

Musk stated during the third-quarter earnings call that Optimus "has the potential to become the largest product in history," and he expects the company to eventually produce 1 million robots per year. He also mentioned that Optimus could one day account for about 80% of the automaker's value.

Training Method Fully Shifted to Camera Data Collection

In a glass laboratory at Tesla's engineering headquarters, data collection workers perform seemingly simple but extremely precise repetitive actions. Each action must be repeated hundreds of times during an 8-hour shift, with all behaviors fully recorded by the five cameras on the helmet and the backpack equipment.

In June of this year, after project director Milan Kovac left the company, employees were informed that the company would transition from motion capture suits and remote operation to solely using cameras for data collection. Workers reported that the team was told this method would allow for faster scaling of data collection.

In addition to the cameras worn by the workers, Tesla has also set up fixed cameras around the work area. Jonathan Aitken, a robotics expert at the University of Sheffield, stated that these fixed camera towers can provide a broader environmental perspective to supplement the data from the cameras worn by the workers.

Workers are sometimes also equipped with haptic gloves to track subtle hand movements. Musk has stated that Tesla has invested significant effort in developing humanoid hands for Optimus, calling it "an extremely difficult engineering challenge."

AI-Generated Task Instructions Cover Complex Action Scenarios

Tesla has begun using AI-generated prompts to assist in training the robots. In certain training exercises, workers receive a series of AI-generated instructions through a headset connected to their backpack, needing to complete each action within 3-5 seconds.

According to workers, these exercises include squatting, doing the "chicken dance," imitating a gorilla, pretending to vacuum, sprinting short distances, and pretending to play golf. Some tasks even include infant intelligence games, such as stacking rings by size and color or placing shapes into corresponding slots.

Two data collectors mentioned that some AI-generated tasks made them uncomfortable, including requests to crawl on all fours or remove clothing. However, experts believe that these seemingly random tasks may help Tesla identify areas that need improvementAt the Fremont factory, data collectors also organize vehicle parts while wearing helmets and backpacks, working on the assembly line. Experts say that collecting different data points for the same task is very helpful for training.

The Actual Performance of Robots Still Faces Technical Challenges

Although in the company video, Optimus can walk, fold clothes, perform kung fu moves, and distribute candy in Times Square, there is a significant gap in performance during actual training.

Reports indicate that two workers stated that the robot falls half the time when performing tasks that require bending or tilting, and sometimes damages expensive equipment. Unless performing tasks that require moving more than a few feet, it is usually tethered to a support frame to remain upright.

Aitken stated that in a controlled environment like Tesla's office, the robot should be able to easily maintain its upright position. "Getting it to stand up and maintain balance should be one of the first problems you solve."

Alan Fern, an AI and robotics expert at Oregon State University, pointed out that robot demonstrations "are always the best demonstrations they can show you." When seeing it perform kung fu, although it appears to be doing intelligent things, "it is just reacting to the environment, with no cognitive thought behind it."

Currently, over 100 people have participated in data collection work, but the company laid off dozens of data collectors after the semi-annual performance evaluation in September. Workers are scored based on task performance, and each shift requires collecting at least 4 hours of usable video footage