HumanPlus
A Comprehensive Research Summary by Stanford University
Company Background
HumanPlus is a research project developed by a team at Stanford University, primarily led by Zipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wetzstein, and Chelsea Finn. The project operates within the academic research framework of Stanford University, with support from The AI Institute and ONR grant N00014-21-1-2685. Inspire-Robots and Unitree Robotics have provided extensive support on hardware and low-level firmware for the project.
Robot History and Development Timeline
The HumanPlus project, recognized as a Best Paper Award Finalist at CoRL 2024, addresses the challenge of enabling humanoid robots to learn from human data. It leverages extensive human motion and skill data to train robots, bypassing data scarcity. Key 2024 developments include a full-stack system for learning motion and autonomous skills, demonstrated on a customized 33-DoF 180cm humanoid.
Key Technical Specifications
The HumanPlus robot is a customized 180cm tall humanoid with 33 degrees of freedom (DoF). It is equipped with two egocentric RGB cameras mounted on its head to provide binocular vision and two 6-DoF dexterous hands. The system utilizes a low-level policy trained in simulation via reinforcement learning, using existing 40-hour human motion datasets (AMASS). This pose-conditioned low-level policy is designed for whole-body control and transfers to the real world zero-shot.
AI/Software Stack
The HumanPlus AI/software stack is a full-stack system for learning motion and autonomous skills from human data. It comprises: the Humanoid Shadowing Transformer (HST), a task-agnostic low-level policy trained on retargeted human poses from the AMASS dataset, enabling real-time human motion following via a single RGB camera; the Humanoid Imitation Transformer (HIT), a vision-based skill policy trained through supervised behavior cloning, which uses a transformer-based architecture blending action prediction and forward dynamics prediction with egocentric binocular RGB vision to predict humanoid poses; and Human Body and Hand Pose Estimation Algorithms that estimate and retarget real-time human motion for the low-level policy.
Real-World Deployments or Pilots
The HumanPlus system, demonstrated on its 33-DoF 180cm humanoid, autonomously performs tasks like wearing shoes, walking, object manipulation, typing, and greeting other robots with 60-100% success rates, using up to 40 demonstrations. Human operators can teleoperate the humanoids to collect whole-body data for diverse real-world tasks, including boxing, playing piano, table tennis, and object handling.
Pricing (if known): As HumanPlus is a research project from Stanford University, there is no publicly available pricing information for the robot itself. The project is supported by grants and fellowships, indicating its academic, non-commercial nature.
Notable Achievements
Criticisms or Limitations
Despite its advancements, HumanPlus faces inherent challenges in humanoid robotics, such as complexities in perception and control, morphological and actuation gaps between humanoids and humans, and the absence of a fully accessible data pipeline for whole-body teleoperation. The system currently relies on human operators for data collection via teleoperation, highlighting a need for more autonomous data generation. The paper also points out that similar commercial humanoid systems often lack public details and have limited autonomous demonstrations, indicating a broader industry challenge.
Future Roadmap
The future roadmap for HumanPlus focuses on further leveraging human data to train humanoids for more complex and diverse autonomous skills. The emphasis on imitation learning from human demonstrations points towards achieving more generalized robot intelligence. Future research will likely enhance the robustness of low-level policies, improve vision-based skill policies, and reduce reliance on direct human teleoperation for data collection, moving towards autonomous data acquisition. Ongoing work on HST and HIT underscores continuous advancements in the AI/software stack.