HumanPlus

A Comprehensive Research Summary by Stanford University

 

Company Background

HumanPlus is a research project developed by a team at Stanford University, primarily led by Zipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wetzstein, and Chelsea Finn. The project operates within the academic research framework of Stanford University, with support from The AI Institute and ONR grant N00014-21-1-2685. Inspire-Robots and Unitree Robotics have provided extensive support on hardware and low-level firmware for the project.

Robot History and Development Timeline

The HumanPlus project, recognized as a Best Paper Award Finalist at CoRL 2024, addresses the challenge of enabling humanoid robots to learn from human data. It leverages extensive human motion and skill data to train robots, bypassing data scarcity. Key 2024 developments include a full-stack system for learning motion and autonomous skills, demonstrated on a customized 33-DoF 180cm humanoid.

Key Technical Specifications

The HumanPlus robot is a customized 180cm tall humanoid with 33 degrees of freedom (DoF). It is equipped with two egocentric RGB cameras mounted on its head to provide binocular vision and two 6-DoF dexterous hands. The system utilizes a low-level policy trained in simulation via reinforcement learning, using existing 40-hour human motion datasets (AMASS). This pose-conditioned low-level policy is designed for whole-body control and transfers to the real world zero-shot.

AI/Software Stack

The HumanPlus AI/software stack is a full-stack system for learning motion and autonomous skills from human data. It comprises: the Humanoid Shadowing Transformer (HST), a task-agnostic low-level policy trained on retargeted human poses from the AMASS dataset, enabling real-time human motion following via a single RGB camera; the Humanoid Imitation Transformer (HIT), a vision-based skill policy trained through supervised behavior cloning, which uses a transformer-based architecture blending action prediction and forward dynamics prediction with egocentric binocular RGB vision to predict humanoid poses; and Human Body and Hand Pose Estimation Algorithms that estimate and retarget real-time human motion for the low-level policy.

Real-World Deployments or Pilots

The HumanPlus system, demonstrated on its 33-DoF 180cm humanoid, autonomously performs tasks like wearing shoes, walking, object manipulation, typing, and greeting other robots with 60-100% success rates, using up to 40 demonstrations. Human operators can teleoperate the humanoids to collect whole-body data for diverse real-world tasks, including boxing, playing piano, table tennis, and object handling.
Pricing (if known): As HumanPlus is a research project from Stanford University, there is no publicly available pricing information for the robot itself. The project is supported by grants and fellowships, indicating its academic, non-commercial nature.

Notable Achievements

  • Best Paper Award Finalist (top 6) at CoRL 2024: This highlights the significant impact and novelty of the HumanPlus research within the robotics community.
  • Efficient Data Collection: The shadowing mechanism provides an efficient data-collection pipeline for diverse real-world tasks, bypassing the sim-to-real gap in RGB perception.
  • High Success Rates in Complex Tasks: The robot has demonstrated high success rates (60-100%) in autonomously performing complex tasks such as putting on shoes, walking, object manipulation, and typing, based on a limited number of human demonstrations.
  • Full-Stack System for Learning from Human Data: HumanPlus introduces a comprehensive system that addresses the challenges of humanoid perception, control, and data pipelines for learning autonomous skills from egocentric vision.

 

 

 

 

Criticisms or Limitations

Despite its advancements, HumanPlus faces inherent challenges in humanoid robotics, such as complexities in perception and control, morphological and actuation gaps between humanoids and humans, and the absence of a fully accessible data pipeline for whole-body teleoperation. The system currently relies on human operators for data collection via teleoperation, highlighting a need for more autonomous data generation. The paper also points out that similar commercial humanoid systems often lack public details and have limited autonomous demonstrations, indicating a broader industry challenge.

Future Roadmap

The future roadmap for HumanPlus focuses on further leveraging human data to train humanoids for more complex and diverse autonomous skills. The emphasis on imitation learning from human demonstrations points towards achieving more generalized robot intelligence. Future research will likely enhance the robustness of low-level policies, improve vision-based skill policies, and reduce reliance on direct human teleoperation for data collection, moving towards autonomous data acquisition. Ongoing work on HST and HIT underscores continuous advancements in the AI/software stack.