Fraunhofer IPA offers new test benchmark for humanoids

Fraunhofer IPA offers new test benchmark for humanoids

Fraunhofer IPA has developed a benchmark to test humanoid robots such as the Unitree G1.

Fraunhofer IPA has developed a benchmark to test humanoid robots such as the Unitree G1. Source: Fraunhofer IPA

In recent years, humanoid robots have been omnipresent in the media and continue to fascinate us. On social media as well as in public spaces, they are guaranteed to attract attention. The technology is poised to take over tasks in areas where, due to demographic change, human labor will no longer be available in the future.

Yet there remains a significant information gap between media hype and actual abilities. While the robots are being tested in isolated, non-public pilot applications, questions persist for most potential users regarding what abilities the robots actually possess, how reliably they operate, and to what extent we humans can trust humanoids.

“For end users and manufacturers alike, it is therefore essential to look behind the facade sometimes constructed by marketing agencies,” explained Simon Schmidt, senior manager of the automated systems business unit at Fraunhofer. “The market is too volatile and opaque to allow for a well-founded assessment and reliable evaluation of humanoids for one’s own applications or in comparison to other models.”

This is precisely why Fraunhofer IPA has developed a benchmark. In this process, research teams from the institute’s automation division put humanoids through various challenges and evaluate the results. The neutral, third-party service is modular, allowing manufacturers, end users, or software providers to select the areas relevant to their application.

Benchmark follows industrial standards

The question of abilities and trustworthiness is broken down into six application-relevant criteria:

Fraunhofer IPA developed humanoid benchmarks across six test categories.

Six test categories for humanoid robots. Source: Fraunhofer IPA

The benchmark draws on the research teams’ existing expertise and, where possible, follows established industrial standards that have been internationally recognized for decades. In the area of cleanliness, for example, Fraunhofer IPA has tested and qualified over 3,000 automation components according to ISO 14644 in recent years.

The measurement of collision forces and other safety-relevant properties is based on common safety standards for force- and power-limited robots such as ISO 10218 and ISO TS 15066.

The benchmark is based on master data such as the robots’ size, weight, and reach. It was also important to the research teams to define reproducible tests that can be standardized and remain meaningful for future generations of humanoids. After all, more humanoids will follow.

“With this tool, humanoids can be compared not only with one another but also with known automation components,” said Werner Kraus, head of the research division automation and robotics at Fraunhofer IPA. “Users can interpret the results directly and thus find the right humanoid for the right application.”

Using the Unitree G1 as an example, Fraunhofer IPA itself put the humanoid through its paces with regard to its suitability for use in production and, in doing so, learned a great deal about the current limitations of the technology. The technical basis was the Unitree G1 EDU-4 with Dex3-1 3-finger hands delivered in May 2025 with firmware Version 1.04.

Editor’s note: The 2026 Robotics Summit & Expo this month in Boston will include a keynote and sessions on humanoid robots. Register now to attend.



Six criteria for evaluating humanoid robots

1. Technology and basic abilities

The technologies used in humanoid robots, such as sensors or AI models, allow direct conclusions to be drawn about precision and reliability. Examining additional basic abilities enables detailed assessments of the humanoid’s technological potential and abilities.

Among other things, the evaluation examines sensor technology (such as vision, audio, text recognition, speech recognition, human detection), manipulation abilities (gripper type, number and mobility of fingers), as well as strength (handled loads, gripping forces), and walking speed.

An examination of the technologies is conducted by identifying the installed components and comparing them with data sheets. Tests enable the determination of basic abilities. A 3D tracking system from Vicon is used to determine walking speed. Gripping forces are measured using a force sensor. Dumbbells of varying weights are available to determine the maximum load that can be handled.

Using the G1 as an example, it became apparent, among other things, that its dexterity is still far below human levels. As delivered by the manufacturer, it can only walk via remote control. Users must implement additional basic abilities themselves. Extending the arms horizontally even without any additional load can cause them to shut down and drop after one to two minutes because the actuators overheat.

Measurements taken with the Vicon tracker determined that 0.49 m/s (1 mph) is the slow walking speed and 0.84 m/s (1.8 mph) is the fast walking speed. When carrying a 3-kg (6.6 lb.) payload, the robot does not slow down, but it takes a few tenths of a second longer to accelerate and decelerate.

The results show that the humanoid cannot yet perform many tasks that humans can handle. However, future software and hardware updates are expected to increase the number of pre-implemented basic abilities.

2. Complex abilities

Building on the basic abilities, this section focuses on performing small generic tasks that require a combination of technologies and skills. The benchmark enables a comprehensive evaluation of humanoids across various task domains.

The tests can be categorized into whole-body movements (running, jumping, climbing, navigating ramps, standing up), manipulative skills (opening doors), and navigation through obstacle courses, as well as precision and force control, The influence of changing environmental conditions and additional loads on robot performance can also be measured.

Many of the designed tests could not yet be performed with the G1. For example, according to the manufacturer’s specifications, the robot is not suitable for climbing stairs. Complex obstacle courses requiring the robot to navigate are also not feasible with its onboard abilities.

However, when walking over steps (cable duct) and on a slope with 20% inclination, the legged robot demonstrated good self-stabilization abilities. It never lost its balance during the tests. For the G1, standing up from a supine position requires surfaces with sufficient friction, such as carpet. On smooth tile or hardwood floors, the arms may slip, resulting in a failure to push the robot in an upright position.

The tests for complex abilities are intentionally designed to overwhelm current humanoids. Only future models will be able to fully meet these benchmarks. This enables comparability of humanoids across multiple model generations. Furthermore, the tasks that cannot currently be performed clearly demonstrate the limits of current technology to potential users.

3. Cleanliness

The cleanroom suitability benchmark examines whether humanoids and other automation components can be used in sensitive production environments such as the semiconductor, optical, electrical, pharmaceutical, biotechnology, and food industries without causing critical contamination.

Particle emission is evaluated at various points on the moving robot in accordance with ISO 14644-14, outgassing behavior according to ISO 14644-15, and cleanability and hygienic design according to current guidelines. The goal is to objectively determine the robot’s suitability for cleanroom use and, if necessary, identify areas for optimization in design or material selection.

Conducting the benchmark is challenging due to the complexity of humanoids and their diverse range of motion. Typical operating parameters must be defined and worst-case scenarios examined to obtain realistic results. The limited battery life must be considered when determining the test duration.

Upon completion of the benchmark, customers receive a qualification package with a report and certificate, which serves as a reliable basis for decision-making regarding use in cleanrooms and enables a validated, industry standard evaluation.

In tests regarding particle emissions and outgassing, the G1 showed promising results. Accordingly, the humanoid is expected to be suitable for use in ISO Class 5 cleanrooms per ISO 14644-1, such as those used in the semiconductor industry.

Tests on hygienic design revealed that the cleanability of the device was evidently not a priority during the development of the G1. There are gaps and areas at the numerous joints that are inaccessible for wipe cleaning. Consequently, use in areas with high hygiene requirements is not possible without further measures.

Given that cleanroom suitability and hygienic design are not currently a priority for manufacturers, the test results should be viewed positively. Should manufacturers develop models in the future that are specifically optimized for cleanliness, the tests developed here provide a valuable foundation. Manufacturers can commission the extensive tests at Fraunhofer IPA and thus benefit from the existing cleanroom infrastructure.

Chart of particle emissions tests of the Unitree G1, which indicate that use in ISO 5 cleanrooms is possible.

Particle emissions tests of the Unitree G1 indicate that use in ISO 5 cleanrooms is possible. Source: Fraunhofer IPA

4. Functional safety

Humanoid robots are intended to collaborate with humans and share common workspaces. Therefore, functional safety is of great importance. The factors influencing safety in humanoids are diverse. These include, above all, stability when moving on different surfaces, behavior on slopes, and when lifting loads.

Added to this are the detection and avoidance of obstacles and force limitation in the event of collisions. Also relevant are system behavior during communication failures and complete battery discharge, risks arising from hardware design, and the accessibility of emergency stop devices.

Various challenges were developed to assess stability. For example, a humanoid must navigate a course featuring different surfaces such as steps or cable ducts from various approach directions.

To test stability on slopes, the robot walks up and down ramps and performs stops and changes in direction. The experiments can be recorded using the 3D tracking system.

Force sensors were used for collision tests; these are also employed for force measurements in accordance with ISO TS 15066 for collaborative robots. These sensors are used to investigate both collision movements of the arms and collisions with a moving robot (such as the chest or shoulder).

It was found, for example, that the G1 can exert forces exceeding 500 N on humans during full-body collisions and during rapid arm movements. This significantly exceeds the pain thresholds permitted by the standard for operation near humans, meaning that injuries cannot be ruled out.

Conversely, however, the robot is very stable and can hardly be thrown off balance even on difficult surfaces.

The tests also revealed that the absence of an emergency stop button on the robot can lead to problems if it becomes unstable during experiments and needs to be stopped quickly. In this case, the only way to cut off the power supply to the G1 is to remove the battery. Potential for improvement was also identified regarding pinch points in the area of the joints.

The test results show that it is important to provide manufacturers and users with an objective, evidence-based foundation for evaluating and improving safety, as existing standards for human-robot collaboration are only of limited applicability.

The ISO is not expected to publish the ISO 25785-1 standard, which includes safety requirements for humanoids, until 2028. Until then, applications involving humanoids must be made safe on a case-by-case basis using a series of complementary measures.

Table showing how the G1 humanoid can exert forces exceeding 500 N on humans during full-body collisions and during rapid arm movements.

The G1 can exert forces exceeding 500 N on humans during full-body collisions and during rapid arm movements. Source: Fraunhofer IPA

5. Cybersecurity

As complex hardware and software products, humanoid robots regularly receive updates and thus benefit from an internet connection. The benchmark responds to rising legal requirements and the growing importance of cybersecurity in robotics. It helps identify vulnerabilities and open points of attack at an early stage, which arise in particular from neglected security architectures.

The results provide a quick overview of the robot’s resilience against typical attack patterns and offer manufacturers guidance for improvements. In this way, the benchmark helps to sustainably strengthen the acceptance and security of humanoid robots in the market.

The Cybersecurity Benchmark consists of four independently usable modules: vulnerability analysis, secure lifecycle, connectivity, and stability. Each module examines specific aspects such as:

  • Management and existence of vulnerabilities
  • Handling of updates and end of life for hardware and software
  • Security of network interfaces
  • The system’s resilience and resistance to penetration, or how secure is it against unauthorized access?

The goal is to provide manufacturers with an objective, scientifically sound analysis of their cybersecurity measures and to demonstrate whether the secure operation of their products is guaranteed.

At the time of the investigation, the G1 had a documented remote code execution vulnerability within its Bluetooth interface. This allowed attackers within Bluetooth range to gain extensive control over the robot.

The vulnerability persisted across multiple firmware updates, even after an update changelog had noted its resolution. Based on current information, it has since been patched by a more recent update.

Additionally, the G1 provides no information regarding end-of-life expectations for its hardware and software, regular update intervals, or warnings for known cybersecurity vulnerabilities. Unitree has not yet described the continuous transmission of sensor data to the manufacturer’s servers (without the option to disable this in the administration interface). During load tests, the G1 performed very stably.

6. Energy efficiency

Battery life is a key design parameter for the operation of humanoid robots. This benchmark addresses criteria such as battery capacity, charging times, and the range of the charge depending on the operating mode. It offers robot manufacturers a standardized, reproducible method for measuring and evaluating the energy consumption of humanoids under defined, real-world operating conditions.

The benchmark is primarily aimed at robot manufacturers and provides independent energy metrics that serve as a reference for external communication and competitive positioning. It also helps end users better plan the operating time and charging cycles of their robots and specifically optimize energy efficiency.

Energy requirements and average power consumption are measured in various scenarios such as standing, walking on flat ground, walking uphill, and walking under load. Also, power-on and power-off procedures as well as battery charging behavior are taken into account.

The results are consolidated into a meaningful metric that describes the energy consumption per hour and the battery runtime under standardized conditions, enabling an objective comparison of different robot models. Distinctions can be made between various usage scenarios, such as stationary work at a workstation or predominantly walking in logistics tasks.

When applied to the Unitree G1, the measurements show an average power consumption of approximately 154 W in stationary operation. When walking on a flat surface, power consumption is approximately 272 W, while it rises to around 283 W when walking on a 10 percent incline. For the defined one-hour standard scenario, this results in an average power consumption of approximately 239 W.

The results demonstrate the expected differences between static operation and movement and confirm that the benchmark consistently reflects these effects. This results in maximum operating times of 2 hours and 49 minutes on a single battery charge while standing still, and 1 hour and 49 minutes in a typical scenario that includes both standing and walking.

Users and manufacturers can use the benchmark to systematically incorporate energy metrics into the selection and deployment planning of humanoid robots. Against this backdrop, Fraunhofer IPA expects that the energy requirements of humanoids will in the future be more strongly oriented toward size, system architecture, and the respective application profiles, and will be differentiated accordingly.

Charts of power consumption and battery life of the Unitree G1, according to Fraunhofer IPA.

Power consumption and battery life of the Unitree G1. Source: Fraunhofer IPA

Fraunhofer IPA is ready to test future humanoid robots

Many people expect humanoids to demonstrate their added value particularly where flexibility is required and where there is a need to operate in environments designed for humans. In this respect, there are diverse potential application scenarios. These often require very different characteristics and abilities from humanoids.

The developed test methods and initial results show that an independent evaluation of characteristics and abilities is both useful and necessary. While benchmarks for technologies and skills focus on transparency and comparability, independent testing can help build trust in humanoid robots when it comes to issues such as functional safety and cybersecurity.

Fraunhofer IPA’s goal is to test additional humanoids and thus create a comparative database. Manufacturers and users can contact us regarding everything from initial general inquiries and interest in individual benchmark modules to requests for comprehensive testing.

About the authors

Simon Schmidt has been a senior manager of the Automated Systems business unit at Fraunhofer IPA since 2024. His key topics are cognitive robotics and embodied AI in intralogistics, manufacturing, and for assembly systems.

Dr. Werner Kraus has been head of the Automation and Robotics research unit at Fraunhofer IPA since 2019. He is managing plenty of national and international projects and is an expert in cognitive robotics.

Dr. Theo Jacobs has been leader of the  Safe Robot Applications and Cobots research team at Fraunhofer IPA since 2023. He is also a certified machinery safety expert and engaged in ISO activities around safe humanoid robots.

The post Fraunhofer IPA offers new test benchmark for humanoids appeared first on The Robot Report.