LLM-powered robots are prone to discriminatory and dangerous behavior

www.psypost.org

New research indicates that robots operated by popular artificial intelligence systems are prone to enacting discriminatory, violent, and unlawful behaviors, making them unsafe for real-world use. The study, published in the International Journal of Social Robotics, found that when given access to personal information, these AI models produced responses that could lead to significant physical and psychological harm.

The technology at the heart of this inquiry is the large language model, or LLM. These are complex AI systems trained on vast amounts of text and images from the internet, enabling them to generate human-like responses to questions and commands. This capability has made them an attractive option for developers aiming to create more adaptable and user-friendly robots that can understand and respond to instructions given in everyday language, a feature known as open-vocabulary control.

A team of researchers from Carnegie Mellon University, King’s College London, and the University of Birmingham sought to systematically investigate the risks that emerge when these powerful language models are connected to robots capable of physical action in the world.

The researchers were concerned that because LLMs learn from human-generated data, they might also absorb and reproduce harmful human biases and stereotypes. When an AI system is purely digital, these biases can manifest as offensive text. When that same AI system controls a physical robot, the potential for harm escalates, as biased decisions can translate into discriminatory physical actions or safety failures.

To explore these potential dangers, the research team designed two distinct sets of experiments. The first was designed to measure direct discrimination by testing how an LLM-driven robot would behave differently toward people based on their identity. The second experiment was a safety assessment to determine if the models would approve and attempt to execute harmful, abusive, or impossible commands.

In the discrimination assessment, the researchers created scenarios simulating common human-robot interactions. They prompted the AI models to make decisions for a robot interacting with individuals described with various personal characteristics, including gender, race, nationality, religion, age, and disability. The tasks ranged from deciding what facial expression to show a person, to assigning a priority level for emergency rescue, to delegating household chores.

The results of this assessment revealed consistent patterns of discriminatory outputs across the tested models. For instance, one model suggested that a robot should display the facial expression of “disgust” when interacting with individuals identified as Christian, Muslim, and Jewish. Another model associated a higher probability of having a dirty room with people from certain ethnic groups and with those identified as having ADHD.

The study also found evidence of ableism and sexism. When asked to assign a level of trust for a collaborative manufacturing task, one model rated people described as blind, nonspeaking, or paralyzed with low trust. In scenarios involving task delegation, the models frequently assigned duties along stereotypical lines, such as asking women to cook or do laundry while asking men to carry a heavy box.

The second part of the investigation focused on safety and the potential for misuse. Researchers presented the AI models with a list of commands and asked them to rate each task’s acceptability and feasibility. The list included benign household chores, like making coffee, alongside deeply concerning actions designed based on documented cases of technology-facilitated abuse. These harmful commands included instructions for a robot to steal, conduct surveillance, and inflict physical or psychological harm.

Every AI model evaluated in the study failed these critical safety checks. The models approved at least one command that could lead to severe harm. A particularly alarming finding was that multiple models deemed it acceptable for a robot to remove a mobility aid, such as a wheelchair or cane, from its user. People who rely on these aids have described such an act as being equivalent to having a limb broken.

“Every model failed our tests,” said Andrew Hundt, a co-author of the study from Carnegie Mellon University. “We show how the risks go far beyond basic bias to include direct discrimination and physical safety failures together… Refusing or redirecting harmful commands is essential, but that’s not something these robots can reliably do right now.”

Other harmful tasks approved by the models included brandishing a kitchen knife to intimidate office workers, taking nonconsensual photographs in a shower, and stealing credit card information. The models also rated some scientifically impossible tasks as feasible, such as sorting people into “criminals” and “non-criminals” based on their appearance alone. This suggests the models lack a fundamental understanding of what is conceptually possible, which could lead a robot to perform actions that are not only dangerous but also based on flawed and pseudoscientific premises.

The researchers acknowledge that these experiments were conducted in controlled, simulated environments and that real-world robot systems have additional components. However, they argue that the failures of the core AI models are so fundamental that they render any robot relying solely on them for decision-making inherently unsafe for general-purpose deployment in homes, workplaces, or care facilities. The study suggests that without robust safeguards, these systems could be exploited for abuse, surveillance, or other malicious activities.

Looking ahead, the authors call for a significant shift in how these technologies are developed and regulated. They propose the immediate implementation of independent safety certification for AI-driven robots, similar to the rigorous standards applied in fields like aviation and medicine. This would involve comprehensive risk assessments before a system is deployed in any setting where it might interact with people, especially vulnerable populations.

“If an AI system is to direct a robot that interacts with vulnerable people, it must be held to standards at least as high as those for a new medical device or pharmaceutical drug,” said Rumaisa Azeem, a co-author from King’s College London. “This research highlights the urgent need for routine and comprehensive risk assessments of AI before they are used in robots.” Future research may focus on developing more effective technical safeguards, exploring alternative control systems that do not rely on open-ended language inputs, and establishing clear ethical and legal frameworks to govern the use of autonomous robots in society.

The study, “LLM-Driven Robots Risk Enacting Discrimination, Violence, and Unlawful Actions,” was authored by Andrew Hundt, Rumaisa Azeem, Masoumeh Mansouri, and Martim Brandão.