Embodied AI Safety

Hangtao Zhang¹
Chenyu Zhu¹
Xianlong Wang¹
Ziqi Zhou¹
Yichen Wang¹
Lulu Xue¹
Minghui Li¹
Shengshan Hu¹
Leo Yu Zhang²

¹Huazhong University of Science and Technology
²Griffith University

A Quick Glance

Figure 1. In this work, for the first time, we successfully jailbreak the LLM-based embodied AI in the physical world, enabling it to perform various actions that were previously restricted. We demonstrate the potential for embodied AI to engage in activities related to Physical Harm, Privacy Violations, Pornography, Fraud, Illegal Activities, Hateful Conduct, and Sabotage activatities.

Paper Overview

Embodied artificial intelligence (AI) represents an AI system integrated into physical entities, capable of perceiving and interacting with their environment through sensors and actuators. Large Language Models (LLMs) deeply explore language instructions, playing a crucial role in devising plans for complex tasks. Consequently, they have progressively shown immense potential in empowering embodied AI, with LLM-based embodied AI emerging as a focal point of research within the community. Over the next decade, embodied AI robots are expected to become commonplace in homes and industries. However, a critical safety issue that has long been hiding in plain sight is: could LLM-based embodied AI perpetrate harmful behaviors? Our research investigates, for the first time, how to induce threatening actions in embodied AI systems operating in the real world, confirming the severe risks posed by these soon-to-be-marketed robots, which starkly contravene Asimov's Three Laws of Robotics and threaten human safety. Specifically, we formulate the concept of embodied AI jailbreaking and expose three critical security vulnerabilities: first, jailbreaking robotics through compromised LLMs; second, safety misalignment between action and linguistic output spaces; and third, deceptive prompts leading embodied AI with imperfect world knowledge to perform unaware hazardous behaviors. Experiments on embodied AI systems using various advanced LLMs (e.g., Chat-GPT4, Chat-GPT4o, and Yi-vision) demonstrate the effectiveness of our embodied AI jailbreak attacks. We also analyze potential mitigation measures and advocate for community awareness regarding the safety of embodied AI applications in the physical world.

Figure 2. (Overview) LLM-based embodied AI face three risks in real-world applications: (a): inducing harmful behaviors by leveraging jailbroken LLMs; (b): safety misalignment between action and linguistic output spaces (i.e., verbally refuses response but still acts); (c): conceptual deception inducing unrecognized harmful behaviors.

Ethics and Disclosure

This research is devoted to examining the security and risk issues associated with applying LLMs and VLMs to embodied AI. Our ultimate goal is to enhance the safety and reliability of embodied AI systems, thereby making a positive contribution to society. This research includes examples that may be considered harmful, offensive, or otherwise inappropriate. These examples are included solely for research purposes to illustrate vulnerabilities and enhance the security of embodied AI systems. They do not reflect the personal views or beliefs of the authors. We are committed to principles of respect for all individuals and strongly oppose any form of crime or violence. Some sensitive details in the examples have been redacted to minimize potential harm. Furthermore, we have taken comprehensive measures to ensure the safety and well-being of all participants involved in this study. In this paper, We provide comprehensive documentation of our experimental results to enable other researchers to independently replicate and validate our findings using publicly available benchmarks. Our commitment is to enhance the security of language models and encourage all stakeholders to address the associated risks. Providers of LLMs may leverage our discoveries to implement new mitigation strategies that improve the security of their models and APIs, even though these strategies were not available during our experiments. We believe that in order to improve the safety of model deployment, it is worth accepting the increased difficulty in reproducibility.

Citation

If you find our project useful, please consider citing:

            
          
AخA
 

                      @misc{zhang2024threatsembodiedmultimodalllms,
                          title={The Threats of Embodied Multimodal LLMs: Jailbreaking  Robotic Manipulation in the Physical World},
                          author={Hangtao Zhang and Chenyu Zhu and Xianlong Wang and Ziqi Zhou and Yichen Wang and Lulu Xue and Minghui Li and Shengshan Hu and Leo Yu Zhang},
                          year={2024},
                          eprint={2407.20242},
                          archivePrefix={arXiv},
                          primaryClass={cs.CY}
                      } 
                      
                    

The Threats of Embodied Multimodal LLMs: Jailbreaking Robotic Manipulation in the Physical World

Warning: This paper contains potentially harmful AI-generated language and aggressive actions.

A Quick Glance

Paper Overview

Ethics and Disclosure

Citation