• Hangtao Zhang1
  • Chenyu Zhu1
  • Xianlong Wang1
  • Ziqi Zhou1
  • Yichen Wang1
  • Lulu Xue1
  • Minghui Li1
  • Shengshan Hu1
  • Leo Yu Zhang2
  • 1Huazhong University of Science and Technology
  • 2Griffith University

Figure 1. In this work, for the first time, we successfully jailbreak the LLM-based embodied AI in the physical world, enabling it to perform various actions that were previously restricted. We demonstrate the potential for embodied AI to engage in activities related to Physical Harm, Privacy Violations, Pornography, Fraud, Illegal Activities, Hateful Conduct, and Sabotage activatities.

Embodied artificial intelligence (AI) represents an AI system integrated into physical entities, capable of perceiving and interacting with their environment through sensors and actuators. Large Language Models (LLMs) deeply explore language instructions, playing a crucial role in devising plans for complex tasks. Consequently, they have progressively shown immense potential in empowering embodied AI, with LLM-based embodied AI emerging as a focal point of research within the community. Over the next decade, embodied AI robots are expected to become commonplace in homes and industries. However, a critical safety issue that has long been hiding in plain sight is: could LLM-based embodied AI perpetrate harmful behaviors? Our research investigates, for the first time, how to induce threatening actions in embodied AI systems operating in the real world, confirming the severe risks posed by these soon-to-be-marketed robots, which starkly contravene Asimov's Three Laws of Robotics and threaten human safety. Specifically, we formulate the concept of embodied AI jailbreaking and expose three critical security vulnerabilities: first, jailbreaking robotics through compromised LLMs; second, safety misalignment between action and linguistic output spaces; and third, deceptive prompts leading embodied AI with imperfect world knowledge to perform unaware hazardous behaviors. Experiments on embodied AI systems using various advanced LLMs (e.g., Chat-GPT4, Chat-GPT4o, and Yi-vision) demonstrate the effectiveness of our embodied AI jailbreak attacks. We also analyze potential mitigation measures and advocate for community awareness regarding the safety of embodied AI applications in the physical world.

Figure 2. (Overview) LLM-based embodied AI face three risks in real-world applications: (a): inducing harmful behaviors by leveraging jailbroken LLMs; (b): safety misalignment between action and linguistic output spaces (i.e., verbally refuses response but still acts); (c): conceptual deception inducing unrecognized harmful behaviors.

  • This research is devoted to examining the security and risk issues associated with applying LLMs and VLMs to embodied AI. Our ultimate goal is to enhance the safety and reliability of embodied AI systems, thereby making a positive contribution to society. This research includes examples that may be considered harmful, offensive, or otherwise inappropriate. These examples are included solely for research purposes to illustrate vulnerabilities and enhance the security of embodied AI systems. They do not reflect the personal views or beliefs of the authors. We are committed to principles of respect for all individuals and strongly oppose any form of crime or violence. Some sensitive details in the examples have been redacted to minimize potential harm. Furthermore, we have taken comprehensive measures to ensure the safety and well-being of all participants involved in this study. In this paper, We provide comprehensive documentation of our experimental results to enable other researchers to independently replicate and validate our findings using publicly available benchmarks. Our commitment is to enhance the security of language models and encourage all stakeholders to address the associated risks. Providers of LLMs may leverage our discoveries to implement new mitigation strategies that improve the security of their models and APIs, even though these strategies were not available during our experiments. We believe that in order to improve the safety of model deployment, it is worth accepting the increased difficulty in reproducibility.


