Universal and Transferable Adversarial Attacks on Aligned Language Models Andy Zou1, Zifan Wang2, Nicholas Carlini3, Milad Nasr3, J. Zico Kolter1,4, Matt Fredrikson1 1Carnegie Mellon University, 2Center for AI Safety, 3 Google DeepMind, 4Bosch Center for AI Overview of Research : Large language models (LLMs) like ChatGPT, Bard, or Claude undergo extensive fine-tuning to not produce harmful content