Feb 19, 2024 · A query-based attack that leverages API access to a remote language model to construct adversarial examples that cause the model to emit harmful strings.
Nov 5, 2024 · The paper presents a novel query-based attack method designed to generate adversarial examples that induce harmful outputs in aligned language ...
Dec 7, 2024 · In this paper, we design an optimization attack that directly constructs adversarial examples on a remote language model, without relying on ...
In this paper, we design an optimization attack that directly constructs adversarial examples on a remote language model, without relying on transferability.1 ...
This work improves on prior work with a query-based attack that leverages API access to a remote language model to construct adversarial examples that cause ...
Dec 9, 2024 · The researchers created a way to query-based adversarial prompt generation that tricks AI language models into saying harmful things. Instead of ...
People also search for
We create data defenses by developing a method to automatically generate adversarial prompt injections that, when added to input text, significantly reduce ...
Oct 28, 2024 · To address this challenge, we propose a novel Adversarial Contrastive Prompt Tuning (ACPT) method to robustly fine-tune the CLIP image encoder ...
Adversarial prompting is an important topic in prompt engineering as it could help to understand the risks and safety issues involved with LLMs.
Feb 19, 2024 · Recent work has shown it is possible to construct adversarial examples that cause an aligned language model to emit harmful strings or ...