Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–4 of 4 results for author: Yona, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.02373  [pdf, other

    cs.AI

    Operationalizing Contextual Integrity in Privacy-Conscious Assistants

    Authors: Sahra Ghalebikesabi, Eugene Bagdasaryan, Ren Yi, Itay Yona, Ilia Shumailov, Aneesh Pappu, Chongyang Shi, Laura Weidinger, Robert Stanforth, Leonard Berrada, Pushmeet Kohli, Po-Sen Huang, Borja Balle

    Abstract: Advanced AI assistants combine frontier LLMs and tool access to autonomously perform complex tasks on behalf of users. While the helpfulness of such assistants can increase dramatically with access to user information including emails and documents, this raises privacy concerns about assistants sharing inappropriate information with third parties without user supervision. To steer information-shar… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  2. arXiv:2407.00106  [pdf, other

    cs.LG cs.AI cs.CL cs.CR

    UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI

    Authors: Ilia Shumailov, Jamie Hayes, Eleni Triantafillou, Guillermo Ortiz-Jimenez, Nicolas Papernot, Matthew Jagielski, Itay Yona, Heidi Howard, Eugene Bagdasaryan

    Abstract: Exact unlearning was first introduced as a privacy mechanism that allowed a user to retract their data from machine learning models on request. Shortly after, inexact schemes were proposed to mitigate the impractical costs associated with exact unlearning. More recently unlearning is often discussed as an approach for removal of impermissible knowledge i.e. knowledge that the model should not poss… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  3. arXiv:2403.06634  [pdf, other

    cs.CR

    Stealing Part of a Production Language Model

    Authors: Nicholas Carlini, Daniel Paleka, Krishnamurthy Dj Dvijotham, Thomas Steinke, Jonathan Hayase, A. Feder Cooper, Katherine Lee, Matthew Jagielski, Milad Nasr, Arthur Conmy, Itay Yona, Eric Wallace, David Rolnick, Florian Tramèr

    Abstract: We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \… ▽ More

    Submitted 9 July, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  4. arXiv:2402.05526  [pdf, other

    cs.CR cs.LG

    Buffer Overflow in Mixture of Experts

    Authors: Jamie Hayes, Ilia Shumailov, Itay Yona

    Abstract: Mixture of Experts (MoE) has become a key ingredient for scaling large foundation models while keeping inference costs steady. We show that expert routing strategies that have cross-batch dependencies are vulnerable to attacks. Malicious queries can be sent to a model and can affect a model's output on other benign queries if they are grouped in the same batch. We demonstrate this via a proof-of-c… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.