Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Daniel Paleka is qualified to endorse.

Refusal in Language Models Is Mediated by a Single Direction

Andy Arditi: Is registered as an author of this paper.
Not currently an endorser. (why?)
Daniel Paleka: Is registered as an author of this paper.
Can endorse for cs.AI, cs.CL, cs.CR, cs.CV, cs.CY, cs.LG, stat.ML. (why?)
Nina Panickssery: Is registered as an author of this paper.
Not currently an endorser. (why?)

Oscar Obeso, Aaquib Syed, Wes Gurnee and Neel Nanda are not registered as owners of this paper. (why?)