default search action
David Krueger 0001
Person information
- affiliation: University of Cambridge, UK
- affiliation (former): University of Montréal, MILA, Canada
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j2]Shoaib Ahmed Siddiqui, David Krueger, Yann LeCun, Stéphane Deny:
Blockwise Self-Supervised Learning at Scale. Trans. Mach. Learn. Res. 2024 (2024) - [c23]Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, Markus Anderljung:
Visibility into AI Agents. FAccT 2024: 958-973 - [c22]Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas A. Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell:
Black-Box Access is Insufficient for Rigorous AI Audits. FAccT 2024: 2254-2272 - [c21]Thomas Coste, Usman Anwar, Robert Kirk, David Krueger:
Reward Model Ensembles Help Mitigate Overoptimization. ICLR 2024 - [c20]Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Tim Rocktäschel, Edward Grefenstette, David Scott Krueger:
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks. ICLR 2024 - [c19]Dmitrii Krasheninnikov, Egor Krasheninnikov, Bruno Kacper Mlodozeniec, Tegan Maharaj, David Krueger:
Implicit meta-learning may lead language models to trust more reliable sources. ICML 2024 - [i48]Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, Markus Anderljung:
Visibility into AI Agents. CoRR abs/2401.13138 (2024) - [i47]Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Alexander Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell:
Black-Box Access is Insufficient for Rigorous AI Audits. CoRR abs/2401.14446 (2024) - [i46]James Urquhart Allingham, Bruno Kacper Mlodozeniec, Shreyas Padhy, Javier Antorán, David Krueger, Richard E. Turner, Eric T. Nalisnick, José Miguel Hernández-Lobato:
A Generative Model of Symmetry Transformations. CoRR abs/2403.01946 (2024) - [i45]Joshua Clymer, Nick Gabrieli, David Krueger, Thomas Larsen:
Safety Cases: How to Justify the Safety of Advanced AI Systems. CoRR abs/2403.10462 (2024) - [i44]Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, José Hernández-Orallo, Lewis Hammond, Eric J. Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Yoshua Bengio, Danqi Chen, Samuel Albanie, Tegan Maharaj, Jakob N. Foerster, Florian Tramèr, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger:
Foundational Challenges in Assuring Alignment and Safety of Large Language Models. CoRR abs/2404.09932 (2024) - [i43]Ryan Greenblatt, Fabien Roger, Dmitrii Krasheninnikov, David Krueger:
Stress-Testing Capability Elicitation With Password-Locked Models. CoRR abs/2405.19550 (2024) - [i42]Alan Chan, Noam Kolt, Peter Wills, Usman Anwar, Christian Schröder de Witt, Nitarshan Rajkumar, Lewis Hammond, David Krueger, Lennart Heim, Markus Anderljung:
IDs for AI Systems. CoRR abs/2406.12137 (2024) - [i41]Akash R. Wasil, Joshua Clymer, David Krueger, Emily Dardaman, Simeon Campos, Evan R. Murphy:
Affirmative safety: An approach to risk management for high-risk AI. CoRR abs/2406.15371 (2024) - [i40]Lukas Fluri, Leon Lang, Alessandro Abate, Patrick Forré, David Krueger, Joar Skalse:
The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret. CoRR abs/2406.15753 (2024) - [i39]Shoaib Ahmed Siddiqui, Xin Dong, Greg Heinrich, Thomas M. Breuel, Jan Kautz, David Krueger, Pavlo Molchanov:
A deeper look at depth pruning of LLMs. CoRR abs/2407.16286 (2024) - [i38]Neel Alex, Shoaib Ahmed Siddiqui, Amartya Sanyal, David Krueger:
Protecting against simultaneous data poisoning attacks. CoRR abs/2408.13221 (2024) - [i37]Jakub Vrábel, Ori Shem-Ur, Yaron Oz, David Krueger:
Input Space Mode Connectivity in Deep Neural Networks. CoRR abs/2409.05800 (2024) - [i36]Shoaib Ahmed Siddiqui, Radhika Gaonkar, Boris Köpf, David Krueger, Andrew Paverd, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Menglin Xia, Santiago Zanella Béguelin:
Permissive Information-Flow Analysis for Large Language Models. CoRR abs/2410.03055 (2024) - 2023
- [j1]Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Tong Wang, Samuel Marks, Charbel-Raphaël Ségerie, Micah Carroll, Andi Peng, Phillip J. K. Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Biyik, Anca D. Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell:
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback. Trans. Mach. Learn. Res. 2023 (2023) - [c18]Micah Carroll, Alan Chan, Henry Ashton, David Krueger:
Characterizing Manipulation from AI Systems. EAAMO 2023: 6:1-6:13 - [c17]Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov, Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, Michelle Lin, Alex Mayhew, Katherine M. Collins, Maryam Molamohammadi, John Burden, Wanru Zhao, Shalaleh Rismani, Konstantinos Voudouris, Umang Bhatt, Adrian Weller, David Krueger, Tegan Maharaj:
Harms from Increasingly Agentic Algorithmic Systems. FAccT 2023: 651-666 - [c16]Ethan Caballero, Kshitij Gupta, Irina Rish, David Krueger:
Broken Neural Scaling Laws. ICLR 2023 - [c15]Shoaib Ahmed Siddiqui, Nitarshan Rajkumar, Tegan Maharaj, David Krueger, Sara Hooker:
Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics. ICLR 2023 - [c14]Ekdeep Singh Lubana, Eric J. Bigelow, Robert P. Dick, David Scott Krueger, Hidenori Tanaka:
Mechanistic Mode Connectivity. ICML 2023: 22965-23004 - [c13]Stephen Chung, Ivan Anokhin, David Krueger:
Thinker: Learning to Plan and Act. NeurIPS 2023 - [c12]Cindy Wu, Ekdeep Singh Lubana, Bruno Kacper Mlodozeniec, Robert Kirk, David Krueger:
What Mechanisms Does Knowledge Distillation Distill? UniReps 2023: 60-75 - [i35]Lev McKinney, Yawen Duan, David Krueger, Adam Gleave:
On The Fragility of Learned Reward Functions. CoRR abs/2301.03652 (2023) - [i34]Shoaib Ahmed Siddiqui, David Krueger, Yann LeCun, Stéphane Deny:
Blockwise Self-Supervised Learning at Scale. CoRR abs/2302.01647 (2023) - [i33]Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov, Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, Michelle Lin, Alex Mayhew, Katherine M. Collins, Maryam Molamohammadi, John Burden, Wanru Zhao, Shalaleh Rismani, Konstantinos Voudouris, Umang Bhatt, Adrian Weller, David Krueger, Tegan Maharaj:
Harms from Increasingly Agentic Algorithmic Systems. CoRR abs/2302.10329 (2023) - [i32]Xander Davies, Lauro Langosco, David Krueger:
Unifying Grokking and Double Descent. CoRR abs/2303.06173 (2023) - [i31]Micah Carroll, Alan Chan, Henry Ashton, David Krueger:
Characterizing Manipulation from AI Systems. CoRR abs/2303.09387 (2023) - [i30]Shoaib Ahmed Siddiqui, David Krueger, Thomas M. Breuel:
Investigating the Nature of 3D Generalization in Deep Neural Networks. CoRR abs/2304.09358 (2023) - [i29]Stephen Chung, Ivan Anokhin, David Krueger:
Thinker: Learning to Plan and Act. CoRR abs/2307.14993 (2023) - [i28]Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Tong Wang, Samuel Marks, Charbel-Raphaël Ségerie, Micah Carroll, Andi Peng, Phillip J. K. Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Biyik, Anca D. Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell:
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback. CoRR abs/2307.15217 (2023) - [i27]Thomas Coste, Usman Anwar, Robert Kirk, David Krueger:
Reward Model Ensembles Help Mitigate Overoptimization. CoRR abs/2310.02743 (2023) - [i26]Dmitrii Krasheninnikov, Egor Krasheninnikov, Bruno Mlodozeniec, David Krueger:
Meta- (out-of-context) learning in neural networks. CoRR abs/2310.15047 (2023) - [i25]Yoshua Bengio, Geoffrey E. Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian K. Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atilim Günes Baydin, Sheila A. McIlraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca D. Dragan, Philip H. S. Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann:
Managing AI Risks in an Era of Rapid Progress. CoRR abs/2310.17688 (2023) - [i24]Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Edward Grefenstette, Tim Rocktäschel, David Scott Krueger:
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks. CoRR abs/2311.12786 (2023) - [i23]Alan Chan, Ben Bucknall, Herbie Bradley, David Krueger:
Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models. CoRR abs/2312.14751 (2023) - 2022
- [c11]Lauro Langosco di Langosco, Jack Koch, Lee D. Sharkey, Jacob Pfau, David Krueger:
Goal Misgeneralization in Deep Reinforcement Learning. ICML 2022: 12004-12019 - [c10]Joar Skalse, Nikolaus H. R. Howe, Dmitrii Krasheninnikov, David Krueger:
Defining and Characterizing Reward Gaming. NeurIPS 2022 - [i22]Shoaib Ahmed Siddiqui, Nitarshan Rajkumar, Tegan Maharaj, David Krueger, Sara Hooker:
Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics. CoRR abs/2209.10015 (2022) - [i21]Joar Skalse, Nikolaus H. R. Howe, Dmitrii Krasheninnikov, David Krueger:
Defining and Characterizing Reward Hacking. CoRR abs/2209.13085 (2022) - [i20]Adam Ibrahim, Charles Guille-Escuret, Ioannis Mitliagkas, Irina Rish, David Krueger, Pouya Bashivan:
Towards Out-of-Distribution Adversarial Robustness. CoRR abs/2210.03150 (2022) - [i19]Ethan Caballero, Kshitij Gupta, Irina Rish, David Krueger:
Broken Neural Scaling Laws. CoRR abs/2210.14891 (2022) - [i18]Ekdeep Singh Lubana, Eric J. Bigelow, Robert P. Dick, David Scott Krueger, Hidenori Tanaka:
Mechanistic Mode Connectivity. CoRR abs/2211.08422 (2022) - [i17]Alan Clark, Shoaib Ahmed Siddiqui, Robert Kirk, Usman Anwar, Stephen Chung, David Krueger:
Domain Generalization for Robust Model-Based Offline Reinforcement Learning. CoRR abs/2211.14827 (2022) - 2021
- [c9]David Krueger, Ethan Caballero, Jörn-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Dinghuai Zhang, Rémi Le Priol, Aaron C. Courville:
Out-of-Distribution Generalization via Risk Extrapolation (REx). ICML 2021: 5815-5826 - [i16]Shahar Avin, Haydn Belfield, Miles Brundage, Gretchen Krueger, Jasmine Wang, Adrian Weller, Markus Anderljung, Igor Krawczuk, David Krueger, Jonathan Lebensold, Tegan Maharaj, Noa Zilberman:
Filling gaps in trustworthy development of AI. CoRR abs/2112.07773 (2021) - [i15]Enoch Tetteh, Joseph D. Viviano, Yoshua Bengio, David Krueger, Joseph Paul Cohen:
Multi-Domain Balanced Sampling Improves Out-of-Distribution Generalization of Chest X-ray Pathology Prediction Models. CoRR abs/2112.13734 (2021) - 2020
- [i14]David Krueger, Ethan Caballero, Jörn-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Rémi Le Priol, Aaron C. Courville:
Out-of-Distribution Generalization via Risk Extrapolation (REx). CoRR abs/2003.00688 (2020) - [i13]Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian K. Hadfield, Heidy Khlaaf, Jingying Yang, Helen Toner, Ruth Fong, Tegan Maharaj, Pang Wei Koh, Sara Hooker, Jade Leung, Andrew Trask, Emma Bluemke, Jonathan Lebensold, Cullen O'Keefe, Mark Koren, Théo Ryffel, J. B. Rubinovitz, Tamay Besiroglu, Federica Carugati, Jack Clark, Peter Eckersley, Sarah de Haas, Maritza Johnson, Ben Laurie, Alex Ingerman, Igor Krawczuk, Amanda Askell, Rosario Cammarota, Andrew Lohn, David Krueger, Charlotte Stix, Peter Henderson, Logan Graham, Carina Prunkl, Bianca Martin, Elizabeth Seger, Noa Zilberman, Seán Ó hÉigeartaigh, Frens Kroeger, Girish Sastry, Rebecca Kagan, Adrian Weller, Brian Tse, Elizabeth Barnes, Allan Dafoe, Paul Scharre, Ariel Herbert-Voss, Martijn Rasser, Shagun Sodhani, Carrick Flynn, Thomas Krendl Gilbert, Lisa Dyer, Saif Khan, Yoshua Bengio, Markus Anderljung:
Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims. CoRR abs/2004.07213 (2020) - [i12]Andrew Critch, David Krueger:
AI Research Considerations for Human Existential Safety (ARCHES). CoRR abs/2006.04948 (2020) - [i11]David Krueger, Tegan Maharaj, Jan Leike:
Hidden Incentives for Auto-Induced Distributional Shift. CoRR abs/2009.09153 (2020) - [i10]David Krueger, Jan Leike, Owain Evans, John Salvatier:
Active Reinforcement Learning: Observing Rewards at a Cost. CoRR abs/2011.06709 (2020)
2010 – 2019
- 2018
- [c8]Chin-Wei Huang, David Krueger, Alexandre Lacoste, Aaron C. Courville:
Neural Autoregressive Flows. ICML 2018: 2083-2092 - [i9]Joel Ruben Antony Moniz, David Krueger:
Nested LSTMs. CoRR abs/1801.10308 (2018) - [i8]Chin-Wei Huang, David Krueger, Alexandre Lacoste, Aaron C. Courville:
Neural Autoregressive Flows. CoRR abs/1804.00779 (2018) - [i7]Alexandre Lacoste, Boris N. Oreshkin, Wonchang Chung, Thomas Boquet, Negar Rostamzadeh, David Krueger:
Uncertainty in Multitask Transfer Learning. CoRR abs/1806.07528 (2018) - [i6]Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg:
Scalable agent alignment via reward modeling: a research direction. CoRR abs/1811.07871 (2018) - 2017
- [c7]Joel Ruben Antony Moniz, David Krueger:
Nested LSTMs. ACML 2017: 530-544 - [c6]David Krueger, Nicolas Ballas, Stanislaw Jastrzebski, Devansh Arpit, Maxinder S. Kanwal, Tegan Maharaj, Emmanuel Bengio, Asja Fischer, Aaron C. Courville:
Deep Nets Don't Learn via Memorization. ICLR (Workshop) 2017 - [c5]David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Aaron C. Courville, Christopher J. Pal:
Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations. ICLR (Poster) 2017 - [c4]Devansh Arpit, Stanislaw Jastrzebski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron C. Courville, Yoshua Bengio, Simon Lacoste-Julien:
A Closer Look at Memorization in Deep Networks. ICML 2017: 233-242 - [i5]Devansh Arpit, Stanislaw Jastrzebski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron C. Courville, Yoshua Bengio, Simon Lacoste-Julien:
A Closer Look at Memorization in Deep Networks. CoRR abs/1706.05394 (2017) - [i4]David Krueger, Chin-Wei Huang, Riashat Islam, Ryan Turner, Alexandre Lacoste, Aaron C. Courville:
Bayesian Hypernetworks. CoRR abs/1710.04759 (2017) - [i3]Alexandre Lacoste, Thomas Boquet, Negar Rostamzadeh, Boris N. Oreshkin, Wonchang Chung, David Krueger:
Deep Prior. CoRR abs/1712.05016 (2017) - 2016
- [c3]David Krueger, Roland Memisevic:
Regularizing RNNs by Stabilizing Activations. ICLR 2016 - [i2]David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Hugo Larochelle, Aaron C. Courville, Chris Pal:
Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations. CoRR abs/1606.01305 (2016) - 2015
- [c2]Laurent Dinh, David Krueger, Yoshua Bengio:
NICE: Non-linear Independent Components Estimation. ICLR (Workshop) 2015 - [c1]Roland Memisevic, Kishore Reddy Konda, David Krueger:
Zero-bias autoencoders and the benefits of co-adapting features. ICLR (Poster) 2015 - [i1]Philip Bachman, David Krueger, Doina Precup:
Testing Visual Attention in Dynamic Environments. CoRR abs/1510.08949 (2015)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-11-08 20:29 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint