Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3406325.3451131acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article
Public Access

When is memorization of irrelevant training data necessary for high-accuracy learning?

Published: 15 June 2021 Publication History

Abstract

Modern machine learning models are complex and frequently encode surprising amounts of information about individual inputs. In extreme cases, complex models appear to memorize entire input examples, including seemingly irrelevant information (social security numbers from text, for example). In this paper, we aim to understand whether this sort of memorization is necessary for accurate learning. We describe natural prediction problems in which every sufficiently accurate training algorithm must encode, in the prediction model, essentially all the information about a large subset of its training examples. This remains true even when the examples are high-dimensional and have entropy much higher than the sample size, and even when most of that information is ultimately irrelevant to the task at hand. Further, our results do not depend on the training algorithm or the class of models used for learning.
Our problems are simple and fairly natural variants of the next-symbol prediction and the cluster labeling tasks. These tasks can be seen as abstractions of text- and image-related prediction problems. To establish our results, we reduce from a family of one-way communication problems for which we prove new information complexity lower bounds.

References

[1]
Alexander A Alemi. 2020. Variational predictive information bottleneck. In Symposium on Advances in Approximate Bayesian Inference. Pages 1–6.
[2]
Noga Alon, Roi Livni, Maryanthe Malliaris, and Shay Moran. 2019. Private PAC learning implies finite Littlestone dimension. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, AZ, USA, June 23-26, 2019. Pages 852–860.
[3]
Devansh Arpit, Stanislaw Jastrzkebski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, and Yoshua Bengio. 2017. A closer look at memorization in deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. Pages 233–242.
[4]
Ziv Bar-Yossef, Thathachar S Jayram, Ravi Kumar, and D Sivakumar. 2004. An information statistics approach to data stream and communication complexity. J. Comput. System Sci., 68, 4, 2004. Pages 702–732.
[5]
Raef Bassily, Shay Moran, Ido Nachum, Jonathan Shafer, and Amir Yehudayoff. 2018. Learners that use little information. In Algorithmic Learning Theory. Pages 25–55.
[6]
Raef Bassily, Adam Smith, and Abhradeep Thakurta. 2014. Private empirical risk minimization: Efficient algorithms and tight error bounds. In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science. Pages 464–473.
[7]
Amos Beimel, Kobbi Nissim, and Uri Stemmer. 2019. Characterizing the Sample Complexity of Pure Private Learners. Journal of Machine Learning Research, 20, 146, 2019. Pages 1–33.
[8]
Avrim Blum, Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Chen Li. 2005. Practical privacy: the SuLQ framework. In Proceedings of the Twenty-fourth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 13-15, 2005, Baltimore, Maryland, USA. ACM. Pages 128–138. https://doi.org/10.1145/1065167.1065184
[9]
Gavin Brown, Mark Bun, Vitaly Feldman, Adam Smith, and Kunal Talwar. 2020. When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning? arXiv preprint arXiv:2012.06421, 2020.
[10]
Mark Bun and Thomas Steinke. 2016. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference. Pages 635–658.
[11]
Mark Bun, Jonathan Ullman, and Salil Vadhan. 2018. Fingerprinting codes and the price of approximate differential privacy. SIAM J. Comput., 47, 5, 2018. Pages 1888–1938.
[12]
Nicholas Carlini, Chang Liu, \'Ulfar Erlingsson, Jernej Kos, and Dawn Song. 2019. The secret sharer: Evaluating and testing unintended memorization in neural networks. In 28th \USENIX\ Security Symposium (\USENIX\ Security 19). Pages 267–284.
[13]
Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel. 2020. Extracting Training Data from Large Language Models. 2020.
[14]
Irit Dinur and Kobbi Nissim. 2003. Revealing information while preserving privacy. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. Pages 202–210.
[15]
Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toni Pitassi, Omer Reingold, and Aaron Roth. 2015. Generalization in adaptive data analysis and holdout reuse. In Advances in Neural Information Processing Systems. Pages 2350–2358.
[16]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, Adam D. Smith, Shai Halevi, and Tal Rabin. 2006. Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings. Lecture Notes in Computer Science. 3876, Springer. Pages 265–284. https://doi.org/10.1007/11681878_14
[17]
Cynthia Dwork, Adam Smith, Thomas Steinke, and Jonathan Ullman. 2017. Exposed! a survey of attacks on private data. Annual Review of Statistics and Its Application, 4, 2017. Pages 61–84.
[18]
Vitaly Feldman. 2020. Does learning require memorization? a short tale about a long tail. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing. Pages 954–959.
[19]
Vitaly Feldman and David Xiao. 2014. Sample complexity bounds on differentially private learning via communication complexity. In Conference on Learning Theory. Pages 1000–1019.
[20]
Vitaly Feldman and Chiyuan Zhang. 2020. What neural networks memorize and why: Discovering the long tail via influence estimation. Advances in Neural Information Processing Systems, 33, 2020.
[21]
Sumegha Garg, Ran Raz, and Avishay Tal. 2018. Extractor-based time-space lower bounds for learning. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing. Pages 990–1002.
[22]
Uri Hadar, Jingbo Liu, Yury Polyanskiy, and Ofer Shayevitz. 2019. Communication complexity of estimating correlations. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing. Pages 792–803.
[23]
Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2011. What can we learn privately? SIAM J. Comput., 40, 3, 2011. Pages 793–826.
[24]
Roi Livni and Shay Moran. 2020. A limitation of the pac-bayes framework. Advances in Neural Information Processing Systems, 33, 2020.
[25]
Siyuan Ma, Raef Bassily, and Mikhail Belkin. 2018. The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In International Conference on Machine Learning. Pages 3325–3334.
[26]
Andrew McGregor, Ilya Mironov, Toniann Pitassi, Omer Reingold, Kunal Talwar, and Salil P. Vadhan. 2010. The Limits of Two-Party Differential Privacy. In FOCS. Pages 81–90.
[27]
Ido Nachum, Jonathan Shafer, and Amir Yehudayoff. 2018. A direct sum result for the information complexity of learning. arXiv preprint arXiv:1804.05474, 2018.
[28]
Ido Nachum and Amir Yehudayoff. 2019. Average-case information complexity of learning. In Algorithmic Learning Theory. Pages 633–646.
[29]
Adityanarayanan Radhakrishnan, Mikhail Belkin, and Caroline Uhler. 2019. Overparameterized neural networks can implement associative memory. arXiv preprint arXiv:1909.12362, 2019.
[30]
Ran Raz. 2018. Fast learning requires good memory: A time-space lower bound for parity learning. Journal of the ACM (JACM), 66, 1, 2018. Pages 1–18.
[31]
Ryan Rogers, Aaron Roth, Adam Smith, and Om Thakkar. 2016. Max-information, differential privacy, and post-selection hypothesis testing. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS). Pages 487–494.
[32]
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP). Pages 3–18.
[33]
Naftali Tishby, Fernando C Pereira, and William Bialek. 2000. The information bottleneck method. arXiv preprint physics/0004057, 2000.
[34]
Naftali Tishby and Noga Zaslavsky. 2015. Deep learning and the information bottleneck principle. In 2015 IEEE Information Theory Workshop (ITW). Pages 1–5.
[35]
Chulhee Yun, Suvrit Sra, and Ali Jadbabaie. 2019. Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity. In Advances in Neural Information Processing Systems. Pages 15558–15569.
[36]
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Michael C Mozer, and Yoram Singer. 2019. Identity crisis: Memorization and generalization under extreme overparameterization. arXiv preprint arXiv:1902.04698, 2019.
[37]
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2016. Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530, 2016.
[38]
Xiangxin Zhu, Dragomir Anguelov, and Deva Ramanan. 2014. Capturing long-tail distributions of object subcategories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Pages 915–922.

Cited By

View all
  • (2024)Forget me not: memorisation in generative sequence models trained on open source licensed codeSSRN Electronic Journal10.2139/ssrn.4720990Online publication date: 2024
  • (2024)SoK: Unintended Interactions among Machine Learning Defenses and Risks2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00243(2996-3014)Online publication date: 19-May-2024
  • (2024)A Systematic Review of Adversarial Machine Learning Attacks, Defensive Controls, and TechnologiesIEEE Access10.1109/ACCESS.2024.342332312(99382-99421)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
STOC 2021: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing
June 2021
1797 pages
ISBN:9781450380539
DOI:10.1145/3406325
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Information Complexity
  2. Memorization
  3. Overparameterization

Qualifiers

  • Research-article

Funding Sources

Conference

STOC '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)472
  • Downloads (Last 6 weeks)71
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Forget me not: memorisation in generative sequence models trained on open source licensed codeSSRN Electronic Journal10.2139/ssrn.4720990Online publication date: 2024
  • (2024)SoK: Unintended Interactions among Machine Learning Defenses and Risks2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00243(2996-3014)Online publication date: 19-May-2024
  • (2024)A Systematic Review of Adversarial Machine Learning Attacks, Defensive Controls, and TechnologiesIEEE Access10.1109/ACCESS.2024.342332312(99382-99421)Online publication date: 2024
  • (2024)AI model disgorgement: Methods and choicesProceedings of the National Academy of Sciences10.1073/pnas.2307304121121:18Online publication date: 19-Apr-2024
  • (2024)Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challengesInternational Journal of Multimedia Information Retrieval10.1007/s13735-024-00334-813:3Online publication date: 25-Jun-2024
  • (2023)Deconstructing data reconstructionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668365(51515-51535)Online publication date: 10-Dec-2023
  • (2023)Machine Unlearning: A SurveyACM Computing Surveys10.1145/360362056:1(1-36)Online publication date: 28-Aug-2023
  • (2023)Memory-Query Tradeoffs for Randomized Convex Optimization2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00086(1400-1413)Online publication date: 6-Nov-2023
  • (2023)Near Optimal Memory-Regret Tradeoff for Online Learning2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00069(1171-1194)Online publication date: 6-Nov-2023
  • (2023)Data Reconstruction Attack Against Principal Component AnalysisSecurity and Privacy in Social Networks and Big Data10.1007/978-981-99-5177-2_5(79-92)Online publication date: 14-Aug-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media