Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

What Can We Learn Privately?

Published: 01 June 2011 Publication History

Abstract

Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large real-life data sets. We ask, What concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy differential privacy, a notion that provides strong confidentiality guarantees in contexts where aggregate information is released about a database containing sensitive information about individuals. Our goal is a broad understanding of the resources required for private learning in terms of samples, computation time, and interaction. We demonstrate that, ignoring computational constraints, it is possible to privately agnostically learn any concept class using a sample size approximately logarithmic in the cardinality of the concept class. Therefore, almost anything learnable is learnable privately: specifically, if a concept class is learnable by a (nonprivate) algorithm with polynomial sample complexity and output size, then it can be learned privately using a polynomial number of samples. We also present a computationally efficient private probabilistically approximately correct learner for the class of parity functions. This result dispels the similarity between learning with noise and private learning (both must be robust to small changes in inputs), since parity is thought to be very hard to learn given random classification noise. Local (or randomized response) algorithms are a practical class of private algorithms that have received extensive investigation. We provide a precise characterization of local private learning algorithms. We show that a concept class is learnable by a local algorithm if and only if it is learnable in the statistical query (SQ) model. Therefore, for local private learning algorithms, the similarity to learning with noise is stronger: local learning is equivalent to SQ learning, and SQ algorithms include most known noise-tolerant learning algorithms. Finally, we present a separation between the power of interactive and noninteractive local learning algorithms. Because of the equivalence to SQ learning, this result also separates adaptive and nonadaptive SQ learning.

References

[1]
D. Agrawal and C. C. Aggarwal, On the design and quantification of privacy preserving data mining algorithms, in Proceedings of PODS '01, P. Buneman, ed., ACM, New York, 2001, pp. 247-255.
[2]
R. Agrawal and R. Srikant, Privacy-preserving data mining, SIGMOD Rec., 29 (2000), pp. 439-450.
[3]
S. Agrawal and J. R. Haritsa, A framework for high-accuracy privacy-preserving mining, in Proceedings of ICDE '05, K. Aberer, M. Franklin, and S. Nishio, eds., IEEE, Washington, DC, 2005, pp. 193-204.
[4]
M. Alekhnovich, More on average case vs approximation complexity, in Proceedings of FOCS, P. Beame, ed., IEEE, Washington, DC, 2003, pp. 298-307.
[5]
A. Ambainis, M. Jakobsson, and H. Lipmaa, Cryptographic randomized response techniques, in PKC, Lecture Notes in Comput. Sci. 2947, Springer, Berlin, 2004, pp. 425-438.
[6]
D. Angluin and L. G. Valiant, Fast probabilistic algorithms for hamiltonian circuits and matchings, J. Comput. System Sci., 18 (1979), pp. 155-193.
[7]
B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar, Privacy, accuracy, and consistency too: A holistic solution to contingency table release, in Proceedings of PODS '05, P. Kolaitis and L. Libkin, eds., ACM, New York, 2007, pp. 273-282.
[8]
A. Beimel, S. P. Kasiviswanathan, and K. Nissim, Bounds on the sample complexity forprivate learning and private data release, in Theory of Cryptography Conference (TCC), Lecture Notes in Comput. Sci. 5978, Daniele Micciancio, ed., Springer, Berlin, 2010, pp. 437-454.
[9]
S. Ben-David, D. Pál, and H.-U. Simon, Stability of $k$-means clustering, in COLT '07:Proceedings of the 20th Annual Conference on Learning Theory, N. H. Bshouty and C. Gentile, eds., Springer, Berlin, 2007, pp. 20-34.
[10]
S. Ben-David, U. von Luxburg, and D. Pál, A sober look at clustering stability, in Learning Theory, Lecture Notes in Comput. Sci. 4005, Springer, Berlin, 2006, pp. 5-19.
[11]
A. Blum, C. Dwork, F. McSherry, and K. Nissim, Practical privacy: The SuLQ framework, in Proceedings of PODS '05, G. Gottlob and F. Afrati, eds., ACM, New York, 2005, pp. 128-138.
[12]
A. Blum, M. L. Furst, J. Jackson, M. J. Kearns, Y. Mansour, and S. Rudich, Weakly learning DNF and characterizing statistical query learning using Fourier analysis, in Proceedings of STOC '94, F. T. Leighton and M. Goodrich, eds., ACM, New York, 1994, pp. 253-262.
[13]
A. Blum, A. Kalai, and H. Wasserman, Noise-tolerant learning, the parity problem, and the statistical query model, J. ACM, 50 (2003), pp. 506-519.
[14]
A. Blum, K. Ligett, and A Roth, A learning theory approach to non-interactive database privacy, in Proceedings of STOC '08, R. Ladner and C. Dwork, eds., ACM, New York, 2008, pp. 609-618.
[15]
A. Blumer, A. Ehrenfeucht, D. Haussler, and K. M. Warmuth, Occam's razor, Inform. Process. Lett., 24 (1987), pp. 377-380.
[16]
O. Bousquet and A. Elisseeff, Stability and generalization, J. Mach. Learn. Res., 2 (2002), pp. 499-526.
[17]
N. H. Bshouty and V. Feldman, On using extended statistical queries to avoid membership queries, J. Mach. Learn. Res., 2 (2002), pp. 359-395.
[18]
K. Chaudhuri, C. Dwork, and K. Talwar, private communication, 2008.
[19]
H. Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Ann. Math. Statist., 23 (1952), pp. 493-507.
[20]
L. Devroye and T. Wagner, Distribution-free performance bounds for potential function rules, IEEE Trans. Inform. Theory, 25 (1979), pp. 601-604.
[21]
I. Dinur and K. Nissim, Revealing information while preserving privacy, in Proceedings of PODS '03, F. Neven, C. Beeri, and T. Milo, eds., ACM, New York, 2003, pp. 202-210.
[22]
C. Dwork, Differential privacy, in Automata, Languages, and Programming, Lecture Notes in Comput. Sci. 4052, Springer, Berlin, 2006, pp. 1-12.
[23]
C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor, Our data, ourselves: Privacy via distributed noise generation, in Advances in Cryptology, Lecture Notes in Comput. Sci. 4004, Springer, Berlin, 2006, pp. 486-503.
[24]
C. Dwork and J. Lei, Differential privacy and robust statistics, in Proceedings of STOC '09, M. Mitzenmacher, ed., ACM, New York, 2009, pp. 371-380.
[25]
C. Dwork, F. McSherry, K. Nissim, and A. Smith, Calibrating noise to sensitivity in private data analysis, in Theory of Cryptology, Lecture Notes in Comput. Sci. 3876, Springer, Berlin, 2006, pp. 265-284.
[26]
C. Dwork, F. McSherry, and K. Talwar, The price of privacy and the limits of LP decoding, in Proceedings of STOC '07, R. Ladner and U. Feige, eds., ACM, New York, 2007, pp. 85-94.
[27]
C. Dwork and K. Nissim, Privacy-preserving datamining on vertically partitioned databases, in Advances in Cryptography, Lecture Notes in Comput. Sci. 3152, Springer, Berlin, 2004, pp. 528-544.
[28]
C. Dwork and S. Yekahnin, New efficient attacks on statistical disclosure control, in Advances in Cryptology, Lecture Notes in Comput. Sci. 5157, Springer, Berlin, 2008, pp. 469-480.
[29]
A. Elisseeff, T. Evgeniou, and M. Pontil, Stability of randomized learning algorithms, J. Mach. Learn. Res., 6 (2005), pp. 55-79.
[30]
A. Evfimievski, J. Gehrke, and R. Srikant, Limiting privacy breaches in privacy preserving data mining, in Proceedings of PODS '03, ACM, New York, 2003, pp. 211-222.
[31]
P. Fischer and H.-U. Simon, On learning ring-sum-expansions, SIAM J. Comput., 21 (1992), pp. 181-192.
[32]
Y. Freund, Y. Mansour, and R. E. Schapire, Generalization bounds for averaged classifiers, Ann. Statist., 32 (2004), pp. 1698-1722.
[33]
D. Haussler, Decision theoretic generalizations of the PAC model for neural net and other learning applications, Inform. and Comput., 100 (1992), pp. 78-150.
[34]
D. Helmbold, R. Sloan, and M. K. Warmuth, Learning integer lattices, SIAM J. Comput., 21 (1992), pp. 240-266.
[35]
W. Hoeffding, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc., 58 (1963), pp. 13-30.
[36]
N. J. Hopper and M. Blum, Secure human identification protocols, in Advances in Cryptology, Lecture Notes in Comput. Sci. 2248, Springer, Berlin, 2001, pp. 52-66.
[37]
W. Jank and G. Shmueli, Statistical Methods in eCommerce Research, Wiley & Sons, New York, 2008.
[38]
S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith, What can we learn privately?, in Proceedings of FOCS, R. Ravi, ed., IEEE, Washington, DC, 2008, pp. 559-569.
[39]
S. P. Kasiviswanathan and A. Smith, A Note on Differential Privacy: Defining Resistance to Arbitrary Side Information, Preprint arXiv:0803.39461 {cs.CR}, 2008.
[40]
M. Kearns, Efficient noise-tolerant learning from statistical queries, J. ACM, 45 (1998), pp. 983-1006.
[41]
M. Kearns and D. Ron, Algorithmic stability and sanity-check bounds for leave-one-out cross-validation, Neural Comput., 11 (1999), pp. 1427-1453.
[42]
M. J. Kearns, R. E. Schapire, and L. M. Sellie, Toward efficient agnostic learning, Mach. Learn., 17 (1994), pp. 115-141.
[43]
M. J. Kearns and U. V. Vazirani, An Introduction to Computational Learning Theory, MIT Press, Cambridge, MA, 1994.
[44]
-1S. Kutin and P. Niyogi, Almost-everywhere algorithmic stability and generalization error, in Proceedings of UAI, Association for Uncertainty in Artificial Intelligence, 2002, pp. 275-282.
[45]
F. McSherry and K. Talwar, Mechanism design via differential privacy, in Proceedings of FOCS, A. Sinclair, ed., IEEE, Washington, DC, 2007, pp. 94-103.
[46]
N. Mishra and M. Sandler, Privacy via pseudorandom sketches, in Proceedings of PODS '06, S. Vansummeren, ed., ACM, New York, 2006, pp. 143-152.
[47]
T. Moran and M. Naor, Polling with physical envelopes: A rigorous analysis of a human-centric protocol, in Advances in Cryptology, Lecture Notes in Comput. Sci. 4004, Springer, Berlin, 2006, pp. 88-108.
[48]
K. Nissim, S. Raskhodnikova, and A. Smith, Smooth sensitivity and sampling in private data analysis, in Proceedings of STOC '07, R. Ladner and U. Feige, eds., ACM, New York, 2007, pp. 75-84.
[49]
V. Rastogi, S. Hong, and D. Suciu, The boundary between privacy and utility in data publishing, in Proceedings of VLDB '07, VLDB Endowment, Vienna, Austria, 2007, pp. 531-542.
[50]
O. Regev, On lattices, learning with errors, random linear codes, and cryptography, J. ACM, 56 (2009), pp. 1-40.
[51]
A. Smith, Efficient, Differentially Private Point Estimators, Preprint arXiv:0809.4794v1 {cs.CR}, 2008.
[52]
L. G. Valiant, A theory of the learnable, Commun. of the ACM, 27 (1984), pp. 1134-1142.
[53]
A. van den Hout and P. G. M. van der Heijden, Randomized response, statistical disclosure control and misclassification: A review, Internat. Statist. Rev., 70 (2002), pp. 269-288.
[54]
S. L. Warner, Randomized response: A survey technique for eliminating evasive answer bias, J. Amer. Statist. Assoc., 60 (1965), pp. 63-69.
[55]
L. Wasserman and S. Zhou, A statistical framework for differential privacy, J. Amer. Statist. Assoc., 105 (2010), pp. 375-389.
[56]
K. Yang, New lower bounds for statistical query learning, J. Comput. System Sci., 70 (2005), pp. 485-509.
[57]
S. Zhou, K. Ligett, and L. Wasserman, Differential privacy with compression, in IEEE International Symposium on Information Theory, R. Calderbank, H. Chung, and A. Orlitsky, eds., IEEE, Washington, DC, 2009, pp. 2718-2722.

Cited By

View all
  • (2025)Bayesian Frequency Estimation under Local Differential Privacy with an Adaptive Randomized Response MechanismACM Transactions on Knowledge Discovery from Data10.1145/370658419:2(1-40)Online publication date: 11-Jan-2025
  • (2025)Privacy Passport: Privacy-Preserving Cross-Domain Data SharingIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.351579720(636-650)Online publication date: 1-Jan-2025
  • (2025)When should we use top coding in locally private estimation?International Journal of Information Security10.1007/s10207-024-00942-924:1Online publication date: 1-Feb-2025
  • Show More Cited By
  1. What Can We Learn Privately?

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image SIAM Journal on Computing
    SIAM Journal on Computing  Volume 40, Issue 3
    May 2011
    356 pages

    Publisher

    Society for Industrial and Applied Mathematics

    United States

    Publication History

    Published: 01 June 2011

    Author Tags

    1. data privacy
    2. differential privacy
    3. probabilistically approximately correct learning
    4. statistical query learning

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Bayesian Frequency Estimation under Local Differential Privacy with an Adaptive Randomized Response MechanismACM Transactions on Knowledge Discovery from Data10.1145/370658419:2(1-40)Online publication date: 11-Jan-2025
    • (2025)Privacy Passport: Privacy-Preserving Cross-Domain Data SharingIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.351579720(636-650)Online publication date: 1-Jan-2025
    • (2025)When should we use top coding in locally private estimation?International Journal of Information Security10.1007/s10207-024-00942-924:1Online publication date: 1-Feb-2025
    • (2024)Gradients look alikeProceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3698955(973-990)Online publication date: 14-Aug-2024
    • (2024)Less is moreProceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3698953(937-954)Online publication date: 14-Aug-2024
    • (2024)PositionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694051(48453-48467)Online publication date: 21-Jul-2024
    • (2024)Private truly-everlasting robust-predictionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693966(46591-46604)Online publication date: 21-Jul-2024
    • (2024)Lessons from generalization error analysis of federated learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693866(44093-44135)Online publication date: 21-Jul-2024
    • (2024)CaPSProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693709(40397-40413)Online publication date: 21-Jul-2024
    • (2024)Optimal differentially private model training with public dataProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693404(32849-32903)Online publication date: 21-Jul-2024
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media