Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3132747.3132769acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article
Open access

Prochlo: Strong Privacy for Analytics in the Crowd

Published: 14 October 2017 Publication History

Abstract

The large-scale monitoring of computer users' software activities has become commonplace, e.g., for application telemetry, error reporting, or demographic profiling. This paper describes a principled systems architecture---Encode, Shuffle, Analyze (ESA)---for performing such monitoring with high utility while also protecting user privacy. The ESA design, and its Prochlo implementation, are informed by our practical experiences with an existing, large deployment of privacy-preserving software monitoring.
With ESA, the privacy of monitored users' data is guaranteed by its processing in a three-step pipeline. First, the data is encoded to control scope, granularity, and randomness. Second, the encoded data is collected in batches subject to a randomized threshold, and blindly shuffled, to break linkability and to ensure that individual data items get "lost in the crowd" of the batch. Third, the anonymous, shuffled data is analyzed by a specific analysis engine that further prevents statistical inference attacks on analysis results.
ESA extends existing best-practice methods for sensitive-data analytics, by using cryptography and statistical techniques to make explicit how data is elided and reduced in precision, how only common-enough, anonymous data is analyzed, and how this is done for only specific, permitted purposes. As a result, ESA remains compatible with the established workflows of traditional database analysis.
Strong privacy guarantees, including differential privacy, can be established at each processing step to defend against malice or compromise at one or more of those steps. Prochlo develops new techniques to harden those steps, including the Stash Shuffle, a novel scalable and efficient oblivious-shuffling algorithm based on Intel's SGX, and new applications of cryptographic secret sharing and blinding. We describe ESA and Prochlo, as well as experiments that validate their ability to balance utility and privacy.

Supplementary Material

MP4 File (prochlo.mp4)

References

[1]
Abadi, M. Trusted Computing, Trusted Third Parties, and Verified Communications. In Security and Protection in Information Processing Systems (2004), pp. 291--308.
[2]
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., and Zheng, X. Tensorflow: A system for large-scale machine learning. In Proc. of 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI) (2016), pp. 265--283.
[3]
Abadi, M., Boneh, D., Mironov, I., Raghunathan, A., and Segev, G. Message-Locked Encryption for Lock-Dependent Messages. In Advances in Cryptology--CRYPTO (2013), pp. 374--391.
[4]
Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep Learning with Differential Privacy. In Proc. of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS) (2016), pp. 308--318.
[5]
Abadi, M., Erlingsson, Ú., Goodfellow, I., McMahan, H. B., Papernot, N., Mironov, I., Talwar, K., and Zhang, L. On the Protection of Private Information in Machine Learning Systems: Two Recent Approaches. In Proc. of IEEE 30th Computer Security Foundations Symposium (CSF) (2017), pp. 1--6.
[6]
Avent, B., Korolova, A., Zeber, D., Hovden, T., and Livshits, B. BLENDER: Enabling Local Search with a Hybrid Differential Privacy Model. In Proc. of the 26th USENIX Security Symposium (2017), pp. 747--764.
[7]
Bassily, R., Nissim, K., Stemmer, U., and Thakurta, A. Practical Locally Private Heavy Hitters. CoRR abs/1707.0498 (2017). http://arxiv.org/abs/1707.04982.
[8]
Batcher, K. E. Sorting Networks and their Applications. In AFIPS Spring Joint Computer Conference (1968), vol. 32, pp. 307--314.
[9]
Bellare, M., Keelveedhi, S., and Ristenpart, T. Message-Locked Encryption and Secure Deduplication. In Advances in Cryptology---EUROCRYPT (2013), pp. 296--312.
[10]
Bindschaedler, V., Shokri, R., and Gunter, C. A. Plausible Deniability for Privacy-Preserving Data Synthesis. PVLDB 10, 5 (2017), 481--492.
[11]
Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., Ramage, D., Segal, A., and Seth, K. Practical Secure Aggregation for Privacy Preserving Machine Learning. In Proc. of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS) (2017).
[12]
Bulck, J. V., Weichbrodt, N., Kapitza, R., Piessens, F., and Strackx, R. Telling Your Secrets without Page Faults: Stealthy Page Table-Based Attacks on Enclaved Execution. In Proc. of 26th USENIX Security Symposium (2017), pp. 1041--1056.
[13]
Buse, R. P. L., and Zimmermann, T. Information Needs for Software Development Analytics. In Proc. of the 34th International Conference on Software Engineering (ICSE) (2012), pp. 987--996.
[14]
Chaudhry, G., Hamon, E. A., and Cormen, T. H. Relaxing the Problem-Size Bound for Out-Of-Core Column-Sort. In Proc. of the Fifteenth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA) (2003), pp. 250--251.
[15]
Chen, R., Reznichenko, A., Francis, P., and Gehrke, J. Towards Statistical Queries over Distributed Private User Data. In Proc. of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI) (2012), pp. 169--182.
[16]
Chromium Projects. RAPPOR (Randomized Aggregatable Privacy Preserving Ordinal Responses). https://www.chromium.org/developers/design-documents/rappor, 2017.
[17]
Cito, J., Leitner, P., Fritz, T., and Gall, H. C. The Making of Cloud Applications: An Empirical Study on Software Development for the Cloud. In Proc. of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE) (2015), pp. 393--403.
[18]
Corrigan-Gibbs, H., and Boneh, D. Prio: Private, Robust, and Scalable Computation of Aggregate Statistics. In Proc. of the 14th USENIX Symposium on Networked Systems Design and Implementation (NDSI) (2017), pp. 259--282.
[19]
Corrigan-Gibbs, H., Boneh, D., and Mazières, D. Riposte: An Anonymous Messaging System Handling Millions of Users. In Proc. of the 2015 IEEE Symposium on Security and Privacy (2015), pp. 321--338.
[20]
Dang, H., Dinh, T. T. A., Chang, E.-C., and Ooi, B. C. Privacy-Preserving Computation with Trusted Computing via Scramble-then-Compute. Proceedings on Privacy Enhancing Technologies, 3 (2017), 18--35.
[21]
Demetriou, S., Merrill, W., Yang, W., Zhang, A., and Gunter, C. A. Free for All! Assessing User Data Exposure to Advertising Libraries on Android. In Proc. of the 23nd Annual Network and Distributed System Security Symposium (NDSS) (2016).
[22]
Denning, D. E. R. Cryptography and Data Security. Addison-Wesley, Boston, MA, USA, 1982.
[23]
Dinh, T. T. A., Saxena, P., Chang, E.-C., Ooi, B. C., and Zhang, C. M2R: Enabling Stronger Privacy in MapReduce Computation. In Proc. of the 24th USENIX Security Symposium (2015), pp. 447--462.
[24]
Douceur, J. R. The Sybil attack. In The First International Workshop on Peer-to-Peer Systems (IPTPS) (2002), pp. 251--260.
[25]
Dwork, C. Differential Privacy: A Survey of Results. In International Conference on Theory and Applications of Models of Computation (TAMC) (2008), pp. 1--19.
[26]
Dwork, C., McSherry, F., Nissim, K., and Smith, A. Calibrating Noise to Sensitivity in Private Data Analysis. In Proc. of the Third Conference on Theory of Cryptography (TCC) (2006), pp. 265--284.
[27]
Dwork, C., and Roth, A. The Algorithmic Foundations of Differential Privacy. Found. Trends Theor. Comput. Sci. 9, 3--4 (Aug. 2014), 211--407.
[28]
Erlingsson, Ú., Pihur, V., and Korolova, A. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. In Proc. of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS) (2014), pp. 1054--1067.
[29]
Evfimievski, A., Gehrke, J., and Srikant, R. Limiting Privacy Breaches in Privacy Preserving Data Mining. In Proc. of the Twenty-second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS) (2003), pp. 211--222.
[30]
Fanti, G., Pihur, V., and Erlingsson, U. Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries. Proceedings on Privacy Enhancing Technologies, 3 (2016), 41--61.
[31]
Gaboardi, M., Honaker, J., King, G., Nissim, K., Ullman, J., and Vadhan, S. P. PSI (Ψ): A Private data Sharing Interface. CoRR abs/1609.04340 (2016). http://arxiv.org/abs/1609.04340.
[32]
Ganta, S. R., Kasiviswanathan, S. P., and Smith, A. Composition Attacks and Auxiliary Information in Data Privacy. In Proc. of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2008), pp. 265--273.
[33]
Gehrke, J., Hay, M., Lui, E., and Pass, R. Crowd-Blending Privacy. In Advances in Cryptology---CRYPTO (2012), pp. 479--496.
[34]
Glerum, K., Kinshumann, K., Greenberg, S., Aul, G., Orgovan, V., Nichols, G., Grant, D., Loihle, G., and Hunt, G. Debugging in the (Very) Large: Ten Years of Implementation and Experience. In Proc. of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP) (2009), ACM, pp. 103--116.
[35]
Google. gRPC: A High Performance, Open-Source Universal RPC Framework. http://grpc.io, 2017.
[36]
Intel® Software Guard Extensions Programming Reference. https://software.intel.com/sites/default/files/managed/48/88/329298-002.pdf, Oct. 2014.
[37]
Intel® Software Guard Extensions (Intel® SGX) SDK. https://software.intel.com/en-us/sgx-sdk, 2017.
[38]
Johnson, N., Near, J. P., and Song, D. Towards Practical Differential Privacy for SQL Queries. CoRR abs/1706.09479 (2017). https://arxiv.org/abs/1706.09479.
[39]
Jovic, M., Adamoli, A., and Hauswirth, M. Catch Me if You Can: Performance Bug Detection in the Wild. In Proc. of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA) (2011), pp. 155--170.
[40]
Klonowski, M., and Kutyłowski, M. Provable Anonymity for Networks of Mixes. In Information Hiding: 7th International Workshop (2005), pp. 26--38.
[41]
Koerner, B. I. Inside the Cyberattack That Shocked the US Government. Wired (Oct. 2016). https://www.wired.com/2016/10/inside-cyberattack-shocked-us-government/.
[42]
Laurie, B. Certificate Transparency. Commun. ACM 57, 10 (Oct. 2014), 40--46. https://github.com/google/certificate-transparency.
[43]
Lee, S., Shih, M.-W., Gera, P., Kim, T., Kim, H., and Peinado, M. Inferring Fine-grained Control Flow Inside SGX Enclaves with Branch Shadowing. In Proc. of the 26th USENIX Security Symposium (2017), pp. 557--574.
[44]
Leighton, T. Tight Bounds on the Complexity of Parallel Sorting. IEEE Trans. Comput. 34, 4 (Apr. 1985), 344--354.
[45]
Li, N., Qardaji, W., and Su, D. On Sampling, Anonymization, and Differential Privacy or, k-anonymization Meets Differential Privacy. In Proc. of the 7th ACM Symposium on Information, Computer and Communications Security (ASIACCS) (2012), pp. 32--33.
[46]
Liblit, B., Naik, M., Zheng, A. X., Aiken, A., and Jordan, M. I. Scalable Statistical Bug Isolation. In Proc. of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation (2005), pp. 15--26.
[47]
Lie, D., and Maniatis, P. Glimmers: Resolving the Privacy/Trust Quagmire. In Proc. of the 16th Workshop on Hot Topics in Operating Systems (HotOS) (2017), pp. 94--99.
[48]
Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasubramaniam, M. l-diversity: Privacy Beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1, 1 (Mar. 2007).
[49]
Maniatis, P., and Baker, M. Secure History Preservation Through Timeline Entanglement. In Proc. of the 11th USENIX Security Symposium (2002), pp. 297--312.
[50]
Maniatis, P., Mironov, I., and Talwar, K. Oblivious Stash Shuffle. CoRR abs/1709.07553 (2017). https://arxiv.org/abs/1709.07553.
[51]
McSherry, F., and Mahajan, R. Differentially-Private Network Trace Analysis. SIGCOMM Comput. Commun. Rev. 40, 4 (Aug. 2010), 123--134.
[52]
McSherry, F. D. Privacy Integrated Queries: An Extensible Platform for Privacy-Preserving Data Analysis. In Proc. of the 2009 ACM SIGMOD International Conference on Management of Data (2009), pp. 19--30.
[53]
Mohan, P., Thakurta, A., Shi, E., Song, D., and Culler, D. GUPT: Privacy Preserving Data Analysis Made Easy. In Proc. of the 2012 ACM SIGMOD International Conference on Management of Data (2012), pp. 349--360.
[54]
Narayanan, A., and Shmatikov, V. Robust De-anonymization of Large Sparse Datasets. In Proc. of the 2008 IEEE Symposium on Security and Privacy (2008), pp. 111--125.
[55]
National Institute of Standards and Technology. Digital Signature Standard (DSS). FIPS PUB 186-2, Jan. 2000.
[56]
Newman, L. H. How to Protect Yourself from that Massive Equifax Breach. Wired (Sept. 2017). https://www.wired.com/story/how-to-protect-yourself-from-that-massive-equifax-breach/.
[57]
Ohrimenko, O., Costa, M., Fournet, C., Gkantsidis, C., Kohlweiss, M., and Sharma, D. Observing and Preventing Leakage in MapReduce. In Proc. of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS) (2015), pp. 1570--1581.
[58]
Ohrimenko, O., Goodrich, M. T., Tamassia, R., and Upfal, E. The Melbourne Shuffle: Improving Oblivious Storage in the Cloud. In Automata, Languages, and Programming: 41st International Colloquium, ICALP 2014, Part II (2014), pp. 556--567.
[59]
Oliner, A. J., Iyer, A. P., Stoica, I., Lagerspetz, E., and Tarkoma, S. Carat: Collaborative Energy Diagnosis for Mobile Devices. In Proc. of the 11th Conference on Embedded Networked Sensor Systems (SenSys) (2013), ACM, pp. 10:1--10:14.
[60]
Papadopoulos, E. P., Diamantaris, M., Papadopoulos, P., Petsas, T., Ioannidis, S., and Markatos, E. P. The Long-Standing Privacy Debate: Mobile Websites vs Mobile Apps. In Proc. of the 26th International Conference on World Wide Web (WWW) (2017), pp. 153--162.
[61]
Ravindranath, L., Padhye, J., Agarwal, S., Mahajan, R., Obermiller, I., and Shayandeh, S. AppInsight: Mobile App Performance Monitoring in the Wild. In Proc. of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI) (2012), pp. 107--120.
[62]
Reis, C., Barth, A., and Pizano, C. Browser Security: Lessons from Google Chrome. Commun. ACM 52, 8 (Aug. 2009), 45--49.
[63]
Roy, I., Setty, S. T. V., Kilzer, A., Shmatikov, V., and Witchel, E. Airavat: Security and Privacy for MapReduce. In Proc. of the 7th USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2010), pp. 297--312.
[64]
Saltzer, J. H., and Schroeder, M. D. The Protection of Information in Computer Systems. Proc. of the IEEE 63, 9 (1975), 1278--1308.
[65]
Samarati, P. Protecting Respondents' Identities in Micro-data Release. IEEE Trans. on Knowl. and Data Eng. 13, 6 (Nov. 2001), 1010--1027.
[66]
Samarati, P., and Sweeney, L. Generalizing Data to Provide Anonymity when Disclosing Information (Abstract). In Proc. of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS) (1998), p. 188.
[67]
Schuster, F., Costa, M., Fournet, C., Gkantsidis, C., Peinado, M., Mainar-Ruiz, G., and Russinovich, M. VC3: Trustworthy Data Analytics in the Cloud Using SGX. In Proc. of the 2015 IEEE Symposium on Security and Privacy (2015), pp. 38--54.
[68]
Shamir, A. How to Share a Secret. Comm. of the ACM 22, 11 (1979), 612--613.
[69]
Spensky, C., Stewart, J., Yerukhimovich, A., Shay, R., Trachtenberg, A., Housley, R., and Cunningham, R. K. SoK: Privacy on Mobile Devices---It's Complicated. Proc. on Privacy Enhancing Technologies, 3 (2016), 96--116.
[70]
Tang, J., Korolova, A., Bai, X., Wang, X., and Wang, X. Privacy Loss in Apple's Implementation of Differential Privacy on macOS 10.12. arXiv:1709.02753 (2017). https://arxiv.org/abs/1709.02753.
[71]
Vallina-Rodriguez, N., Sundaresan, S., Razaghpanah, A., Nithyanand, R., Allman, M., Kreibich, C., and Gill, P. Tracking the Trackers: Towards Understanding the Mobile Advertising and Tracking Ecosystem. CoRR abs/1609.07190 (2016). http://arxiv.org/abs/1609.07190.
[72]
van den Hooff, J., Lazar, D., Zaharia, M., and Zeldovich, N. Vuvuzela: Scalable Private Messaging Resistant to Traffic Analysis. In Proc. of the 25th Symposium on Operating Systems Principles (SOSP) (2015), pp. 137--152.
[73]
Viega, J., Chandra, P., and Messier, M. Network Security with OpenSSL, 1st ed. O'Reilly & Associates, Inc., Sebastopol, CA, USA, 2002.
[74]
Wang, T., Blocki, J., Li, N., and Jha, S. Locally Differentially Private Protocols for Frequency Estimation. In Proc. of the 26th USENIX Security Symposium (2017), pp. 729--745.
[75]
Warner, S. L. Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias. J. of the American Statistical Association 60, 309 (1965), 63--69.
[76]
Xu, Y., Cui, W., and Peinado, M. Controlled-Channel Attacks: Deterministic Side Channels for Untrusted Operating Systems. In Proc. of the 2015 IEEE Symposium on Security and Privacy (2015), pp. 640--656.
[77]
Zhang, L., Bild, D. R., Dick, R. P., Mao, Z. M., and Dinda, P. Panappticon: Event-Based Tracing to Measure Mobile Application and Platform Performance. In Proc. of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) (2013), pp. 33:1--33:10.
[78]
Zheng, W., Dave, A., Beekman, J. G., Popa, R. A., Gonzalez, J. E., and Stoica, I. Opaque: An Oblivious and Encrypted Distributed Analytics Platform. In Proc. of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2017), pp. 283--298.

Cited By

View all
  • (2025)Shuffle Model of Differential Privacy: Numerical Composition for Federated LearningApplied Sciences10.3390/app1503159515:3(1595)Online publication date: 5-Feb-2025
  • (2025)Unknown Worker Recruitment With Long-Term Incentive in Mobile CrowdsensingIEEE Transactions on Mobile Computing10.1109/TMC.2024.347156924:2(999-1015)Online publication date: Feb-2025
  • (2025)Simple Binary Hypothesis Testing Under Local Differential Privacy and Communication ConstraintsIEEE Transactions on Information Theory10.1109/TIT.2024.346136171:1(592-617)Online publication date: Jan-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SOSP '17: Proceedings of the 26th Symposium on Operating Systems Principles
October 2017
677 pages
ISBN:9781450350853
DOI:10.1145/3132747
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2017

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SOSP '17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 174 of 961 submissions, 18%

Upcoming Conference

SOSP '25
ACM SIGOPS 31st Symposium on Operating Systems Principles
October 13 - 16, 2025
Seoul , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)681
  • Downloads (Last 6 weeks)98
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Shuffle Model of Differential Privacy: Numerical Composition for Federated LearningApplied Sciences10.3390/app1503159515:3(1595)Online publication date: 5-Feb-2025
  • (2025)Unknown Worker Recruitment With Long-Term Incentive in Mobile CrowdsensingIEEE Transactions on Mobile Computing10.1109/TMC.2024.347156924:2(999-1015)Online publication date: Feb-2025
  • (2025)Simple Binary Hypothesis Testing Under Local Differential Privacy and Communication ConstraintsIEEE Transactions on Information Theory10.1109/TIT.2024.346136171:1(592-617)Online publication date: Jan-2025
  • (2025)GoCrowd: Obliviously Aggregating Crowd Wisdom With Quality Awareness in CrowdsourcingIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.341500322:1(710-722)Online publication date: Jan-2025
  • (2025)Differentially Private Numerical Vector Analyses in the Local and Shuffle ModelIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.334017822:1(1-15)Online publication date: Jan-2025
  • (2025)YSZ-Based In₂O₃/SnO₂ Electrochemical Sensing Device for Highly Selective Tracking HexanalIEEE Sensors Journal10.1109/JSEN.2024.351010225:2(3404-3411)Online publication date: 15-Jan-2025
  • (2025)FedShufde: A privacy preserving framework of federated learning for edge-based smart UAV delivery systemFuture Generation Computer Systems10.1016/j.future.2025.107706166(107706)Online publication date: May-2025
  • (2025)Melbourne Shuffle, TheEncyclopedia of Cryptography, Security and Privacy10.1007/978-3-030-71522-9_1791(1513-1516)Online publication date: 8-Jan-2025
  • (2025)Secure Computation of Differentially Private MechanismsEncyclopedia of Cryptography, Security and Privacy10.1007/978-3-030-71522-9_1714(2221-2224)Online publication date: 8-Jan-2025
  • (2024)LottoProceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3698920(343-360)Online publication date: 14-Aug-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media