Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

PrivacyCheck: Automatic Summarization of Privacy Policies Using Data Mining

Published: 07 August 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Prior research shows that only a tiny percentage of users actually read the online privacy policies they implicitly agree to while using a website. Prior research also suggests that users ignore privacy policies because these policies are lengthy and, on average, require 2 years of college education to comprehend. We propose a novel technique that tackles this problem by automatically extracting summaries of online privacy policies. We use data mining models to analyze the text of privacy policies and answer 10 basic questions concerning the privacy and security of user data, what information is gathered from them, and how this information is used. In order to train the data mining models, we thoroughly study privacy policies of 400 companies (considering 10% of all listings on NYSE, Nasdaq, and AMEX stock markets) across industries. Our free Chrome browser extension, PrivacyCheck, utilizes the data mining models to summarize any HTML page that contains a privacy policy. PrivacyCheck stands out from currently available counterparts because it is readily applicable on any online privacy policy. Cross-validation results show that PrivacyCheck summaries are accurate 40% to 73% of the time. Over 400 independent Chrome users are currently using PrivacyCheck.

    Supplementary Material

    a53-zaeem-apndx.pdf (zaeem.zip)
    Supplemental movie, appendix, image and software files for, PrivacyCheck: Automatic Summarization of Privacy Policies Using Data Mining

    References

    [1]
    Alessandro Acquisti, Curtis Taylor, and Liad Wagman. 2016. The economics of privacy. Journal of Economic Literature 54, 2 (2016), 442--492.
    [2]
    AdblockPlus. 2015. Adblock Plus Surf the web without annoying ads! Retrieved June 3, 2015, from https://adblockplus.org/.
    [3]
    Waleed Ammar, Shomir Wilson, Norman Sadeh, and Noah A. Smith. 2012. Automatic categorization of privacy policies: A pilot study. Research Showcase @ CMU.
    [4]
    AT8T. 2002. Privacy Bird. Retrieved June 15, 2015, from http://www.privacybird.org.
    [5]
    BBBOnline. 2015. Better Business Bureau. Retrieved June 15, 2015, from http://www.bbb.org/central-texas/bbb-education-foundation.
    [6]
    BuiltWith. 2015. P3P policy usage statistics. Retrieved June 3, 2015, from http://trends.builtwith.com/docinfo/P3P-Policy.
    [7]
    Nathan Clarke, Steven Furnell, Julio Angulo, Simone Fischer-Hübner, Erik Wästlund, and Tobias Pulls. 2012. Towards usable privacy policy display and management. Information Management 8 Computer Security 20, 1 (2012), 4--17.
    [8]
    Lorrie Faith Cranor. 2012. Necessary but not sufficient: Standardized mechanisms for privacy notice and choice. Journal on Telecommunications 8 High Technology Law 10 (2012), 273.
    [9]
    Lorrie Faith Cranor, Praveen Guduru, and Manjula Arjula. 2006a. User interfaces for privacy agents. ACM Transactions on Computer-Human Interaction (TOCHI) 13, 2 (2006), 135--178.
    [10]
    Lorrie Cranor, Marc Langheinrich, Massimo Marchiori, Martin Presler-Marshall, and Joseph Reagle. 2006b. The Platform for Privacy Preferences 1.1 (P3P1.1) Specification.
    [11]
    Datanyze. 2015. Truste market share in the Alexa top 1M. Retrieved June 3, 2015, from https://www.datanyze.com/market-share/security/truste-market-share.
    [12]
    Tatiana Ermakova, Annika Baumann, Benjamin Fabian, and Hanna Krasnova. 2014. Privacy policies and users’ trust: Does readability matter? In 20th Americas Conference on Information Systems (AMCIS’14).
    [13]
    FTC. 2000. Privacy online: Fair information practices in the electronic marketplace: A Federal Trade Commission report to Congress. Retrieved October 21, 2015, from https://www.ftc.gov/reports/privacy-online-fair-information-practices-electronic-marketplace-federal-trade-commission.
    [14]
    FTC. 2010. Exploring privacy: An FTC roundtable discussion. Retrieved May 21, 2015, from https://www.ftc.gov/sites/default/files/documents/public_events/exploring-privacy-roundtable-series/privacyroundtable_march2010_transcript.pdf.
    [15]
    FTC. 2012. Protecting consumer privacy in an era of rapid change: Recommendations for businesses and policymakers. Retrieved May 21, 2015, from https://www.ftc.gov/reports/protecting-consumer-privacy-era-rapid-change-recommendations-businesses-policymakers.
    [16]
    Ghostery. 2015. Join over 40 million Ghostery users and download the web’s most popular privacy tool. Retrieved June 3, 2015, from https://www.ghostery.com/en/home.
    [17]
    Google. 2014a. Google Prediction API v 1.6. Retrieved June 3, 2015, from https://cloud.google.com/prediction/docs.
    [18]
    Google. 2014b. Google search engine. Retrieved November 13, 2014, from https://www.google.com/?gws_rd=ssl#q=privacy+policy.
    [19]
    Mark A. Graber, Donna M. D. Alessandro, and Jill Johnson-West. 2002. Reading level of privacy policies on internet health web sites. Journal of Family Practice 51, 7 (2002), 642--642.
    [20]
    ICB. 2006. Industry Classification Benchmark (ICB): A single standard defining the market. Retrieved October 7, 2015, from http://www.icbenchmark.com.
    [21]
    Patrick Gage Kelley, Joanna Bresee, Lorrie Faith Cranor, and Robert W. Reeder. 2009. A nutrition label for privacy. In Proceedings of the 5th Symposium on Usable Privacy and Security. ACM, 4.
    [22]
    Patrick Gage Kelley, Lucian Cesca, Joanna Bresee, and Lorrie Faith Cranor. 2010. Standardizing privacy notices: An online study of the nutrition label approach. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1573--1582.
    [23]
    Alfred Kobsa. 2007. Privacy-enhanced web personalization. In The Adaptive Web. Springer, 628--670.
    [24]
    Ron Kohavi. 2001. Mining e-commerce data: The good, the bad, and the ugly. In International Conference on Knowledge Discovery and Data Mining. ACM, 8--13.
    [25]
    Aleecia M. McDonald and Lorrie Faith Cranor. 2008. The cost of reading privacy policies. I/S: A Journal of Law and Policy for the Information Society 4 (2008), 543.
    [26]
    David B. Meinert, Dane K. Peterson, John R. Criswell, and Martin D. Crossland. 2006. Privacy policy statements and consumer willingness to provide personal information. Journal of Electronic Commerce in Organizations 4, 1 (2006), 1.
    [27]
    George R. Milne and Mary J. Culnan. 2004. Strategies for reducing online privacy risks: Why consumers read (or don’t read) online privacy notices. Journal of Interactive Marketing 18, 3 (2004), 15--29.
    [28]
    George R. Milne, Mary J. Culnan, and Henry Greene. 2006. A longitudinal assessment of online privacy notice readability. Journal of Public Policy 8 Marketing 25, 2 (2006), 238--249.
    [29]
    Nasdaq. 2015. Nasdaq. Retrieved September 3, 2015, from http://www.nasdaq.com.
    [30]
    Robert W. Reeder, Patrick Gage Kelley, Aleecia M. McDonald, and Lorrie Faith Cranor. 2008. A user study of the expandable grid applied to P3P privacy policy visualization. In Proceedings of the 7th ACM Workshop on Privacy in the Electronic Society. ACM, 45--54.
    [31]
    Having Regard. 1980. Recommendation of the council concerning guidelines governing the protection of privacy and transborder flows of personal data.
    [32]
    Disconnect Me. 2014. Disconnect Me privacy icons. Retrieved March 15, 2016, from https://disconnect.me/icons.
    [33]
    Usable Privacy. 2016. Usable Privacy Project website. Retrieved September 28, 2016, from https://usableprivacy.org/.
    [34]
    UT CID. 2015. PrivacyCheck. Retrieved May 16, 2016, from https://chrome.google.com/webstore/detail/privacycheck/poobeppenopkcbjejfjenbiepifcbclg.
    [35]
    Norman Sadeh, Alessandro Acquisti, Travis D. Breaux, Lorrie Faith Cranor, Aleecia M. McDonald, Joel R. Reidenberg, Noah A. Smith, Fei Liu, N. Cameron Russell, Florian Schaub, and Shomir Wilson. 2013. The Usable Privacy Policy Project. Technical Report, CMU-ISR-13-119, Carnegie Mellon University.
    [36]
    Nili Steinfeld. 2016. I agree to the terms and conditions: (How) do users read privacy policies online? An eye-tracking experiment. Computers in Human Behavior 55 (2016), 992--1000.
    [37]
    ToS;DR. 2012. Terms of Service; Didn’t Read. Retrieved March 4, 2015, from https://tosdr.org.
    [38]
    TRUSTe. 2015. TRUSTe. Retrieved March 4, 2015, from http://www.truste.com.
    [39]
    Shomir Wilson, Florian Schaub, Aswarth Dara, Sushain K. Cherivirala, Sebastian Zimmeck, Mads Schaarup Andersen, Pedro Giovanni Leon, Eduard Hovy, and Norman Sadeh. 2016a. Demystifying privacy policies using language technologies: Progress and challenges. In LREC Workshop on Text Analytics for Cybersecurity and Online Safety (TA-COS’16).
    [40]
    Shomir Wilson, Florian Schaub, Aswarth Abhilash Dara, Frederick Liu, Sushain Cherivirala, Pedro Giovanni Leon, Mads Schaarup Andersen, Sebastian Zimmeck, Kanthashree Mysore Sathyendra, N. Cameron Russell, Thomas B. Norton, Eduard Hovy, Joel Reidenberg, and Norman Sadeh. 2016b. The creation and analysis of a website privacy policy corpus. In Annual Meeting of the Association for Computational Linguistics. 1330--1340.
    [41]
    Shomir Wilson, Florian Schaub, Rohan Ramanath, Norman Sadeh, Fei Liu, Noah A. Smith, and Frederick Liu. 2016c. Crowdsourcing annotations for websites’ privacy policies: Can it really work? In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 133--143.
    [42]
    Sebastian Zimmeck. 2014. Privee Chrome extension. Retrieved July 13, 2015, from https://chrome.google.com/webstore/detail/privee/lmhnkfilbojonenmnagllnoiganihmnl.
    [43]
    Sebastian Zimmeck and Steven M. Bellovin. 2014. Privee: An architecture for automatically analyzing web privacy policies. In 23rd USENIX Security Symposium (USENIX Security’14). USENIX Association, San Diego, CA, 1--16. Retrieved from https://www.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/zimmeck.

    Cited By

    View all
    • (2024)IoT Privacy Risks RevealedEntropy10.3390/e2607056126:7(561)Online publication date: 29-Jun-2024
    • (2024)A User-Centered Privacy Policy Management System for Automatic Consent on Cookie BannersComputers10.3390/computers1302004313:2(43)Online publication date: 1-Feb-2024
    • (2024)Bring Privacy To The Table: Interactive Negotiation for Privacy Settings of Shared Sensing DevicesProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642897(1-22)Online publication date: 11-May-2024
    • Show More Cited By

    Index Terms

    1. PrivacyCheck: Automatic Summarization of Privacy Policies Using Data Mining

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Internet Technology
        ACM Transactions on Internet Technology  Volume 18, Issue 4
        Special Issue on Computational Ethics and Accountability, Special Issue on Economics of Security and Privacy and Regular Papers
        November 2018
        348 pages
        ISSN:1533-5399
        EISSN:1557-6051
        DOI:10.1145/3210373
        • Editor:
        • Munindar P. Singh
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 07 August 2018
        Accepted: 01 July 2017
        Revised: 01 July 2017
        Received: 01 October 2016
        Published in TOIT Volume 18, Issue 4

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Privacy policy
        2. classification
        3. data mining

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        • State of Texas IDWise Project
        • Center for Identity's Strategic Partners

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)155
        • Downloads (Last 6 weeks)16
        Reflects downloads up to 12 Aug 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)IoT Privacy Risks RevealedEntropy10.3390/e2607056126:7(561)Online publication date: 29-Jun-2024
        • (2024)A User-Centered Privacy Policy Management System for Automatic Consent on Cookie BannersComputers10.3390/computers1302004313:2(43)Online publication date: 1-Feb-2024
        • (2024)Bring Privacy To The Table: Interactive Negotiation for Privacy Settings of Shared Sensing DevicesProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642897(1-22)Online publication date: 11-May-2024
        • (2024)CSChecker: Revisiting GDPR and CCPA Compliance of Cookie Banners on the WebProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639159(1-12)Online publication date: 20-May-2024
        • (2024)PrivacyChat: Utilizing Large Language Model for Fine-Grained Information Extraction over Privacy PoliciesWisdom, Well-Being, Win-Win10.1007/978-3-031-57850-2_17(223-231)Online publication date: 15-Apr-2024
        • (2023)Understanding Website Privacy Policies—A Longitudinal Analysis Using Natural Language ProcessingInformation10.3390/info1411062214:11(622)Online publication date: 19-Nov-2023
        • (2023)Graph-Based Extractive Text Summarization Sentence Scoring Scheme for Big Data ApplicationsInformation10.3390/info1409047214:9(472)Online publication date: 22-Aug-2023
        • (2023)Personalized Privacy Assistant: Identity Construction and Privacy in the Internet of ThingsEntropy10.3390/e2505071725:5(717)Online publication date: 26-Apr-2023
        • (2023)An Analytical Review of Industrial Privacy Frameworks and Regulations for Organisational Data SharingApplied Sciences10.3390/app13231272713:23(12727)Online publication date: 27-Nov-2023
        • (2023)A GDPR Compliant Approach to Assign Risk Levels to Privacy PoliciesComputers, Materials & Continua10.32604/cmc.2023.03403974:3(4631-4647)Online publication date: 2023
        • Show More Cited By

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media