Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3106426.3106427acmconferencesArticle/Chapter ViewAbstractPublication PageswiConference Proceedingsconference-collections
research-article

Large-scale readability analysis of privacy policies

Published: 23 August 2017 Publication History

Abstract

Online privacy policies notify users of a Website how their personal information is collected, processed and stored. Against the background of rising privacy concerns, privacy policies seem to represent an influential instrument for increasing customer trust and loyalty. However, in practice, consumers seem to actually read privacy policies only in rare cases, possibly reflecting the common assumption stating that policies are hard to comprehend. By designing and implementing an automated extraction and readability analysis toolset that embodies a diversity of established readability measures, we present the first large-scale study that provides current empirical evidence on the readability of nearly 50,000 privacy policies of popular English-speaking Websites. The results empirically confirm that on average, current privacy policies are still hard to read. Furthermore, this study presents new theoretical insights for readability research, in particular, to what extent practical readability measures are correlated. Specifically, it shows the redundancy of several well-established readability metrics such as SMOG, RIX, LIX, GFI, FKG, ARI, and FRES, thus easing future choice making processes and comparisons between readability studies, as well as calling for research towards a readability measures framework. Moreover, a more sophisticated privacy policy extractor and analyzer as well as a solid policy text corpus for further research are provided.

References

[1]
TRUSTe, http://www.truste.com/
[2]
Milne, G.R., Culnan, M.J.: Strategies for Reducing Online Privacy Risks: Why Consumers Read (or don't Read) Privacy Notices. Journal of Interactive Marketing 18, 15--29 (2004)
[3]
Peterson, D.; Meinert, D.; Criswell II, J.; Crossland, M.: Consumer Trust: Privacy Policies and Third-Party Seals. Journal of Small Business and Enter-prise Development 14, 654--669 (2007)
[4]
Acquisti, A., Gross, R: Imagined Communities: Awareness, Information Sharing, and Privacy on the Facebook. Privacy Enhancing Technologies 36--58 (2006)
[5]
Jensen, C., Potts, C., Jensen, C.: Privacy Practices of Internet Users: Self-Reports versus Observed Behavior. International Journal of Human-Computer Studies 63, 203--227 (2005)
[6]
Graber, M.A., D'Alessandro, D.M., Johnson-West, J.: Reading Level of Privacy Policies on Internet Health Web Sites. Journal of Family Practice 51, 642--642 (2002)
[7]
Sunyaev, A., Dehling, T., Taylor, P.L., Mandl, K.D.: Availability and Quality of Mobile Health App Privacy Policies. Journal of the American Medical Informatics Association (2014)
[8]
Antón, A I., Bertino, E., Li, N., Yu, T.: A Roadmap for Comprehensive Online Privacy Policy Management. Communications of the ACM 50, 109--116 (2007)
[9]
McDonald, A.M., Cranor L.F.: The Cost of Reading Privacy Policies. ISJLP 4, 543 (2008)
[10]
McDonald, A.M., Reeder, R.W., Kelley, P.G., Cranor, L.F.: A Comparative Study of Online Privacy Policies and Formats. Privacy Enhancing Technologies, 37--55 (2009)
[11]
Ermakova, T., Baumann, A., Fabian, B., Krasnova, H.: Privacy Policies and Users' Trust: Does Reada-bility Matter? In: AMCIS (2014)
[12]
Sultan, F., Urban, G. L., Shankar, V., Bart, I. Y.: Determinants and Role of Trust in e-Business: A Large Scale Empirical Study. MIT Sloan School of Management (2002)
[13]
Bansal, G.; Zahedi, F.; Gefen, D.: The Moderating Influence of Privacy Concern on the Efficacy of Privacy Assurance Mechanisms for Building Trust: A Multiple-Context Investigation. In: ICIS (2008)
[14]
Bansal, G.; Zahedi, F.; Gefen, D.: Efficacy of Privacy Assurance Mechanisms in the Context of Disclosing Health Information Online. In: AMCIS (2008)
[15]
Dinev, T., Xu, H., Smith, H. J., Hart, P.: Information Privacy and Correlates: An Empirical Attempt to Bridge and Distinguish Privacy-Related Concepts. European Journal of Information Systems 22, 295--316 (2013)
[16]
Smith, H.J., Dinev, T., Xu, H.: Information Privacy Research: an Interdisciplinary Review. MIS Quarterly 35, 989--1016 (2011)
[17]
Gutwirth, S.: Privacy and the Information Age. Rowman & Littlefield (2002)
[18]
Westin, A. F.: Privacy and Freedom. Atheneum (1967)
[19]
European Parliament and Council: Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the Protection of Individuals with Regard to the Processing of Personal Data and on the Free Movement of Such Data, Official Journal 281, 31--50 (1995)
[20]
U.S. Department of Health & Human Services: Health Insurance Portability and Accountability Act of 1996 (HIPAA), http://www.hhs.gov/ocr/privacy/ (1996)
[21]
Clarke, R.: Introduction to Dataveillance and Information Privacy, and Definitions of Terms, http://www.rogerclarke.com/DV/Intro.html (1999)
[22]
CNNMoney: Hospital network hacked, 4.5 million records stolen, http://money.cnn.com/2014/08/18/technology/security/hospital-chs-hack/index.html?iid=SF_T_Lead (2014)
[23]
Dinev, T., Hart, P.: Internet Privacy Concerns and Social Awareness as Determinants of Intention to Transact. International Journal of E-Commerce 10, 7--29 (2006)
[24]
Pollach, I.: What's Wrong with Online Privacy Policies? Communications of the ACM 50, 103--108 (2007)
[25]
BusinessDictionary: Definition of Privacy Policy, http://www.businessdictionary.com/definition/privacy-policy.html
[26]
W3C, http://www.w3.org/P3P/
[27]
Sadeh, N., Acquisti, A., Breaux, T.D., Cranor, L.F., McDonald, A.M., Reidenberg, J., Smith, N.A., Liu, F., Russell, C., Schaub, F., Wilson, S., Graves, J.T., Leon, P.G., Ramanath, R., Rao, A.: Towards Usable Privacy Policies: Semi-Automatically Extracting Data Practices From Websites' Privacy Policies. In: SOUPS (2014)
[28]
Hunteon Privacy Blog. 10 Steps to Multilayered Privacy Notice, https://www.huntonprivacyblog.com/wp-content/files/2012/07/Centre-10-Steps-to-Multilayered-Privacy-Notice.pdf
[29]
Kelley, P.G., Bresee, J., Cranor, L.F., Reeder, L.W.: A Nutrition Label for Privacy. In: 5th Symposium on Usable Privacy and Security (2009)
[30]
KnowPrivacy, http://www.knowprivacy.org/ (2009)
[31]
Reeder, R.W., Kelley, P.G., McDonald, A.M., Cranor, L.F.: A User Study of the Expandable Grid Applied to P3P Privacy Policy Visualization, in: WPES (2008)
[32]
Feldman, L., Turow, J., Meltzer, K.: Open to Exploitation: American Shoppers Online and Offline. Annenberg Public Policy Center (2005)
[33]
Li, H., Sarathy, R., Xu, H.: The Role of Affect and Cognition on Online Consumers' Decision to Disclose Personal Information to Unfamiliar Online Vendors. Decision Support Systems 51, 434--445 (2011)
[34]
Hui, K.L., Teo, H.H., Lee, S.Y.T.: The Value of Privacy Assurance: An Exploratory Field Experiment. MIS Quarterly 31, 19--33 (2007)
[35]
Tsai, J.Y., Egelman, S., Cranor, L., Acquisti, A.: The Effect of Online Privacy Information on Purchasing Behavior: An Experimental Study. Information Systems Research 22, 254--268 (2011)
[36]
Acquisti, A., Grossklags, J.: Privacy and Rationality in Individual Decision Making. IEEE Security & Privacy 2, 24--30 (2005)
[37]
Egelman, S., Tsai, J., Cranor, L.F., Acquisti, A.: Timing is Everything?: the Effects of Timing and Placement of Online Privacy Indicators. In: International Conference on Human Factors in Computing Systems (2009)
[38]
Klare, G.R.: Measurement of Readability. Iowa St. (1963)
[39]
Harris, T.L., Hodges, R.E. (Eds.): The Literacy Dictionary: The Vocabulary of Reading and Writing. International Reading Association, Newark (1995)
[40]
DuBay, W.H.: Smart Language: Readers, Readability, and the Grading of Text. BookSurge Publishing (2007)
[41]
Klare, G.R.: Assessing Readability. Reading Research Quarterly, 62--102 (1974)
[42]
Shedlosky-Shoemaker, R., Sturm A.C., Saleem, M., Kelly, K.M.: Tools for Assessing Readability and Quality of Health-Related Web Sites, Journal of Genetic Counseling 18, 49--59 (2008)
[43]
Anderson, J.: Lix and Rix: Variations on a Little-Known Readability Index. Journal of Reading, 490--496 (1983)
[44]
Fry, E.: A Readability Formula that Saves Time. Journal of Reading 11, 513--578 (1968)
[45]
DMOZ, http://dmoz.org
[46]
Amazon: Alexa Top 1 Million Sites List, http://s3.amazonaws.com/alexa-static/top-1m.csv.zip.
[47]
Crawler4J. Google Code, http://code.google.com/p/crawler4j
[48]
Thelwall, M., Stuart, D.: Web Crawling Ethics Revisited: Cost, Privacy, and Denial of Service. Journal of the American Society for Information Science and Technology 57, 1771--1779 (2006)
[49]
Natural Language Toolkit, http://www.nltk.org
[50]
Fitzsimmons, P.R., Michael, B.D., Hulley, J.L., Scott, G.O.: A Readability Assessment of Online Parkinson's Disease Information. The Journal of the Royal College of Physicians of Edinburgh 40, 292--296 (2010)
[51]
Hedman, A.S.: Using the SMOG Formula to Revise a Health-Related Document. American Journal of Health Education 39, 61--64 (2008)
[52]
Ley, P., Florio, T.: The Use of Readability Formulas in Health Care. Psychology, Health & Medicine 1, 7--28 (1996)
[53]
ORG Domain, http://pir.org/domains/org-domain/
[54]
Dickson-Swift, V., James, E.L., Liamputtong, P.: What is Sensitive Research? Undertaking Sensitive Research in the Health and Social Sciences: Managing Boundaries, Emotions and Risks, 1--10 (2008)
[55]
Ermakova, T., Fabian, B., Babina, E.: Readability of Privacy Policies of Healthcare Websites, in: 12. Internationale Tagung Wirtschaftsinformatik (2015)
[56]
Hazman, M.: A Survey of Focused Crawler Approaches. Journal of Global Research in Computer Science 3 (2012)
[57]
Singh, R. I., Sumeeth, M., Miller, J.: Evaluating the Readability of Privacy Policies in Mobile Environments. International Journal of Mobile Human Computer Interaction 3 (1), 55--78 (2011)
[58]
Proctor, R., Ali, M. and Vu, K.-P.: Examining Usability of Web Privacy Policies. International Journal of Human-Computer Interaction 24 (3), 307--328 (2008)
[59]
Meiselwitz, G.: Readability Assessment of Policies and Procedures of Social Networking Sites, in: 5th International Conference on Online Communities and Social Computing (2013)
[60]
Jensen, C., Potts, C.: Privacy Policies as Decision-Making Tools: An Evaluation of Online Privacy Notices, in: SIGCHI Conference on Human Factors in Computing Systems (2004)
[61]
Antón, A.I., Earp, J.B., He, Q., Stufflebeam, W., Bolchini, D. and Jensen, C.: The Lack of Clarity in Financial Privacy Policies and the Need for Standardization. IEEE Security and Privacy 2 (2), 36--45 (2004)
[62]
Antón, A.I., Earp, J.B., Vail. V.W., Jain, N., Gheen, C., Frink, J.M.: An Analysis of Web Site Privacy Policy Evolution in the Presence of HIPAA. IEEE Security & Privacy (2005)
[63]
Jensen, C., Potts, C.: Privacy Policies Examined: Fair Warning of Fair Game? Technical Report, Georgia Institute of Technology (2003)
[64]
Singh, R. I., Sumeeth, M., Miller, J.: A User-Centric Evaluation of the Readability of Privacy Policies in Popular Web Sites. Information Systems Frontiers 13 (4), 501 -- 514 (2011)
[65]
Grossklags, J., Good, N.: Empirical Studies on Software Notices to Inform Policy Makers and Usability Designers. Handbook of Financial Cryptography and Security, Springer-Verlag (2007)
[66]
Jafar, M.J., Abdullat, A.A.: Exploratory Analysis Of The Readability Of Information Privacy Statement Of The Primary Social Networks. Journal of Business & Economics Research 7 (12), 123--142 (2009)
[67]
Cadogan, R.A.: An Imbalance Of Power: The Readability Of Internet Privacy Policies. Journal Of Business & Economics Research 2 (3) (2010)
[68]
Sathyendra, K.M., Schaub, F., Wilson, S., Sadeh, N.: Automatic Extraction of Opt-Out Choices from Privacy Policies. AAAI Fall Symposium on Privacy and Language Technologies (2016)
[69]
Liu, F., Wilson, S., Schaub, F., Sadeh, N.: Analyzing Vocabulary Intersections of Expert Annotations and Topic Models for Data Practices in Privacy Policies. AAAI Fall Symposium on Privacy and Language Technologies (2016)
[70]
Wilson, S., Schaub, F., Dara, A., Cherivirala, S.K., Zimmeck, S., Andersen, M.S., Leon, P.G., Hovy, E., Sadeh, N.: Demystifying Privacy Policies Using Language Technologies: Progress and Challenges. TA-COS '16: LREC Workshop on Text Analytics for Cybersecurity and Online Safety (2016)
[71]
Bhatia, J., Breaux, T.D., Schaub, F.: Mining Privacy Goals from Privacy Policies using Hybridized Task Recomposition. ACM Transactions on Software Engineering and Methodology 25 (1) (2016)
[72]
Cherivirala, S.K., Schaub, F., Andersen, M.S., Wilson, S., Sadeh, N., Reidenberg, J.R.: Visualization and Interactive Exploration of Data Practices in Privacy Policies. Symposium on Usable Privacy and Security (2016)
[73]
Wilson, S., Schaub, F., Dara, A., Liu, F., Cherivirala, S., Leon, P.G., Andersen, M.S., Zimmeck, S., Sathyendra, K., Russell, N.C., Norton, T.B., Hovy, E., Reidenberg, J.R., Sadeh, N.: The Creation and Analysis of a Website Privacy Policy Corpus. Annual Meeting of the Association for Computational Linguistics (2016)
[74]
Schaub, F., Balebako, R., Cranor, L.F.: Designing Effective Privacy Notices and Controls. IEEE Internet Computing (2017)
[75]
Schaub, F., Balebako, R., Durity, A., Cranor, L.F.: A Design Space for Effective Privacy Notices. Symposium on Usable Privacy and Security (2015)
[76]
Gluck, J., Schaub, F., Friedman, A., Habib, H., Sadeh, N., Cranor, L.F., Agarwal, Y.: How Short is Too Short? Implications of Length and Framing on the Effectiveness of Privacy Notices. Symposium on Usable Privacy and Security (2016)
[77]
Oglaza, A., Zarate, P., Laborde, R.: KAPUER: A Decision Support System for Privacy Policies Specification. Annals of Data Science 1 (369) (2014)
[78]
Reidenberg, J.R., Breaux, T.D., Cranor, L.F., French, B., Grannis, A., Graves, J.T., Liu, F., McDonald, A.M., Norton, T.B., Ramanath, R., Russell, N.C., Sadeh, N., Schaub, F.: Disagreeable Privacy Policies: Mismatches between Meaning and Users' Understanding. Berkeley Technology Law Journal 30 (1), 39--88 (2015)
[79]
Ermakova, T., Krasnova, H., Fabian, B.: Exploring the Impact of Readability of Privacy Policies on Users' Trust. European Conference on Information Systems (2016)

Cited By

View all
  • (2024)Use & Abuse of Personal Information, Part II: Robust Generation of Fake IDs for Privacy ExperimentationJournal of Cybersecurity and Privacy10.3390/jcp40300264:3(546-571)Online publication date: 11-Aug-2024
  • (2024)Person-based design and evaluation of MIA, a digital medical interview assistant for radiologyFrontiers in Artificial Intelligence10.3389/frai.2024.14311567Online publication date: 16-Aug-2024
  • (2024)A Systematic Review of Privacy Policy LiteratureACM Computing Surveys10.1145/369839357:2(1-43)Online publication date: 1-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WI '17: Proceedings of the International Conference on Web Intelligence
August 2017
1284 pages
ISBN:9781450349512
DOI:10.1145/3106426
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. privacy
  2. readability
  3. user experience

Qualifiers

  • Research-article

Conference

WI '17
Sponsor:

Acceptance Rates

WI '17 Paper Acceptance Rate 118 of 178 submissions, 66%;
Overall Acceptance Rate 118 of 178 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)209
  • Downloads (Last 6 weeks)24
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Use & Abuse of Personal Information, Part II: Robust Generation of Fake IDs for Privacy ExperimentationJournal of Cybersecurity and Privacy10.3390/jcp40300264:3(546-571)Online publication date: 11-Aug-2024
  • (2024)Person-based design and evaluation of MIA, a digital medical interview assistant for radiologyFrontiers in Artificial Intelligence10.3389/frai.2024.14311567Online publication date: 16-Aug-2024
  • (2024)A Systematic Review of Privacy Policy LiteratureACM Computing Surveys10.1145/369839357:2(1-43)Online publication date: 1-Oct-2024
  • (2024)ToneCheck: Unveiling the Impact of Dialects in Privacy PolicyProceedings of the 29th ACM Symposium on Access Control Models and Technologies10.1145/3649158.3657035(7-18)Online publication date: 24-Jun-2024
  • (2024)The Online Identity Help Center: Designing and Developing a Content Moderation Policy Resource for Marginalized Social Media UsersProceedings of the ACM on Human-Computer Interaction10.1145/36374068:CSCW1(1-30)Online publication date: 26-Apr-2024
  • (2024)"Why is Everything in the Cloud?": Co-Designing Visual Cues Representing Data Processes with ChildrenProceedings of the 23rd Annual ACM Interaction Design and Children Conference10.1145/3628516.3655819(517-532)Online publication date: 17-Jun-2024
  • (2024)Security screening metrics for information‐sharing partnershipsRisk Analysis10.1111/risa.1426744:7(1560-1572)Online publication date: 21-Jan-2024
  • (2024)Beware: Processing of Personal Data—Informed Consent Through Risk CommunicationIEEE Transactions on Professional Communication10.1109/TPC.2024.336132867:1(4-25)Online publication date: Mar-2024
  • (2024)The rise of “security and privacy”: bibliometric analysis of computer privacy researchInternational Journal of Information Security10.1007/s10207-023-00761-423:2(863-885)Online publication date: 1-Apr-2024
  • (2024)Japanese Users’ (Mis)understandings of Technical Terms Used in Privacy Policies and the Privacy Protection LawHCI for Cybersecurity, Privacy and Trust10.1007/978-3-031-61379-1_16(245-264)Online publication date: 1-Jun-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media