Abstract
The need for privacy protection on the Internet is well recognized. Everyday users are asked to release personal information in order to use online services and applications. Service providers do not always need all the data they gather to be able to offer a service. Thus users should be aware of what data is collected by a provider to judge whether this is too much for the services offered. Providers are obliged to describe how they treat personal data in privacy policies. By reading the policy users could discover, amongst others, what personal data they agree to give away when choosing to use a service. Unfortunately, privacy policies are long legal documents that users notoriously refuse to read. In this paper we propose a solution which automatically analyzes privacy policy text and shows what personal information is collected. Our solution is based on the use of Information Extraction techniques and represents a step towards the more ambitious aim of automated grading of privacy policies.
This work has been partially funded by the THeCS project in the Dutch National COMMIT program.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kosta, E., Dumortier, J., Graux, H., Tirtea, R., Ikonomou, D.: Study on data collection and storage in the EU. Technical report, ENISA, European Network and Information Securiy Agency (2012)
Newman, J.: 8 Tools for the Online Privacy Paranoid (2012)
Spiekermann, S.: Engineering privacy. IEEE Software Engineering 35(1) (2009)
Tsai, J., Egelman, S., Cranor, L.: The effect of online privacy information on purchasing behavior: An experimental study. Information Systems Research 21 (2011)
Turow, J., Hoofnagle, C.J., Mulligan, D.K., Good, N., Grossklags, J.: The FTC and Consumer Privacy in the Coming Decade. Federal Trade Commission (2006)
Tene, O.: Privacy in the Age of Big Data: A Time for Big Decisions. Stanford Law Review Online (2012)
Costante, E., Sun, Y., Petkovic, M., den Hartog, J.: A machine learning solution to assess privacy policy completeness: (short paper). In: WPES 2012, pp. 91–96 (2012)
Holtz, L.-E., Nocun, K., Hansen, M.: Towards Displaying Privacy Information with Icons. In: Fischer-Hübner, S., Duquenoy, P., Hansen, M., Leenes, R., Zhang, G. (eds.) Privacy and Identity Management for Life. IFIP AICT, vol. 352, pp. 338–348. Springer, Heidelberg (2011)
Anton, A.I., Earp, J.B., Qingfeng, H., Stufflebeam, W., Bolchini, D., Jensen, C.: Financial privacy policies and the need for standardization. IEEE Security and Privacy 2(2) (2004)
Brodie, C.A., Karat, C.M., Karat, J.: An empirical study of natural language parsing of privacy policy rules using the SPARCLE policy workbench. In: Proc. of SOUPS 2006. ACM (2006)
Brodie, C.A., Karat, C.M., Karat, J., Feng, J.: Usable security and privacy: a case study of developing privacy management tools. In: Proc. of SOUPS 2005. ACM (2005)
Yu, W.D., Doddapaneni, S., Murthy, S.: A Privacy Assessment Approach for Serviced Oriented Architecture Application. In: Proc. of SOSE 2006. IEEE (2006)
Yu, W.D., Murthy, S.: PPMLP: A Special Modeling Language Processor for Privacy Policies. In: Proc. of ISCC 2007. IEEE (2007)
Cranor, L., Langheinrich, M., Marchiori, M., Presler-Marshall, M., Reagle, J.: The platform for privacy preferences 1.0 (P3P1. 0) specification. W3C (2002)
Aïmeur, E., Gambs, S., Ho, A.: UPP: User Privacy Policy for Social Networking Sites. In: Proc. of ICIW 2009. IEEE (2009)
W3C: Privacy Enhancing Browser Extensions. Technical report, W3C (2011)
Ashley, P., Hada, S., Karjoth, G., Powers, C., Schunter, M.: Enterprise privacy authorization language (EPAL). Technical report, IBM Research (2003)
OASIS: extensible access control markup language (xacml) version 2.0. Technical report, OASIS (2008)
Schwitter, R.: English as a Formal Specification Language. In: Proc. of DEXA 2002. IEEE Computer Society (2002)
Cranor, L., Arjula, M.: Use of a P3P user agent by early adopters. In: Poc. of WPES 2002 (2002)
Reagle, J., Cranor, L.: The platform for privacy preferences. Communications of the ACMÂ 42(2) (1999)
Beatty, P., Reay, I., Dick, S., Miller, J.: P3P Adoption on E-Commerce Web sites: A Survey and Analysis. IEEE Internet Computing 11(2) (2007)
Nédellec, C., Nazarenko, A.: Ontologies and Information Extraction. CoRR abs/cs/060 (July 2006)
Cunningham, H.: Information extraction, automatic. In: Brown, K. (ed.) Encyclopedia of Language and Linguistics, vol. 5. Elsevier (2005)
Turmo, J., Ageno, A.: Adaptive information extraction. ACM Computing Surveys (CSUR)Â 38(2) (2006)
Hobbs, J.: The generic information extraction system. In: Proc. of MUC 1993 (1993)
Deemter, K., Kibble, R.: On coreferring: Coreference in MUC and related annotation schemes. Computational Linguistics (2000)
Hirschman, L., Robinson, P., Burger, J.D., Vilain, M.B.: Automating coreference: The role of annotated training data. CoRR cmp-lg/9803001 (1998)
Cunningham, H.: GATE, a General Architecture for Text Engineering. Computers and the Humanities 36(2) (2002)
Cunningham, H., Maynard, D., Bontcheva, K.: Text Processing with GATE (Version 6). GATE (2011)
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proc. of ACL 2002 (2002)
Ohm, P.: Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review 57 (2010)
Krishnamurthy, R., Li, Y., Raghavan, S., Reiss, F., Vaithyanathan, S., Zhu, H.: SystemT: a system for declarative information extraction. SIGMOD Rec. 37(4) (2009)
Ashish, N., Mehrotra, S., Pirzadeh, P.: Xar: An integrated framework for information extraction. In: WRI World Congress on Computer Science and Information Engineering (2009)
Cunningham, H., Maynard, D., Tablan, V.: Jape: a java annotation patterns engine (1999)
Xu, F.: Bootstrapping Relation Extraction from Semantic Seeds. PhD thesis, Saarland University (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Costante, E., den Hartog, J., Petković, M. (2013). What Websites Know About You. In: Di Pietro, R., Herranz, J., Damiani, E., State, R. (eds) Data Privacy Management and Autonomous Spontaneous Security. DPM SETOP 2012 2012. Lecture Notes in Computer Science, vol 7731. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35890-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-35890-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35889-0
Online ISBN: 978-3-642-35890-6
eBook Packages: Computer ScienceComputer Science (R0)