Article

You are what you say: privacy risks of public mentions

Authors:

Dan Frankowski,

John RiedlAuthors Info & Claims

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 565 - 572

https://doi.org/10.1145/1148170.1148267

Published: 06 August 2006 Publication History

Abstract

In today's data-rich networked world, people express many aspects of their lives online. It is common to segregate different aspects in different places: you might write opinionated rants about movies in your blog under a pseudonym while participating in a forum or web site for scholarly discussion of medical ethics under your real name. However, it may be possible to link these separate identities, because the movies, journal articles, or authors you mention are from a sparse relation space whose properties (e.g., many items related to by only a few users) allow re-identification. This re-identification violates people's intentions to separate aspects of their life and can have negative consequences; it also may allow other privacy violations, such as obtaining a stronger identifier like name and address.This paper examines this general problem in a specific setting: re-identification of users from a public web movie forum in a private movie ratings dataset. We present three major results. First, we develop algorithms that can re-identify a large proportion of public users in a sparse relation space. Second, we evaluate whether private dataset owners can protect user privacy by hiding data; we show that this requires extensive and undesirable changes to the dataset, making it impractical. Third, we evaluate two methods for users in a public forum to protect their own privacy, suppression and misdirection. Suppression doesn't work here either. However, we show that a simple misdirection strategy works well: mention a few popular items that you haven't rated.

References

[1]

Ackerman, M. S., Cranor, L. F., and Reagle, J. 1999. Privacy in e-commerce: examining user scenarios and privacy preferences. In Proc. EC99, pp. 1--8.

Digital Library

[2]

Agrawal, R. and Srikant, R. 2000. Privacy-preserving data mining. In Proc. SIGMOD00, pp. 439--450.

Digital Library

[3]

Berkovsky, S., Eytani, Y., Kuflik, T., and Ricci, R. 2005. Privacy-Enhanced Collaborative Filtering. In Proc. User Modeling Workshop on Privacy-Enhanced Personalization.

[4]

Canny, J. 2002. Collaborative filtering with privacy via factor analysis. In Proc. SIGIR02, pp. 238--245.

Digital Library

[5]

Dave, K., Lawrence, S., and Pennock, D. M. 2003. Mining the peanut gallery: opinion extraction and semantic classi-fication of product reviews. In Proc. WWW03, pp. 519--528.

Digital Library

[6]

Drenner, S., Harper, M., Frankowski, D., Terveen, L., and Riedl, J. 2006. Insert Movie Reference Here: A System to Bridge Conversation and Item-Oriented Web Sites. Accepted for Proc. CHI06.

Digital Library

[7]

Evfimievski, A., Srikant, R., Agrawal, R., and Gehrke, J. 2002. Privacy preserving mining of association rules. In Proc. KDD02, pp. 217--228.

Digital Library

[8]

Hong, J.I. and J.A. Landay. An Architecture for Privacy-Sensitive Ubiquitous Computing. In Mobisys04 pp. 177--189.

Digital Library

[9]

Lam, S.K. and Riedl, J. 2004. Shilling recommender systems for fun and profit. In Proc. WWW04, pp. 393--402.

Digital Library

[10]

Novak, J., Raghavan, P., and Tomkins, A. 2004. Anti-Aliasing on the Web. In Proc. WWW04, pp. 30--39.

Digital Library

[11]

Pang, B., Lee, L., and Vaithyanathan, S. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proc. Empirical Methods in NLP, pp. 79--86.

Digital Library

[12]

Polat, H., Du, W. 2003. Privacy-Preserving Collaborative Filtering Using Randomized Perturbation Techniques. In Proc. ICDM03, p. 625.

Digital Library

[13]

Ramakrishnan, N., Keller, B. J., Mirza, B. J., Grama, A. Y., and Karypis, G. 2001. Privacy Risks in Recommender Systems. IEEE Internet Computing 5(6):54--62.

Digital Library

[14]

Rizvi, S., and Haritsa, J. 2002. Maintaining Privacy in Association Rule Mining. In Proc. VLDB02, pp. 682--693.

Digital Library

[15]

Sarwar, B. M., Karypis, G., Konstan, J.A., and Riedl, J. 2001. Item-based collaborative filtering recommendation algorithms. In Proc. WWW01.

Digital Library

[16]

Sweeney, L. 2002. Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5):571--588.

Digital Library

[17]

Sweeney, L. 2002. k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5): 557--570.

Digital Library

[18]

Taylor, H. 2003. Most People Are "Privacy Pragmatists." The Harris Poll #17. Harris Interactive (March 19, 2003).

[19]

Terveen, L., et al. 1997. PHOAKS: a system for sharing recommendations. CACM 40(3):59--62.

Digital Library

[20]

Verykios, V. S., et al. 2004. State-of-the-art in privacy preserving data mining. SIGMOD Rec. 33(1):50--57.

Digital Library

Cited By

O'Donovan N(2024)The regulator’s trilemma: On the limits of technocratic governance in digital marketsCompetition & Change10.1177/1024529424126604828:5(643-662)Online publication date: 25-Jul-2024
https://doi.org/10.1177/10245294241266048
Alshuhail ABhatia S(2023)An Improved Partitioning Method via Disassociation towards Environmental SustainabilitySustainability10.3390/su1509744715:9(7447)Online publication date: 30-Apr-2023
https://doi.org/10.3390/su15097447
Hutt SBaker RAshenafi MAndres‐Bray JBrooks C(2022) Controlled outputs, full data: A privacy‐protecting infrastructure for MOOC data British Journal of Educational Technology10.1111/bjet.1323153:4(756-775)Online publication date: 11-May-2022
https://doi.org/10.1111/bjet.13231
Show More Cited By

Index Terms

You are what you say: privacy risks of public mentions

Recommendations

k-anonymity: a model for protecting privacy

Consider a data holder, such as a hospital or a bank, that has a privately held collection of person-specific, field structured data. Suppose the data holder wants to share a version of the data with researchers. How can a data holder release a version ...
Identity disclosure protection: A data reconstruction approach for privacy-preserving data mining

Identity disclosure is one of the most serious privacy concerns in today's information age. A well-known method for protecting identity disclosure is k-anonymity. A dataset provides k-anonymity protection if the information for each individual in the ...
A polynomial-time approximation to optimal multivariate microaggregation

Microaggregation is a family of methods for statistical disclosure control (SDC) of microdata (records on individuals and/or companies), that is, for masking microdata so that they can be released without disclosing private information on the underlying ...

Reviews

Reviewer: Anthony L. Clapes

Identification data for most Internet users exists in numerous data sets on numerous servers in their home countries and often in foreign countries as well. Sometimes, the identifying information is explicit and provided by the user: name, address, phone number, and credit card number. Often, however, the information is indirect. For example, user Paradiso is one of several people who review foreign movies at a certain Web site. The reviews indicate Paradiso's tastes, sensibilities, and perhaps much more, but without directly identifying him. What if we wanted to find out who the mysterious Paradiso actually was__?__ The authors have considered this question in the context of two data sets: a sparse relation space of people who rate movies for a particular Web site, and a public forum for discussing movies. They discuss methods for narrowing the range of possible identities for a particular reviewer who, like Paradiso, also participates in the forum, and evaluate the effectiveness of each method. They also consider methods by which users such as Paradiso or data set owners such as the movie review Web site can protect personal anonymity in the face of algorithmic matching techniques. It turns out that the most successful method is for Paradiso to discuss a number of popular domestic films in the public forum, rather than only foreign films. The paper concludes by noting that our knowledge of how much people reveal in public communications, and how they can preserve anonymity, is still rudimentary. This is a considerable understatement. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

August 2006

768 pages

ISBN:1595933697

DOI:10.1145/1148170

General Chair:
Efthimis N. Efthimiadis
University of Washington
,
Program Chairs:
Susan Dumais
Microsoft Research, Redmond
,
David Hawking
CSIRO ICT Centre, Canberra, Australia
,
Kalervo Järvelin,
University of Tampere, Finland

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 August 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGIR06

Sponsor:

SIGIR06: The 29th Annual International SIGIR Conference

August 6 - 11, 2006

Washington, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

58
Total Citations
View Citations
1,474
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

O'Donovan N(2024)The regulator’s trilemma: On the limits of technocratic governance in digital marketsCompetition & Change10.1177/1024529424126604828:5(643-662)Online publication date: 25-Jul-2024
https://doi.org/10.1177/10245294241266048
Alshuhail ABhatia S(2023)An Improved Partitioning Method via Disassociation towards Environmental SustainabilitySustainability10.3390/su1509744715:9(7447)Online publication date: 30-Apr-2023
https://doi.org/10.3390/su15097447
Hutt SBaker RAshenafi MAndres‐Bray JBrooks C(2022) Controlled outputs, full data: A privacy‐protecting infrastructure for MOOC data British Journal of Educational Technology10.1111/bjet.1323153:4(756-775)Online publication date: 11-May-2022
https://doi.org/10.1111/bjet.13231
Zheng ZLi XTang MXie FLyu M(2022)Web Service QoS Prediction via Collaborative Filtering: A SurveyIEEE Transactions on Services Computing10.1109/TSC.2020.299557115:4(2455-2472)Online publication date: 1-Jul-2022
https://doi.org/10.1109/TSC.2020.2995571
Lu CLi YXiangli YLi Z(2020)Nowhere to Hide: Cross-modal Identity Leakage between Biometrics and DevicesProceedings of The Web Conference 202010.1145/3366423.3380108(212-223)Online publication date: 20-Apr-2020
https://dl.acm.org/doi/10.1145/3366423.3380108
Zigomitros ACasino FSolanas APatsakis C(2020)A Survey on Privacy Properties for Data Publishing of Relational DataIEEE Access10.1109/ACCESS.2020.29802358(51071-51099)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2980235
Zhao WTan SGuan ZZhang BGong MCao ZWang Q(2018)Learning to Map Social Network Users by Unified Manifold Alignment on HypergraphIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2018.281288829:12(5834-5846)Online publication date: Dec-2018
https://doi.org/10.1109/TNNLS.2018.2812888
Boukoros SKatzenbeisser S(2017)Measuring privacy in high dimensional microdata collectionsProceedings of the 12th International Conference on Availability, Reliability and Security10.1145/3098954.3098977(1-8)Online publication date: 29-Aug-2017
https://dl.acm.org/doi/10.1145/3098954.3098977
Zhang JLi HLiu XLuo YChen FWang HChang L(2017)On Efficient and Robust Anonymization for Privacy Protection on Massive Streaming Categorical InformationIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2015.248350314:5(507-520)Online publication date: 1-Sep-2017
https://doi.org/10.1109/TDSC.2015.2483503
Katarya RParihar A(2017)Study on privacy preserving recommender systems datasets2017 International Conference on Inventive Computing and Informatics (ICICI)10.1109/ICICI.2017.8365367(335-338)Online publication date: Nov-2017
https://doi.org/10.1109/ICICI.2017.8365367
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents