Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1148170.1148267acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

You are what you say: privacy risks of public mentions

Published: 06 August 2006 Publication History

Abstract

In today's data-rich networked world, people express many aspects of their lives online. It is common to segregate different aspects in different places: you might write opinionated rants about movies in your blog under a pseudonym while participating in a forum or web site for scholarly discussion of medical ethics under your real name. However, it may be possible to link these separate identities, because the movies, journal articles, or authors you mention are from a sparse relation space whose properties (e.g., many items related to by only a few users) allow re-identification. This re-identification violates people's intentions to separate aspects of their life and can have negative consequences; it also may allow other privacy violations, such as obtaining a stronger identifier like name and address.This paper examines this general problem in a specific setting: re-identification of users from a public web movie forum in a private movie ratings dataset. We present three major results. First, we develop algorithms that can re-identify a large proportion of public users in a sparse relation space. Second, we evaluate whether private dataset owners can protect user privacy by hiding data; we show that this requires extensive and undesirable changes to the dataset, making it impractical. Third, we evaluate two methods for users in a public forum to protect their own privacy, suppression and misdirection. Suppression doesn't work here either. However, we show that a simple misdirection strategy works well: mention a few popular items that you haven't rated.

References

[1]
Ackerman, M. S., Cranor, L. F., and Reagle, J. 1999. Privacy in e-commerce: examining user scenarios and privacy preferences. In Proc. EC99, pp. 1--8.
[2]
Agrawal, R. and Srikant, R. 2000. Privacy-preserving data mining. In Proc. SIGMOD00, pp. 439--450.
[3]
Berkovsky, S., Eytani, Y., Kuflik, T., and Ricci, R. 2005. Privacy-Enhanced Collaborative Filtering. In Proc. User Modeling Workshop on Privacy-Enhanced Personalization.
[4]
Canny, J. 2002. Collaborative filtering with privacy via factor analysis. In Proc. SIGIR02, pp. 238--245.
[5]
Dave, K., Lawrence, S., and Pennock, D. M. 2003. Mining the peanut gallery: opinion extraction and semantic classi-fication of product reviews. In Proc. WWW03, pp. 519--528.
[6]
Drenner, S., Harper, M., Frankowski, D., Terveen, L., and Riedl, J. 2006. Insert Movie Reference Here: A System to Bridge Conversation and Item-Oriented Web Sites. Accepted for Proc. CHI06.
[7]
Evfimievski, A., Srikant, R., Agrawal, R., and Gehrke, J. 2002. Privacy preserving mining of association rules. In Proc. KDD02, pp. 217--228.
[8]
Hong, J.I. and J.A. Landay. An Architecture for Privacy-Sensitive Ubiquitous Computing. In Mobisys04 pp. 177--189.
[9]
Lam, S.K. and Riedl, J. 2004. Shilling recommender systems for fun and profit. In Proc. WWW04, pp. 393--402.
[10]
Novak, J., Raghavan, P., and Tomkins, A. 2004. Anti-Aliasing on the Web. In Proc. WWW04, pp. 30--39.
[11]
Pang, B., Lee, L., and Vaithyanathan, S. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proc. Empirical Methods in NLP, pp. 79--86.
[12]
Polat, H., Du, W. 2003. Privacy-Preserving Collaborative Filtering Using Randomized Perturbation Techniques. In Proc. ICDM03, p. 625.
[13]
Ramakrishnan, N., Keller, B. J., Mirza, B. J., Grama, A. Y., and Karypis, G. 2001. Privacy Risks in Recommender Systems. IEEE Internet Computing 5(6):54--62.
[14]
Rizvi, S., and Haritsa, J. 2002. Maintaining Privacy in Association Rule Mining. In Proc. VLDB02, pp. 682--693.
[15]
Sarwar, B. M., Karypis, G., Konstan, J.A., and Riedl, J. 2001. Item-based collaborative filtering recommendation algorithms. In Proc. WWW01.
[16]
Sweeney, L. 2002. Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5):571--588.
[17]
Sweeney, L. 2002. k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5): 557--570.
[18]
Taylor, H. 2003. Most People Are "Privacy Pragmatists." The Harris Poll #17. Harris Interactive (March 19, 2003).
[19]
Terveen, L., et al. 1997. PHOAKS: a system for sharing recommendations. CACM 40(3):59--62.
[20]
Verykios, V. S., et al. 2004. State-of-the-art in privacy preserving data mining. SIGMOD Rec. 33(1):50--57.

Cited By

View all
  • (2024)The regulator’s trilemma: On the limits of technocratic governance in digital marketsCompetition & Change10.1177/1024529424126604828:5(643-662)Online publication date: 25-Jul-2024
  • (2023)An Improved Partitioning Method via Disassociation towards Environmental SustainabilitySustainability10.3390/su1509744715:9(7447)Online publication date: 30-Apr-2023
  • (2022) Controlled outputs, full data: A privacy‐protecting infrastructure for MOOC data British Journal of Educational Technology10.1111/bjet.1323153:4(756-775)Online publication date: 11-May-2022
  • Show More Cited By

Recommendations

Reviews

Anthony L. Clapes

Identification data for most Internet users exists in numerous data sets on numerous servers in their home countries and often in foreign countries as well. Sometimes, the identifying information is explicit and provided by the user: name, address, phone number, and credit card number. Often, however, the information is indirect. For example, user Paradiso is one of several people who review foreign movies at a certain Web site. The reviews indicate Paradiso's tastes, sensibilities, and perhaps much more, but without directly identifying him. What if we wanted to find out who the mysterious Paradiso actually was__?__ The authors have considered this question in the context of two data sets: a sparse relation space of people who rate movies for a particular Web site, and a public forum for discussing movies. They discuss methods for narrowing the range of possible identities for a particular reviewer who, like Paradiso, also participates in the forum, and evaluate the effectiveness of each method. They also consider methods by which users such as Paradiso or data set owners such as the movie review Web site can protect personal anonymity in the face of algorithmic matching techniques. It turns out that the most successful method is for Paradiso to discuss a number of popular domestic films in the public forum, rather than only foreign films. The paper concludes by noting that our knowledge of how much people reveal in public communications, and how they can preserve anonymity, is still rudimentary. This is a considerable understatement. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
August 2006
768 pages
ISBN:1595933697
DOI:10.1145/1148170
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 August 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. k-anonymity
  2. k-identification
  3. datasets
  4. mentions
  5. privacy
  6. re-identification
  7. sparse relation space

Qualifiers

  • Article

Conference

SIGIR06
Sponsor:
SIGIR06: The 29th Annual International SIGIR Conference
August 6 - 11, 2006
Washington, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)The regulator’s trilemma: On the limits of technocratic governance in digital marketsCompetition & Change10.1177/1024529424126604828:5(643-662)Online publication date: 25-Jul-2024
  • (2023)An Improved Partitioning Method via Disassociation towards Environmental SustainabilitySustainability10.3390/su1509744715:9(7447)Online publication date: 30-Apr-2023
  • (2022) Controlled outputs, full data: A privacy‐protecting infrastructure for MOOC data British Journal of Educational Technology10.1111/bjet.1323153:4(756-775)Online publication date: 11-May-2022
  • (2022)Web Service QoS Prediction via Collaborative Filtering: A SurveyIEEE Transactions on Services Computing10.1109/TSC.2020.299557115:4(2455-2472)Online publication date: 1-Jul-2022
  • (2020)Nowhere to Hide: Cross-modal Identity Leakage between Biometrics and DevicesProceedings of The Web Conference 202010.1145/3366423.3380108(212-223)Online publication date: 20-Apr-2020
  • (2020)A Survey on Privacy Properties for Data Publishing of Relational DataIEEE Access10.1109/ACCESS.2020.29802358(51071-51099)Online publication date: 2020
  • (2018)Learning to Map Social Network Users by Unified Manifold Alignment on HypergraphIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2018.281288829:12(5834-5846)Online publication date: Dec-2018
  • (2017)Measuring privacy in high dimensional microdata collectionsProceedings of the 12th International Conference on Availability, Reliability and Security10.1145/3098954.3098977(1-8)Online publication date: 29-Aug-2017
  • (2017)On Efficient and Robust Anonymization for Privacy Protection on Massive Streaming Categorical InformationIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2015.248350314:5(507-520)Online publication date: 1-Sep-2017
  • (2017)Study on privacy preserving recommender systems datasets2017 International Conference on Inventive Computing and Informatics (ICICI)10.1109/ICICI.2017.8365367(335-338)Online publication date: Nov-2017
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media