Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1281192.1281195acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Challenges in mining social network data: processes, privacy, and paradoxes

Published: 12 August 2007 Publication History

Abstract

The profileration of rich social media, on-line communities, and collectively produced knowledge resources has accelerated the convergence of technological and social networks, producing environments that reflect both the architecture of the underlying information systems and the social structure on their members. In studying the consequences of these developments, we are faced with the opportunity to analyze social network data at unprecedented levels of scale and temporal resolution; this has led to a growing body of research at the intersection of the computing and social sciences.
We discuss some of the current challenges in the analysis of large-scale social network data, focusing on two themes in particular: the inference of social processes from data, and the problem of maintaining individual privacy in studies of social networks. While early research on this type of data focused on structural questions, recent work has extended this to consider the social processes that unfold within the networks. Particular lines of investigation have focused on processes in on-line social systems related to communication [1, 22], community formation [2, 8, 16, 23], information-seeking and collective problem-solving [20, 21, 18], marketing [12, 19, 24, 28], the spread of news [3, 17], and the dynamics of popularity [29]. There are a number of fundamental issues, however, for which we have relatively little understanding, including the extent to which the outcomes of these types of social processes are predictable from their early stages (see e.g. [29]), the differences between properties of individuals and properties of aggregate populations in these types of data, and the extent to which similar social phenomena in different domains have uniform underlying explanations.
The second theme we pursue is concerned with the problem of privacy. While much of the research on large-scale social systems has been carried out on data that is public, some of the richest emerging sources of social interaction data come from settings such as e-mail, instant messaging, or phone communication in which users have strong expectations of privacy. How can such data be made available to researchers while protecting the privacy of the individuals represented in the data? Many of the standard approaches here are variations on the principle of anonymization - the names of individuals are replaced with meaningless unique identifiers, so that the network structure is maintained while private information has been suppressed.
In recent joint work with Lars Backstrom and Cynthia Dwork, we have identified some fundamental limitations on the power of network anonymization to ensure privacy [7]. In particular, we describe a family of attacks such that even from a single anonymized copy of a social network, it is possible for an adversary to learn whether edges exist or not between specific targeted pairs of nodes. The attacks are based on the uniqueness of small random subgraphs embedded in an arbitrary network, using ideas related to those found in arguments from Ramsey theory [6, 14]. Combined with other recent examples of privacy breaches in data containing rich textual or time-series information [9, 26, 27, 30], these results suggest that anonymization contains pitfalls even in very simple settings. In this way, our approach can be seen as a step toward understanding how techniques of privacy-preserving data mining (see e.g. [4, 5, 10, 11, 13, 15, 25] and the references therein) can inform how we think about the protection of eventhe most skeletal social network data.

Supplementary Material

Low Resolution (p4-kleinberg-200.mov)
High Resolution (p4-kleinberg-768.mov)

References

[1]
Lada A. Adamic and Eytan Adar. How to search a social network. Social Networks, 27(3):187--203, 2005.
[2]
Lada A. Adamic, Orkut Buyukkokten, and Eytan Adar. A social network caught in the web. First Monday, 8(6), 2003.
[3]
Eytan Adar, Li Zhang, Lada A. Adamic, and Rajan M. Lukose. Implicit structure and the dynamics of blogspace. In Workshop on the Weblogging Ecosystem, 2004.
[4]
Dakshi Agrawal and Charu C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In Proc. 20th ACM Symposium on Principles of Database Systems, 2001.
[5]
Rakesh Agrawal and Ramakrishnan Srikant. Privacy-preserving data mining. In Proc. ACM SIGMOD International Conference on Management of Data, pages 439--450, 2000.
[6]
Noga Alon and Joel Spencer. The Probabilistic Method. John Wiley & Sons, second edition, 2000.
[7]
Lars Backstrom, Cynthia Dwork, and Jon Kleinberg. Wherefore art thou R3579X? Anonymized social networks, hidden patterns, and structural steganography. In Proc. 16th International World Wide Web Conference, 2007.
[8]
Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. Group formation in large social networks: Membership, growth, and evolution. In Proc. 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006.
[9]
Michael Barbaro and Tom Zeller Jr. A face is exposed for aol searcher no. 4417749. New York Times, 9 August 2006.
[10]
Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim. Practical privacy: The SuLQ framework. In Proc. 24th ACM Symposium on Principles of Database Systems, pages 128--138, 2005.
[11]
Irit Dinur and Kobbi Nissim. Revealing information while preserving privacy. In Proc. 22nd ACM Symposium on Principles of Database Systems, pages 202--210, 2003.
[12]
Pedro Domingos and Matt Richardson. Mining the network value of customers. In Proc. 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 57--66, 2001.
[13]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Proc. 3rd International Conference on Very Large Data Bases, pages 265--284, 2006.
[14]
Paul Erdös. Some remarks on the theory of graphs. Bulletin of the AMS, 53:292--294, 1947.
[15]
Alexandre V. Evfimievski, Johannes Gehrke, and Ramakrishnan Srikant. Limiting privacy breaches in privacy preserving data mining. In Proc. 22nd ACM Symposium on Principles of Database Systems, pages 211--222, 2003.
[16]
Scott A. Golder, Dennis Wilkinson, and Bernardo A. Huberman. Rhythms of social interaction: Messaging within a massive online network. In Proc. 3rd International Conference on Communities and Technologies, 2007.
[17]
Daniel Gruhl, David Liben-Nowell, R. V. Guha, and Andrew Tomkins. Information diffusion through blogspace. In Proc. 13th International World Wide Web Conference, 2004.
[18]
Michael Kearns, Siddharth Suri, and Nick Monfort. An experimental study of the coloring problem on human subject networks. Science, 313(5788):824--827, 2006.
[19]
David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence in a social network. In Proc. 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 137--146, 2003.
[20]
Jon Kleinberg. Complex networks and decentralized search algorithms. In Proc. International Congress of Mathematicians, 2006.
[21]
Jon Kleinberg and Prabhakar Raghavan. Query incentive networks. In Proc. 46th IEEE Symposium on Foundations of Computer Science, pages 132--141, 2005.
[22]
Gueorgi Kossinets and Duncan Watts. Empirical analysis of an evolving social network. Science, 311:88--90, 2006.
[23]
Ravi Kumar, Jasmine Novak, and Andrew Tomkins. Structure and evolution of online social networks. In Proc. 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 611--617, 2006.
[24]
Jure Leskovec, Lada Adamic, and Bernardo Huberman. The dynamics of viral marketing. In Proc. 7th ACM Conference on Electronic Commerce, 2006.
[25]
Nina Mishra and Mark Sandler. Privacy via pseudorandom sketches. In Proc. 25th ACM Symposium on Principles of Database Systems, pages 143--152, 2006.
[26]
Arvind Narayanan and Vitaly Shmatikov. How to break anonymity of the netflix prize dataset, October 2006. arxiv cs/0610105.
[27]
Jasmine Novak, Prabhakar Raghavan, and Andrew Tomkins. Anti-aliasing on the web. In Proc. 13th International World Wide Web Conference, pages 30--39, 2004.
[28]
Matt Richardson and Pedro Domingos. Mining knowledge-sharing sites for viral marketing. In Proc. 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 61--70, 2002.
[29]
Matthew Salganik, Peter Dodds, and Duncan Watts. Experimental study of inequality and unpredictability in an artificial cultural market. Science, 311:854--856, 2006.
[30]
Latanya Sweeney. Weaving technology and policy together to maintain confidentiality. J. Law Med. Ethics, 25, 1997.

Cited By

View all
  • (2024)Positive connotations of map-matching based on sub-city districts for trajectory data analyticsInternet of Things10.1016/j.iot.2024.10133828(101338)Online publication date: Dec-2024
  • (2024)Prediction and evaluation of wireless network data transmission security risk based on machine learningWireless Networks10.1007/s11276-024-03773-7Online publication date: 28-May-2024
  • (2022)Optimized multi‐label convolutional neural network using modified genetic algorithm for popularity based personalized news recommendation systemConcurrency and Computation: Practice and Experience10.1002/cpe.703334:19Online publication date: 10-May-2022
  • Show More Cited By

Index Terms

  1. Challenges in mining social network data: processes, privacy, and paradoxes

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2007
    1080 pages
    ISBN:9781595936097
    DOI:10.1145/1281192
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 August 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. anonymization
    2. data mining
    3. diffusion of innovations
    4. privacy in data mining
    5. social networks

    Qualifiers

    • Article

    Conference

    KDD07

    Acceptance Rates

    KDD '07 Paper Acceptance Rate 111 of 573 submissions, 19%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)26
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 21 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Positive connotations of map-matching based on sub-city districts for trajectory data analyticsInternet of Things10.1016/j.iot.2024.10133828(101338)Online publication date: Dec-2024
    • (2024)Prediction and evaluation of wireless network data transmission security risk based on machine learningWireless Networks10.1007/s11276-024-03773-7Online publication date: 28-May-2024
    • (2022)Optimized multi‐label convolutional neural network using modified genetic algorithm for popularity based personalized news recommendation systemConcurrency and Computation: Practice and Experience10.1002/cpe.703334:19Online publication date: 10-May-2022
    • (2021)Privacy Preserving Approaches for Online Social Network Data PublishingDigital Transformation and Challenges to Data Security and Privacy10.4018/978-1-7998-4201-9.ch007(119-132)Online publication date: 2021
    • (2021)A Subgraph Isomorphism-based Attack Towards Social NetworksIEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology10.1145/3498851.3499024(520-528)Online publication date: 14-Dec-2021
    • (2021)Graph Matching Based Privacy-Preserving Scheme in Social NetworksSecurity and Privacy in Social Networks and Big Data10.1007/978-981-16-7913-1_8(110-118)Online publication date: 15-Nov-2021
    • (2020)Data Anonymization in Social Networks State of the Art, Exposure of Shortcomings and Discussion of New Innovations2020 1st International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET)10.1109/IRASET48871.2020.9092064(1-10)Online publication date: Apr-2020
    • (2020)National Security Intelligence through Social Network Data Mining2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9377940(2270-2273)Online publication date: 10-Dec-2020
    • (2020)Interaction and Visualization Design for User Privacy Interface on Online Social NetworksSN Computer Science10.1007/s42979-020-00314-91:5Online publication date: 10-Sep-2020
    • (2020)Network-theoretic modeling of complex activity using UK online sex advertisementsApplied Network Science10.1007/s41109-020-00275-15:1Online publication date: 18-Jun-2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media