Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1348549.1348562acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Automated social hierarchy detection through email network analysis

Published: 12 August 2007 Publication History
  • Get Citation Alerts
  • Abstract

    This paper provides a novel algorithm for automatically extracting social hierarchy data from electronic communication behavior. The algorithm is based on data mining user behaviors to automatically analyze and catalog patterns of communications between entities in a email collection to extract social standing. The advantage to such automatic methods is that they extract relevancy between hierarchy levels and are dynamic over time.
    We illustrate the algorithms over real world data using the Enron corporation's email archive. The results show great promise when compared to the corporations work chart and judicial proceeding analyzing the major players.

    References

    [1]
    Z. Bar-Yossef, I. Guy, R. Lempel, Y. S. Maarek, and V. Soroka. Cluster ranking with an application to mining mailbox networks. In ICDM '06: Proceedings of the Sixth International Conference on Data Mining, pages 63-74, Washington, DC, USA, 2006. IEEE Computer Society.
    [2]
    C. Bron and J. Kerbosch. Algorithm 457: finding all cliques of an undirected graph. Commun. ACM, 16(9):575-577, 1973.
    [3]
    G. Carenini, R. T. Ng, and X. Zhou. Scalable discovery of hidden emails from large folders. In KDD '05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 544-549, New York, NY, USA, 2005. ACM Press.
    [4]
    W. Cohen. Enron data set, March 2004.
    [5]
    D. G. Deepak P and V. Varshney. Analysis of enron email threads and quantification of employee responsiveness. In Proceedings of the Text Mining and Link Analysis Workshop on International Joint Conference on Artificial Intelligence, Hyderabad, India, 2007.
    [6]
    J. Diesner and K. Carley. Exploration of communication networks from the enron email corpus. In Proceedings of Workshop on Link Analysis, Counterterrorism and Security, Newport Beach CA, 2005.
    [7]
    J. Diesner, T. L. Frantz, and K. M. Carley. Communication networks from the enron email corpus. Journal of Computational and Mathematical Organization Theory, 11:201-228, 2005.
    [8]
    T. Elsayed and D. W. Oard. Modeling identity in archival collections of email: a preliminary study. In Third Conference on Email and Anti-spam (CEAS), Mountain View, CA, July 2006.
    [9]
    T. Fawcett and F. Provost. Activity monitoring: noticing interesting changes in behavior. In Proceedings of the Fifth ACM SIGKDD International conference on knowledge discovery and data mining (KDD-99), pages 53-62, 1999.
    [10]
    L. Freeman. Centrality in networks: I. conceptual clarification. Social networks, 1:215-239, 1979.
    [11]
    L. Getoor and C. P. Diehl. Link mining: A survey. SIGKDD Explorations, 7(2):3-12, 2005.
    [12]
    L. Getoor, N. Friedman, D. Koller, and B. Taskar. Learning probabilistic models of link structure. Journal of Machine Learning Research, 3:679-707, 2002.
    [13]
    H. G. Goldberg, J. D. Kirkland, D. Lee, P. Shyr, and D. Thakker. The NASD securities observation, news analysis and regulation system (sonar). In IAAI 2003, 2003.
    [14]
    S. Hershkop. Behavior-based Email Analysis with Application to Spam Detection. PhD thesis, Columbia University, 2006.
    [15]
    D. F. Joshua O'Madadhain and S. White. Java universal network/graph framework, 2006. JUNG 1.7.4.
    [16]
    P. Keila and D. Sillicorn. Structure in the enron email dataset. Journal of Computational and Mathematical Organization Theory, 11:183-199, 2005.
    [17]
    J. D. Kirkland, T. E. Senator, J. J. Hayden, T. Dybala, H. G.Goldberg, and P. Shyr. The nasd regulation advanced detection system (ads). AI Magazine, 20(1):55-67, 1999.
    [18]
    J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46, 1999.
    [19]
    B. Klimt and Y. Yang. The enron corpus: A new dataset for email classification research. In European Conference on Machine Learning, Pisa, Italy, 2004.
    [20]
    B. Klimt and Y. Yang. Introducing the enron corpus. In First Conference on Email and Anti-spam (CEAS), Mountain View, CA, 2004.
    [21]
    B. Klimt and Y. Yang. Introducing the enron corpus. In CEAS, 2004.
    [22]
    S. Madnick, R. Wang, and X. Xian. The design and implementation of a corporate householding knowledge processor to improve data quality. Journal of Management Information Systems, 20(3):41-69, Winter 2003.
    [23]
    S. Madnick, R. Wang, and W. Zhang. A framework for corporate householding. In C. Fisher and B. Davidson, editors, Proceedings of the Seventh International Conference on Information Quality, pages 36-40, Cambridge, MA, November 2002.
    [24]
    A. McCallum, A. Corrada-Emmanuel, and X. Wang. The author-recipient-topic model for topic and role discovery in social networks: Experiments with enron and academic email. In NIPS'04 Workshop on 'Structured Data and Representations in Probabilistic Models for Categorization', Whistler, B.C., 2004.
    [25]
    C. Perlich and Z. Huang. Relational learning for customer relationship management. In Proceedings of International Workshop on Customer Relationship Management: Data Mining Meets Marketing, 2005.
    [26]
    C. Perlich and F. Provost. Acora: Distribution-based aggregation for relational learning from identifier attributes. Journal of Machine Learning, 2005.
    [27]
    T. E. Senator. Link mining applications: Progress and challenges. SIGKDD Explorations, 7(2):76-83, 2005.
    [28]
    J. Shetty and J. Adibi. The enron email dataset database schema and brief statistical report, 2004.
    [29]
    J. Shetty and J. Adibi. Discovering important nodes through graph entropy: the case of enron email database. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Ill, August 2005.
    [30]
    M. Sparrow. The application of network analysis to criminal intelligence: an assessment of the prospects. Social networks, 13:251-274, 1991.
    [31]
    S. Stolfo, G. Creamer, and S. Hershkop. A temporal based forensic discovery of electronic communication. In Proceedings of the National Conference on Digital Government Research, San Diego, California, 2006.
    [32]
    S. J. Stolfo, S. Hershkop, C.-W. Hu, W.-J. Li, O. Nimeskern, and K. Wang. Behavior-based modeling and its application to email analysis. ACM Transactions on Internet Technology, 6(2):187-221, May 2006.
    [33]
    B. Taskar, E. Segal, and D. Koller. Probabilistic classification and clustering in relational data. In B. Nebel, editor, Proceeding of IJCAI-01, 17th International Joint Conference on Artificial Intelligence, pages 870-878, Seattle, US, 2001.
    [34]
    B. Taskar, M. Wong, P. Abbeel, and D. Koller. Link prediction in relational data. In Proceedings of Neural Information Processing Systems, 2004., 2004.

    Cited By

    View all
    • (2024)Information Dissemination Evolution Driven by Hierarchical RelationshipIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.329305811:2(1967-1978)Online publication date: Apr-2024
    • (2024)Towards privacy-aware exploration of archived personal emailsInternational Journal on Digital Libraries10.1007/s00799-024-00394-5Online publication date: 21-Feb-2024
    • (2021)Dynamic Structural Role Node Embedding for User Modeling in Evolving NetworksACM Transactions on Information Systems10.1145/347295540:3(1-21)Online publication date: 22-Nov-2021
    • Show More Cited By
    1. Automated social hierarchy detection through email network analysis

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WebKDD/SNA-KDD '07: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
      August 2007
      125 pages
      ISBN:9781595938480
      DOI:10.1145/1348549
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 August 2007

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Enron
      2. behavior profile
      3. data
      4. link mining
      5. mining
      6. social network

      Qualifiers

      • Research-article

      Conference

      KDD07
      Sponsor:

      Upcoming Conference

      KDD '24

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)19
      • Downloads (Last 6 weeks)4
      Reflects downloads up to

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Information Dissemination Evolution Driven by Hierarchical RelationshipIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.329305811:2(1967-1978)Online publication date: Apr-2024
      • (2024)Towards privacy-aware exploration of archived personal emailsInternational Journal on Digital Libraries10.1007/s00799-024-00394-5Online publication date: 21-Feb-2024
      • (2021)Dynamic Structural Role Node Embedding for User Modeling in Evolving NetworksACM Transactions on Information Systems10.1145/347295540:3(1-21)Online publication date: 22-Nov-2021
      • (2021)Beyond high and low: Obstacles and opportunities associated with conceptualizing middle power and other middle‐range effectsSocial and Personality Psychology Compass10.1111/spc3.1260715:7Online publication date: 25-May-2021
      • (2021)Hyperbolic node embedding for temporal networksData Mining and Knowledge Discovery10.1007/s10618-021-00774-4Online publication date: 23-Jun-2021
      • (2021)Which Group Do You Belong To? Sentiment-Based PageRank to Measure Formal and Informal Influence of Nodes in NetworksComplex Networks & Their Applications IX10.1007/978-3-030-65351-4_50(623-636)Online publication date: 5-Jan-2021
      • (2020)Classification Of Power Relations Based On Email Exchange2020 IEEE International Conference on Computing, Power and Communication Technologies (GUCON)10.1109/GUCON48875.2020.9231072(486-489)Online publication date: 2-Oct-2020
      • (2020)Automating Internal Controls AssessmentArtificial Intelligence for Audit, Forensic Accounting, and Valuation10.1002/9781119601906.ch9(165-188)Online publication date: 3-Aug-2020
      • (2019)Les échos du pouvoirRéseaux10.3917/res.214.0251n° 214-215:2(251-288)Online publication date: 24-May-2019
      • (2019)Smart RolesProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330735(2923-2933)Online publication date: 25-Jul-2019
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media