Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1557019.1557124acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Mining social networks for personalized email prioritization

Published: 28 June 2009 Publication History
  • Get Citation Alerts
  • Abstract

    Email is one of the most prevalent communication tools today, and solving the email overload problem is pressingly urgent. A good way to alleviate email overload is to automatically prioritize received messages according to the priorities of each user. However, research on statistical learning methods for fully personalized email prioritization (PEP) has been sparse due to privacy issues, since people are reluctant to share personal messages and importance judgments with the research community. It is therefore important to develop and evaluate PEP methods under the assumption that only limited training examples can be available, and that the system can only have the personal email data of each user during the training and testing of the model for that user. This paper presents the first study (to the best of our knowledge) under such an assumption. Specifically, we focus on analysis of personal social networks to capture user groups and to obtain rich features that represent the social roles from the viewpoint of a particular user. We also developed a novel semi-supervised (transductive) learning algorithm that propagates importance labels from training examples to test examples through message and user nodes in a personal email network. These methods together enable us to obtain an enriched vector representation of each new email message, which consists of both standard features of an email message (such as words in the title or body, sender and receiver IDs, etc.) and the induced social features from the sender and receivers of the message. Using the enriched vector representation as the input in SVM classifiers to predict the importance level for each test message, we obtained significant performance improvement over the baseline system (without induced social features) in our experiments on a multi-user data collection. We obtained significant performance improvement over the baseline system (without induced social features) in our experiments on a multi-user data collection: the relative error reduction in MAE was 31% in micro-averaging, and 14% in macro-averaging.

    Supplementary Material

    JPG File (p967-yoo.jpg)
    MP4 File (p967-yoo.mp4)

    References

    [1]
    CEAS 2005 - Second Conference on Email and Anti-Spam, July 21-22, 2005, Stanford University, California, USA, 2005.
    [2]
    P. O. Boykin and V. P. Roychowdhury. Leveraging social networks to fight spam. Computer, 38(4):61--68,2005.
    [3]
    S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst., 30(1-7):107--117, 1998.
    [4]
    J. Cadiz, L. Dabbish, A. Gupta, and G. D. Venolia. Supporting email workflow. Technical Report MSR-TR-2001-88, Microsoft Research (MSR), Sept. 2001.
    [5]
    K. M. Carley, D. Columbus, M. DeReno, J. Reminga, and I. Moon. Ora user's guide 2007. Carnegie Mellon University, SCS ISRI, Technical Report, (07-115), 2007.
    [6]
    L. A. Dabbish and R. E. Kraut. Email overload at work: an analysis of factors associated with email strain. In P. J. Hinds and D. Martin, editors, Proceedings of the 2006 ACM Conference on Computer Supported Cooperative Work, CSCW 2006, Banff, Alberta, Canada, November 4-8, 2006, pages 431--440. ACM, 2006.
    [7]
    R. B. Einat Minkov and W. Cohen. Activity-centred search in email. In Proceedings of the 5th Conference on Email and Anti-Spam (CEAS). CEAS, 2008.
    [8]
    L. C. Freeman. The Development of Social Network Analysis: A Study in the Sociology of Science. Empirical Press, 2004.
    [9]
    L. H. Gomes, F. D. O. Castro, V. A. F. Almeida, J. M. Almeida, R. B. Almeida, and L. M. A. Bettencourt. Improving spam detection based on structural similarity. In SRUTI'05: Proceedings of the Steps to Reducing Unwanted Traffic on the Internet on Steps to Reducing Unwanted Traffic on the Internet Workshop, pages 12--12, Berkeley, CA, USA, 2005. USENIX Association.
    [10]
    T. Haveliwala, S. Kamvar, and G. Jeh. An analytical comparison of approaches to personalizing pagerank. Technical report, Stanford University, 2003.
    [11]
    E. Horvitz, A. Jacobs, and D. Hovel. Attention-sensitive alerting. In K. B. Laskey and H. Prade, editors, UAI '99: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, July 30-August 1,1999, pages 305--313. Morgan Kaufmann, 1999.
    [12]
    J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999.
    [13]
    R. Likert. A technique for the measurement of attitudes. Archives of Psychology, 140:1--55, 1932.
    [14]
    K. B. Lisa Johansen, Michael Rowell and P. McDaniel. Email communities of interest. In Proceedings of the 4th Conference on Email and Anti-Spam (CEAS).CEAS, 2007.
    [15]
    S. Martin, B. Nelson, A. Sewani, K. Chen, and A. D. Joseph. Analyzing behavioral features for email classification. In CEAS {1}.
    [16]
    C. Neustaedter, A. J. B. Brush, M. A. Smith, and D. Fisher. The social network and relationship finder: Social sorting for email triage. In CEAS {1}.
    [17]
    M. E. J. Newman. Modularity and community structure in networks. 2006.
    [18]
    J. R. Tyler, D. M. Wilkinson, and B. A. Huberman. Email as spectroscopy: automated discovery of community structure within organizations. pages 81--96, 2003.
    [19]
    S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge, 1994.
    [20]
    M. Wattenberg, Rohall, S. L., D. Gruen, and B. Kerr. E-mail research: Targeting the enterprise. Human-Computer Interaction, 20(1/2):139--162, 2005.
    [21]
    Joshua Goodman, Gordon V. Cormack, and David Heckerman. Spam and the ongoing battle for the inbox. Commununications of the ACM, 50(2):24--33, 2007.
    [22]
    M. Mojdeh and G. V. Cormack, Semi-supervised Spam Filtering: Does it Work?, SIGIR 2008.
    [23]
    B. Klimt and Y. Yang. The Enron Corpus: A New Dataset for Email Classification Research. ECML 2004.
    [24]
    R. Balasubramanyan, V. Carvalho and W. Cohen, CutOnce - Recipient Recommendation and Leak Detection in Action. In AAAI-2008, Workshop on Enhanced Messaging.
    [25]
    P.N. Bennett and J. Carbonell (2007). Combining Probability-Based Rankers for Action-Item Detection. In Proceedings of HLT-NAACL 2007.
    [26]
    A. McCallum, X. Wang and A. Corrada-Emmanuel. Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email. Journal of Artificial Intelligence Research (JAIR), 2007.
    [27]
    D. Alwin, and J. Krosnick, The reliability of survey attitude measurement: The influence of questions and respondent attributes, Sociological Methods Research, 1991, 20(139).

    Cited By

    View all
    • (2023)Using Clustering Algorithms to Automatically Identify Phishing CampaignsIEEE Access10.1109/ACCESS.2023.331081011(96502-96513)Online publication date: 2023
    • (2023)Privacy-preserving spam filtering using homomorphic and functional encryptionComputer Communications10.1016/j.comcom.2022.11.002197(230-241)Online publication date: Jan-2023
    • (2022)Interval-Valued Intuitionistic Fuzzy Decision With Graph Pattern in Big GraphIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2022.31410626:5(1057-1067)Online publication date: Oct-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
    June 2009
    1426 pages
    ISBN:9781605584959
    DOI:10.1145/1557019
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 June 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. email prioritization
    2. social network
    3. text mining

    Qualifiers

    • Research-article

    Conference

    KDD09

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)32
    • Downloads (Last 6 weeks)3
    Reflects downloads up to

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Using Clustering Algorithms to Automatically Identify Phishing CampaignsIEEE Access10.1109/ACCESS.2023.331081011(96502-96513)Online publication date: 2023
    • (2023)Privacy-preserving spam filtering using homomorphic and functional encryptionComputer Communications10.1016/j.comcom.2022.11.002197(230-241)Online publication date: Jan-2023
    • (2022)Interval-Valued Intuitionistic Fuzzy Decision With Graph Pattern in Big GraphIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2022.31410626:5(1057-1067)Online publication date: Oct-2022
    • (2022)An Intelligent Model of Email Spam Classification2022 Fourth International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT)10.1109/ICERECT56837.2022.10059620(1-6)Online publication date: 26-Dec-2022
    • (2022)A Reproducible Approach for Mining Business Activities from Emails for Process AnalyticsService-Oriented Computing – ICSOC 2021 Workshops10.1007/978-3-031-14135-5_6(77-91)Online publication date: 24-Aug-2022
    • (2021)Intuitionistic Fuzzy Requirements Aggregation for Graph Pattern Matching with Group Decision Makers2021 IEEE International Conference on Big Knowledge (ICBK)10.1109/ICKG52313.2021.00023(102-109)Online publication date: Dec-2021
    • (2021)Mining Organizational Structures from Email Logs: an NLP based approachProcedia Computer Science10.1016/j.procs.2021.08.036192(348-356)Online publication date: 2021
    • (2019)Evaluating User Actions as a Proxy for Email SignificanceThe World Wide Web Conference10.1145/3308558.3313624(26-36)Online publication date: 13-May-2019
    • (2019)An efficient method for top-k graph based node matchingWorld Wide Web10.1007/s11280-018-0577-y22:3(945-966)Online publication date: 1-May-2019
    • (2019)Context-aware graph pattern based top-k designated nodes finding in social graphsWorld Wide Web10.1007/s11280-017-0513-622:2(751-770)Online publication date: 1-Mar-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media