Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1378773.1378800acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article

Generating summary keywords for emails using topics

Published: 13 January 2008 Publication History

Abstract

Email summary keywords, used to concisely represent the gist of an email, can help users manage and prioritize large numbers of messages. We develop an unsupervised learning framework for selecting summary keywords from emails using latent representations of the underlying topics in a user's mailbox. This approach selects words that describe each message in the context of existing topics rather than simply selecting keywords based on a single message in isolation. We present and compare four methods for selecting summary keywords based on two well-known models for inferring latent topics: latent semantic analysis and latent Dirichlet allocation. The quality of the summary keywords is assessed by generating summaries for emails from twelve users in the Enron corpus. The summary keywords are then used in place of entire messages in two proxy tasks: automated foldering and recipient prediction. We also evaluate the extent to which summary keywords enhance the information already available in a typical email user interface by repeating the same tasks using email subject lines.

References

[1]
Ron Bekkerman, Andrew McCallum, and Gary Huang. Automatic categorization of email into folders: Benchmark experiments on Enron and SRI corpora. Technical Report IR-418, University of Massachusetts Amherst, 2004.
[2]
David Blei, Andrew Ng, and Michael Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.
[3]
W. Buntine, J. Löfström, J. Perkiö, S. Perttu, V. Poroshin, T. Silander, H. Tirri, A. Tuominen, and V. Tuulos. A scalable topic-based open source search engine. In Proceedings of the IEEE/WIC/ACM Conference on Web Intelligence, pages 228--234, 2004.
[4]
Giuseppe Carenini, Raymond Ng, and Xiaodong Zhou. Summarizing email conversations with clue words. In Proceedings of the Sixteenth International World Wide Web Conference (WWW2007), 2007.
[5]
Vitor R. Carvalho and William Cohen. Recommending recipients in the Enron email corpus. Technical Report CMU-LTI-07-005, Carnegie Mellon University, 2007.
[6]
Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 2006.
[7]
Angelo Dalli, Yunqing Xia, and Yorick Wilks. Fasil email summarisation system. In COLING, 2004.
[8]
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.
[9]
Mark Dredze, Tessa Lau, and Nicholas Kushmerick. Automatically classifying emails into activities. In Proceedings of the International Conference on Intelligent User Interfaces, 2006.
[10]
Susan T. Dumais. LSI meets TREC: A status report. In Text REtrieval Conference, pages 137--152, 1992.
[11]
Michael Fink, Shai Shalev-Shwartz, Yoram Singer, and Shimon Ullman. Online multiclass learning by interclass hypothesis sharing. In International Conference on Machine Learning (ICML), 2006.
[12]
Joshua Goodman and Vitor R. Carvalho. Implicit queries for email. In CEAS, 2005.
[13]
T. L. Griffiths and M. Steyvers. A probabilistic approach to semantic representation. In Proceedings of the 24th Annual Conference of the Cognitive Society, 2002.
[14]
T. Hoffman. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, 1999.
[15]
B. Klimt and Y. Yang. The Enron corpus: A new dataset for email classification research. In ECML, 2004.
[16]
Andrew McCallum, Andres Corrada-Emmanuel, and Xuerui Wang. Topic and role discovery in social networks. In IJCAI, 2005.
[17]
Andrew McCallum, Xuerui Wang, and Andres Corrada-Emmanuel. Topic and role discovery in social networks with experiments on Enron and academic email. In Journal of Artificial Intelligence Research, 2007.
[18]
Andrew Kachites McCallum. MALLET: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.
[19]
Ryan McDonald, Koby Crammer, Kuzman Ganchev, Surya Prakash Bachoti, and Mark Dredze. Penn StructLearn. http://www.seas.upenn.edu/strctlrn/StructLearn/StructLearn.html, 2006.
[20]
Smaranda Muresan, Evelyne Tzoukermann, and Judith L. Klavans. Combining linguistic and machine learning techniques for email summarization. In CONLL, 2001.
[21]
Carman Neustaedter, A. J. Bernheim Brush, Marc A. Smith, and Danyel Fisher. The social network and relationship finder: Social sorting for email triage. In Proceedings of the Conference on Email and Anti-Spam (CEAS), Mountain View, CA, 2005.
[22]
Chris Pal and Andrew McCallum. CC prediction with graphical models. In Conference on Email and Anti-Spam (CEAS), 2006.
[23]
Owen Rambow, Lokesh Shrestha, John Chen, and Chirsty Lauridsen. Summarizing email threads. In HLT/NAACL, 2004.
[24]
R. Segal and J. Kephart. Mailcat: An intelligent assistant for organizing e-mail. In Proceedings of the Third International Conference on Autonomous Agents, 1999.
[25]
S Sood, S Owsley, K Hammond, and L Birnbaum. Tag Assist: Automatic tag suggestion for blog posts. In ICWSM, 2007.
[26]
Mark Steyvers and Tom Griffiths. Probabilistic topic models. In D McNamara, S Dennis, and W Kintsch, editors, Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, in press.
[27]
G. Venolia, L. Dabbish, J. J. Cadiz, and A. Gupta. Supporting email workflow. Technical Report MSR-TR-2001-88, Microsoft Research, 2001.
[28]
Hanna M. Wallach. Topic modeling: Beyond bag-of-words. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, Pennsylvania, 2006.
[29]
Stephen Wan and Kathy McKeown. Generating overview summaries of ongoing email thread discussions. In COLING, 2004.
[30]
Xuerui Wang and Andrew McCallum. A note on topical n-grams. Technical Report UM-CS-2005-071, University of Massachusetts Amherst, 2005.
[31]
Xing Wei and W. Bruce Croft. LDA-based document models for Ad-hoc retrieval. In SIGIR, 2006.

Cited By

View all
  • (2023)SlideSpecs: Automatic and Interactive Presentation Feedback CollationProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584035(695-709)Online publication date: 27-Mar-2023
  • (2023)Textual Analytics on ‘Azadi Ka Amrit Mahotsav’: Exploring Indian citizens' ideas for achieving Aatmanirbhar Bharat2023 Third International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)10.1109/ICAECT57570.2023.10118308(1-8)Online publication date: 5-Jan-2023
  • (2022)Characterizing and Understanding Development of Social Computing Through DBLP: A Data-Driven AnalysisJournal of Social Computing10.23919/JSC.2022.00183:4(287-302)Online publication date: Dec-2022
  • Show More Cited By

Index Terms

  1. Generating summary keywords for emails using topics

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    IUI '08: Proceedings of the 13th international conference on Intelligent user interfaces
    January 2008
    458 pages
    ISBN:9781595939876
    DOI:10.1145/1378773
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 January 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. email
    2. foldering
    3. keyword generation
    4. recipient prediction
    5. topic modeling

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    IUI08
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 746 of 2,811 submissions, 27%

    Upcoming Conference

    IUI '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)33
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)SlideSpecs: Automatic and Interactive Presentation Feedback CollationProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584035(695-709)Online publication date: 27-Mar-2023
    • (2023)Textual Analytics on ‘Azadi Ka Amrit Mahotsav’: Exploring Indian citizens' ideas for achieving Aatmanirbhar Bharat2023 Third International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)10.1109/ICAECT57570.2023.10118308(1-8)Online publication date: 5-Jan-2023
    • (2022)Characterizing and Understanding Development of Social Computing Through DBLP: A Data-Driven AnalysisJournal of Social Computing10.23919/JSC.2022.00183:4(287-302)Online publication date: Dec-2022
    • (2022)Real-Time Visual Analysis of High-Volume Social Media PostsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311480028:1(879-889)Online publication date: 1-Jan-2022
    • (2021)Email Clustering & Generating Email Templates Based on Their TopicsProceedings of the 2021 5th International Conference on Information System and Data Mining10.1145/3471287.3471298(96-103)Online publication date: 27-May-2021
    • (2021)In authority, or peers we trust? Reviews and recommendations in social commerceBehaviour & Information Technology10.1080/0144929X.2021.195701641:13(2887-2904)Online publication date: 29-Jul-2021
    • (2020)Bot in the Bunch: Facilitating Group Chat Discussion by Improving Efficiency and Participation with a ChatbotProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376785(1-13)Online publication date: 21-Apr-2020
    • (2020)Semiautomatic Structural BIM-Model Generation Methodology Using CAD Construction DrawingsJournal of Computing in Civil Engineering10.1061/(ASCE)CP.1943-5487.000088534:3Online publication date: May-2020
    • (2019)Keyphrase Generation: A Multi-Aspect Survey2019 25th Conference of Open Innovations Association (FRUCT)10.23919/FRUCT48121.2019.8981519(85-94)Online publication date: Nov-2019
    • (2019)ELSAACM Transactions on Information Systems10.1145/329898737:2(1-33)Online publication date: 16-Jan-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media