research-article

Generating summary keywords for emails using topics

Authors:

Hanna M. Wallach,

Fernando PereiraAuthors Info & Claims

IUI '08: Proceedings of the 13th international conference on Intelligent user interfaces

Pages 199 - 206

https://doi.org/10.1145/1378773.1378800

Published: 13 January 2008 Publication History

Abstract

Email summary keywords, used to concisely represent the gist of an email, can help users manage and prioritize large numbers of messages. We develop an unsupervised learning framework for selecting summary keywords from emails using latent representations of the underlying topics in a user's mailbox. This approach selects words that describe each message in the context of existing topics rather than simply selecting keywords based on a single message in isolation. We present and compare four methods for selecting summary keywords based on two well-known models for inferring latent topics: latent semantic analysis and latent Dirichlet allocation. The quality of the summary keywords is assessed by generating summaries for emails from twelve users in the Enron corpus. The summary keywords are then used in place of entire messages in two proxy tasks: automated foldering and recipient prediction. We also evaluate the extent to which summary keywords enhance the information already available in a typical email user interface by repeating the same tasks using email subject lines.

References

[1]

Ron Bekkerman, Andrew McCallum, and Gary Huang. Automatic categorization of email into folders: Benchmark experiments on Enron and SRI corpora. Technical Report IR-418, University of Massachusetts Amherst, 2004.

[2]

David Blei, Andrew Ng, and Michael Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.

Digital Library

[3]

W. Buntine, J. Löfström, J. Perkiö, S. Perttu, V. Poroshin, T. Silander, H. Tirri, A. Tuominen, and V. Tuulos. A scalable topic-based open source search engine. In Proceedings of the IEEE/WIC/ACM Conference on Web Intelligence, pages 228--234, 2004.

Digital Library

[4]

Giuseppe Carenini, Raymond Ng, and Xiaodong Zhou. Summarizing email conversations with clue words. In Proceedings of the Sixteenth International World Wide Web Conference (WWW2007), 2007.

Digital Library

[5]

Vitor R. Carvalho and William Cohen. Recommending recipients in the Enron email corpus. Technical Report CMU-LTI-07-005, Carnegie Mellon University, 2007.

[6]

Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 2006.

Digital Library

[7]

Angelo Dalli, Yunqing Xia, and Yorick Wilks. Fasil email summarisation system. In COLING, 2004.

Digital Library

[8]

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.

[9]

Mark Dredze, Tessa Lau, and Nicholas Kushmerick. Automatically classifying emails into activities. In Proceedings of the International Conference on Intelligent User Interfaces, 2006.

Digital Library

[10]

Susan T. Dumais. LSI meets TREC: A status report. In Text REtrieval Conference, pages 137--152, 1992.

[11]

Michael Fink, Shai Shalev-Shwartz, Yoram Singer, and Shimon Ullman. Online multiclass learning by interclass hypothesis sharing. In International Conference on Machine Learning (ICML), 2006.

Digital Library

[12]

Joshua Goodman and Vitor R. Carvalho. Implicit queries for email. In CEAS, 2005.

[13]

T. L. Griffiths and M. Steyvers. A probabilistic approach to semantic representation. In Proceedings of the 24th Annual Conference of the Cognitive Society, 2002.

[14]

T. Hoffman. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, 1999.

[15]

B. Klimt and Y. Yang. The Enron corpus: A new dataset for email classification research. In ECML, 2004.

Digital Library

[16]

Andrew McCallum, Andres Corrada-Emmanuel, and Xuerui Wang. Topic and role discovery in social networks. In IJCAI, 2005.

Digital Library

[17]

Andrew McCallum, Xuerui Wang, and Andres Corrada-Emmanuel. Topic and role discovery in social networks with experiments on Enron and academic email. In Journal of Artificial Intelligence Research, 2007.

Digital Library

[18]

Andrew Kachites McCallum. MALLET: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.

[19]

Ryan McDonald, Koby Crammer, Kuzman Ganchev, Surya Prakash Bachoti, and Mark Dredze. Penn StructLearn. http://www.seas.upenn.edu/strctlrn/StructLearn/StructLearn.html, 2006.

[20]

Smaranda Muresan, Evelyne Tzoukermann, and Judith L. Klavans. Combining linguistic and machine learning techniques for email summarization. In CONLL, 2001.

Digital Library

[21]

Carman Neustaedter, A. J. Bernheim Brush, Marc A. Smith, and Danyel Fisher. The social network and relationship finder: Social sorting for email triage. In Proceedings of the Conference on Email and Anti-Spam (CEAS), Mountain View, CA, 2005.

[22]

Chris Pal and Andrew McCallum. CC prediction with graphical models. In Conference on Email and Anti-Spam (CEAS), 2006.

[23]

Owen Rambow, Lokesh Shrestha, John Chen, and Chirsty Lauridsen. Summarizing email threads. In HLT/NAACL, 2004.

Digital Library

[24]

R. Segal and J. Kephart. Mailcat: An intelligent assistant for organizing e-mail. In Proceedings of the Third International Conference on Autonomous Agents, 1999.

Digital Library

[25]

S Sood, S Owsley, K Hammond, and L Birnbaum. Tag Assist: Automatic tag suggestion for blog posts. In ICWSM, 2007.

[26]

Mark Steyvers and Tom Griffiths. Probabilistic topic models. In D McNamara, S Dennis, and W Kintsch, editors, Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, in press.

[27]

G. Venolia, L. Dabbish, J. J. Cadiz, and A. Gupta. Supporting email workflow. Technical Report MSR-TR-2001-88, Microsoft Research, 2001.

[28]

Hanna M. Wallach. Topic modeling: Beyond bag-of-words. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, Pennsylvania, 2006.

Digital Library

[29]

Stephen Wan and Kathy McKeown. Generating overview summaries of ongoing email thread discussions. In COLING, 2004.

Digital Library

[30]

Xuerui Wang and Andrew McCallum. A note on topical n-grams. Technical Report UM-CS-2005-071, University of Massachusetts Amherst, 2005.

[31]

Xing Wei and W. Bruce Croft. LDA-based document models for Ad-hoc retrieval. In SIGIR, 2006.

Digital Library

Cited By

Warner JPavel ANguyen TAgrawala MHartmann B(2023)SlideSpecs: Automatic and Interactive Presentation Feedback CollationProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584035(695-709)Online publication date: 27-Mar-2023
https://dl.acm.org/doi/10.1145/3581641.3584035
HimaBindu IReddy SHaragopal VSarojamma B(2023)Textual Analytics on ‘Azadi Ka Amrit Mahotsav’: Exploring Indian citizens' ideas for achieving Aatmanirbhar Bharat2023 Third International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)10.1109/ICAECT57570.2023.10118308(1-8)Online publication date: 5-Jan-2023
https://doi.org/10.1109/ICAECT57570.2023.10118308
Wu JYe BGong QOksanen ALi CQu JTian FLi XChen Y(2022)Characterizing and Understanding Development of Social Computing Through DBLP: A Data-Driven AnalysisJournal of Social Computing10.23919/JSC.2022.00183:4(287-302)Online publication date: Dec-2022
https://doi.org/10.23919/JSC.2022.0018
Show More Cited By

Index Terms

Generating summary keywords for emails using topics
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

How Experts Detect Phishing Scam Emails
CSCW

Phishing scam emails are emails that pretend to be something they are not in order to get the recipient of the email to undertake some action they normally would not. While technical protections against phishing reduce the number of phishing emails ...
Email Clustering & Generating Email Templates Based on Their Topics
ICISDM '21: Proceedings of the 2021 5th International Conference on Information System and Data Mining

Email templates have a significant impact on users in terms of productivity. Using an email template that is produced successfully is going to transfer the main information with a considerable impression. While the previous studies were focused on the ...
Text, Topics, and Turkers: A Consensus Measure for Statistical Topics
HT '15: Proceedings of the 26th ACM Conference on Hypertext & Social Media

Topic modeling is an important tool in social media analysis, allowing researchers to quickly understand large text corpora by investigating the topics underlying them. One of the fundamental problems of topic models lies in how to assess the quality of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IUI '08: Proceedings of the 13th international conference on Intelligent user interfaces

January 2008

458 pages

ISBN:9781595939876

DOI:10.1145/1378773

General Chair:
Steffen Staab
Universitat Koblenz-Landau, Germany

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence
ACM: Association for Computing Machinery
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
AAAI: Association for the Advancement of Artifical Intelligence

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Defense Advanced Research Projects Agency

Conference

IUI08

Sponsor:

IUI08: 13th International Conference on Intelligent User Interfaces

January 13 - 16, 2008

Gran Canaria, Spain

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

68
Total Citations
View Citations
992
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)2

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Warner JPavel ANguyen TAgrawala MHartmann B(2023)SlideSpecs: Automatic and Interactive Presentation Feedback CollationProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584035(695-709)Online publication date: 27-Mar-2023
https://dl.acm.org/doi/10.1145/3581641.3584035
HimaBindu IReddy SHaragopal VSarojamma B(2023)Textual Analytics on ‘Azadi Ka Amrit Mahotsav’: Exploring Indian citizens' ideas for achieving Aatmanirbhar Bharat2023 Third International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)10.1109/ICAECT57570.2023.10118308(1-8)Online publication date: 5-Jan-2023
https://doi.org/10.1109/ICAECT57570.2023.10118308
Wu JYe BGong QOksanen ALi CQu JTian FLi XChen Y(2022)Characterizing and Understanding Development of Social Computing Through DBLP: A Data-Driven AnalysisJournal of Social Computing10.23919/JSC.2022.00183:4(287-302)Online publication date: Dec-2022
https://doi.org/10.23919/JSC.2022.0018
Knittel JKoch STang TChen WWu YLiu SErtl T(2022)Real-Time Visual Analysis of High-Volume Social Media PostsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311480028:1(879-889)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1109/TVCG.2021.3114800
Coskun FGezer CGungor V(2021)Email Clustering & Generating Email Templates Based on Their TopicsProceedings of the 2021 5th International Conference on Information System and Data Mining10.1145/3471287.3471298(96-103)Online publication date: 27-May-2021
https://dl.acm.org/doi/10.1145/3471287.3471298
Dinulescu CPrybutok V(2021)In authority, or peers we trust? Reviews and recommendations in social commerceBehaviour & Information Technology10.1080/0144929X.2021.195701641:13(2887-2904)Online publication date: 29-Jul-2021
https://doi.org/10.1080/0144929X.2021.1957016
Kim SEun JOh CSuh BLee JBernhaupt RMueller FVerweij DAndres JMcGrenere JCockburn AAvellino IGoguey ABjørn PZhao SSamson BKocielnik R(2020)Bot in the Bunch: Facilitating Group Chat Discussion by Improving Efficiency and Participation with a ChatbotProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376785(1-13)Online publication date: 21-Apr-2020
https://dl.acm.org/doi/10.1145/3313831.3376785
Yang BLiu BZhu DZhang BWang ZLei K(2020)Semiautomatic Structural BIM-Model Generation Methodology Using CAD Construction DrawingsJournal of Computing in Civil Engineering10.1061/(ASCE)CP.1943-5487.000088534:3Online publication date: May-2020
https://doi.org/10.1061/(ASCE)CP.1943-5487.0000885
Cano EBojar O(2019)Keyphrase Generation: A Multi-Aspect Survey2019 25th Conference of Open Innovations Association (FRUCT)10.23919/FRUCT48121.2019.8981519(85-94)Online publication date: Nov-2019
https://doi.org/10.23919/FRUCT48121.2019.8981519
Dinulescu C(2019)Relationship Quality in Social Commerce Decision-Makingundefined10.12794/metadc1538714Online publication date: Aug-2019
https://doi.org/10.12794/metadc1538714
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten