Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3209978.3210034acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Automatic Ground Truth Expansion for Timeline Evaluation

Published: 27 June 2018 Publication History

Abstract

The development of automatic systems that can produce timeline summaries by filtering high-volume streams of text documents, retaining only those that are relevant to a particular information need (e.g. topic or event), remains a very challenging task. To advance the field of automatic timeline generation, robust and reproducible evaluation methodologies are needed. To this end, several evaluation metrics and labeling methodologies have recently been developed - focusing on information nugget or cluster-based ground truth representations, respectively. These methodologies rely on human assessors manually mapping timeline items (e.g. tweets) to an explicit representation of what information a 'good' summary should contain. However, while these evaluation methodologies produce reusable ground truth labels, prior works have reported cases where such labels fail to accurately estimate the performance of new timeline generation systems due to label incompleteness. In this paper, we first quantify the extent to which timeline summary ground truth labels fail to generalize to new summarization systems, then we propose and evaluate new automatic solutions to this issue. In particular, using a depooling methodology over 21 systems and across three high-volume datasets, we quantify the degree of system ranking error caused by excluding those systems when labeling. We show that when considering lower-effectiveness systems, the test collections are robust (the likelihood of systems being miss-ranked is low). However, we show that the risk of systems being miss-ranked increases as the effectiveness of systems held-out from the pool increases. To reduce the risk of miss-ranking systems, we also propose two different automatic ground truth label expansion techniques. Our results show that our proposed expansion techniques can be effective for increasing the robustness of the TREC-TS test collections, markedly reducing the number of miss-rankings by up to 50% on average among the scenarios tested.

References

[1]
James Allan, Rahul Gupta, and Vikas Khandelwal . 2001. Temporal summaries of new topics. In ACM SIGIR 2001.
[2]
Javed Aslam, Fernando Diaz, Matthew Ekstrand-Abueg, Richard McCreadie, Virgil Pavlu, and Tetsuya Sakai . 2014. TREC 2014 Temporal Summarization Track Guidelines TREC 2014.
[3]
Javed Aslam, Fernando Diaz, Matthew Ekstrand-Abueg, Richard McCreadie, Virgil Pavlu, and Tetsuya Sakai . 2015. TREC 2015 Temporal Summarization Track Overview. In TREC 2015.
[4]
Javed A Aslam, Matthew Ekstrand-Abueg, Virgil Pavlu, Fernando Diaz, and Tetsuya Sakai . 2013. TREC 2013 Temporal Summarization. In TREC 2013.
[5]
Gaurav Baruah, Richard McCreadie, and Jimmy Lin . 2017. A Comparison of Nuggets and Clusters for Evaluating Timeline Summaries ACM CIKM 2017.
[6]
Chris Buckley and Ellen M. Voorhees . 2004. Retrieval Evaluation with Incomplete Information. In ACM SIGIR 2004.
[7]
Chris Buckley and Ellen M. Voorhees . 2017. Evaluating Evaluation Measure Stability. SIGIR Forum Vol. 51, 2 (Aug. . 2017), 235--242.
[8]
John M. Conroy, Judith D. Schlesinger, and Dianne P. O'Leary . 2011. Nouveau-ROUGE: A Novelty Metric for Update Summarization. Computational Linguistics Vol. 37 (2011), 1--8.
[9]
Hoa Dang . 2005. Overview of DUC 2005. In DUC 2015.
[10]
Hoa Trang Dang and Karolina Owczarzak . 2008. Overview of the TAC 2008 Opinion Question Answering and Summarization Tasks TAC 2008.
[11]
Matthew Ekstrand-Abueg, Richard McCreadie, Virgil Pavlu, and Fernando Diaz . 2016. A Study of Realtime Summarization Metrics. In CIKM 2016.
[12]
Qi Guo, Fernando Diaz, and Elad Yom-Tov . 2013. Updating Users about Time Critical Events. In ECIR 2013.
[13]
Chris Kedzie, Fernando Diaz, and Kathleen McKeown . 2016. Real-Time Web Scale Event Summarization Using Sequential Decision Making. arXiv preprint arXiv:1605.03664 (2016).
[14]
Chris Kedzie, Kathleen McKeown, and Fernando Diaz . 2015. Predicting Salient Updates for Disaster Summarization. ACL 2015.
[15]
Tom Kenter and Maarten De Rijke . 2015. Short text similarity with word embeddings. In ACM CIKM 2015.
[16]
Chin-Yew Lin . 2004. ROUGE: A Package for Automatic Evaluation of Summaries ACL Workshop On Text Summarization 2004.
[17]
Jimmy Lin, Miles Efron, Yulu Wang, and Garrick Sherman . 2014. Overview of the TREC-2014 Microblog Track. In TREC 2014.
[18]
Craig Macdonald and Iadh Ounis . 2011. The influence of the document ranking in expert search. Information Processing & Management Vol. 47, 3 (2011), 376--390.
[19]
Richard McCreadie, Craig Macdonald, and Iadh Ounis . 2014. Incremental Update Summarization: Adaptive Sentence Selection based on Prevalence and Novelty CIKM 2014.
[20]
Richard McCreadie, Rodrygo Santos, Craig Macdonald, and Iadh Ounis . 2017. Explicit diversification of event aspects for temporal summarization. ACM TOIS (2017).
[21]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean . 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[22]
Ani Nenkova and Kathleen McKeown . 2011. Automatic Summarization. FnTIR Vol. 5, 2--3 (2011), 103--233.
[23]
Paul Over . 1997. TREC-6 Interactive Report. In TREC 1997.
[24]
Tetsuya Sakai . 2008. Comparing metrics across TREC and NTCIR:: the robustness to pool depth bias ACM SIGIR 2008.
[25]
Rodrygo LT Santos, Iadh Ounis, and Craig Macdonald . 2015. Search result diversification. Foundations and Trends in Information Retrieval Vol. 9, 1 (2015), 1--90.
[26]
Luchen Tan, Adam Roegiest, Charles L. A. Clarke, and Jimmy Lin . 2016. Simple Dynamic Emission Strategies for Microblog Filtering ACM SIGIR 2016.
[27]
Andrew Turpin and Falk Scholer . 2006. User performance versus precision measures for simple search tasks ACM SIGIR 2006.
[28]
Ellen M. Voorhees . 1998. Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness SIGIR 1998.
[29]
Ellen M. Voorhees . 2003. Overview of the TREC 2003 Question Answering Track TREC 2003.
[30]
Rui Yan, Xiaojun Wan, Jahna Otterbacher, Liang Kong, Xiaoming Li, and Yan Zhang . 2011. Evolutionary Timeline Summarization: a Balanced Optimization Framework via Iterative Substitution. In ACM SIGIR 2011.
[31]
Emine Yilmaz, Javed A. Aslam, and Stephen Robertson . 2008. A New Rank Correlation Coefficient for Information Retrieval ACM SIGIR 2008.
[32]
ChengXiang Zhai, William W. Cohen, and John Lafferty . 2003. Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval ACM SIGIR 2003.
[33]
Chunyun Zhang, Zhanyu Ma, Jiayue Zhang, Weiran Xu, and Jun Guo . 2015. A Multi-level System for Sequential Update Summarization QSHINE 2015.
[34]
Justin Zobel . 1998. How Reliable Are the Results of Large-scale Information Retrieval Experiments? ACM SIGIR 1998.

Cited By

View all
  • (2024)A topic modeling‐based bibliometric exploration of automatic summarization researchWIREs Data Mining and Knowledge Discovery10.1002/widm.154014:5Online publication date: 25-Apr-2024
  • (2023)One-Shot Labeling for Automatic Relevance EstimationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592032(2230-2235)Online publication date: 19-Jul-2023
  • (2022)Document vector embedding based extractive text summarization system for Hindi and English textApplied Intelligence10.1007/s10489-021-02871-952:8(9353-9372)Online publication date: 5-Jan-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
June 2018
1509 pages
ISBN:9781450356572
DOI:10.1145/3209978
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ground truth
  2. labelling
  3. pooling
  4. temporal summarization
  5. timeline generation
  6. trec

Qualifiers

  • Research-article

Conference

SIGIR '18
Sponsor:

Acceptance Rates

SIGIR '18 Paper Acceptance Rate 86 of 409 submissions, 21%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A topic modeling‐based bibliometric exploration of automatic summarization researchWIREs Data Mining and Knowledge Discovery10.1002/widm.154014:5Online publication date: 25-Apr-2024
  • (2023)One-Shot Labeling for Automatic Relevance EstimationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592032(2230-2235)Online publication date: 19-Jul-2023
  • (2022)Document vector embedding based extractive text summarization system for Hindi and English textApplied Intelligence10.1007/s10489-021-02871-952:8(9353-9372)Online publication date: 5-Jan-2022
  • (2021)Improved Text Summarization of News Articles Using GA-HC and PSO-HCApplied Sciences10.3390/app11221051111:22(10511)Online publication date: 9-Nov-2021
  • (2020)Jointly Learning Representations of Nodes and Attributes for Attributed NetworksACM Transactions on Information Systems10.1145/337785038:2(1-32)Online publication date: 27-Jan-2020

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media