Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Exploiting temporal characteristics of features for effectively discovering event episodes from news corpora

Published: 01 March 2014 Publication History

Abstract

An organization performing environmental scanning generally monitors or tracks various events concerning its external environment. One of the major resources for environmental scanning is online news documents, which are readily accessible on news websites or infomediaries. However, the proliferation of the World Wide Web, which increases information sources and improves information circulation, has vastly expanded the amount of information to be scanned. Thus, it is essential to develop an effective event episode discovery mechanism to organize news documents pertaining to an event of interest. In this study, we propose two new metrics, Term Frequency × Inverse Document FrequencyTempoTF×IDFTempo and TF×Enhanced-IDFTempo, and develop a temporal-based event episode discovery TEED technique that uses the proposed metrics for feature selection and document representation. Using a traditional TF×IDF-based hierarchical agglomerative clustering technique as a performance benchmark, our empirical evaluation reveals that the proposed TEED technique outperforms its benchmark, as measured by cluster recall and cluster precision. In addition, the use of TF×Enhanced-IDFTempo significantly improves the effectiveness of event episode discovery when compared with the use of TF×IDFTempo.

References

[1]
Allan, J., Carbonell, J., Doddington, G., Yamron, J., &Yang, Y. 1998. Topic detection and tracking pilot study: Final report. Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop pp. pp.194-218. San Francisco, CA: Kaufmann.
[2]
Allan, J., Harding, S., Fisher, D., Bolivar, A., Guzman-Lara, S., &Amstutz, P. 2005. Taking topic detection from evaluation to practice. In R.H.Sprague, Jr. Ed., Proceedings of the 38th annual Hawaii International Conference on System Sciences HICSS '05 p. pp.101a. Los Alamitos, CA: IEEE Computer Society.
[3]
Allan, J., Lavrenko, V., &Swan, R. 2002. Explorations within topic tracking and detection. In J.Allan Ed., Topic detection and tracking: Event-based information organization pp. pp.197-224. Dordrecht, The Netherlands: Kluwer Academic.
[4]
Allan, J., Papka, R., &Lavrenko, V. 1998. On-line new event detection and tracking. In W.Croft, A.Moffat, C.J.<familyNamePrefix>van</familyNamePrefix>Rijsbergen, R.Wilkinson, &J.Zobel Eds., Proceedings of the 21st annual International Association Association for Computing Machinery's Special Interest Group on Information Retrieval ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR '98 pp. pp.37-45. New York, NY: ACM Press.
[5]
Choo, C.W. Ed. 1998. Information management for the intelligent organization: The art of scanning the environment. Medford, NJ: Information Today/Learned Information.
[6]
Goldstein, J., Mittal, V., Carbonell, J., &Kantrowitz, M. 2000. Multi-document summarization by sentence extraction. In U.Hahn, C.-Y.Lin, I.Mani, &D.Radev Eds., Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics NAACL-ANLP 2000 Workshop on Automatic Summarization pp. pp.40-48. Stroudsburg, PA: Association for Computational Linguistics.
[7]
Kumaran, G., &Allan, J. 2004. Text classification and named entities for new event detection. In M.Sanderson, K.Järvelin, J.Allan, &P.Bruza Eds., Proceedings of the 27th annual International Association Association for Computing Machinery's Special Interest Group on Information Retrieval ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR '04 pp. pp.297-304. New York, NY: ACM Press.
[8]
Kumaran, G., &Allan, J. 2005. Using names and topics for new event detection. In R.J.Mooney Ed., Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing pp. pp.121-128. Stroudsburg, PA: Association for Computational Linguistics.
[9]
Liu, X., Ma, F., &Lin, H. 2011. Topic detection with hypergraph partition algorithm. Journal of Software, Volume 6 Issue 12, pp.2407-2415.
[10]
Long, R., Wang, H., Chen, Y., Jin, O., &Yu, Y. 2011. Towards effective event detection, tracking and summarization on microblog data. In H.Wang, S.Li, S.Oyama, X.Hu, &T.Qian Eds., Lecture notes in computer science, Volume 6897 pp. pp.652-663. Berlin, Heidelberg: Springer-Verlag.
[11]
Luhn, H.P. 1958. The automatic creation of literature abstracts. IBM Journal of Research and Development, Volume 2, pp.159-165.
[12]
Makkonen, J. 2003. Investigations on event evolution in TDT. In C.Boulis, E.Breck, &V.Lavrenko Eds., Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics HLT-NAACL 2003 Student Research Workshop pp. pp.43-48. Stroudsburg, PA: Association for Computational Linguistics.
[13]
Makkonen, J., Ahonen-Myka, H., &Salmenkivi, M. 2004. Simple semantics in topic detection and tracking. Information Retrieval, Volume 7 Issue 3-4, pp.347-368.
[14]
Nallapati, R., Feng, A., Peng, F., &Allan, J. 2004. Event threading within news topics. In D.A.Grossman, L.Gravano, C.X.Zhai, O.Herzog, &D.A.Evans Eds., Proceedings of the 13th Association for Computing Machinery ACM International Conference on Information and Knowledge Management CIKM '04 pp. pp.446-453. New York, NY: ACM Press.
[15]
Papka, R. 1999. On-line new event detection, clustering, and tracking Unpublished doctoral dissertation. University of Massachusetts Amherst, Amherst, MA.
[16]
Roussinov, D., &Chen, H. 1999. Document clustering for electronic meetings: An experimental comparison of two techniques. Decision Support Systems, Volume 27 Issue 1, pp.67-79.
[17]
Toutanova, K., Klein, D., Manning, C., &Singer, Y. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In M.Hearst &M.Ostendorf Eds., Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics HLT-NAACL '03 pp. pp.252-259. Stroudsburg, PA: Association for Computational Linguistics.
[18]
Voorhees, E.M. 1986. Implementing agglomerative hierarchical clustering algorithms for use in document retrieval. Information Processing & Management, Volume 22 Issue 6, pp.465-476.
[19]
Wei, C., &Chang, Y.H. 2007. Discovering event evolution patterns from document sequences. The Institute of Electrical and Electronics Engineers IEEE Transactions on Systems, Man and Cybernetics-Part A: Systems and Humans, Volume 37 Issue 2, pp.273-283.
[20]
Wei, C., Chiang, R., &Wu, C. 2006. Accommodating individual categorization preferences: A personalized document clustering approach. Journal of Management Information Systems, Volume 23 Issue 2, pp.173-201.
[21]
Wei, C., Hu, P., &Lee, Y.H. 2009. Preserving user preferences in automated document-category management: An evolution-based approach. Journal of Management Information Systems, Volume 25 Issue 4, pp.109-143.
[22]
Wei, C., &Lee, Y.H. 2004. Event detection from online news documents for supporting environmental scanning. Decision Support Systems, Volume 36 Issue 4, pp.385-401.
[23]
Wei, C., Yang, C.S., &Hsiao, H.W. 2008. Collaborative filtering-based approach to personalized document clustering. Decision Support Systems, Volume 45 Issue 3, pp.413-428.
[24]
Yang, C.C., Shi, X.D., &Wei, C. 2009. Discovering event evolution graphs from news corpora. The Institute of Electrical and Electronics Engineers IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, Volume 39 Issue 4, pp.850-863.
[25]
Yang, Y., Carbonell, J.G., Brown, R.D., Pierce, T., Archibald, B.T., &Liu, X. 1999. Learning approaches for detecting and tracking news events. IEEE Intelligent Systems, Volume 14 Issue 4, pp.32-43.
[26]
Yang, Y., Pierce, T., &Carbonell, J. 1998. A study of retrospective and on-line event detection. In W.Croft, A.Moffat, C.J.<familyNamePrefix>van</familyNamePrefix>Rijsbergen, R.Wilkinson, &J.Zobel Eds., Proceedings of the 21st annual International Association Association for Computing Machinery's Special Interest Group on Information Retrieval ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR '98 pp. pp.28-36. New York, NY: ACM Press.
[27]
Yang, Y., Zhang, J., Carbonell, J., &Jin, C. 2002. Topic-conditioned novelty detection. In D.Hand, D.Keim, &R.Ng Eds., Proceedings of the Eighth Association ofr Computing Machinery's Special Interest Group on Knowledge Discovery and Data Mining ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD '02 pp. pp.688-693. New York, NY: ACM Press.
[28]
Yi, J. 2005. Detecting buzz from time-sequenced document streams. In W.K.Cheung &J.Hsu Eds., Proceedings of the Institute of Electrical and Electronics Engineers IEEE International Conference on e-Technology, e-Commerce and e-Service EEE '05 pp. pp.347-352. Los Alamitos, CA: IEEE Computer Society.
[29]
Zhang, J., Xia, Y., Ma, B., Yao, J., &Hong, Y. 2011. Thread cleaning and merging for microblog topic detection. In H.Wang &D.Yarowsky Eds., Proceedings of the Fifth International Joint Conference on Natural Language Processing IJCNLP '11 pp. pp.589-597. Singapore: Asian Federation of Natural Language Processing.
[30]
Zhang, K., Li, J.Z., &Wu, G. 2007. New event detection based on indexing-tree and named entity. In W.Kraaij, A.P.<familyNamePrefix>de</familyNamePrefix>Vries, C.L.Clarke, N.Fuhr, &N.Kando Eds., Proceedings of the 30th annual International Association Association for Computing Machinery's Special Interest Group on Information Retrieval ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR '07 pp. pp.215-222. New York, NY: ACM Press.

Cited By

View all
  • (2023)A Survey on Event-Based News Narrative ExtractionACM Computing Surveys10.1145/358474155:14s(1-39)Online publication date: 17-Feb-2023
  • (2018)Lifecycle-Based Event Detection from MicroblogsCompanion Proceedings of the The Web Conference 201810.1145/3184558.3186338(283-290)Online publication date: 23-Apr-2018
  • (2017)News event evolution model based on the reading willingness and modified TF-IDF formulaJournal of High Speed Networks10.3233/JHS-17055523:1(33-47)Online publication date: 1-Jan-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of the Association for Information Science and Technology
Journal of the Association for Information Science and Technology  Volume 65, Issue 3
March 2014
218 pages
ISSN:2330-1635
EISSN:2330-1643
Issue’s Table of Contents

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 01 March 2014

Author Tags

  1. automatic extracting
  2. text mining

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A Survey on Event-Based News Narrative ExtractionACM Computing Surveys10.1145/358474155:14s(1-39)Online publication date: 17-Feb-2023
  • (2018)Lifecycle-Based Event Detection from MicroblogsCompanion Proceedings of the The Web Conference 201810.1145/3184558.3186338(283-290)Online publication date: 23-Apr-2018
  • (2017)News event evolution model based on the reading willingness and modified TF-IDF formulaJournal of High Speed Networks10.3233/JHS-17055523:1(33-47)Online publication date: 1-Jan-2017

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media