Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
On-line new event detection, clustering, and tracking (information retrieval, internet)
Publisher:
  • University of Massachusetts Amherst
ISBN:978-0-599-52964-9
Order Number:AAI9950198
Pages:
154
Reflects downloads up to 04 Oct 2024Bibliometrics
Skip Abstract Section
Abstract

In this work, we discuss and evaluate solutions to text classification problems associated with the events that are reported in on-line sources of news. We present solutions to three related classification problems: new event detection, event clustering , and event tracking .

The primary focus of this thesis is new event detection , where the goal is to identify news stories that have not Previously been reported, in a stream of broadcast news comprising radio, television, and newswire. We present an algorithm for new event detection, and analyze the effects of incorporating domain properties into the classification algorithm. We explore a solution that models the temporal relationship between news stories, and investigate the use of proper noun phrase extraction to capture the who, what, when , and where contained in news. Our results for new event detection suggest that previous approaches to document clustering provide a good basis for an approach to new event detection, and that further improvements to classification accuracy are obtained when the domain properties of broadcast news are modeled.

New event detection is related to the problem of event clustering , where the goal is to group stories that discuss the same event. We investigate on-line clustering as an approach to new event detection, and re-evaluate existing cluster comparison strategies previously used for document retrieval. Our results suggest that these strategies produce different groupings of events, and that the on-line single-link strategy extended with a model for domain properties is faster and more effective than other approaches.

In this dissertation, we explore several text representation issues in the context of event tracking , where a classifier for an event is formulated from one or more sample stories. The classifier is used to monitor the subsequent news strewn for documents related to the event. We discuss different approaches to classifier formulation, and compare feature selection and weight-learning steps as extensions to a baseline process used for new event detection. In addition, we evaluate an unsupervised adaptive approach to event tracking that captures the property of event evolution in broadcast news.

The implementations of our approaches to on-line new event detection, clustering, and tracking have been evaluated in comparison to other systems, and we present cross-system comparisons for all three classification problems. In general, the results using our approaches compared favorably to other approaches for each problem.

Cited By

  1. Wei C, Lee Y, Chiang Y, Chen C and Yang C (2014). Exploiting temporal characteristics of features for effectively discovering event episodes from news corpora, Journal of the Association for Information Science and Technology, 65:3, (621-634), Online publication date: 1-Mar-2014.
  2. ACM
    Wei C, Lee Y, Chiang Y, Chen J and Yang C Discovering event episodes from news corpora Proceedings of the 11th International Conference on Electronic Commerce, (72-80)
  3. Lei Z, Liao J, Li D and Wu L Event Detection and Tracking Based on Improved Incremental K-Means and Transductive SVM Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Artificial Intelligence, (872-879)
  4. ACM
    Araujo L and Merelo J A genetic algorithm for dynamic modelling and prediction of activity in document streams Proceedings of the 9th annual conference on Genetic and evolutionary computation, (1896-1903)
  5. Lei Z, Wu L, Zhang Y and Liu Y A system for detecting and tracking internet news event Proceedings of the 6th Pacific-Rim conference on Advances in Multimedia Information Processing - Volume Part I, (754-764)
  6. Makkonen J, Ahonen-Myka H and Salmenkivi M (2019). Simple Semantics in Topic Detection and Tracking, Information Retrieval, 7:3-4, (347-368), Online publication date: 1-Sep-2004.
  7. Makkonen J Investigations on event evolution in TDT Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Proceedings of the HLT-NAACL 2003 student research workshop - Volume 3, (43-48)
  8. Pons-Porrata A, Berlanga-Llavori R and Ruiz-Shulcloper J Building a hierarchy of events and topics for newspaper digital libraries Proceedings of the 25th European conference on IR research, (588-596)
  9. ACM
    Allan J, Lavrenko V and Jin H First story detection in TDT is hard Proceedings of the ninth international conference on Information and knowledge management, (374-381)
Contributors
  • University of Massachusetts Amherst
  • University of Massachusetts Amherst
  • University of Massachusetts Amherst

Recommendations