Abstract: In this paper we propose a new incremental clustering algorithm for Event Detection, which is based on the mathematical properties of the compact sets. Additionally, this algorithm makes use of the temporal references appearing in the document texts to measure the similarity between documents according to the events that they describe. In order to discover the structure of topics and composite events, the clustering algorithm is hierarchically applied to the stream of newspaper articles. Thus, in the first level, documents with a high temporal-semantic similarity are clustered together into events. In the next levels of the hierarchy, these events are…successively clustered into more complex events and topics. The evaluation results demonstrate that regarding the temporal references of documents improves the quality of the system-generated clusters, and that the overall performance of the proposed system compares favorably to other on-line detection systems of the literature.
Show more