Authors:
Aigerim Mussina
1
;
Sanzhar Aubakirov
1
and
Paulo Trigo
2
Affiliations:
1
Department of Computer Science, Al-Farabi Kazakh National University, Almaty and Kazakhstan
;
2
Instituto Superior de Engenharia de Lisboa, Biosystems and Integrative Sciences Institute / Agent and Systems Modeling, Lisbon and Portugal
Keyword(s):
Summarization, Automatic Extraction, Key-words, N-gram, TextRank.
Related
Ontology
Subjects/Areas/Topics:
Business Analytics
;
Data Engineering
;
Data Management and Quality
;
Text Analytics
Abstract:
This paper presents a comparative perspective in the field of automatic text summarization algorithms. The main contribution is the implementation of well-known algorithms and the comparison of different summarization techniques on corpora of news articles parsed from the web. The work compares three summarization techniques based on TextRank algorithm, namely: General TextRank, BM25, LongestCommonSubstring. For experiments, we used corpora based on news articles written in Russian and Kazakh. We implemented and experimented well-known algorithms, but we evaluated them differently from previous work in summary evaluation. In this research, we propose a summary evaluation method based on keywords extracted from the corpora. We describe the application of statistical information, show results of summarization processes and provide their comparison.