research-article

A Comparative Analysis on Hindi and English Extractive Text Summarization

Authors: Pradeepika Verma, Sukomal Pal, Hari OmAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Volume 18, Issue 3

Article No.: 30, Pages 1 - 39

https://doi.org/10.1145/3308754

Published: 09 May 2019 Publication History

Abstract

Text summarization is the process of transfiguring a large documental information into a clear and concise form. In this article, we present a detailed comparative study of various extractive methods for automatic text summarization on Hindi and English text datasets of news articles. We consider 13 different summarization techniques, namely, TextRank, LexRank, Luhn, LSA, Edmundson, ChunkRank, TGraph, UniRank, NN-ED, NN-SE, FE-SE, SummaRuNNer, and MMR-SE, and we evaluate their performance using various performance metrics, such as precision, recall, F₁, cohesion, non-redundancy, readability, and significance. A thorough analysis is done in eight different parts that exhibits the strengths and limitations of these methods, effect of performance over the summary length, impact of language of a document, and other factors as well. A standard summary evaluation tool (ROUGE) and extensive programmatic evaluation using Python 3.5 in Anaconda environment are used to evaluate their outcome.

References

[1]

Hans Peter Luhn. 1958. The automatic creation of literature abstracts. IBM J. Res. Dev. 2, 2, 159--165.

Digital Library

[2]

Dipanjan Das and Andre F. T. Martins. 2007. A survey on automatic text summarization. Lit. Survey Lang. Stat. 4, 192--195.

[3]

Ehsan Shareghi and Leila Sharif Hassanabadi. 2008. Text summarization with harmony search algorithm-based sentence extraction. Proceedings of the 5th International Conference on Soft Computing as Transdisciplinary Science and Technology. ACM. 226--231.

Digital Library

[4]

K. Sankar and L. Sobha. 2009. An approach to text summarization. Proceedings of the 3rd International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies. ACL. 53--60.

Digital Library

[5]

Daraksha Parveen, Mohsen Mesgar, and Michael Strube. 2016. Generating coherent summaries of scientific articles using coherence patterns. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 773--783.

[6]

Pradeepika Verma and Hari Om. 2019. MCRMR: Maximum coverage and relevancy with minimal redundancy-based multi-document summarization. Expert Syst. Appl. 120, 43--56.

[7]

Harold P. Edmundson. 1969. New methods in automatic extracting. J. ACM 16, 2, 264--285.

Digital Library

[8]

Gunes Erkan and Dragomir R. Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artific. Intell. Res. 22, 457--479.

Digital Library

[9]

Josef Steinberger and Karel Jezek. 2004. Using latent semantic analysis in text summarization and summary evaluation. Proceedings of the International Conference on Information System Implementation and Modeling (ISIM’04). 93--100.

[10]

Rafael Ferreira, Luciano de Souza Cabral, Rafael Dueire Lins, Gabriel Pereira e Silva, Fred Freitas, George D. C. Cavalcanti, Rinaldo Lima, Steven J. Simske, and Luciano Favaro. 2013. Assessing sentence scoring techniques for extractive text summarization. Expert Syst. Appl. 40, 14, 5755--5764.

[11]

Sandeep Sripada, Venu Gopal Kasturi, and Gautam Kumar Parai. 2005. Multi-document extraction-based Summarization. CS 224N, Final Project. https://nlp.stanford.edu/courses/cs224n/2010/reports/ssandeep-venuk-gkparai.pdf.

[12]

Xiaojun Wan. 2010. Towards a unified approach to simultaneous single-document and multi-document summarizations. In Proceedings of the 23rd International Conference on Computational Linguistics. ACL. 1137--1145.

Digital Library

[13]

Janara Christensen, Stephen Soderland, and Oren Etzioni. 2013. Towards coherent multi-document summarization. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1163--1173.

[14]

Daraksha Parveen, Hans-Martin Ramsl, and Michael Strube. 2015. Topical coherence for graph-based extractive summarization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1949--1954.

[15]

Pradeepika Verma and Hari Om. 2019. Collaborative ranking-based text summarization using a metaheuristic approach. In Proceedings of the Emerging Technologies in Data Mining and Information Security. Springer. 417--426.

[16]

Hayato Kobayashi, Masaki Noguchi, and Taichi Yatsuka. 2015. Summarization based on embedding distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL. 1984--1989.

[17]

Jianpeng Cheng and Mirella Lapata. 2016. Neural summarization by extracting sentences and words. arXiv preprint arXiv:1603.07252.

[18]

Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. SummaRuNNer: A recurrent neural network-based sequence model for extractive summarization of documents. In Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI’17). 3075--3081.

Digital Library

[19]

Rasim M. Alguliev, Ramiz M. Aliguliyev, Makrufa S. Hajirahimova, and Chingiz A. Mehdiyev. 2011. MCMR: Maximum coverage and minimum redundant text summarization model. Expert Syst. Appl. 38, 12, 14514--14522.

Digital Library

[20]

Rasim M. Alguliev, Ramiz M. Aliguliyev, and Nijat R. Isazade. 2013. Multiple documents summarization based on evolutionary optimization algorithm. Expert Syst. Appl. 40, 5, 1675--1689.

Digital Library

[21]

Atif Khan, Naomie Salim, and Yogan Jaya Kumar. 2015. A framework for multi-document abstractive summarization based on semantic role labelling. Appl. Soft Comput. 30, 737--747.

Digital Library

[22]

Razieh Abbasi-ghalehtaki, Hassan Khotanlou, and Mansour Esmaeilpour. 2016. Fuzzy evolutionary cellular learning automata model for text summarization. Swarm Evolution. Comput. 30, 11--26.

[23]

Rasmita Rautray and Rakesh Chandra Balabantaray. 2017. Cat swarm optimization-based evolutionary framework for multi document summarization. Physica A: Stat. Mech. Appl. 477, 174--186.

[24]

Pradeepika Verma and Hari Om. 2019. A variable dimension optimization approach for text summarization. In Proceedings of the Harmony Search and Nature Inspired Optimization Algorithms. Springer. 687--696.

[25]

Vishal Gupta and Gurpreet Singh Lehal. 2010. A survey of text summarization extractive techniques. J. Emerg. Technol. Web Intell. 2, 3, 258--268.

[26]

Mahak Gambhir and Vishal Gupta. 2017. Recent automatic text summarization techniques: A survey. Artific. Intell. Rev. 47, 1, 1--66.

Digital Library

[27]

N. Moratanch and S. Chitrakala. 2016. A survey on abstractive text summarization. In Proceedings of the Conference on Circuit, Power and Computing Technologies (ICCPCT’16). IEEE. 1--7.

[28]

Christopher C. Yang and Kar Wing Li. 2003. Automatic construction of English/Chinese parallel corpora. J. Amer. Soc. Info. Sci. Technol. 54, 8, 730--742.

Digital Library

[29]

Eduard Hovy and Chin-Yew Lin. 1998. Automated text summarization and the SUMMARIST system. In Proceedings of the Association for Computational Linguistics Workshop. ACL. 13--15.

Digital Library

[30]

Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.

[31]

Chin-Yew Lin. 2004. Looking for a few good metrics: Automatic summarization evaluation—How many samples are enough? In Proceedings of NII Testbeds and Community for Information Access Research.

[32]

Kavita Ganesan, ChengXiang Zhai, and Jiawei Han. 2010. Opinosis: A graph-based approach to abstractive summarization of highly redundant opinions. Proceedings of the 23rd International Conference on Computational Linguistics. ACL. 340--348.

Digital Library

[33]

Feng Jin, Minlie Huang, and Xiaoyan Zhu. 2010. A comparative study on ranking and selection strategies for multi-document summarization. In Proceedings of the 23rd International Conference on Computational Linguistics. ACL. 525--533.

Digital Library

[34]

Eleni Galiotou, Nikitas Karanikolas, and Christodoulos Tsoulloftas. 2013. On the effect of stemming algorithms on extractive summarization: A case study. Proceedings of the 17th Panhellenic Conference on Informatics. ACM. 300--304.

Digital Library

[35]

P. M. Dhanya and M. Jathavedan. 2013. Comparative study of text summarization in Indian Languages. Int. J. Comput. Appl. 75, 6.

[36]

K. Vimal Kumar, Divakar Yadav, and Arun Sharma. 2015. Graph-based technique for hindi text summarization. Information Systems Design and Intelligent Applications. Springer, New Delhi, 301--310.

[37]

K. Vimal Kumar and Divakar Yadav. 2015. An improvised extractive approach to hindi text summarization. Information Systems Design and Intelligent Applications. Springer, New Delhi, 291--300.

[38]

C. Sunitha, A. Jaya, and Amal Ganesh. 2016. A study on abstractive summarization techniques in indian languages. Procedia Comput. Sci. 87, 25--31.

[39]

Pradeepika Verma and Hari Om. 2016. Extraction-based text summarization methods on user’s review data: A comparative study. In Proceedings of the Conference on Smart Trends for Information Technology and Computer Communications. Springer, Singapore. 346--354.

[40]

Inderjeet Mani and Mark T. Maybury. 1999. Advances in Automatic Text Summarization. MIT Press.

Digital Library

[41]

Jade Goldstein and Jaime Carbonell. 1998. Summarization: (1) using MMR for diversity-based reranking and (2) evaluating summaries. Proceedings of the Association for Computational Linguistics Workshop. ACL. 181--195.

Digital Library

[42]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

[43]

Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out.

[44]

Michael Alexander Kirkwood Halliday and Ruqaiya Hasan. 2014. Cohesion in English. Routledge.

[45]

Houda Oufaida, Omar Nouali, and Philippe Blache. 2014. Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization. J. King Saud Univ.-Comput. Info. Sci. 26, 4, 450--461.

Digital Library

[46]

Jade Goldstein, Vibhu Mittal, Jaime Carbonell, and Mark Kantrowitz. 2000. Multi-document summarization by sentence extraction. In Proceedings of the NAACL-ANLP Workshop on Automatic Summarization. ACL. 40--48.

Digital Library

[47]

Ondrej Bojar, Vojtech Diatka, Pavel Rychly, Pavel Stranik, Vat Suchomel, Ales Tamchyna, and Daniel Zeman. 2014. HindEnCorp-Hindi-English and Hindi-only corpus for machine translation. In Proceedings of the Language Resources and Evaluation Conference (LREC’14). 3550--3555.

[48]

William H. DuBay. 2004. The Principles of Readability. ERIC. Online Submission. https://files.eric.ed.gov/fulltext/ED490073.pdf.

[49]

Ray R. Larson. 2010. Introduction to information retrieval. J. Amer. Soc. Info. Sci. Technol. 4, 852--853.

Digital Library

Cited By

Marcondes FBarbosa MGala AAlmeida JNovais P(2024)Emotional and Mental Nuances and Technological Approaches: Optimising Fact-Check Dissemination through Cognitive Reinforcement TechniqueElectronics10.3390/electronics1301024013:1(240)Online publication date: 4-Jan-2024
https://doi.org/10.3390/electronics13010240
Radha NSwathika RUthayan KB M(2024)AI-Driven Summarization of Academic Literature using Transformer Model2024 Second International Conference on Inventive Computing and Informatics (ICICI)10.1109/ICICI62254.2024.00065(359-364)Online publication date: 11-Jun-2024
https://doi.org/10.1109/ICICI62254.2024.00065
Kaushik AAttri SJha R(2024)Exploring Text Summarization Techniques: A Review of Current Challenges and Future Directions2024 2nd International Conference on Disruptive Technologies (ICDT)10.1109/ICDT61202.2024.10489243(289-295)Online publication date: 15-Mar-2024
https://doi.org/10.1109/ICDT61202.2024.10489243
Show More Cited By

Index Terms

A Comparative Analysis on Hindi and English Extractive Text Summarization
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems

Recommendations

Metaheuristic Optimization Using Sentence Level Semantics for Extractive Document Summarization
MIKE 2015: Proceedings of the Third International Conference on Mining Intelligence and Knowledge Exploration - Volume 9468

Multi document summarization is the process of automatic creation of a summary of one or more text documents. We developed a multi-document summarization system which generate an extractive generic summary with maximum relevance and minimum redundancy. ...
Automatic Extractive Text Summarization using Multiple Linguistic Features
Automatic text summarization (ATS) provides a summary of distinct categories of information using natural language processing (NLP). Low-resource languages like Hindi have restricted applications of these techniques. This study proposes a method for ...
RankSum—An unsupervised extractive text summarization based on rank fusion
Abstract
In this paper, we propose Ranksum, an approach for extractive text summarization of single documents based on the rank fusion of four multi-dimensional sentence features extracted for each sentence: topic information, semantic content, ...
Graphical abstract

Display Omitted
Highlights
- A unified summarization framework with multi-dimensional sentence features.
- ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 18, Issue 3

September 2019

386 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3305347

Editor:
Nianwen Xue
Brandeis University, Waltham, USA

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 May 2019

Accepted: 01 January 2019

Revised: 01 October 2018

Received: 01 September 2017

Published in TALLIP Volume 18, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
579
Total Downloads

Downloads (Last 12 months)68
Downloads (Last 6 weeks)2

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Marcondes FBarbosa MGala AAlmeida JNovais P(2024)Emotional and Mental Nuances and Technological Approaches: Optimising Fact-Check Dissemination through Cognitive Reinforcement TechniqueElectronics10.3390/electronics1301024013:1(240)Online publication date: 4-Jan-2024
https://doi.org/10.3390/electronics13010240
Radha NSwathika RUthayan KB M(2024)AI-Driven Summarization of Academic Literature using Transformer Model2024 Second International Conference on Inventive Computing and Informatics (ICICI)10.1109/ICICI62254.2024.00065(359-364)Online publication date: 11-Jun-2024
https://doi.org/10.1109/ICICI62254.2024.00065
Kaushik AAttri SJha R(2024)Exploring Text Summarization Techniques: A Review of Current Challenges and Future Directions2024 2nd International Conference on Disruptive Technologies (ICDT)10.1109/ICDT61202.2024.10489243(289-295)Online publication date: 15-Mar-2024
https://doi.org/10.1109/ICDT61202.2024.10489243
Guleria AVarshney KPahwa GSinghal SSharma N(2024)Multimodal sentiment analysis of english and hinglish memesMultimedia Tools and Applications10.1007/s11042-024-19640-8Online publication date: 20-Jun-2024
https://doi.org/10.1007/s11042-024-19640-8
Varghese TPriya C(2024)Automatic Text Summarization: Methods, Metrics and DatasetsProceedings of the Second International Conference on Computing, Communication, Security and Intelligent Systems10.1007/978-981-99-8398-8_6(83-97)Online publication date: 28-Mar-2024
https://doi.org/10.1007/978-981-99-8398-8_6
Basu AChatterjee AGhosh RDasgupta SRoychowdhury TDutta PBhattacharya PTanwar S(2024)Analysis and Performance of Text Summarization Tools Applied on Indian LanguagesProceedings of International Conference on Recent Innovations in Computing10.1007/978-981-97-2839-8_28(407-418)Online publication date: 13-Jul-2024
https://doi.org/10.1007/978-981-97-2839-8_28
Parul Garg KGupta DRakhra M(2024)Text Summarization Techniques for the Bengali Language: SurveyProceedings of International Conference on Recent Innovations in Computing10.1007/978-981-97-2839-8_26(379-392)Online publication date: 13-Jul-2024
https://doi.org/10.1007/978-981-97-2839-8_26
Kumar RPrakash DSaha SSharma S(2024)IndicBART Alongside Visual Element: Multimodal Summarization in Diverse Indian LanguagesDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70552-6_16(264-280)Online publication date: 30-Aug-2024
https://dl.acm.org/doi/10.1007/978-3-031-70552-6_16
Akhmetova DAkhmetov I(2024)Overview of Approaches for Increasing Coherence in Extractive SummariesAdvances in Information and Communication10.1007/978-3-031-53963-3_41(592-609)Online publication date: 17-Mar-2024
https://doi.org/10.1007/978-3-031-53963-3_41
Darnoto BSiahaan DPurwitasari D(2023)Automated Detection of Persuasive Content in Electronic NewsInformatics10.3390/informatics1004008610:4(86)Online publication date: 21-Nov-2023
https://doi.org/10.3390/informatics10040086
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents