Hybrid Algorithm for Multilingual Summarization of Hindi and Punjabi Documents

Gupta, Vishal

doi:10.1007/978-3-319-03844-5_70

Vishal Gupta²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8284))

2713 Accesses

Abstract

This paper concentrates on hybrid algorithm for multilingual summarization of Hindi and Punjabi documents. It combines the features of Hindi summarizer as suggested by CDAC Noida and Punjabi summarizer as suggested by Gupta and Lehal in 2012. In addition to this, it also suggests some new features for summarizing Hindi and Punjabi multilingual text. It is first time that this multilingual text summarizer has been proposed which supports both Hindi and Punjabi text. Nine features used in this algorithm for summarizing multilingual Hindi and Punjabi text are: 1) Key phrase extraction 2) Font feature 3) Nouns and Verbs Extraction 4) Position feature 5) Cue-phrase feature 6) Negative keywords extraction 7) Named Entities extraction 8) Relative length feature 9) extraction of number data. For each sentence, scores of each feature is calculated and then machine learning based mathematical regression is applied for identifying weights of these nine features. Sentence final-scores are calculated from feature weight equations. Top scored sentences in proper order (in same order as in input) are selected for final summary. Default summary is made at 30% compression ratio. This algorithm performs well at 30% compression ratio for both intrinsic and extrinsic measures of summary evaluation. This algorithm has been thoroughly tested on 30 Hindi-Punjabi documents and reports F-Score equal to 92.56% which is reasonably good.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Cross-Language Text Summarization Using Sentence and Multi-Sentence Compression

Design and development of Dogri extractive summarization model for automated summary generation

Article 22 February 2025

A developed framework for multi-document summarization using softmax regression and spider monkey optimization methods

Article 05 January 2022

References

Kyoomarsi, F., Khosravi, H., Eslami, E., Dehkordy, P.K.: Optimizing Text Summarization Based on Fuzzy Logic. In: IEEE International Conference on Computer and Information Science, pp. 347–352. University of Shahid Kerman, UK (2008)
Google Scholar
Gupta, V., Lehal, G.S.: A Survey of Text Summarization Extractive Techniques. International Journal of Emerging Technologies in Web Intelligence 2, 258–268 (2010)
Google Scholar
Lin, J.: Summarization. In: Encyclopedia of Database Systems. Springer, Heidelberg (2009)
Google Scholar
Gupta, V., Lehal, G.S.: Automatic Punjabi Text Extractive Summarization System. In: International Conference on Computational Linguistics, COLING 2012, pp. 191–198. IIT Bombay, India (2012)
Google Scholar
Fattah, M.A., Ren, F.: Automatic Text Summarization. World Academy of Science Engineering and Technology 27, 192–195 (2008)
Google Scholar
Kaikhah, K.: Automatic Text Summarization with Neural Networks. In: IEEE International Conference on Intelligent Systems, Texas, USA, pp. 40–44 (2004)
Google Scholar
Neto, J.L., Santos, A.D., Kaestner, C.A.A., Freitas, A.A.: Document Clustering and Text Summarization. In: International Conference on Practical Application of Knowledge Discovery & Data Mining, London, pp. 41–55 (2000)
Google Scholar
Gupta, V., Lehal, G.S.: Complete Pre processing Phase of Punjabi Language Text Summarization. In: International Conference on Computational Linguistics, COLING 2012, pp. 199–205. IIT Bombay, India (2012)
Google Scholar
Hassel, M.: Evaluation of Automatic Text Summarization. Licentiate Thesis, Stockholm, Sweden, pp. 1–75 (2004)
Google Scholar
http://www.cdacnoida.in/snlp/digital_library/text_summ.asp
Gupta, V., Lehal, G.S.: Punjabi Language Stemmer for Nouns and Proper Names. In: Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP) IJCNLP 2011, Chiang Mai, Thailand, pp. 35–39 (2011)
Google Scholar
Ramanathan, A., Rao, D.D.: A Lightweight Stemmer for Hindi. In: Proceedings of Workshop on Computational Linguistics for South-Asian Languages. EACL (2003)
Google Scholar
http://www.cfilt.iitb.ac.in/wordnet/webhwn/
Sharma, R., Goyal, V.: Name Entity Recognition Systems for Hindi Using CRF Approach. In: Singh, C., Singh Lehal, G., Sengupta, J., Sharma, D.V., Goyal, V. (eds.) ICISIL 2011. CCIS, vol. 139, pp. 31–35. Springer, Heidelberg (2011)
Chapter Google Scholar
Gupta, V., Lehal, G.S.: Named Entity Recognition for Punjabi Language Text Summarization. International Journal of Computer Applications 33, 28–32 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science & Engineering, University Institute of Engineering & Technology, Panjab University, Chandigarh, India
Vishal Gupta

Authors

Vishal Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Business Information Systems, FSIC, National University of Ireland, University College Cork, O’Rahilly Buildings, Cork, Ireland
Rajendra Prasath
Research Centre in Computer Science, V.H.N.Senthikumara Nadar College (Autonomous), 626 001, Virudhunagar, Tamil Nadu, India
T. Kathirvalavakumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gupta, V. (2013). Hybrid Algorithm for Multilingual Summarization of Hindi and Punjabi Documents. In: Prasath, R., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. Lecture Notes in Computer Science(), vol 8284. Springer, Cham. https://doi.org/10.1007/978-3-319-03844-5_70

Download citation

DOI: https://doi.org/10.1007/978-3-319-03844-5_70
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03843-8
Online ISBN: 978-3-319-03844-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics