Abstract
This paper concentrates on hybrid algorithm for multilingual summarization of Hindi and Punjabi documents. It combines the features of Hindi summarizer as suggested by CDAC Noida and Punjabi summarizer as suggested by Gupta and Lehal in 2012. In addition to this, it also suggests some new features for summarizing Hindi and Punjabi multilingual text. It is first time that this multilingual text summarizer has been proposed which supports both Hindi and Punjabi text. Nine features used in this algorithm for summarizing multilingual Hindi and Punjabi text are: 1) Key phrase extraction 2) Font feature 3) Nouns and Verbs Extraction 4) Position feature 5) Cue-phrase feature 6) Negative keywords extraction 7) Named Entities extraction 8) Relative length feature 9) extraction of number data. For each sentence, scores of each feature is calculated and then machine learning based mathematical regression is applied for identifying weights of these nine features. Sentence final-scores are calculated from feature weight equations. Top scored sentences in proper order (in same order as in input) are selected for final summary. Default summary is made at 30% compression ratio. This algorithm performs well at 30% compression ratio for both intrinsic and extrinsic measures of summary evaluation. This algorithm has been thoroughly tested on 30 Hindi-Punjabi documents and reports F-Score equal to 92.56% which is reasonably good.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kyoomarsi, F., Khosravi, H., Eslami, E., Dehkordy, P.K.: Optimizing Text Summarization Based on Fuzzy Logic. In: IEEE International Conference on Computer and Information Science, pp. 347–352. University of Shahid Kerman, UK (2008)
Gupta, V., Lehal, G.S.: A Survey of Text Summarization Extractive Techniques. International Journal of Emerging Technologies in Web Intelligence 2, 258–268 (2010)
Lin, J.: Summarization. In: Encyclopedia of Database Systems. Springer, Heidelberg (2009)
Gupta, V., Lehal, G.S.: Automatic Punjabi Text Extractive Summarization System. In: International Conference on Computational Linguistics, COLING 2012, pp. 191–198. IIT Bombay, India (2012)
Fattah, M.A., Ren, F.: Automatic Text Summarization. World Academy of Science Engineering and Technology 27, 192–195 (2008)
Kaikhah, K.: Automatic Text Summarization with Neural Networks. In: IEEE International Conference on Intelligent Systems, Texas, USA, pp. 40–44 (2004)
Neto, J.L., Santos, A.D., Kaestner, C.A.A., Freitas, A.A.: Document Clustering and Text Summarization. In: International Conference on Practical Application of Knowledge Discovery & Data Mining, London, pp. 41–55 (2000)
Gupta, V., Lehal, G.S.: Complete Pre processing Phase of Punjabi Language Text Summarization. In: International Conference on Computational Linguistics, COLING 2012, pp. 199–205. IIT Bombay, India (2012)
Hassel, M.: Evaluation of Automatic Text Summarization. Licentiate Thesis, Stockholm, Sweden, pp. 1–75 (2004)
Gupta, V., Lehal, G.S.: Punjabi Language Stemmer for Nouns and Proper Names. In: Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP) IJCNLP 2011, Chiang Mai, Thailand, pp. 35–39 (2011)
Ramanathan, A., Rao, D.D.: A Lightweight Stemmer for Hindi. In: Proceedings of Workshop on Computational Linguistics for South-Asian Languages. EACL (2003)
Sharma, R., Goyal, V.: Name Entity Recognition Systems for Hindi Using CRF Approach. In: Singh, C., Singh Lehal, G., Sengupta, J., Sharma, D.V., Goyal, V. (eds.) ICISIL 2011. CCIS, vol. 139, pp. 31–35. Springer, Heidelberg (2011)
Gupta, V., Lehal, G.S.: Named Entity Recognition for Punjabi Language Text Summarization. International Journal of Computer Applications 33, 28–32 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Gupta, V. (2013). Hybrid Algorithm for Multilingual Summarization of Hindi and Punjabi Documents. In: Prasath, R., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. Lecture Notes in Computer Science(), vol 8284. Springer, Cham. https://doi.org/10.1007/978-3-319-03844-5_70
Download citation
DOI: https://doi.org/10.1007/978-3-319-03844-5_70
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03843-8
Online ISBN: 978-3-319-03844-5
eBook Packages: Computer ScienceComputer Science (R0)