Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Hybrid Algorithm for Multilingual Summarization of Hindi and Punjabi Documents

  • Conference paper
Mining Intelligence and Knowledge Exploration

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8284))

Abstract

This paper concentrates on hybrid algorithm for multilingual summarization of Hindi and Punjabi documents. It combines the features of Hindi summarizer as suggested by CDAC Noida and Punjabi summarizer as suggested by Gupta and Lehal in 2012. In addition to this, it also suggests some new features for summarizing Hindi and Punjabi multilingual text. It is first time that this multilingual text summarizer has been proposed which supports both Hindi and Punjabi text. Nine features used in this algorithm for summarizing multilingual Hindi and Punjabi text are: 1) Key phrase extraction 2) Font feature 3) Nouns and Verbs Extraction 4) Position feature 5) Cue-phrase feature 6) Negative keywords extraction 7) Named Entities extraction 8) Relative length feature 9) extraction of number data. For each sentence, scores of each feature is calculated and then machine learning based mathematical regression is applied for identifying weights of these nine features. Sentence final-scores are calculated from feature weight equations. Top scored sentences in proper order (in same order as in input) are selected for final summary. Default summary is made at 30% compression ratio. This algorithm performs well at 30% compression ratio for both intrinsic and extrinsic measures of summary evaluation. This algorithm has been thoroughly tested on 30 Hindi-Punjabi documents and reports F-Score equal to 92.56% which is reasonably good.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Kyoomarsi, F., Khosravi, H., Eslami, E., Dehkordy, P.K.: Optimizing Text Summarization Based on Fuzzy Logic. In: IEEE International Conference on Computer and Information Science, pp. 347–352. University of Shahid Kerman, UK (2008)

    Google Scholar 

  2. Gupta, V., Lehal, G.S.: A Survey of Text Summarization Extractive Techniques. International Journal of Emerging Technologies in Web Intelligence 2, 258–268 (2010)

    Google Scholar 

  3. Lin, J.: Summarization. In: Encyclopedia of Database Systems. Springer, Heidelberg (2009)

    Google Scholar 

  4. Gupta, V., Lehal, G.S.: Automatic Punjabi Text Extractive Summarization System. In: International Conference on Computational Linguistics, COLING 2012, pp. 191–198. IIT Bombay, India (2012)

    Google Scholar 

  5. Fattah, M.A., Ren, F.: Automatic Text Summarization. World Academy of Science Engineering and Technology 27, 192–195 (2008)

    Google Scholar 

  6. Kaikhah, K.: Automatic Text Summarization with Neural Networks. In: IEEE International Conference on Intelligent Systems, Texas, USA, pp. 40–44 (2004)

    Google Scholar 

  7. Neto, J.L., Santos, A.D., Kaestner, C.A.A., Freitas, A.A.: Document Clustering and Text Summarization. In: International Conference on Practical Application of Knowledge Discovery & Data Mining, London, pp. 41–55 (2000)

    Google Scholar 

  8. Gupta, V., Lehal, G.S.: Complete Pre processing Phase of Punjabi Language Text Summarization. In: International Conference on Computational Linguistics, COLING 2012, pp. 199–205. IIT Bombay, India (2012)

    Google Scholar 

  9. Hassel, M.: Evaluation of Automatic Text Summarization. Licentiate Thesis, Stockholm, Sweden, pp. 1–75 (2004)

    Google Scholar 

  10. http://www.cdacnoida.in/snlp/digital_library/text_summ.asp

  11. Gupta, V., Lehal, G.S.: Punjabi Language Stemmer for Nouns and Proper Names. In: Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP) IJCNLP 2011, Chiang Mai, Thailand, pp. 35–39 (2011)

    Google Scholar 

  12. Ramanathan, A., Rao, D.D.: A Lightweight Stemmer for Hindi. In: Proceedings of Workshop on Computational Linguistics for South-Asian Languages. EACL (2003)

    Google Scholar 

  13. http://www.cfilt.iitb.ac.in/wordnet/webhwn/

  14. Sharma, R., Goyal, V.: Name Entity Recognition Systems for Hindi Using CRF Approach. In: Singh, C., Singh Lehal, G., Sengupta, J., Sharma, D.V., Goyal, V. (eds.) ICISIL 2011. CCIS, vol. 139, pp. 31–35. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  15. Gupta, V., Lehal, G.S.: Named Entity Recognition for Punjabi Language Text Summarization. International Journal of Computer Applications 33, 28–32 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Gupta, V. (2013). Hybrid Algorithm for Multilingual Summarization of Hindi and Punjabi Documents. In: Prasath, R., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. Lecture Notes in Computer Science(), vol 8284. Springer, Cham. https://doi.org/10.1007/978-3-319-03844-5_70

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03844-5_70

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03843-8

  • Online ISBN: 978-3-319-03844-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics