Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

IndicBART Alongside Visual Element: Multimodal Summarization in Diverse Indian Languages

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2024 (ICDAR 2024)

Abstract

In the age of information overflow, the demand for advanced summarization techniques has surged, especially in linguistically diverse regions such as India. This paper introduces an innovative approach to multimodal multilingual summarization that seamlessly unites textual and visual elements. Our research focuses on four prominent Indian languages: Hindi, Bangla, Gujarati, and Marathi, employing abstractive summarization methods to craft coherent and concise summaries. For text summarization, we leverage the capabilities of the pre-trained IndicBART model, known for its exceptional proficiency in comprehending and generating text in Indian languages. We integrate an image summarization component based on the Image Pointer model to tackle multimodal challenges. This component identifies images from the input that enhance and complement the generated summaries, contributing to the overall comprehensiveness of our multimodal summaries. Our proposed methodology attains excellent results, surpassing other text summarization approaches tailored for the specified Indian languages. Furthermore, we enhance the significance of our work by incorporating a user satisfaction evaluation method, thereby providing a robust framework for assessing the quality of summaries. This holistic approach contributes to the advancement of summarization techniques, particularly in diverse Indian languages.

R. Kumar and D. Prakash—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/Shubh8434/indicBART.

References

  1. Ahuja, K., et al.: Mega: Multilingual evaluation of generative AI. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 4232–4267 (2023)

    Google Scholar 

  2. Bhat, S., Varma, V., Pedanekar, N.: Generative models for Indic languages: Evaluating content generation capabilities. In: Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pp. 187–195 (2023)

    Google Scholar 

  3. Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)

    Google Scholar 

  4. Chen, J., Zhuge, H.: Extractive text-image summarization using multi-modal RNN. In: 2018 14th International Conference on Semantics, Knowledge and Grids (SKG), pp. 245–248 (2018). https://doi.org/10.1109/SKG.2018.00033

  5. Dabre, R., Shrotriya, H., Kunchukuttan, A., Puduppully, R., Khapra, M.M., Kumar, P.: Indicbart: a pre-trained model for indic natural language generation. arXiv preprint arXiv:2109.02903 (2021)

  6. Dash, S.R., Guha, P., Mallick, D.K., Parida, S.: Summarizing bengali text: an extractive approach. In: Satapathy, S.C., Peer, P., Tang, J., Bhateja, V., Ghosh, A. (eds.) Intelligent Data Engineering and Analytics, pp. 133–140. Springer Nature Singapore, Singapore (2022)

    Chapter  Google Scholar 

  7. Dhankhar, S., Gupta, M.K.: Automatic extractive summarization for English text: a brief survey. In: Gupta, D., Khanna, A., Kansal, V., Fortino, G., Hassanien, A.E. (eds.) Proceedings of Second Doctoral Symposium on Computational Intelligence. AISC, vol. 1374, pp. 183–198. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-3346-1_15

    Chapter  Google Scholar 

  8. Ghosh, A., et al.: Medsumm: a multimodal approach to summarizing code-mixed Hindi-English clinical queries. arXiv preprint arXiv:2401.01596 (2024)

  9. Jain, R., Verma, A., Singh, A., Gangwar, V., Saha, S.: Aspect-based complaint and cause detection: a multimodal generative framework with external knowledge infusion. In: De Francisci Morales, G., Perlich, C., Ruchansky, N., Kourtellis, N., Baralis, E., Bonchi, F. (eds.) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track, ECML PKDD 2023, LNCS, vol. 14174, pp. 88–104. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43427-3_6

  10. Jangra, A., Mukherjee, S., Jatowt, A., Saha, S., Hasanuzzaman, M.: A survey on multi-modal summarization. ACM Comput. Surv. 55(13s), 1–36 (2023)

    Article  Google Scholar 

  11. Jangra, A., Saha, S., Jatowt, A., Hasanuzzaman, M.: Multi-modal summary generation using multi-objective optimization. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1745–1748 (2020)

    Google Scholar 

  12. Kevat, R., Degadwala, S.: A comprehensive review on Gujarati-text summarization through different features (2023)

    Google Scholar 

  13. Kumar, K.V., Yadav, D., Sharma, A.: Graph based technique for Hindi text summarization. In: Mandal, J.K., Satapathy, S.C., Sanyal, M.K., Sarkar, P.P., Mukhopadhyay, A. (eds.) Information Systems Design and Intelligent Applications. AISC, vol. 339, pp. 301–310. Springer, New Delhi (2015). https://doi.org/10.1007/978-81-322-2250-7_29

    Chapter  Google Scholar 

  14. Kumar, R., Sinha, R., Saha, S., Jatowt, A.: Multimodal rumour detection: catching news that never transpired!. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds.) Document Analysis and Recognition - ICDAR 2023, ICDAR 2023, LNCS, vol. 14189, pp. 231–248. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-41682-8_15

  15. Kumar, Y., Kaur, K., Kaur, S.: Study of automatic text summarization approaches in different languages. Artif. Intell. Rev. 54(8), 5897–5929 (2021)

    Article  Google Scholar 

  16. Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880. Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-main.703, https://aclanthology.org/2020.acl-main.703

  17. Lin, C.Y.: ROUGE: A package for automatic evaluation of summaries. In: Text Summarization Branches Out. pp. 74–81. Association for Computational Linguistics, Barcelona, Spain, July 2004, https://aclanthology.org/W04-1013

  18. Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Lin, D., Wu, D. (eds.) Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411. Association for Computational Linguistics, Barcelona, Spain, July 2004. https://aclanthology.org/W04-3252

  19. Modani, N., et al.: Summarizing multimedia content. In: Cellary, W., Mokbel, M., Wang, J., Wang, H., Zhou, R., Zhang, Y. (eds.) Web Information Systems Engineering - WISE 2016, WISE 2016, LNCS, Part II, vol. 10042, pp. 340–348. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48743-4_27

  20. Muennighoff, N., et al.: Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786 (2022)

  21. Nallapati, R., Zhou, B., Gulcehre, C., Xiang, B., et al.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. arXiv preprint arXiv:1602.06023 (2016)

  22. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Isabelle, P., Charniak, E., Lin, D. (eds.) Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, July 2002. https://doi.org/10.3115/1073083.1073135, https://aclanthology.org/P02-1040

  23. Radev, D.R., Jing, H., Styś, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manage. 40(6), 919–938 (2004). https://doi.org/10.1016/j.ipm.2003.10.006, https://www.sciencedirect.com/science/article/pii/S0306457303000955

  24. Sarwadnya, V.V., Sonawane, S.S.: Marathi extractive text summarizer using graph based model. In: 2018 fourth international conference on computing communication control and automation (ICCUBEA). pp. 1–6. IEEE (2018)

    Google Scholar 

  25. Shen, S., Yao, Z., Gholami, A., Mahoney, M., Keutzer, K.: Powernorm: Rethinking batch normalization in transformers. In: International conference on machine learning. pp. 8741–8751. PMLR (2020)

    Google Scholar 

  26. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015)

    Google Scholar 

  27. Sunitha, C., Jaya, A., Ganesh, A.: A study on abstractive summarization techniques in indian languages. Procedia Computer Science 87, 25–31 (2016)

    Article  Google Scholar 

  28. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need (2023)

    Google Scholar 

  29. Verma, P., Pal, S., Om, H.: A comparative analysis on hindi and english extractive text summarization. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 18(3), 1–39 (2019)

    Article  Google Scholar 

  30. Verma, P., Verma, A., Pal, S.: An approach for extractive text summarization using fuzzy evolutionary and clustering algorithms. Appl. Soft Comput. 120, 108670 (2022)

    Article  Google Scholar 

  31. Verma, Y., Jangra, A., Verma, R., Saha, S.: Large scale multi-lingual multi-modal summarization dataset. In: Vlachos, A., Augenstein, I. (eds.) Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. pp. 3620–3632. Association for Computational Linguistics, Dubrovnik, Croatia (May 2023). https://doi.org/10.18653/v1/2023.eacl-main.263, https://aclanthology.org/2023.eacl-main.263

  32. Wolyn, S., Simske, S.J.: Summarization assessment methodology for multiple corpora using queries and classification for functional evaluation. Integrated Computer-Aided Engineering 29(3), 227–239 (2022)

    Article  Google Scholar 

  33. Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mt5: A massively multilingual pre-trained text-to-text transformer. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 483–498 (2021)

    Google Scholar 

  34. Yadav, D., Desai, J., Yadav, A.K.: Automatic text summarization methods: A comprehensive review. arXiv preprint arXiv:2204.01849 (2022)

  35. Yu, Z., Yu, J., Fan, J., Tao, D.: Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: Proceedings of the IEEE international conference on computer vision. pp. 1821–1830 (2017)

    Google Scholar 

  36. Zhu, J., Li, H., Liu, T., Zhou, Y., Zhang, J., Zong, C.: Msmo: Multimodal summarization with multimodal output. In: Proceedings of the 2018 conference on empirical methods in natural language processing. pp. 4154–4164 (2018)

    Google Scholar 

Download references

Acknowledgements

Raghvendra Kumar extends his sincere thanks to the Prime Minister’s Research Fellows (PMRF) Scheme, which has significantly aided his research pursuits. Dr. Sriparna Saha appreciatively recognizes the support provided by the Technology Innovation Hub (TIH), Vishlesan I-Hub Foundation, IIT Patna. Deepak Prakash and Dr Sriparna Saha extend sincere thanks to the SERB (Science and Engineering Research Board) POWER scheme, Government of India, for generously funding this research endeavour.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raghvendra Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kumar, R., Prakash, D., Saha, S., Sharma, S. (2024). IndicBART Alongside Visual Element: Multimodal Summarization in Diverse Indian Languages. In: Barney Smith, E.H., Liwicki, M., Peng, L. (eds) Document Analysis and Recognition - ICDAR 2024. ICDAR 2024. Lecture Notes in Computer Science, vol 14809. Springer, Cham. https://doi.org/10.1007/978-3-031-70552-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70552-6_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70551-9

  • Online ISBN: 978-3-031-70552-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics