Abstract
In the age of information overflow, the demand for advanced summarization techniques has surged, especially in linguistically diverse regions such as India. This paper introduces an innovative approach to multimodal multilingual summarization that seamlessly unites textual and visual elements. Our research focuses on four prominent Indian languages: Hindi, Bangla, Gujarati, and Marathi, employing abstractive summarization methods to craft coherent and concise summaries. For text summarization, we leverage the capabilities of the pre-trained IndicBART model, known for its exceptional proficiency in comprehending and generating text in Indian languages. We integrate an image summarization component based on the Image Pointer model to tackle multimodal challenges. This component identifies images from the input that enhance and complement the generated summaries, contributing to the overall comprehensiveness of our multimodal summaries. Our proposed methodology attains excellent results, surpassing other text summarization approaches tailored for the specified Indian languages. Furthermore, we enhance the significance of our work by incorporating a user satisfaction evaluation method, thereby providing a robust framework for assessing the quality of summaries. This holistic approach contributes to the advancement of summarization techniques, particularly in diverse Indian languages.
R. Kumar and D. Prakash—These authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahuja, K., et al.: Mega: Multilingual evaluation of generative AI. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 4232–4267 (2023)
Bhat, S., Varma, V., Pedanekar, N.: Generative models for Indic languages: Evaluating content generation capabilities. In: Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pp. 187–195 (2023)
Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Chen, J., Zhuge, H.: Extractive text-image summarization using multi-modal RNN. In: 2018 14th International Conference on Semantics, Knowledge and Grids (SKG), pp. 245–248 (2018). https://doi.org/10.1109/SKG.2018.00033
Dabre, R., Shrotriya, H., Kunchukuttan, A., Puduppully, R., Khapra, M.M., Kumar, P.: Indicbart: a pre-trained model for indic natural language generation. arXiv preprint arXiv:2109.02903 (2021)
Dash, S.R., Guha, P., Mallick, D.K., Parida, S.: Summarizing bengali text: an extractive approach. In: Satapathy, S.C., Peer, P., Tang, J., Bhateja, V., Ghosh, A. (eds.) Intelligent Data Engineering and Analytics, pp. 133–140. Springer Nature Singapore, Singapore (2022)
Dhankhar, S., Gupta, M.K.: Automatic extractive summarization for English text: a brief survey. In: Gupta, D., Khanna, A., Kansal, V., Fortino, G., Hassanien, A.E. (eds.) Proceedings of Second Doctoral Symposium on Computational Intelligence. AISC, vol. 1374, pp. 183–198. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-3346-1_15
Ghosh, A., et al.: Medsumm: a multimodal approach to summarizing code-mixed Hindi-English clinical queries. arXiv preprint arXiv:2401.01596 (2024)
Jain, R., Verma, A., Singh, A., Gangwar, V., Saha, S.: Aspect-based complaint and cause detection: a multimodal generative framework with external knowledge infusion. In: De Francisci Morales, G., Perlich, C., Ruchansky, N., Kourtellis, N., Baralis, E., Bonchi, F. (eds.) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track, ECML PKDD 2023, LNCS, vol. 14174, pp. 88–104. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43427-3_6
Jangra, A., Mukherjee, S., Jatowt, A., Saha, S., Hasanuzzaman, M.: A survey on multi-modal summarization. ACM Comput. Surv. 55(13s), 1–36 (2023)
Jangra, A., Saha, S., Jatowt, A., Hasanuzzaman, M.: Multi-modal summary generation using multi-objective optimization. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1745–1748 (2020)
Kevat, R., Degadwala, S.: A comprehensive review on Gujarati-text summarization through different features (2023)
Kumar, K.V., Yadav, D., Sharma, A.: Graph based technique for Hindi text summarization. In: Mandal, J.K., Satapathy, S.C., Sanyal, M.K., Sarkar, P.P., Mukhopadhyay, A. (eds.) Information Systems Design and Intelligent Applications. AISC, vol. 339, pp. 301–310. Springer, New Delhi (2015). https://doi.org/10.1007/978-81-322-2250-7_29
Kumar, R., Sinha, R., Saha, S., Jatowt, A.: Multimodal rumour detection: catching news that never transpired!. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds.) Document Analysis and Recognition - ICDAR 2023, ICDAR 2023, LNCS, vol. 14189, pp. 231–248. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-41682-8_15
Kumar, Y., Kaur, K., Kaur, S.: Study of automatic text summarization approaches in different languages. Artif. Intell. Rev. 54(8), 5897–5929 (2021)
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880. Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-main.703, https://aclanthology.org/2020.acl-main.703
Lin, C.Y.: ROUGE: A package for automatic evaluation of summaries. In: Text Summarization Branches Out. pp. 74–81. Association for Computational Linguistics, Barcelona, Spain, July 2004, https://aclanthology.org/W04-1013
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Lin, D., Wu, D. (eds.) Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411. Association for Computational Linguistics, Barcelona, Spain, July 2004. https://aclanthology.org/W04-3252
Modani, N., et al.: Summarizing multimedia content. In: Cellary, W., Mokbel, M., Wang, J., Wang, H., Zhou, R., Zhang, Y. (eds.) Web Information Systems Engineering - WISE 2016, WISE 2016, LNCS, Part II, vol. 10042, pp. 340–348. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48743-4_27
Muennighoff, N., et al.: Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786 (2022)
Nallapati, R., Zhou, B., Gulcehre, C., Xiang, B., et al.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. arXiv preprint arXiv:1602.06023 (2016)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Isabelle, P., Charniak, E., Lin, D. (eds.) Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, July 2002. https://doi.org/10.3115/1073083.1073135, https://aclanthology.org/P02-1040
Radev, D.R., Jing, H., Styś, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manage. 40(6), 919–938 (2004). https://doi.org/10.1016/j.ipm.2003.10.006, https://www.sciencedirect.com/science/article/pii/S0306457303000955
Sarwadnya, V.V., Sonawane, S.S.: Marathi extractive text summarizer using graph based model. In: 2018 fourth international conference on computing communication control and automation (ICCUBEA). pp. 1–6. IEEE (2018)
Shen, S., Yao, Z., Gholami, A., Mahoney, M., Keutzer, K.: Powernorm: Rethinking batch normalization in transformers. In: International conference on machine learning. pp. 8741–8751. PMLR (2020)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015)
Sunitha, C., Jaya, A., Ganesh, A.: A study on abstractive summarization techniques in indian languages. Procedia Computer Science 87, 25–31 (2016)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need (2023)
Verma, P., Pal, S., Om, H.: A comparative analysis on hindi and english extractive text summarization. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 18(3), 1–39 (2019)
Verma, P., Verma, A., Pal, S.: An approach for extractive text summarization using fuzzy evolutionary and clustering algorithms. Appl. Soft Comput. 120, 108670 (2022)
Verma, Y., Jangra, A., Verma, R., Saha, S.: Large scale multi-lingual multi-modal summarization dataset. In: Vlachos, A., Augenstein, I. (eds.) Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. pp. 3620–3632. Association for Computational Linguistics, Dubrovnik, Croatia (May 2023). https://doi.org/10.18653/v1/2023.eacl-main.263, https://aclanthology.org/2023.eacl-main.263
Wolyn, S., Simske, S.J.: Summarization assessment methodology for multiple corpora using queries and classification for functional evaluation. Integrated Computer-Aided Engineering 29(3), 227–239 (2022)
Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., Raffel, C.: mt5: A massively multilingual pre-trained text-to-text transformer. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 483–498 (2021)
Yadav, D., Desai, J., Yadav, A.K.: Automatic text summarization methods: A comprehensive review. arXiv preprint arXiv:2204.01849 (2022)
Yu, Z., Yu, J., Fan, J., Tao, D.: Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: Proceedings of the IEEE international conference on computer vision. pp. 1821–1830 (2017)
Zhu, J., Li, H., Liu, T., Zhou, Y., Zhang, J., Zong, C.: Msmo: Multimodal summarization with multimodal output. In: Proceedings of the 2018 conference on empirical methods in natural language processing. pp. 4154–4164 (2018)
Acknowledgements
Raghvendra Kumar extends his sincere thanks to the Prime Minister’s Research Fellows (PMRF) Scheme, which has significantly aided his research pursuits. Dr. Sriparna Saha appreciatively recognizes the support provided by the Technology Innovation Hub (TIH), Vishlesan I-Hub Foundation, IIT Patna. Deepak Prakash and Dr Sriparna Saha extend sincere thanks to the SERB (Science and Engineering Research Board) POWER scheme, Government of India, for generously funding this research endeavour.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kumar, R., Prakash, D., Saha, S., Sharma, S. (2024). IndicBART Alongside Visual Element: Multimodal Summarization in Diverse Indian Languages. In: Barney Smith, E.H., Liwicki, M., Peng, L. (eds) Document Analysis and Recognition - ICDAR 2024. ICDAR 2024. Lecture Notes in Computer Science, vol 14809. Springer, Cham. https://doi.org/10.1007/978-3-031-70552-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-70552-6_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70551-9
Online ISBN: 978-3-031-70552-6
eBook Packages: Computer ScienceComputer Science (R0)