research-article

Abstractive Summarization of Text Document in Malayalam Language: Enhancing Attention Model Using POS Tagging Feature

Authors:

Sindhya K. Nambiar,

David Peter S.,

Sumam Mary IdiculaAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 22, Issue 2

Article No.: 59, Pages 1 - 14

https://doi.org/10.1145/3561819

Published: 23 March 2023 Publication History

Get Access

Editorial Notes

The authors have requested minor, non-substantive changes to the VoR and, in accordance with ACM policies, a Corrected Version of Record was published on May 18, 2023. For reference purposes, the VoR may still be accessed via the Supplemental Material section on this citation page.

Abstract

Over the past few years, researchers are showing huge interest in sentiment analysis and summarization of documents. The primary reason being that huge volumes of information are available in textual format, and this data has proven helpful for real-world applications and challenges. The sentiment analysis of a document will help the user comprehend the content’s emotional intent. Abstractive summarization algorithms generate a condensed version of the text, which can then be used to determine the emotion represented in the text using sentiment analysis. Recent research in abstractive summarization concentrates on neural network-based models, rather than conjunctions-based approaches, which might improve the overall efficiency. Neural network models like attention mechanism are tried out to handle complex works with promising results. The proposed work aims to present a novel framework that incorporates the part of speech tagging feature to the word embedding layer, which is then used as the input to the attention mechanism. With POS feature being part of the input layer, this framework is capable of dealing with words containing contextual and morphological information. The relevance of POS tagging here is due to its strong reliance on the language’s syntactic, contextual, and morphological information. The three main elements in the work are pre-processing, POS tagging feature in the embedding phase, and the incorporation of it into the attention mechanism. The word embedding provides the semantic concept about the word, while the POS tags give an idea about how significant the words are in the context of the content, which corresponds to the syntactic information. The proposed work was carried out in Malayalam, one of the prominent Indian languages. A widely used and accepted dataset from the English language was translated to Malayalam for conducting the experiments. The proposed framework gives a ROUGE score of 28, which outperformed the baseline models.

Supplementary Material

3561819-vor (3561819-vor.pdf)

Version of Record for "Abstractive Summarization of Text Document in Malayalam Language: Enhancing Attention Model Using POS Tagging Feature" by Nambiar et al., ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 22, No. 2 (TALLIP 22:2).

Download
3.38 MB

References

[1]

A. P. Ajees and Sumam Mary Idicula. 2018. A POS tagger for Malayalam using conditional random fields. Int. J. Appl. Eng. Res. 13, 3 (2018).

Editorial Notes

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

A Comparative Study on the Efficiency of POS Tagging Techniques on Amazigh Corpus

Experiments on POS tagging and data driven dependency parsing for Telugu language

Hybrid multi-document summarization using pre-trained language models

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Full Text

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations