Punjabi news multi-classification using language generation-based optimized long short-term memory networks

Gupta, Varun; Gupta, Ekta

doi:10.1007/s12530-022-09428-2

Punjabi news multi-classification using language generation-based optimized long short-term memory networks

Original Paper
Published: 08 March 2022

Volume 14, pages 49–58, (2023)
Cite this article

Evolving Systems Aims and scope Submit manuscript

274 Accesses
1 Citation
Explore all metrics

Abstract

Text classification is a method that assigns a specific category to each piece of written information. It is one of the fundamental tasks in natural language processing that has a wide range of applications like spam detection, sentiment analysis, etc. One type of text classification is news classification which can help the reader to focus on news as per their choice. In this paper, we propose a novel method for multiclassification of Punjabi news articles using a pretrained language generation model based optimized and regularized long short-term memory model. The proposed method employs Averaged Stochastic Gradient Descent Weight-Dropped LSTM model, which uses a recurrent regularization technique known as DropConnect on hidden-to-hidden weights and a variant of the averaged stochastic gradient method wherein the averaging trigger is determined using a non-monotonic condition instead of being tuned by the user. The proposed news classification method works in three stages. In the first stage, we train a language model on Punjabi text acquired from Wikipedia, and in the second stage, we fine-tune the language model on the Punjabi news dataset. Finally, we train a classifier using the pretrained encoder part of the language model. The pretrained encoder part of the language model helps the classifier in the linguistic understanding of the text, resulting in better classification results on the text. The results obtained from the proposed work indicate that the proposed method outperforms the other direct methods of news classification, which are not using pretrained language generation models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Scheme for News Article Classification in a Low-Resource Language

An Approach to Mizo Language News Classification Using Machine Learning

Hybrid Decision Based Chinese News Headline Classification

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

https://indicnlp.ai4bharat.org/resources/.

References

Agarwal V (2018) Kumar P (2018) UNLization of Punjabi text for natural language processing applications. Sadhana 43(6):1–23
Article Google Scholar
Akhter MP, Jiangbin Z, Naqvi IR, Abdelmajeed M, Mehmood A, Sadiq MT (2020) Document-level text classification using single-layer multisize filters convolutional neural network. IEEE Access 8(M1):42689–42707
Article Google Scholar
Alonso JM, Bugarín A, Reiter E (2017) natural language generation with computational intelligence. IEEE Comput Intell Mag 12(3):8–9 (Guest Editorial)
Article Google Scholar
Angelov P, Zhou X (2008) On line learning fuzzy rule-based system structure from data streams. 44(1524):915–922
Angelov P, Gu X, Kangin D (2017) Empirical data analytics. Int J Intell Syst 32(12):1261–1284
Article Google Scholar
Aravinda-Reddy D, Anand-Kumar M, Soman KP (2019) LSTM based paraphrase identification using combined word embedding features, vol 898. Springer, Singapore
Google Scholar
Baboulin M et al (2009) Accelerating scientific computations with mixed precisionalgorithms. Comput Phys Commun 180(12):2526–2533
Article MATH Google Scholar
Basu T, Murthy CA (2012) Effective text classification by a supervised feature selection approach. In: Proc.—12th IEEE Int. Conf. Data Min. Work. ICDMW 2012, pp 918–925
Cai J, Li J, Li W, Wang J (2019) Deeplearning model used in text classification. In: 2018 15th Int. Comput. Conf. Wavelet Act. Media Technol. Inf. Process. ICCWAMTIP 2018, pp 123–126
Ganapathi A et al (2009) Predicting multiple metrics for queries: better decisions enabled by machine learning. In: Proceeding—International Conference on Data Engineering, pp. 592–603
Gargiulo F, Silvestri S, Ciampi M (2018) Deep convolution neural network for extreme multi-label text classification. In: Heal. 2018—11th Int. Conf. Heal. Informatics, Proceedings; Part 11th Int. Jt. Conf. Biomed. Eng. Syst. Technol. BIOSTEC 2018, vol. 5, no. Healthinf, pp 641–650
Guohao Q, Bin W, Bai W, Baoli Z (2019) Competency analysis in human resources using text classification based on deep neural network. In: Proc.—2019 IEEE 4th Int. Conf. Data Sci. Cyberspace, DSC 2019, pp 322–329
Hartmann J, Huppertz J, Schamp C, Heitmann M (2019) Comparing automated text classification methods. Int J Res Mark 36(1):20–38
Article Google Scholar
Huang M et al (2019) Supervised representation learning for multi-label classification. Mach Learn 108(5):747–763
Article MATH Google Scholar
Jungang-daero T, Korea S, Korea S (2015) RNNDROP: a novel dropout for RNNS in ASR Daegu Gyeongbuk Institute of Science and Technology ( DGIST ) Samsung Advanced Institute of Technology, Samsung Electronics, pp 65–70
Kadhim AI (2019) Survey on supervised machine learning techniques for automatic text classification. Artif Intell Rev 52(1):273–292
Article Google Scholar
Kamath CN, Bukhari SS, Dengel A (2018) Comparative study between traditional machine learning and deep learning approaches for text classification. In: Proc. ACM Symp. Doc. Eng. 2018, DocEng 2018
Katnoria M, Singh V, Kumar R (2018) Punjabi document classification using vector evaluation method. In: Proc. Int. Conf. Comput. Methodol. Commun. ICCMC 2017, vol. 2018-Jan, pp 940–944
Lam MO, Hollingsworth JK, De Supinski BR, Legendre MP (2013) Automatically adapting programs for mixed-precision floating-point computation. In: Proceedings of the International Conference on Supercomputing, pp 369–378
Lazemi S, Ebrahimpour-Komleh H, Noroozi N (2018) Persian plagirisim detection using CNNs. In: 2018 8th Int. Conf. Comput. Knowl. Eng. ICCKE 2018, pp 171–175
Le CC, Prasad PWC, Alsadoon A, Pham L, Elchouemi A (2019) Text classification: Naïve Bayes classifier with sentiment Lexicon. IAENG Int J Comput Sci 46(2):141–148
Google Scholar
Liu P, Qiu X, Xuanjing H (2016) Recurrent neural network for text classification with multi-task learning. In: IJCAI Int. Jt. Conf. Artif. Intell. vol. 2016-Janua, pp 2873–2879
Mahmoud A, Zrigui M (2019) Deep neural network models for paraphrased text classification in the Arabic language, vol 11608. Springer International Publishing, LNCS
Google Scholar
Merity S, Keskar NS, Socher R (2018) Regularizing and optimizing LSTM language models. In: 6th Int. Conf. Learn. Represent. ICLR 2018—Conf. Track Proc., pp 1–13
Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2020) Deep learning based text classification: a comprehensive review. ACM Comput Serv 1(1):1–43
Google Scholar
Mirończuk MM, Protasiewicz J (2018) A recent overview of the state-of-the-art elements of text classification. Expert Syst Appl 106:36–54
Article Google Scholar
Mohasseb A, Liu HAN, Cocea M (2017) Domain specific syntax based approach for text. In: 16th Int. Conf. Mach. Learn. Cybern
Pal K, Patel BV (2017) A study of current state of work done for classification in indian languages. Int J Sci Res Sci Technol 3(7):403–407
Google Scholar
Phyu MS, Nwet KT (2019) Articles classification in Myanmar language. In: 2019 Int. Conf. Adv. Inf. Technol. ICAIT 2019, pp 188–193
Prusa JD, Khoshgoftaar TM (2017) Improving deep neural network design with new text data representations. J Big Data 4(1):1–16
Article Google Scholar
Sambyal N, Saini P, Syal R, Gupta V (2021) Aggregated residual transformation network for multistage classification in diabetic retinopathy. Int J Imaging Syst Technol 31(2):741–752
Article Google Scholar
Shelke MB, Deshmukh SN (2020) Recent advances in sentiment analysis of Indian languages. J Fut Gener
Singh M, Goyal V, Raj S (2019) Sentiment analysis of English–Punjabi code mixed social media content for agriculture domain. In: 2019 4th Int. Conf. Inf. Syst. Comput. Networks, ISCON 2019, pp 352–357
Singh J, Singh G, Singh R, Singh P (2021) Morphological evaluation and sentimentanalysis of Punjabi text using deep learning classification. J King Saud Univ ComputInf Sci 33(5):508–517
Google Scholar
Wang T, Liu L, Liu N, Zhang H, Zhang L, Feng S (2020) A multi-label text classification method via dynamic semantic representation model and deep neural network. Appl Intell 50(8):2339–2351
Article Google Scholar
Wei G, Gao X, Wu S (2010) Study of text classification methods for data sets with huge features. In: 2010 2nd Int. Conf. Ind. Inf. Syst. IIS 2010, vol. 1, pp 433–436
Yadav K, Lamba A, Gupta D, Gupta A, Karmakar P, Saini S (2020) Bi-LSTM and ensemble based bilingual sentiment analysis for a code-mixed Hindi–English social media text. In: 2020 IEEE 17th India Counc. Int. Conf., INDICON 2020
Yin W, Kann K, Yu M, Schütze H (2017) Comparative study of CNN and RNN for natural language processing
Zeiler M, Fergus R (2012) Regularization of neural networks using DropConnect wan-icml-13.pdf, no. 1
Zhang L, Wu L (2015) Lift: multi-label learning with label-specific features. IEEE Trans Pattern Anal Mach Intell 37(1):107–120
Article Google Scholar
Zhang ML, Zhou ZH (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837
Article Google Scholar
Zheng and Tian (2010) Chinese web text classification system model based on Naive Bayes. In: 2010 Int. Conf. E-Product E-Service E-Entertainment, ICEEE2010, pp 1–4

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Chandigarh College of Engineering and Technology, Chandigarh, India
Varun Gupta & Ekta Gupta

Authors

Varun Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Ekta Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Varun Gupta.

Additional information

Publisher's Note

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gupta, V., Gupta, E. Punjabi news multi-classification using language generation-based optimized long short-term memory networks. Evolving Systems 14, 49–58 (2023). https://doi.org/10.1007/s12530-022-09428-2

Download citation

Received: 10 November 2021
Accepted: 07 February 2022
Published: 08 March 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s12530-022-09428-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Punjabi news multi-classification using language generation-based optimized long short-term memory networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Scheme for News Article Classification in a Low-Resource Language

An Approach to Mizo Language News Classification Using Machine Learning

Hybrid Decision Based Chinese News Headline Classification

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Punjabi news multi-classification using language generation-based optimized long short-term memory networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Scheme for News Article Classification in a Low-Resource Language

An Approach to Mizo Language News Classification Using Machine Learning

Hybrid Decision Based Chinese News Headline Classification

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation