research-article

A Comparative Analysis of Quantized and Non-Quantized BERT Model Performance for the Low-Resource Tagalog Language through Binary Text Classification

Authors:

John Paul Q. Tomas,

Gabriel Richmond R. Ngo,

Aramis Faye D. Reyes,

Spencer Ivan S. SantiagoAuthors Info & Claims

CIIS '24: Proceedings of the 2024 7th International Conference on Computational Intelligence and Intelligent Systems

Pages 139 - 145

https://doi.org/10.1145/3708778.3708798

Published: 07 February 2025 Publication History

Abstract

The development of artificial intelligence and its integration to human society is an unstoppable force. Hence, many subsets of this phenomenon such as natural language processing are expected to continuously evolve alongside other advancements made in the field. One of the challenges under such subset are low-resource languages which impact the growth of artificial intelligence in countries whose language or languages are not globally popular and therefore lack training data and/or tailored NLP technologies for machine and deep learning use. The fairly recent, state-of-the-art transformer-based language model Bidirectional Encoder Representations from Transformers or BERT has given NLP research for low-resource languages such as Filipino/Tagalog the opportunity to advance, but is held back for many by its high hardware requirements. Therefore, the researchers wish to contribute to the accessibility of Filipino natural language processing research by determining if quantization would be beneficial for low-resource BERT models such as for Tagalog.

References

[1]

Acheampong Francisca Adoma, Nunoo-Mensah Henry, Wenyu Chen, and Niyongabo Rubungo Andre. 2020. Recognizing Emotions from Texts using a BERT-Based Approach. 2020 17th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), 2020, pp. 62-66.

[2]

Shivaji Alaparthi and Manit Mishra. 2020. Bidirectional Encoder Representations from Transformers (BERT): A sentiment analysis odyssey.

[3]

Jeffrey Rosario Ancheta, Ken Gorro, and Mark Anthony Uy. 2020. #Walangpasok on Twitter: Natural language processing as a method for analzying Tweets on class suspensions in the Philippines. 2020 12th International Conference on Knowledge and Smart Technology (KST), 2020, pp. 103-108.

[4]

Neil Vicente Cabasag, Vicente Raphael Chan, Sean Christian Lim, Mark Edward Gonzales, and Charibeth Cheng. 2019. Hate speech in philippine election-related tweets: Automatic detection and classification using natural language processing. Philippine Computing Journal, XIV No. 1 August 2019.

[5]

Denis Eka Cahyani, Darmawan Satyananda, Lucky Tri Oktiavana, A Susy Kaspambudi, A Syihabul Anwar, and W Adefa Sekti. 2021. Emotions Classification using Bidirectional Encoder Representations from Transformers. 2021 8th International Conference on Information Technology, Computer and Electrical Engineering (ICITACEE), 2021, pp. 165-169.

[6]

Jasper Kyle Catapang and Jerome Cleofas. 2022. Topic Modeling, Clade-assisted Sentiment Analysis, and Vaccine Brand Reputation Analysis of COVID-19 Vaccine-related Facebook Comments in the Philippines. 2022 IEEE 16th International Conference on Semantic Computing (ICSC), 2022, pp. 123-130.

[7]

Jan Christian Blaise Cruz and Charibeth Cheng. 2019. Evaluating Language Model Finetuning Techniques for Low-resource Languages.

[8]

Jan Christian Blaise Cruz and Charibeth Cheng. 2020. Establishing Baselines for Text Classification in Low-Resource Languages.

[9]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Retrieved April 26, 2023 from

[10]

Yifan Ding. 2020. Training BERT at a University. (Dec 2020). Retrieved August 2, 2022 from https://towardsdatascience.com/training-bert-at-a-university-eedcf940c754

[11]

Rani Horev. 2018. BERT Explained: State of the art language model for NLP. (Nov 2018). Retrieved April 6, 2022 from https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270

[12]

Raghuraman Krishnamoorthi, James Reed, Min Ni, Chris Gottbrath, and Seth Weidman. 2020. Introduction to Quantization on PyTorch. (March 2020). Retrieved August 14, 2023 from https://pytorch.org/blog/introduction-to-quantization-on-pytorch/

[13]

John R. Smith and Shih-Fu Chang. 1997. Visual Seek: a fully automated content-based image query system. In Proceedings of the fourth ACM international conference on Multimedia (MULTIMEDIA ’96). Association for Computing Machinery, New York, NY, USA, 87–98.

Digital Library

[14]

John Snow Labs. 2022. Natural Language Processing Named a Top AI Priority for Technical Leaders, as Healthcare-Specific Models, Domain Expertise, and Open Source Technologies Proliferate, New Research Finds. (March 2022). Retrieved April 6, 2022 from https://www.globenewswire.com/news-release/2022/03/28/2411266/0/en/Natural-Language-Processing-Named-a-Top-AI-Priority-for-Technical-Leaders-as-Healthcare-Specific-Models-Domain-Expertise-and-Open-Source-Technologies-Proliferate-New-Research-Finds.html

[15]

Jairus Mingua, Dionis Padilla, and Evan Joy Celino. 2021. Classification of Fire Related Tweets on Twitter Using Bidirectional Encoder Representations from Transformers (BERT). 2021 IEEE 13th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM)3], 2021, pp. 1-6.

[16]

Sciforce. 2020. Biggest Open Problems in Natural Language Processing. (February 2020). Retrieved April 6, 2022 from https://medium.com/sciforce/biggest-open-problems-in-natural-language-processing-7eb101ccfc9

[17]

Trong-Loc Truong, Hanh-Linh Le, and Thien-Phuc Le-Dang. 2020. Sentiment Analysis Implementing BERT-based Pre-trained Language Model for Vietnamese. 2020 7th NAFOSTED Conference on Information and Computer Science (NICS). pp. 362-367.

[18]

Ziheng Wang. 2021. Speeding up BERT Inference: Quantization vs Sparsity. Retrieved August 2, 2022 from https://towardsdatascience.com/speeding-up-bert-inference-different-approaches-40863b2954c4

[19]

Ali Hadi Zadeh, Isak Edo, Omar Mohamed Awad, and Andreas Moshovos. 2020. GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference. 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, pp. 811-824.

[20]

Ofir Zafrir, Guy Boudoukh, Peter Izsak, and Moshe Wasserblat. 2019. Q8BERT: Quantized 8Bit BERT. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019).

Index Terms

A Comparative Analysis of Quantized and Non-Quantized BERT Model Performance for the Low-Resource Tagalog Language through Binary Text Classification
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language Families

The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction a difficult task for low-resource languages. The pivot language and cognate recognition approaches have been proven useful for inducing bilingual lexicons for such ...
Cross-Lingual Transfer of Large Language Model by Visually-Derived Supervision Toward Low-Resource Languages
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Recent progress on vision and language research has shown that visual supervision improves the performance of large language models (LLMs) in various natural language processing (NLP) tasks. In particular, the Vokenization approach [65] initiated a new ...
Deep Persian sentiment analysis: Cross-lingual training for low-resource languages

With the advent of deep neural models in natural language processing tasks, having a large amount of training data plays an essential role in achieving accurate models. Creating valid training data, however, is a challenging issue in many low-resource ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CIIS '24: Proceedings of the 2024 7th International Conference on Computational Intelligence and Intelligent Systems

November 2024

183 pages

ISBN:9798400717437

DOI:10.1145/3708778

Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 February 2025

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIIS 2024

CIIS 2024: 2024 The 7th International Conference on Computational Intelligence and Intelligent Systems

November 22 - 24, 2024

Nagoya, Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
10
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)10

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Table of Conten