A Comparison of SVM Against Pre-trained Language Models (PLMs) for Text Classification Tasks

Wahba, Yasmen; Madhavji, Nazim; Steinbacher, John

doi:10.1007/978-3-031-25891-6_23

Yasmen Wahba¹⁵,
Nazim Madhavji¹⁵ &
John Steinbacher¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13811))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

1212 Accesses
4 Citations

Abstract

The emergence of pre-trained language models (PLMs) has shown great success in many Natural Language Processing (NLP) tasks including text classification. Due to the minimal to no feature engineering required when using these models, PLMs are becoming the de facto choice for any NLP task. However, for domain-specific corpora (e.g., financial, legal, and industrial), fine-tuning a pre-trained model for a specific task has shown to provide a performance improvement. In this paper, we compare the performance of four different PLMs on three public domain-free datasets and a real-world dataset containing domain-specific words, against a simple SVM linear classifier with TFIDF vectorized text. The experimental results on the four datasets show that using PLMs, even fine-tuned, do not provide significant gain over the linear SVM classifier. Hence, we recommend that for text classification tasks, traditional SVM along with careful feature engineering can provide a cheaper and superior performance than PLMs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

CoBertTC: Covid-19 Text Classification Using Transformer-Based Language Models

Attention Is not Always What You Need: Towards Efficient Classification of Domain-Specific Text

A Comparative Study of Pretrained Language Models on Thai Social Text Categorization

Notes

1.
TFIDF stands for Term Frequency-Inverse Document Frequency, which is a combination of two metrics:
1. Term frequency (tf): a measure of how frequently a term t, appears in a document d.
2. Inverse document frequency(idf): a measure of how important a term is. It is computed by dividing the total number of documents in our corpus by the document frequency for each term and then applying logarithmic scaling on the result.

References

Brundage, M.P., Sexton, T., Hodkiewicz, M., Dima, A., Lukens, S.: Technical language processing: unlocking maintenance knowledge. Manuf. Lett. 27, 42–46 (2021)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, pp. 4171–4186 (2019)
Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, pp. 2227–2237 (2018)
Google Scholar
Han, X., et al.: Pre-trained models: past, present and future. AI Open 2, 225–250 (2021)
Article Google Scholar
Aronoff, M., Rees-Miller, J. (eds.): The Handbook of Linguistics. Wiley, Hoboken (2020)
Google Scholar
Acheampong, F.A., Nunoo-Mensah, H., Chen, W.: Transformer models for text-based emotion detection: a review of BERT-based approaches. Artif. Intell. Rev. 54(8), 5789–5829 (2021). https://doi.org/10.1007/s10462-021-09958-2
Article Google Scholar
Han, X., Zhao, W., Ding, N., Liu, Z., Sun, M.: PTR: prompt tuning with rules for text classification. arXiv preprint arXiv:2105.11259 (2021)
Schick, T., Schütze, H.: Rare words: A major problem for contextualized embeddings and how to fix it by attentive mimicking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8766–8774 (2020)
Google Scholar
McCoy, R.T., Pavlick, E., Linzen, T.: Right for the wrong reasons: diagnosing syntactic heuristics in natural language inference. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy (2019)
Google Scholar
Zhao, Z., Zhang, Z., Hopfgartner, F.: A comparative study of using pre-trained language models for toxic comment classification. In: Companion Proceedings of the Web Conference, pp. 500–507 (2021)
Google Scholar
Zheng, S., Yang, M.: A new method of improving BERT for text classification. In: Cui, Z., Pan, J., Zhang, S., Xiao, L., Yang, J. (eds.) IScIDE 2019. LNCS, vol. 11936, pp. 442–452. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36204-1-37
Chapter Google Scholar
Conneau, A., Lample, G.: Cross-lingual language model pretraining. In: Proceedings of the Advances in Neural Information Processing Systems, Vancouver, pp. 7057–7067 (2019)
Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Proceedings of the Advances in Neural Information Processing Systems, Vancouver, pp. 5754–5764 (2019)
Google Scholar
Gururangan, S., et al.: Don’t stop pretraining: adapt language models to domains and tasks. In: Proceedings of ACL (2020)
Google Scholar
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Hong Kong, pp. 3613– 3618 (2019)
Google Scholar
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2020)
Article Google Scholar
Huang, K., Altosaar, J., Ranganath, R.: ClinicalBERT: modeling clinical notes and predicting hospital readmission. arXiv:1904.05342 (2019)
Alsentzer, E., et al.: Publicly available clinical BERT embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop, pp. pp.72–78. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019)
Google Scholar
Araci, D.: FinBERT: financial sentiment analysis with pre-trained language models. arXiv preprint. arXiv:1908.10063 (2019)
Elwany, E., Moore, D., Oberoi, G.: Bert goes to law school: quantifying the competitive advantage of access to large legal corpora in contract understanding. In: Proceedings of NeurIPS Workshop on Document Intelligence (2019)
Google Scholar
Lu, D.: Masked reasoner at SemEval-2020 Task 4: fine-tuning RoBERTa for commonsense reasoning. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 411–414 (2020)
Google Scholar
Tang, T., Tang, X., Yuan, T.: Fine-tuning BERT for multi-label sentiment analysis in unbalanced code-switching text. IEEE Access 8, 193248–193256 (2020)
Article Google Scholar
Yuan, J., Bian, Y., Cai, X., Huang, J., Ye, Z., Church, K.: Disfluencies and fine-tuning pre-trained language models for detection of Alzheimer’s disease. In: INTER-SPEECH, pp. 2162–2166 (2020)
Google Scholar
Sun, Y., et al.: ERNIE 2.0: a continual pre-training framework for language understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8968–8975 (2020)
Google Scholar
Kao, W.T., Wu, T.H., Chi, P.H., Hsieh, C.C., Lee, H.Y.: BERT’s output layer recognizes all hidden layers? Some Intriguing Phenomena and a simple way to boost BERT. arXiv preprint arXiv:2001.09309 (2020)
Kovaleva, O., Romanov, A., Rogers, A., Rumshisky, A.: Revealing the dark secrets of BERT. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China (2019)
Google Scholar
Greene, D., Cunningham, P.: Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 377–384 (2006)
Google Scholar
Newsgroups Data Set Homepage. http://qwone.com/~jason/20Newsgroups/. Accessed March 2022
Consumer Complaint Database Homepage. https://www.consumerfinance.gov/data-research/consumer-complaints/.Online. Accessed March 2022
Zhou, Z.H., Liu, X.Y.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006)
Article MathSciNet Google Scholar
He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms, and Applications, 1st edn. Wiley-IEEE Press, New York (2013)
Book MATH Google Scholar
Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In: Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing (EMC2) co-located with the Thirty-third Conference on Neural Information Processing Systems (NeurIPS 2019), pp. 1–5 (2019)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026683
Chapter Google Scholar
Telnoni, P.A., Budiawan, R., Qana’a, M.: Comparison of machine learning classification method on text-based case in Twitter. In: Proceedings of International Conference on ICT for Smart Society: Innovation and Transformation Toward Smart Region, ICISS (2019)
Google Scholar
4. Support Vector Machines—scikit-learn 0.23.1 documentation. https://scikit-learn.org/stable/modules/svm.html. Accessed March 2022
Chauhan, V.K., Dahiya, K., Sharma, A.: Problem formulations and solvers in linear SVM: a review. Artif. Intell. Rev. 52(2), 803–855 (2018). https://doi.org/10.1007/s10462-018-9614-6
Article Google Scholar

Download references

Author information

Authors and Affiliations

Western University, London, ON, Canada
Yasmen Wahba & Nazim Madhavji
IBM Canada, Toronto, ON, Canada
John Steinbacher

Authors

Yasmen Wahba
View author publications
You can also search for this author in PubMed Google Scholar
Nazim Madhavji
View author publications
You can also search for this author in PubMed Google Scholar
John Steinbacher
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yasmen Wahba .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy
Giuseppe Nicosia
University of Reading, Reading, UK
Varun Ojha
University of Oxford, Oxford, UK
Emanuele La Malfa
University of Cambridge, Cambridge, UK
Gabriele La Malfa
University of Florida, Gainesville, FL, USA
Panos Pardalos
Free University of Bozen-Bolzano, Bolzano, Italy
Giuseppe Di Fatta
University of Catania, Catania, Italy
Giovanni Giuffrida
Dana-Farber Cancer Institute, Boston, MA, USA
Renato Umeton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wahba, Y., Madhavji, N., Steinbacher, J. (2023). A Comparison of SVM Against Pre-trained Language Models (PLMs) for Text Classification Tasks. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2022. Lecture Notes in Computer Science, vol 13811. Springer, Cham. https://doi.org/10.1007/978-3-031-25891-6_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-25891-6_23
Published: 10 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25890-9
Online ISBN: 978-3-031-25891-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Comparison of SVM Against Pre-trained Language Models (PLMs) for Text Classification Tasks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

CoBertTC: Covid-19 Text Classification Using Transformer-Based Language Models

Attention Is not Always What You Need: Towards Efficient Classification of Domain-Specific Text

A Comparative Study of Pretrained Language Models on Thai Social Text Categorization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Comparison of SVM Against Pre-trained Language Models (PLMs) for Text Classification Tasks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

CoBertTC: Covid-19 Text Classification Using Transformer-Based Language Models

Attention Is not Always What You Need: Towards Efficient Classification of Domain-Specific Text

A Comparative Study of Pretrained Language Models on Thai Social Text Categorization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation