Abstract
The emergence of pre-trained language models (PLMs) has shown great success in many Natural Language Processing (NLP) tasks including text classification. Due to the minimal to no feature engineering required when using these models, PLMs are becoming the de facto choice for any NLP task. However, for domain-specific corpora (e.g., financial, legal, and industrial), fine-tuning a pre-trained model for a specific task has shown to provide a performance improvement. In this paper, we compare the performance of four different PLMs on three public domain-free datasets and a real-world dataset containing domain-specific words, against a simple SVM linear classifier with TFIDF vectorized text. The experimental results on the four datasets show that using PLMs, even fine-tuned, do not provide significant gain over the linear SVM classifier. Hence, we recommend that for text classification tasks, traditional SVM along with careful feature engineering can provide a cheaper and superior performance than PLMs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
TFIDF stands for Term Frequency-Inverse Document Frequency, which is a combination of two metrics:
1. Term frequency (tf): a measure of how frequently a term t, appears in a document d.
2. Inverse document frequency(idf): a measure of how important a term is. It is computed by dividing the total number of documents in our corpus by the document frequency for each term and then applying logarithmic scaling on the result.
References
Brundage, M.P., Sexton, T., Hodkiewicz, M., Dima, A., Lukens, S.: Technical language processing: unlocking maintenance knowledge. Manuf. Lett. 27, 42–46 (2021)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, pp. 4171–4186 (2019)
Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, pp. 2227–2237 (2018)
Han, X., et al.: Pre-trained models: past, present and future. AI Open 2, 225–250 (2021)
Aronoff, M., Rees-Miller, J. (eds.): The Handbook of Linguistics. Wiley, Hoboken (2020)
Acheampong, F.A., Nunoo-Mensah, H., Chen, W.: Transformer models for text-based emotion detection: a review of BERT-based approaches. Artif. Intell. Rev. 54(8), 5789–5829 (2021). https://doi.org/10.1007/s10462-021-09958-2
Han, X., Zhao, W., Ding, N., Liu, Z., Sun, M.: PTR: prompt tuning with rules for text classification. arXiv preprint arXiv:2105.11259 (2021)
Schick, T., Schütze, H.: Rare words: A major problem for contextualized embeddings and how to fix it by attentive mimicking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8766–8774 (2020)
McCoy, R.T., Pavlick, E., Linzen, T.: Right for the wrong reasons: diagnosing syntactic heuristics in natural language inference. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy (2019)
Zhao, Z., Zhang, Z., Hopfgartner, F.: A comparative study of using pre-trained language models for toxic comment classification. In: Companion Proceedings of the Web Conference, pp. 500–507 (2021)
Zheng, S., Yang, M.: A new method of improving BERT for text classification. In: Cui, Z., Pan, J., Zhang, S., Xiao, L., Yang, J. (eds.) IScIDE 2019. LNCS, vol. 11936, pp. 442–452. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36204-1-37
Conneau, A., Lample, G.: Cross-lingual language model pretraining. In: Proceedings of the Advances in Neural Information Processing Systems, Vancouver, pp. 7057–7067 (2019)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Proceedings of the Advances in Neural Information Processing Systems, Vancouver, pp. 5754–5764 (2019)
Gururangan, S., et al.: Don’t stop pretraining: adapt language models to domains and tasks. In: Proceedings of ACL (2020)
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Hong Kong, pp. 3613– 3618 (2019)
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2020)
Huang, K., Altosaar, J., Ranganath, R.: ClinicalBERT: modeling clinical notes and predicting hospital readmission. arXiv:1904.05342 (2019)
Alsentzer, E., et al.: Publicly available clinical BERT embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop, pp. pp.72–78. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019)
Araci, D.: FinBERT: financial sentiment analysis with pre-trained language models. arXiv preprint. arXiv:1908.10063 (2019)
Elwany, E., Moore, D., Oberoi, G.: Bert goes to law school: quantifying the competitive advantage of access to large legal corpora in contract understanding. In: Proceedings of NeurIPS Workshop on Document Intelligence (2019)
Lu, D.: Masked reasoner at SemEval-2020 Task 4: fine-tuning RoBERTa for commonsense reasoning. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 411–414 (2020)
Tang, T., Tang, X., Yuan, T.: Fine-tuning BERT for multi-label sentiment analysis in unbalanced code-switching text. IEEE Access 8, 193248–193256 (2020)
Yuan, J., Bian, Y., Cai, X., Huang, J., Ye, Z., Church, K.: Disfluencies and fine-tuning pre-trained language models for detection of Alzheimer’s disease. In: INTER-SPEECH, pp. 2162–2166 (2020)
Sun, Y., et al.: ERNIE 2.0: a continual pre-training framework for language understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8968–8975 (2020)
Kao, W.T., Wu, T.H., Chi, P.H., Hsieh, C.C., Lee, H.Y.: BERT’s output layer recognizes all hidden layers? Some Intriguing Phenomena and a simple way to boost BERT. arXiv preprint arXiv:2001.09309 (2020)
Kovaleva, O., Romanov, A., Rogers, A., Rumshisky, A.: Revealing the dark secrets of BERT. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China (2019)
Greene, D., Cunningham, P.: Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 377–384 (2006)
Newsgroups Data Set Homepage. http://qwone.com/~jason/20Newsgroups/. Accessed March 2022
Consumer Complaint Database Homepage. https://www.consumerfinance.gov/data-research/consumer-complaints/.Online. Accessed March 2022
Zhou, Z.H., Liu, X.Y.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006)
He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms, and Applications, 1st edn. Wiley-IEEE Press, New York (2013)
Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In: Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing (EMC2) co-located with the Thirty-third Conference on Neural Information Processing Systems (NeurIPS 2019), pp. 1–5 (2019)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026683
Telnoni, P.A., Budiawan, R., Qana’a, M.: Comparison of machine learning classification method on text-based case in Twitter. In: Proceedings of International Conference on ICT for Smart Society: Innovation and Transformation Toward Smart Region, ICISS (2019)
4. Support Vector Machines—scikit-learn 0.23.1 documentation. https://scikit-learn.org/stable/modules/svm.html. Accessed March 2022
Chauhan, V.K., Dahiya, K., Sharma, A.: Problem formulations and solvers in linear SVM: a review. Artif. Intell. Rev. 52(2), 803–855 (2018). https://doi.org/10.1007/s10462-018-9614-6
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wahba, Y., Madhavji, N., Steinbacher, J. (2023). A Comparison of SVM Against Pre-trained Language Models (PLMs) for Text Classification Tasks. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2022. Lecture Notes in Computer Science, vol 13811. Springer, Cham. https://doi.org/10.1007/978-3-031-25891-6_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-25891-6_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25890-9
Online ISBN: 978-3-031-25891-6
eBook Packages: Computer ScienceComputer Science (R0)