Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A Comparison of SVM Against Pre-trained Language Models (PLMs) for Text Classification Tasks

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Data Science (LOD 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13811))

Abstract

The emergence of pre-trained language models (PLMs) has shown great success in many Natural Language Processing (NLP) tasks including text classification. Due to the minimal to no feature engineering required when using these models, PLMs are becoming the de facto choice for any NLP task. However, for domain-specific corpora (e.g., financial, legal, and industrial), fine-tuning a pre-trained model for a specific task has shown to provide a performance improvement. In this paper, we compare the performance of four different PLMs on three public domain-free datasets and a real-world dataset containing domain-specific words, against a simple SVM linear classifier with TFIDF vectorized text. The experimental results on the four datasets show that using PLMs, even fine-tuned, do not provide significant gain over the linear SVM classifier. Hence, we recommend that for text classification tasks, traditional SVM along with careful feature engineering can provide a cheaper and superior performance than PLMs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    TFIDF stands for Term Frequency-Inverse Document Frequency, which is a combination of two metrics:

    1. Term frequency (tf): a measure of how frequently a term t, appears in a document d.

    2. Inverse document frequency(idf): a measure of how important a term is. It is computed by dividing the total number of documents in our corpus by the document frequency for each term and then applying logarithmic scaling on the result.

References

  1. Brundage, M.P., Sexton, T., Hodkiewicz, M., Dima, A., Lukens, S.: Technical language processing: unlocking maintenance knowledge. Manuf. Lett. 27, 42–46 (2021)

    Article  Google Scholar 

  2. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, pp. 4171–4186 (2019)

    Google Scholar 

  3. Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, pp. 2227–2237 (2018)

    Google Scholar 

  4. Han, X., et al.: Pre-trained models: past, present and future. AI Open 2, 225–250 (2021)

    Article  Google Scholar 

  5. Aronoff, M., Rees-Miller, J. (eds.): The Handbook of Linguistics. Wiley, Hoboken (2020)

    Google Scholar 

  6. Acheampong, F.A., Nunoo-Mensah, H., Chen, W.: Transformer models for text-based emotion detection: a review of BERT-based approaches. Artif. Intell. Rev. 54(8), 5789–5829 (2021). https://doi.org/10.1007/s10462-021-09958-2

    Article  Google Scholar 

  7. Han, X., Zhao, W., Ding, N., Liu, Z., Sun, M.: PTR: prompt tuning with rules for text classification. arXiv preprint arXiv:2105.11259 (2021)

  8. Schick, T., Schütze, H.: Rare words: A major problem for contextualized embeddings and how to fix it by attentive mimicking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8766–8774 (2020)

    Google Scholar 

  9. McCoy, R.T., Pavlick, E., Linzen, T.: Right for the wrong reasons: diagnosing syntactic heuristics in natural language inference. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy (2019)

    Google Scholar 

  10. Zhao, Z., Zhang, Z., Hopfgartner, F.: A comparative study of using pre-trained language models for toxic comment classification. In: Companion Proceedings of the Web Conference, pp. 500–507 (2021)

    Google Scholar 

  11. Zheng, S., Yang, M.: A new method of improving BERT for text classification. In: Cui, Z., Pan, J., Zhang, S., Xiao, L., Yang, J. (eds.) IScIDE 2019. LNCS, vol. 11936, pp. 442–452. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36204-1-37

    Chapter  Google Scholar 

  12. Conneau, A., Lample, G.: Cross-lingual language model pretraining. In: Proceedings of the Advances in Neural Information Processing Systems, Vancouver, pp. 7057–7067 (2019)

    Google Scholar 

  13. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  14. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Proceedings of the Advances in Neural Information Processing Systems, Vancouver, pp. 5754–5764 (2019)

    Google Scholar 

  15. Gururangan, S., et al.: Don’t stop pretraining: adapt language models to domains and tasks. In: Proceedings of ACL (2020)

    Google Scholar 

  16. Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Hong Kong, pp. 3613– 3618 (2019)

    Google Scholar 

  17. Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2020)

    Article  Google Scholar 

  18. Huang, K., Altosaar, J., Ranganath, R.: ClinicalBERT: modeling clinical notes and predicting hospital readmission. arXiv:1904.05342 (2019)

  19. Alsentzer, E., et al.: Publicly available clinical BERT embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop, pp. pp.72–78. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019)

    Google Scholar 

  20. Araci, D.: FinBERT: financial sentiment analysis with pre-trained language models. arXiv preprint. arXiv:1908.10063 (2019)

  21. Elwany, E., Moore, D., Oberoi, G.: Bert goes to law school: quantifying the competitive advantage of access to large legal corpora in contract understanding. In: Proceedings of NeurIPS Workshop on Document Intelligence (2019)

    Google Scholar 

  22. Lu, D.: Masked reasoner at SemEval-2020 Task 4: fine-tuning RoBERTa for commonsense reasoning. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 411–414 (2020)

    Google Scholar 

  23. Tang, T., Tang, X., Yuan, T.: Fine-tuning BERT for multi-label sentiment analysis in unbalanced code-switching text. IEEE Access 8, 193248–193256 (2020)

    Article  Google Scholar 

  24. Yuan, J., Bian, Y., Cai, X., Huang, J., Ye, Z., Church, K.: Disfluencies and fine-tuning pre-trained language models for detection of Alzheimer’s disease. In: INTER-SPEECH, pp. 2162–2166 (2020)

    Google Scholar 

  25. Sun, Y., et al.: ERNIE 2.0: a continual pre-training framework for language understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8968–8975 (2020)

    Google Scholar 

  26. Kao, W.T., Wu, T.H., Chi, P.H., Hsieh, C.C., Lee, H.Y.: BERT’s output layer recognizes all hidden layers? Some Intriguing Phenomena and a simple way to boost BERT. arXiv preprint arXiv:2001.09309 (2020)

  27. Kovaleva, O., Romanov, A., Rogers, A., Rumshisky, A.: Revealing the dark secrets of BERT. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China (2019)

    Google Scholar 

  28. Greene, D., Cunningham, P.: Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 377–384 (2006)

    Google Scholar 

  29. Newsgroups Data Set Homepage. http://qwone.com/~jason/20Newsgroups/. Accessed March 2022

  30. Consumer Complaint Database Homepage. https://www.consumerfinance.gov/data-research/consumer-complaints/.Online. Accessed March 2022

  31. Zhou, Z.H., Liu, X.Y.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006)

    Article  MathSciNet  Google Scholar 

  32. He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms, and Applications, 1st edn. Wiley-IEEE Press, New York (2013)

    Book  MATH  Google Scholar 

  33. Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)

  34. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In: Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing (EMC2) co-located with the Thirty-third Conference on Neural Information Processing Systems (NeurIPS 2019), pp. 1–5 (2019)

    Google Scholar 

  35. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026683

    Chapter  Google Scholar 

  36. Telnoni, P.A., Budiawan, R., Qana’a, M.: Comparison of machine learning classification method on text-based case in Twitter. In: Proceedings of International Conference on ICT for Smart Society: Innovation and Transformation Toward Smart Region, ICISS (2019)

    Google Scholar 

  37. 4. Support Vector Machines—scikit-learn 0.23.1 documentation. https://scikit-learn.org/stable/modules/svm.html. Accessed March 2022

  38. Chauhan, V.K., Dahiya, K., Sharma, A.: Problem formulations and solvers in linear SVM: a review. Artif. Intell. Rev. 52(2), 803–855 (2018). https://doi.org/10.1007/s10462-018-9614-6

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yasmen Wahba .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wahba, Y., Madhavji, N., Steinbacher, J. (2023). A Comparison of SVM Against Pre-trained Language Models (PLMs) for Text Classification Tasks. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2022. Lecture Notes in Computer Science, vol 13811. Springer, Cham. https://doi.org/10.1007/978-3-031-25891-6_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25891-6_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25890-9

  • Online ISBN: 978-3-031-25891-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics