Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey

A Comparative Survey of Instance Selection Methods applied to Non-Neural and Transformer-Based Text Classification

Published: 13 July 2023 Publication History

Abstract

Progress in natural language processing has been dictated by the rule of more: more data, more computing power, more complexity, best exemplified by deep learning Transformers. However, training (or fine-tuning) large dense models for specific applications usually requires significant amounts of computing resources. One way to ameliorate this problem is through data engineering instead of the algorithmic or hardware perspectives. Our focus here is an under-investigated data engineering technique, with enormous potential in the current scenario – Instance Selection (IS) (a.k.a. Selective Sampling, Prototype Selection). The IS goal is to reduce the training set size by removing noisy or redundant instances while maintaining or improving the effectiveness (accuracy) of the trained models and reducing the training process cost. We survey classical and recent state-of-the-art IS techniques and provide a scientifically sound comparison of IS methods applied to an essential natural language processing task—Automatic Text Classification (ATC). IS methods have been normally applied to small tabular datasets and have not been systematically compared in ATC. We consider several neural and non-neural state-of-the-art ATC solutions and many datasets. We answer several research questions based on tradeoffs induced by a tripod (training set reduction, effectiveness, and efficiency). Our answers reveal an enormous unfulfilled potential for IS solutions. Specially, we show that in 12 out of 19 datasets, specific IS methods—namely, Condensed Nearest Neighbor, Local Set-based Smoother, and Local Set Border Selector—can reduce the size of the training set without effectiveness losses. Furthermore, in the case of fine-tuning the Transformer methods, the IS methods reduce the amount of data needed, without losing effectiveness and with considerable training-time gains.

References

[1]
Tariq Abdullah and Ahmed Ahmet. 2022. Deep learning in sentiment analysis: A survey of recent architectures. ACM Comput. Surv. (jun2022). Just Accepted.
[2]
David W. Aha, Dennis Kibler, and Marc K. Albert. 1991. Instance-based learning algorithms. Machine Learning 6, 1 (1991), 37–66.
[3]
Ricardo Baeza-Yates, Berthier Ribeiro-Neto, et al. 1999. Modern Information Retrieval. Vol. 463. ACM press New York.
[4]
Fatiha Barigou. 2018. Impact of instance selection on kNN-based text categorization. Journal of Information Processing Systems 14, 2 (2018).
[5]
Joran Beel and Bela Gipp. 2009. Google scholar’s ranking algorithm: The impact of citation counts (an empirical study). In 2009 Third International Conference on Research Challenges in Information Science. IEEE, 439–446.
[6]
Henry Brighton and Chris Mellish. 2002. Advances in instance selection for instance-based learning algorithms. Data Mining and Knowledge Discovery 6, 2 (2002), 153–172.
[7]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
[8]
Jose Camacho-Collados and Mohammad Taher Pilehvar. 2017. On the role of text preprocessing in neural network architectures: An evaluation study on text categorization and sentiment analysis. arXiv preprint arXiv:1707.01780 (2017).
[9]
Sergio Canuto, Thiago Salles, Thierson Couto Rosa, and Marcos André Gonçalves. 2019. Similarity-based synthetic document representations for meta-feature generation in text classification. In Proceedings of the 42nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 355–364.
[10]
S. Canuto, D. X. Sousa, M. A. Gonçalves, and T. C. Rosa. 2018. A thorough evaluation of distance-based meta-features for automated text classification. IEEE Transactions on Knowledge and Data Engineering (TKDE) 30, 12 (Dec2018), 2242–2256.
[11]
Joel Luís Carbonera. 2017. An efficient approach for instance selection. In Big Data Analytics and Knowledge Discovery, Ladjel Bellatreche and Sharma Chakravarthy (Eds.). Springer International Publishing, Cham, 228–243.
[12]
Joel Luis Carbonera and Mara Abel. 2015. A density-based approach for instance selection. In 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI). 768–774.
[13]
Joel Luis Carbonera and Mara Abel. 2016. A novel density-based approach for instance selection. In 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI). 549–556.
[14]
Joel Luís Carbonera and Mara Abel. 2018. Efficient instance selection based on spatial abstraction. In 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI). 286–292.
[15]
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2, 3 (2011), 27:1–27:27.
[16]
Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey E. Hinton. 2020. Big self-supervised models are strong semi-supervised learners. Advances in Neural Information Processing Systems 33 (2020), 22243–22255.
[17]
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015).
[18]
Yu-Ying Chou, Hsuan-Tien Lin, and Tyng-Luh Liu. 2021. Adaptive and generative zero-shot learning. In International Conference on Learning Representations. https://openreview.net/forum?id=ahAUv8TI2Mz.
[19]
Washington Cunha, Sérgio Canuto, Felipe Viegas, Thiago Salles, Christian Gomes, Vitor Mangaravite, Elaine Resende, Thierson Rosa, Marcos André Gonçalves, and Leonardo Rocha. 2020. Extended pre-processing pipeline for text classification: On the role of meta-feature representations, sparsification and selective sampling. Information Processing & Management (IP&M) 57, 4 (2020), 102263.
[20]
Washington Cunha, Vítor Mangaravite, Christian Gomes, Sérgio Canuto, Elaine Resende, Cecilia Nascimento, Felipe Viegas, Celso França, Wellington Santos Martins, Jussara M. Almeida, Thierson Rosa, Leonardo Rocha, and Marcos André Gonçalves. 2021. On the cost-effectiveness of neural and non-neural approaches and representations for text classification: A comprehensive comparative study. Information Processing & Management 58, 3 (2021), 102481.
[21]
Monir Dahbi, Saadane Rachid, and Samir Mbarki. 2020. Citizen Sentiment Analysis in Social Media Moroccan Dialect as Case Study. 16–29.
[22]
Lingjia Deng and Janyce Wiebe. 2015. MPQA 3.0: An entity/event-level sentiment corpus. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Denver, Colorado, 1323–1328.
[23]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL). 4171–4186.
[24]
Frank Emmert-Streib, Zhen Yang, Han Feng, Shailesh Tripathi, and Matthias Dehmer. 2020. An introductory review of deep learning for prediction models with big data. Frontiers in Artificial Intelligence 3 (2020), 4.
[25]
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research (JMLR) 9 (June2008), 1871–1874.
[26]
Salvador Garcia, Joaquin Derrac, Jose Cano, and Francisco Herrera. 2012. Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 3 (2012), 417–435.
[27]
Siddhant Garg, Goutham Ramakrishnan, and Varun Thumbe. 2021. Towards robustness to label noise in text classification via noise modeling. CoRR abs/2101.11214 (2021). arXiv:2101.11214https://arxiv.org/abs/2101.11214.
[28]
Andrew B. Goldberg, Xiaojin Zhu, and Stephen Wright. 2007. Dissimilarity in graph-based semi-supervised classification. In Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research), Marina Meila and Xiaotong Shen (Eds.), Vol. 2. PMLR, San Juan, Puerto Rico, 155–162. https://proceedings.mlr.press/v2/goldberg07a.html.
[29]
Christian Gomes, Marcos Goncalves, Leonardo Rocha, and Sergio Canuto. 2021. On the cost-effectiveness of stacking of neural and non-neural methods for text classification: Scenarios and performance prediction. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, 4003–4014.
[30]
Peter Hart. 1968. The condensed nearest neighbor rule (Corresp.). IEEE Transactions on Information Theory 14, 3 (1968), 515–516.
[31]
William Hersh, Chris Buckley, T. J. Leone, and David Hickam. 1994. OHSUMED: An interactive retrieval evaluation and new large test collection for research. In SIGIR’94, Bruce W. Croft and C. J. van Rijsbergen (Eds.). Springer London, London, 192–201.
[32]
Yosef Hochberg. 1988. A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 4 (1988).
[33]
David Hull. 1993. Using statistical testing in the evaluation of retrieval experiments. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 329–338.
[34]
Sagar Imambi, Kolla Bhanu Prakash, and GR Kanagachidambaresan. 2021. PyTorch. In Programming with TensorFlow. Springer, 87–104.
[35]
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag of tricks for efficient text classification. In Proceedings of the Conference European Chapter Association Computational Linguistics (EACL). 427–431.
[36]
Kamran Kowsari, Donald E. Brown, Mojtaba Heidarysafa, Kiana Jafari Meimandi, Matthew S. Gerber, and Laura E. Barnes. 2017. HDLTex: Hierarchical deep learning for text classification. In Machine Learning and Applications (ICMLA), 2017 16th IEEE International Conference on. IEEE.
[37]
Kamran Kowsari, Donald E. Brown, Mojtaba Heidarysafa, Kiana Jafari Meimandi, Matthew S. Gerber, and Laura E. Barnes. 2017. Hdltex: Hierarchical deep learning for text classification. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 364–371.
[38]
Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2017. Unsupervised machine translation using monolingual corpora only. arXiv preprint arXiv:1711.00043 (2017).
[39]
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019).
[40]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.
[41]
Guy Lev, Benjamin Klein, and Lior Wolf. 2015. NLDB’15. Chapter In Defense of Word Embedding for Generic Text Representation.
[42]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019).
[43]
Enrique Leyva, Antonio González, and Raúl Pérez. 2015. Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective. Pattern Recognition 48, 4 (2015), 1523–1537.
[44]
Qian Li, Hao Peng, Jianxin Li, Congying Xia, Renyu Yang, Lichao Sun, Philip S. Yu, and Lifang He. 2022. A survey on text classification: From traditional to deep learning. ACM Transactions on Intelligent Systems and Technology (TIST) 13, 2 (2022), 1–41.
[45]
Xin Li and Dan Roth. 2002. Learning question classifiers. In Proceedings of the 19th International Conference on Computational Linguistics - Volume 1 (COLING’02). Association for Computational Linguistics, USA, 1–7.
[46]
Bing Liu. 2020. Sentence Subjectivity and Sentiment Classification (2 ed.). Cambridge University Press, 89–114.
[47]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
[48]
Washington Luiz, Felipe Viegas, Rafael Alencar, Fernando Mourão, Thiago Salles, Dárlinton Carvalho, Marcos Andre Gonçalves, and Leonardo Rocha. 2018. A feature-oriented sentiment rating for mobile app reviews. In Proceedings of the 2018 World Wide Web Conference (WWW’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1909–1918.
[49]
Sean MacAvaney, Hao-Ren Yao, Eugene Yang, Katina Russell, Nazli Goharian, and Ophir Frieder. 2019. Hate speech detection: Challenges and solutions. PLOS ONE 14, 8 (082019), 1–16.
[50]
Mohamed Malhat, Mohamed El Menshawy, Hamdy Mousa, and Ashraf El Sisi. 2020. A new approach for instance selection: Algorithms, evaluation, and comparisons. Expert Systems with Applications 149 (2020), 113297.
[51]
Karen Martins, Pedro Vaz de Melo, and Rodrygo Santos. 2021. Why do document-level polarity classifiers fail?1782–1794.
[52]
Dheeraj Mekala, Xinyang Zhang, and Jingbo Shang. 2020. META: Metadata-empowered weak supervision for text classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 8351–8361.
[53]
Luiz Felipe Mendes, Marcos André Gonçalves, Washington Cunha, Leonardo C. da Rocha, Thierson Couto Rosa, and Wellington Martins. 2020. “Keep it simple, lazy” - MetaLazy: A new metastrategy for lazy text classification. In CIKM’20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, October 19-23, 2020, Mathieu d’Aquin, Stefan Dietze, Claudia Hauff, Edward Curry, and Philippe Cudré-Mauroux (Eds.). ACM, 1125–1134.
[54]
Shervin Minaee, Nal Kalchbrenner, Erik Cambria, Narjes Nikzad, Meysam Chenaghlu, and Jianfeng Gao. 2021. Deep learning–based text classification: A comprehensive review. ACM Comput. Surv. 54, 3, Article 62 (apr2021), 40 pages.
[55]
Michal Moran, Tom Cohen, Yuval Ben-Zion, and Goren Gordon. 2022. Curious instance selection. Information Sciences 608 (2022), 794–808.
[56]
Andrew Ng. 2016. Nuts and bolts of building AI applications using deep learning. NIPS Keynote Talk (2016).
[57]
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of EMNLP. 79–86.
[58]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[59]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.
[60]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. http://arxiv.org/abs/1908.10084.
[61]
Julio C. S. Reis, André Correia, Fabrício Murai, Adriano Veloso, and Fabrício Benevenuto. 2019. Explainable machine learning for fake news detection. In Proceedings of the 10th ACM Conference on Web Science (WebSci’19). Association for Computing Machinery, New York, NY, USA, 17–26.
[62]
Filipe Nunes Ribeiro, Matheus Araújo, P. Gonçalves, Marcos André Gonçalves, and Fabrício Benevenuto. 2016. SentiBench - A benchmark comparison of state-of-the-practice sentiment analysis methods. EPJ Data Science 5 (2016), 1–29.
[63]
Bernard Rous. 2012. Major update to ACM’s computing classification system. Commun. ACM 55, 11 (nov2012), 12.
[64]
Cristòfol Rovira, Lluís Codina, Frederic Guerrero-Solé, and Carlos Lopezosa. 2019. Ranking by relevance and citation counts, a comparative study: Google scholar, microsoft academic, WoS and scopus. Future Internet 11, 9 (2019).
[65]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).
[66]
Fabrizio Sebastiani. 2002. Machine learning in automated text categorization. ACM Comput. Surv. 34, 1 (2002), 1–47.
[67]
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, Washington, USA, 1631–1642. https://aclanthology.org/D13-1170.
[68]
Marina Sokolova and Guy Lapalme. 2009. A systematic analysis of performance measures for classification tasks. Information Processing & Management (IP&M) 45, 4 (July2009), 427–437.
[69]
Jian Tang, Meng Qu, and Qiaozhu Mei. 2015. Pte: Predictive text embedding through large-scale heterogeneous text networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 1165–1174.
[70]
Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. ArnetMiner: Extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). Association for Computing Machinery, New York, NY, USA, 990–998.
[71]
Chih-Fong Tsai, Zong-Yao Chen, and Shih-Wen Ke. 2014. Evolutionary instance selection for text classification. J. Syst. Softw. 90, C (apr2014), 104–113.
[72]
Julián Urbano, Harlley Lima, and Alan Hanjalic. 2019. Statistical significance testing in information retrieval: An empirical analysis of type I, type II and type III errors. In Proceedings of the 42nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 505–514.
[73]
Felipe Viegas, Sérgio Canuto, Christian Gomes, Washington Luiz, Thierson Rosa, Sabir Ribas, Leonardo Rocha, and Marcos André Gonçalves. 2019. CluWords: Exploiting semantic word clustering representation for enhanced topic modeling. In Proceedings of WSDM’19. 753–761.
[74]
Mengting Wan, Rishabh Misra, Ndapa Nakashole, and Julian McAuley. 2019. Fine-grained spoiler detection from large-scale review corpora. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 2605–2610.
[75]
Jiaheng Wang, Bing Xue, Xiaoying Gao, and Mengjie Zhang. 2016. A differential evolution approach to feature selection and instance selection. In Pacific Rim International Conference on Artificial Intelligence. Springer, 588–602.
[76]
Wei Wang, Vincent W. Zheng, Han Yu, and Chunyan Miao. 2019. A survey of zero-shot learning: Settings, methods, and applications. ACM Trans. Intell. Syst. Technol. 10, 2, Article 13 (jan2019), 37 pages.
[77]
Dennis L. Wilson. 1972. Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics (1972), 408–421.
[78]
D. Randall Wilson and Tony R. Martinez. 2000. Reduction techniques for instance-based learning algorithms. Machine Learning 38, 3 (2000), 257–286.
[79]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Proceedings of the 33rd International Conference on Neural Information Processing Systems (NIPS), Vol. 32. 5754–5764.
[80]
Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola. 2021. Dive into deep learning. arXiv preprint arXiv:2106.11342 (2021).
[81]
Hao Zhang, Han Xu, Xin Tian, Junjun Jiang, and Jiayi Ma. 2021. Image fusion meets deep learning: A survey and perspective. Information Fusion 76 (2021), 323–336.
[82]
Xiang Zhang, Junbo Zhao, and Yann LeCun. 2016. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems (NIPS). Vol. 28. Curran Associates, Inc., 649–657. http://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf.
[83]
Yichu Zhou and Vivek Srikumar. 2022. A closer look at how fine-tuning changes BERT. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1046–1061.

Cited By

View all
  • (2024)Análise Comparativa de Métodos de Undersampling em Classificação Automática de Texto Baseada em TransformersRevista Eletrônica de Iniciação Científica em Computação10.5753/reic.2024.464322:1(1-10)Online publication date: 28-Jun-2024
  • (2024)Pipelining Semantic Expansion and Noise Filtering for Sentiment Analysis of Short Documents – CluSent MethodJournal on Interactive Systems10.5753/jis.2024.411715:1(561-575)Online publication date: 11-Jun-2024
  • (2024)Genre Classification of Books in Russian with Stylometric Features: A Case StudyInformation10.3390/info1506034015:6(340)Online publication date: 7-Jun-2024
  • Show More Cited By

Index Terms

  1. A Comparative Survey of Instance Selection Methods applied to Non-Neural and Transformer-Based Text Classification

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 55, Issue 13s
      December 2023
      1367 pages
      ISSN:0360-0300
      EISSN:1557-7341
      DOI:10.1145/3606252
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 July 2023
      Online AM: 24 January 2023
      Accepted: 18 January 2023
      Revised: 04 January 2023
      Received: 10 February 2022
      Published in CSUR Volume 55, Issue 13s

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Instance Selection
      2. text classification
      3. comparative study

      Qualifiers

      • Survey

      Funding Sources

      • CNPq
      • CAPES
      • FAPEMIG
      • Amazon Web Services
      • NVIDIA
      • Google Research Awards

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)772
      • Downloads (Last 6 weeks)69
      Reflects downloads up to 14 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Análise Comparativa de Métodos de Undersampling em Classificação Automática de Texto Baseada em TransformersRevista Eletrônica de Iniciação Científica em Computação10.5753/reic.2024.464322:1(1-10)Online publication date: 28-Jun-2024
      • (2024)Pipelining Semantic Expansion and Noise Filtering for Sentiment Analysis of Short Documents – CluSent MethodJournal on Interactive Systems10.5753/jis.2024.411715:1(561-575)Online publication date: 11-Jun-2024
      • (2024)Genre Classification of Books in Russian with Stylometric Features: A Case StudyInformation10.3390/info1506034015:6(340)Online publication date: 7-Jun-2024
      • (2024)Simultaneous Instance and Attribute Selection for Noise FilteringApplied Sciences10.3390/app1418845914:18(8459)Online publication date: 19-Sep-2024
      • (2024)A selective LVQ algorithm for improving instance reduction techniques and its application for text classificationJournal of Intelligent & Fuzzy Systems10.3233/JIFS-235290(1-14)Online publication date: 24-Apr-2024
      • (2024)Ant-Based Feature and Instance Selection for Multiclass Imbalanced DataIEEE Access10.1109/ACCESS.2024.341866912(133952-133968)Online publication date: 2024
      • (2024)On Representation Learning-based Methods for Effective, Efficient, and Scalable Code RetrievalNeurocomputing10.1016/j.neucom.2024.128172600(128172)Online publication date: Oct-2024
      • (2024)Pose pattern mining using transformer for motion classificationApplied Intelligence10.1007/s10489-024-05325-054:5(3841-3858)Online publication date: 12-Mar-2024
      • (2024)Will sentiment analysis need subculture? A new data augmentation approachJournal of the Association for Information Science and Technology10.1002/asi.2487275:6(655-670)Online publication date: 18-Jan-2024
      • (2023)An Effective, Efficient, and Scalable Confidence-based Instance Selection Framework for Transformer-Based Text ClassificationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591638(665-674)Online publication date: 19-Jul-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media