Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3616131.3616132acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccbdcConference Proceedingsconference-collections
research-article
Open access

Investigating the Performance of Data Complexity & Instance Hardness Measures as A Meta-Feature in Overlapping Classes Problem

Published: 02 October 2023 Publication History

Abstract

Since the meta-learning recommendation's quality depends on the meta-features decision quality, a common problem in meta-learning is establishing a (good) collection of meta-features that best represent the dataset properties. Therefore, many meta-feature measures/methods have been proposed during the last decade to describe the characteristics of the data. However, little attention has been paid to validating the meta-feature decisions in reflecting the actual data properties. In particular, if the meta-feature analysis is negatively affected by complex data characteristics, such as class overlap due to the distortion imposed by the noisy features at the decision boundary of the classes and thereby produces biased meta-learning recommendations that do not match the actual data characteristics (either by overestimating or underestimating the complexity). Hence, this issue is crucial to ensure the success of the meta-learning model since the learning algorithm selection decision is based on meta-feature analysis. Based on that, in this work, we aim to investigate this by assessing the performance of Complexity Measures (global/data-level measures) & Instance Hardness Measures (local/instance-level measures) as a meta-feature in reflecting the actual data complexity associated with the high-class overlapping problem. The reason for focusing on the overlapping classes problem is that several studies have proven that this data issue significantly contributes to degrading prediction accuracy, with which most real-world datasets are associated. On the other hand, the motivation for using the above measures among different meta-feature methods proposed in the literature is that since this study aims to focus on the overlapping classes problem, the above measures are mainly proposed to estimate the data complexity according to the geometrical descriptions focusing on the class overlap imposed by feature values, in which match the data problem that the study interested to investigate.

References

[1]
Rivolli, A., Garcia, L., Soares, C., Vanschoren, J. and de Carvalho, A., 2022. Meta-features for meta-learning. Knowledge-Based Systems, 240, p.108101.
[2]
Tian, Y., Zhao, X. and Huang, W., 2022. Meta-learning approaches for learning-to-learn in deep learning: A survey. Neurocomputing, 494, pp.203-223.
[3]
R. Shah, V. Khemani, M. Azarian, M. Pecht and Y. Su, "Analyzing Data Complexity Using Metafeatures for Classification Algorithm Selection," 2018 Prognostics and System Health Management Conference (PHM-Chongqing), 2018, pp. 1280-1284.
[4]
Garouani, M., Ahmad, A., Bouneffa, M., Hamlich, M., Bourguin, G. and Lewandowski, A., 2022. Using meta-learning for automated algorithms selection and configuration: an experimental framework for industrial big data. Journal of Big Data, 9(1).
[5]
Lorena, A., Maciel, A., de Miranda, P., Costa, I. and Prudêncio, R., 2017. Data complexity meta-features for regression problems. Machine Learning, 107(1), pp.209-246.
[6]
Gupta, S. and Gupta, A. (2018) “Handling class overlapping to detect noisy instances in classification,” The Knowledge Engineering Review, 33. Available at: https://doi.org/10.1017/s0269888918000115. 
[7]
Lorena, A., Garcia, L., Lehmann, J., Souto, M. and Ho, T., 2020. How Complex Is Your Classification Problem? ACM Computing Surveys, 52(5), pp.1-34.
[8]
Barella, V., Garcia, L., de Souto, M., Lorena, A. and de Carvalho, A., 2021. Assessing the data complexity of imbalanced datasets. Information Sciences, 553, pp.83-109.
[9]
Smith, M., Martinez, T. and Giraud-Carrier, C., 2013. An instance level analysis of data complexity. Machine Learning, 95(2), pp.225-256.
[10]
Paiva, P., Moreno, C., Smith-Miles, K., Valeriano, M. and Lorena, A., 2022. Relating instance hardness to classification performance in a dataset: a visual approach. Machine Learning, 111(8), pp.3085-3123.
[11]
Al Hosni, O. and Starkey, A., 2022. Assessing The Stability and Selection Performance of Feature Selection Methods Under Different Data Complexity. The International Arab Journal of Information Technology, 19(3A).
[12]
Arruda, J.L.M., Prudêncio, R.B.C., Lorena, A.C. (2020). Measuring Instance Hardness Using Data Complexity Measures. In: Cerri, R., Prati, R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science, vol 12320. Springer, Cham. https://doi.org/10.1007/978-3-030-61380-8_33
[13]
L. P. F. Garcia, A. C. Lorena, M. C. P. de Souto and T. K. Ho, "Classifier Recommendation Using Data Complexity Measures," 2018 24th International Conference on Pattern Recognition (ICPR), 2018, pp. 874-879.
[14]
Tin Kam Ho and M. Basu, "Complexity measures of supervised classification problems," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 289-300, March 2002.
[15]
H. Barella, L. P. F. Garcia, M. P. de Souto, A. C. Lorena and A. de Carvalho, "Data Complexity Measures for Imbalanced Classification Tasks," 2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1-8.
[16]
Santos, M.S. et al. (2023) “A unifying view of class overlap and imbalance: Key Concepts, multi-view panorama, and Open Avenues for Research,” Information Fusion, 89, pp. 228–253. Available at: https://doi.org/10.1016/j.inffus.2022.08.017. 
[17]
Tusell-Rey, C.C. et al. (2022) “A priori determining the performance of the customized naïve associative classifier for business data classification based on data complexity measures,” Mathematics, 10(15), p. 2740. Available at: https://doi.org/10.3390/math10152740. 
[18]
Garcia, L.P.F. et al. (2020) “Boosting meta-learning with simulated data complexity measures,” Intelligent Data Analysis, 24(5), pp. 1011–1028. Available at: https://doi.org/10.3233/ida-194803. 
[19]
Barella, V.H., Garcia, L.P.F., de Carvalho, A.C.P.L.F. (2020). Simulating Complexity Measures on Imbalanced Datasets. In: Cerri, R., Prati, R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science, vol 12320. Springer, Cham. https://doi.org/10.1007/978-3-030-61380-8_34
[20]
Moreno, C.C. et al. (2021) “Contrasting the Profiles of Easy and Hard Observations in a Dataset,” NeurIPS Data-Centric AI Workshop [Preprint]. 
[21]
J. Wen et al., "Robust Sparse Linear Discriminant Analysis," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 2, pp. 390-403, Feb. 2019.
[22]
Garcia, L.P.F., de Carvalho, A.C.P.L.F. and Lorena, A.C. (2015) “Effect of label noise in the complexity of classification problems,” Neurocomputing, 160, pp. 108119. Available at: https://doi.org/10.1016/j.neucom.2014.10.085. 
[23]
Leyva, E., Gonzalez, A. and Perez, R. (2015) “A set of complexity measures designed for applying meta-learning to instance selection,” IEEE Transactions on Knowledge and Data Engineering, 27(2), pp. 354–367. Available at: https://doi.org/10.1109/tkde.2014.2327034. 
[24]
Cano, J.-R. (2013) “Analysis of data complexity measures for classification,” Expert Systems with Applications, 40(12), pp. 4820–4831. Available at: https://doi.org/10.1016/j.eswa.2013.02.025. 
[25]
Hoekstra, A. and Duin, R.P.W. (1996) “On the nonlinearity of Pattern Classifiers,” Proceedings of 13th International Conference on Pattern Recognition [Preprint]. Available at: https://doi.org/10.1109/icpr.1996.547429. ]
[26]
Leyva, E., González, A. and Pérez, R. (2015) “Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective,” Pattern Recognition, 48(4), pp. 1523–1537. Available at: https://doi.org/10.1016/j.patcog.2014.10.001. 
[27]
Lorena, A.C. et al. (2012) “Analysis of complexity indices for classification problems: Cancer gene expression data,” Neurocomputing, 75(1), pp. 33–42. Available at: https://doi.org/10.1016/j.neucom.2011.03.054. 
[28]
Mantas, C.J. and Abellán, J. (2014) “Credal-C4.5: Decision tree based on imprecise probabilities to classify noisy data,” Expert Systems with Applications, 41(10), pp.4625–4637. Available at: https://doi.org/10.1016/j.eswa.2014.01.017. 

Cited By

View all
  • (2024)Quantifying the Complexity of Stock Price Prediction Using Regression Complexity MeasuresNavigating the Future of Finance in the Age of AI10.4018/979-8-3693-4382-1.ch010(200-216)Online publication date: 30-Aug-2024
  • (2024)Distance mapping overlap complexity metric for class-imbalance problemsApplied Soft Computing10.1016/j.asoc.2024.111904163(111904)Online publication date: Sep-2024

Index Terms

  1. Investigating the Performance of Data Complexity & Instance Hardness Measures as A Meta-Feature in Overlapping Classes Problem

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ICCBDC '23: Proceedings of the 2023 7th International Conference on Cloud and Big Data Computing
      August 2023
      101 pages
      ISBN:9798400707339
      DOI:10.1145/3616131
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 02 October 2023

      Check for updates

      Author Tags

      1. Class Overlapping
      2. Data Complexity Measure
      3. Instance Hardness Measures
      4. Meta-Feature
      5. Meta-Learning

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ICCBDC 2023

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)230
      • Downloads (Last 6 weeks)26
      Reflects downloads up to 17 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Quantifying the Complexity of Stock Price Prediction Using Regression Complexity MeasuresNavigating the Future of Finance in the Age of AI10.4018/979-8-3693-4382-1.ch010(200-216)Online publication date: 30-Aug-2024
      • (2024)Distance mapping overlap complexity metric for class-imbalance problemsApplied Soft Computing10.1016/j.asoc.2024.111904163(111904)Online publication date: Sep-2024

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media