Evidence-based adaptive oversampling algorithm for imbalanced classification

Lin, Chen-ju; Leony, Florence

doi:10.1007/s10115-023-01985-5

Evidence-based adaptive oversampling algorithm for imbalanced classification

Regular paper
Published: 23 September 2023

Volume 66, pages 2209–2233, (2024)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Chen-ju Lin¹ &
Florence Leony^1,2

484 Accesses
2 Citations
Explore all metrics

Abstract

Classification task is complicated by several facts including skewed class proportion and unclear decision regions due to noise, class overlap, small disjunct, caused by large within-class variation. These issues make data classification difficult, reducing overall performance, and challenging to draw meaningful insights. In this research, the evidence-based adaptive oversampling algorithm (EVA-oversampling) based on Dempster–Shafer theory of evidence is developed for imbalance classification. This technique involves assigning probability regarding class belonging for each instance to represent uncertainty that each data point may hold. Synthetic data points are generated to make up for the under-representation of minority instances on the region with high confidence, thereby strengthening the minority class region. The experiments revealed that the proposed method worked effectively even in situations where imbalanced counts and data complexity would normally pose significant obstacles. This approach performs better than SMOTE, Borderline-SMOTE, ADASYN, MWMOTE, KMeansSMOTE, LoRAS, and SyMProD algorithms in terms of $F_1$-measure and G-mean for highly imbalanced data while maintaining the overall performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Adaptive Oversampling Technique for Imbalanced Datasets

Multi-oversampling with Evidence Fusion for Imbalanced Data Classification

Uncertainty-Aware Resampling Method for Imbalanced Classification Using Evidence Theory

Data Availability

The datasets used in this study are available online at https://archive.ics.uci.edu/ml/index.php.

References

Dal Pozzolo A, Caelen O, Le Borgne Y-A, Waterschoot S, Bontempi G (2014) Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst Appl 41(10):4915–4928
Article Google Scholar
Kelly D, Glavin FG, Barrett E (2022) Dowts–denial-of-wallet test simulator: synthetic data generation for preemptive defence. J Intell Inf Syst, 1–24
Zhang T, Chen J, Li F, Zhang K, Lv H, He S, Xu E (2022) Intelligent fault diagnosis of machines with small & imbalanced data: a state-of-the-art review and possible extensions. ISA Trans 119:152–171
Article PubMed Google Scholar
Guo R, Liu H, Xie G, Zhang Y (2021) Weld defect detection from imbalanced radiographic images based on contrast enhancement conditional generative adversarial network and transfer learning. IEEE Sens J 21(9):10844–10853
Article ADS Google Scholar
Hammad M, Alkinani MH, Gupta B, El-Latif A, Ahmed A (2021) Myocardial infarction detection based on deep neural network on imbalanced data. Multimedia Syst, pp 1–13
Azhar NA, Pozi MSM, Din AM, Jatowt A (2022) An investigation of smote based methods for imbalanced datasets with data complexity analysis. IEEE Trans Knowl Data Eng
Santos MS, Abreu PH, Japkowicz N, Fernández A, Santos J (2023) A unifying view of class overlap and imbalance: key concepts, multi-view panorama, and open avenues for research. Information Fusion 89:228–253
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
Fernández A, Garcia S, Herrera F, Chawla NV (2018) Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res 61:863–905
Article MathSciNet Google Scholar
Han H, Wang W-Y, Mao B-H (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, pp 878–887. Springer
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia conference on knowledge discovery and data mining, pp 475–482 . Springer
He H, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE World Congress on Computational Intelligence), pp 1322–1328
Douzas G, Bacao F, Last F (2018) Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inf Sci 465:1–20
Article Google Scholar
Zhang Y, Li X, Gao L, Wang L, Wen L (2018) Imbalanced data fault diagnosis of rotating machinery using synthetic oversampling and feature learning. J Manuf Syst 48:34–50
Article Google Scholar
Wei J, Huang H, Yao L, Hu Y, Fan Q, Huang D (2020) Ia-suwo: an improving adaptive semi-unsupervised weighted oversampling for imbalanced classification problems. Knowl-Based Syst 203:106116
Article Google Scholar
Napierala K, Stefanowski J (2016) Types of minority class examples and their influence on learning classifiers from imbalanced data. J Intell Inf Syst 46(3):563–597
Article Google Scholar
Laurikkala J (2001) Improving identification of difficult small classes by balancing class distribution. In: Conference on artificial intelligence in medicine in Europe, pp 63–66 . Springer
Onan A (2019) Consensus clustering-based undersampling approach to imbalanced learning. Sci Program 2019
Chen B, Xia S, Chen Z, Wang B, Wang G (2021) Rsmote: a self-adaptive robust smote for imbalanced problems with label noise. Inf Sci 553:397–428
Article MathSciNet Google Scholar
Dolo KM, Mnkandla E (2022) Modifying the smote and safe-level smote oversampling method to improve performance. In: 4th International conference on wireless, intelligent and distributed environment for communication: WIDECOM 2021, pp 47–59 . Springer
Barua S, Islam MM, Yao X, Murase K (2012) Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425
Article Google Scholar
Kunakorntum I, Hinthong W, Phunchongharn P (2020) A synthetic minority based on probabilistic distribution (symprod) oversampling for imbalanced datasets. IEEE Access 8:114692–114704
Article Google Scholar
Abdi L, Hashemi S (2015) To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans Knowl Data Eng 28(1):238–251
Article Google Scholar
Bej S, Davtyan N, Wolfien M, Nassar M, Wolkenhauer O (2021) Loras: an oversampling approach for imbalanced datasets. Mach Learn 110:279–301
Article MathSciNet Google Scholar
Agrawal A, Viktor HL, Paquet E (2015) Scut: multi-class imbalanced data classification using smote and cluster-based undersampling. In: 2015 7th international joint conference on knowledge discovery, knowledge engineering and knowledge management (IC3k), vol 1, pp 226–234 . IEEE
Alejo R, García V, Pacheco-Sánchez JH (2015) An efficient over-sampling approach based on mean square error back-propagation for dealing with the multi-class imbalance problem. Neural Process Lett 42(3):603–617
Article Google Scholar
Koziarski M, Krawczyk B, Woźniak M (2017) Radial-based approach to imbalanced data oversampling. In: International conference on hybrid artificial intelligence systems, pp 318–327. Springer
Dang XT, Tran DH, Hirose O, Satou K (2015) Spy: A novel resampling method for improving classification performance in imbalanced data. In: 2015 Seventh international conference on knowledge and systems engineering (KSE), pp 280–285. IEEE
Cervantes J, Garcia-Lamont F, Rodriguez L, López A, Castilla JR, Trueba A (2017) Pso-based method for svm classification on skewed data sets. Neurocomputing 228:187–197
Article Google Scholar
Dempster AP (1968) Upper and lower probabilities generated by a random closed interval. Ann Math Stat, pp 957–966
Shafer G (1976) A mathematical theory of evidence, vol 42. Princeton University Press, New Jersey
Book Google Scholar
Chen L, Diao L, Sang J (2019) A novel weighted evidence combination rule based on improved entropy function with a diagnosis application. Int J Distrib Sens Netw 15(1):1550147718823990
Article Google Scholar
Tong Z, Xu P, Denoeux T (2021) An evidential classifier based on Dempster–Shafer theory and deep learning. Neurocomputing 450:275–293
Article Google Scholar
Grina F, Elouedi Z, Lefevre E (2021) Evidential undersampling approach for imbalanced datasets with class-overlapping and noise. In: International conference on modeling decisions for artificial intelligence, pp 181–192. Springer
Grina F, Elouedi Z, Lefevre E (2020) A preprocessing approach for class-imbalanced data using smote and belief function theory. In: Analide C, Novais P, Camacho D, Yin H (eds) Intelligent data engineering and automated learning—IDEAL 2020. Springer, Cham, pp 3–11
Chapter Google Scholar
Grina F, Elouedi Z, Lefèvre E (2021) Uncertainty-aware resampling method for imbalanced classification using evidence theory. In: Vejnarová J, Wilson N (eds) Symbolic and quantitative approaches to reasoning with uncertainty. Springer, Cham, pp 342–353
Chapter Google Scholar
Denoeux T (1995) A k-nearest neighbor classification rule based on Dempster–Shafer theory. IEEE Trans Syst Man Cybern 25(5):804–813
Article Google Scholar
Xiao F, Qin B (2018) A weighted combination method for conflicting evidence in multi-sensor data fusion. Sensors 18(5)
Deng Y (2016) Deng entropy. Chaos Solitons Fract 91:549–553
Article ADS Google Scholar
Capó M, Pérez A, Lozano JA (2020) An efficient k-means clustering algorithm for tall data. Data Min Knowl Disc 34:776–811
Article MathSciNet Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This work was supported by the Ministry of Science and Technology, Taiwan under Grant No. MOST 110-2221-E-155-060-MY2.

Author information

Authors and Affiliations

Department of Industrial Engineering and Management, Yuan Ze University, Taoyuan City, Taiwan R.O.C.
Chen-ju Lin & Florence Leony
Department of Industrial Engineering, Universitas Kristen Maranatha, Bandung, Indonesia
Florence Leony

Authors

Chen-ju Lin
View author publications
You can also search for this author in PubMed Google Scholar
Florence Leony
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Chen-ju Lin was involved in the conceptualization, methodology, reviewing, supervision, funding acquisition. Florence Leony contributed to the methodology, data curation, formal analysis, visualization, investigation, writing, editing. All authors reviewed the manuscript.

Corresponding author

Correspondence to Florence Leony.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lin, Cj., Leony, F. Evidence-based adaptive oversampling algorithm for imbalanced classification. Knowl Inf Syst 66, 2209–2233 (2024). https://doi.org/10.1007/s10115-023-01985-5

Download citation

Received: 15 May 2023
Revised: 09 August 2023
Accepted: 04 September 2023
Published: 23 September 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10115-023-01985-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evidence-based adaptive oversampling algorithm for imbalanced classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Adaptive Oversampling Technique for Imbalanced Datasets

Multi-oversampling with Evidence Fusion for Imbalanced Data Classification

Uncertainty-Aware Resampling Method for Imbalanced Classification Using Evidence Theory

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Evidence-based adaptive oversampling algorithm for imbalanced classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Adaptive Oversampling Technique for Imbalanced Datasets

Multi-oversampling with Evidence Fusion for Imbalanced Data Classification

Uncertainty-Aware Resampling Method for Imbalanced Classification Using Evidence Theory

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation