Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

High-order polynomial interpolation with CNN: : A robust approach for missing data imputation

Published: 01 October 2024 Publication History

Abstract

The presence of missing data poses a significant challenge in knowledge extraction, where completeness and quality are crucial factors. The decision to ignore records with missing values in many applications may adversely affect the prediction process and lead to a significant bias in the results. Due to these negative consequences, imputation of missing data has become mandatory. In this study, we propose a method to impute missing values using a higher-order polynomial equation incorporating convolutional neural networks. The missing value of each instance is imputed essentially by using a high-order polynomial equation. The kernel is used to optimize the coefficients of the polynomial, and its weights are determined by learning from given data arranged spatially within the data matrix. In order to optimize coefficients, the kernel calculates a weighted sum of neighboring elements. We evaluate the effectiveness of our method on UCI datasets in comparison with other state-of-the-art methods and the accuracy of our method is greater than 95% on some datasets, such as optdigits 97.96%, letter recognition 97.44%, ozone 96.98%, and pendigits 99.90%. It is evident from the experimental results that the performance of the proposed approach is similar to or even better than that of alternative approaches.

Highlights

A robust imputation approach is proposed based on higher-order polynomial equations.
The proposed approach shows improved performance in case of poorly correlated data.
It is designed to create spatial properties using fuzzy clustering.
Experimental results show the significance of the proposed approach.

References

[1]
Khan H., Wang X., Liu H., Handling missing data through deep convolutional neural network, Inform Sci 595 (2022) 278–293.
[2]
Le Morvan M., Josse J., Scornet E., Varoquaux G., What’sa good imputation to predict with missing values?, Adv Neural Inf Process Syst 34 (2021) 11530–11540.
[3]
Khan H., Wang X., Liu H., A study on relationship between prediction uncertainty and robustness to noisy data, Int J Syst Sci 54 (6) (2023) 1243–1258.
[4]
Tirulo A., Chauhan S., Issac B., Ensemble LOF-based detection of false data injection in smart grid demand response system, Comput Electr Eng 116 (2024).
[5]
Rasheed M.T., Shi D., LSR: Lightening super-resolution deep network for low-light image enhancement, Neurocomputing 505 (2022) 263–275.
[6]
Rasheed M.T., Shi D., Khan H., A comprehensive experiment-based review of low-light image enhancement methods and benchmarking low-light image quality assessment, Signal Process (2022).
[7]
Rasheed M.T., Guo G., Shi D., Khan H., Cheng X., An empirical study on retinex methods for low-light image enhancement, Remote Sens 14 (18) (2022) 4608.
[8]
Pethuraj M.S., bin Mohd Aboobaider B., Salahuddin L.B., Developing lung cancer post-diagnosis system using pervasive data analytic framework, Comput Electr Eng 105 (2023).
[9]
Rao R.S., Kalabarige L.R., Alankar B., Sahu A.K., Multimodal imputation-based stacked ensemble for prediction and classification of air quality index in Indian cities, Comput Electr Eng 114 (2024).
[10]
Xie R., Jan N.M., Hao K., Chen L., Huang B., Supervised variational autoencoders for soft sensor modeling with missing data, IEEE Trans Ind Inf 16 (4) (2019) 2820–2828.
[11]
Basurto N., Arroyo Á., Cambra C., Herrero Á., Imputation of missing values affecting the software performance of component-based robots, Comput Electr Eng 87 (2020).
[12]
Nakagawa S., Freckleton R.P., Missing inaction: the dangers of ignoring missing data, Trends Ecol Evol 23 (11) (2008) 592–596.
[13]
Schafer J.L., Graham J.W., Missing data: our view of the state of the art, Psychol Methods 7 (2) (2002) 147.
[14]
Khan H., Wang X., Liu H., Missing value imputation through shorter interval selection driven by fuzzy C-means clustering, Comput Electr Eng 93 (2021).
[15]
Bokde N., Beck M.W., Álvarez F.M., Kulat K., A novel imputation methodology for time series based on pattern sequence forecasting, Pattern Recognit Lett 116 (2018) 88–96.
[16]
Cheng C.-H., Chan C.-P., Sheu Y.-J., A novel purity-based k nearest neighbors imputation method and its application in financial distress prediction, Eng Appl Artif Intell 81 (2019) 283–299.
[17]
Khan H., Liu H., Liu C., Missing label imputation through inception-based semi-supervised ensemble learning, Adv Comput Intell 2 (1) (2022) 10.
[18]
Purwar A., Singh S.K., Hybrid prediction model with missing value imputation for medical data, Expert Syst Appl 42 (13) (2015) 5621–5631.
[19]
Rasheed M.T., Shi D., Pixel-wise polynomial estimation model for low-light image enhancement, KSII Trans Internet Inf Syst (TIIS) 17 (9) (2023) 2483–2504.
[20]
Dua D., Graff C., UCI machine learning repository, 2017, URL http://archive.ics.uci.edu/ml.
[21]
Khan A., Rasheed M.T., Khan H., An empirical study of deep learning-based feature extractor models for imbalanced image classification, Adv Comput Intell 3 (6) (2023) 20.
[22]
Akande O., Li F., Reiter J., An empirical comparison of multiple imputation methods for categorical data, Amer Statist 71 (2) (2017) 162–170.
[23]
Li L., Prato C.G., Wang Y., Ranking contributors to traffic crashes on mountainous freeways from an incomplete dataset: A sequential approach of multivariate imputation by chained equations and random forest classifier, Accid Anal Prev 146 (2020).
[24]
Slade E., Naylor M.G., A fair comparison of tree-based and parametric methods in multiple imputation by chained equations, Stat Med 39 (8) (2020) 1156–1166.
[25]
Samad M.D., Abrar S., Diawara N., Missing value estimation using clustering and deep learning within multiple imputation framework, Knowl-Based Syst 249 (2022).
[26]
Garciarena U., Santana R., An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers, Expert Syst Appl 89 (2017) 52–65.
[27]
Raja P., Thangavel K., Missing value imputation using unsupervised machine learning techniques, Soft Comput 24 (6) (2020) 4361–4392.
[28]
Figueroa-García J.C., Neruda R., Hernandez-Pérez G., A genetic algorithm for multivariate missing data imputation, Inform Sci 619 (2023) 947–967.
[29]
Karmitsa N., Taheri S., Bagirov A., Mäkinen P., Missing value imputation via clusterwise linear regression, IEEE Trans Knowl Data Eng 34 (4) (2020) 1889–1901.
[30]
Folguera L., Zupan J., Cicerone D., Magallanes J.F., Self-organizing maps for imputation of missing data in incomplete data matrices, Chemometr Intell Lab Syst 143 (2015) 146–151.
[31]
Nishanth K.J., Ravi V., Probabilistic neural network based categorical data imputation, Neurocomputing 218 (2016) 17–25.
[32]
Camino R.D., Hammerschmidt C.A., State R., Improving missing data imputation with deep generative models, 2019, arXiv preprint arXiv:1902.10666.
[33]
Zhang H., Xie P., Xing E., Missing value imputation based on deep generative models, 2018, arXiv preprint arXiv:1808.01684.
[34]
Zhuang Y., Ke R., Wang Y., Innovative method for traffic data imputation based on convolutional neural network, IET Intell Transp Syst 13 (4) (2019) 605–613.
[35]
Sangeetha M., Senthil Kumaran M., Deep learning-based data imputation on time-variant data using recurrent neural network, Soft Comput 24 (2020) 13369–13380.
[36]
Choudhury S.J., Pal N.R., Imputation of missing data with neural networks for classification, Knowl-Based Syst 182 (2019).
[37]
Lai X., Wu X., Zhang L., Lu W., Zhong C., Imputations of missing values using a tracking-removed autoencoder trained with incomplete data, Neurocomputing 366 (2019) 54–65.
[38]
Nazabal A., Olmos P.M., Ghahramani Z., Valera I., Handling incomplete heterogeneous data using vaes, Pattern Recognit 107 (2020).
[39]
Qiu Y.L., Zheng H., Gevaert O., Genomic data imputation with variational auto-encoders, GigaScience 9 (8) (2020) giaa082.
[40]
Pereira R.C., Abreu P.H., Rodrigues P.P., Partial multiple imputation with variational autoencoders: tackling not at randomness in healthcare data, IEEE J Biomed Health Inf 26 (8) (2022) 4218–4227.
[41]
Wang Y., Li D., Li X., Yang M., PC-GAIN: Pseudo-label conditional generative adversarial imputation networks for incomplete data, Neural Netw 141 (2021) 395–403.
[42]
Yuan Y., Zhang Y., Wang B., Peng Y., Hu Y., Yin B., STGAN: Spatio-temporal generative adversarial network for traffic data imputation, IEEE Trans Big Data 9 (1) (2022) 200–211.
[43]
Xia Y., Zhang L., Ravikumar N., Attar R., Piechnik S.K., Neubauer S., Petersen S.E., Frangi A.F., Recovering from missing data in population imaging–Cardiac MR image imputation via conditional generative adversarial nets, Med Image Anal 67 (2021).
[44]
Yoon J., Jordon J., Schaar M., Gain: Missing data imputation using generative adversarial nets, in: International conference on machine learning, PMLR, 2018, pp. 5689–5698.
[45]
Beretta L., Santaniello A., Nearest neighbor imputation algorithms: a critical evaluation, BMC Med Inform Decis Mak 16 (3) (2016) 197–208.
[46]
Stekhoven D.J., Bühlmann P., MissForest—non-parametric missing value imputation for mixed-type data, Bioinformatics 28 (1) (2012) 112–118.
[47]
Rubinsteyn A., Feldman S., fancyimpute: An imputation library for python, 2016, URL https://github.com/iskandr/fancyimpute.
[48]
Lall R., Robinson T., The MIDAS touch: accurate and scalable missing-data imputation with deep learning, Polit Anal 30 (2) (2022) 179–196.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Computers and Electrical Engineering
Computers and Electrical Engineering  Volume 119, Issue PA
Oct 2024
1450 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 October 2024

Author Tags

  1. Missing data imputation
  2. Classification
  3. Higher-order polynomial
  4. Convolutional neural network

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media