Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-319-93040-4_21guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

MIDA: Multiple Imputation Using Denoising Autoencoders

Published: 17 June 2018 Publication History

Abstract

Missing data is a significant problem impacting all domains. State-of-the-art framework for minimizing missing data bias is multiple imputation, for which the choice of an imputation model remains nontrivial. We propose a multiple imputation model based on overcomplete deep denoising autoencoders. Our proposed model is capable of handling different data types, missingness patterns, missingness proportions and distributions. Evaluation on several real life datasets show our proposed model significantly outperforms current state-of-the-art methods under varying conditions while simultaneously improving end of the line analytics.

References

[1]
Beaulieu-Jones, B.K., Moore, J.H.: The pooled resource open-access ALS, and clinical trials consortium. Missing data imputation in the electronic health record using deeply learned autoencoders. In: Pacific Symposium on Biocomputing, vol. 22, pp. 207. NIH Public Access (2016)
[2]
Bengio, Y., Yao, L., Alain, G., Vincent, P.: Generalized denoising auto-encoders as generative models. In: Advances in Neural Information Processing Systems, pp. 899–907 (2013)
[3]
Buuren S and Groothuis-Oudshoorn K MICE: multivariate imputation by chained equations in R J. Stat. Softw. 2011 45 3 1-68
[4]
Chen P Optimization algorithms on subspaces: revisiting missing data problem in low-rank matrix Int. J. Comput. Vis. 2008 80 1 125-142
[5]
Duan, Y., Lv, Y., Kang, W., Zhao, Y.: A deep learning based approach for traffic data imputation. In: 2014 IEEE 17th International Conference on Intelligent Transportation Systems (ITSC), pp. 912–917. IEEE (2014)
[6]
LeCun Y, Bengio Y, and Hinton G Deep learning Nature 2015 521 7553 436-444
[7]
Leisch, F., Dimitriadou, E.: Machine learning benchmark problems (2010)
[8]
Li, S., Kawale, J., Fu, Y.: Deep collaborative filtering via marginalized denoising auto-encoder. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 811–820. ACM (2015)
[9]
Little RJA Missing-data adjustments in large surveys J. Bus. Econ. Stat. 1988 6 3 287-296
[10]
Little RJA and Rubin DB Statistical Analysis with Missing Data 2014 Hoboken Wiley
[11]
Morris TP, White IR, and Royston P Tuning multiple imputation by predictive mean matching and local residual draws BMC Med. Res. Methodol. 2014 14 1 75
[12]
Nelwamondo, F.V., Mohamed, S., Marwala, T.: Missing data: A comparison of neural network and expectation maximisation techniques. arXiv preprint arXiv:0704.3474 (2007)
[13]
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O (1/k2) (1983)
[14]
Rubin DB Inference and missing data Biometrika 1976 63 581-592
[15]
Schafer JL Multiple imputation: a primer Stat. Methods Med. Res. 1999 8 1 3-15
[16]
Shah AD, Bartlett JW, Carpenter J, Nicholas O, and Hemingway H Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study Am. J. Epidemiol. 2014 179 6 764-774
[17]
Sterne JAC, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, and Carpenter JR Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls BMJ 2009 338 b2393
[18]
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103. ACM (2008)

Cited By

View all
  • (2024)Missing Data Imputation with Uncertainty-Driven NetworkProceedings of the ACM on Management of Data10.1145/36549202:3(1-25)Online publication date: 30-May-2024
  • (2024)Do We Really Need Imputation in AutoML Predictive Modeling?ACM Transactions on Knowledge Discovery from Data10.1145/364364318:6(1-64)Online publication date: 12-Apr-2024
  • (2024)Data Imputation from the Perspective of Graph Dirichlet EnergyProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679669(3237-3247)Online publication date: 21-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part III
Jun 2018
851 pages
ISBN:978-3-319-93039-8
DOI:10.1007/978-3-319-93040-4
  • Editors:
  • Dinh Phung,
  • Vincent S. Tseng,
  • Geoffrey I. Webb,
  • Bao Ho,
  • Mohadeseh Ganji,
  • Lida Rashidi

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 17 June 2018

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Missing Data Imputation with Uncertainty-Driven NetworkProceedings of the ACM on Management of Data10.1145/36549202:3(1-25)Online publication date: 30-May-2024
  • (2024)Do We Really Need Imputation in AutoML Predictive Modeling?ACM Transactions on Knowledge Discovery from Data10.1145/364364318:6(1-64)Online publication date: 12-Apr-2024
  • (2024)Data Imputation from the Perspective of Graph Dirichlet EnergyProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679669(3237-3247)Online publication date: 21-Oct-2024
  • (2024)Improved generative adversarial network with deep metric learning for missing data imputationNeurocomputing10.1016/j.neucom.2023.127062570:COnline publication date: 12-Apr-2024
  • (2024)Graph t-SNE multi-view autoencoder for joint clustering and completion of incomplete multi-view dataKnowledge-Based Systems10.1016/j.knosys.2023.111324284:COnline publication date: 25-Jan-2024
  • (2024)Block-wise imputation EM algorithm in multi-source scenario: ADNI casePattern Analysis & Applications10.1007/s10044-024-01268-x27:2Online publication date: 1-Jun-2024
  • (2023)Spatio-Temporal Denoising Graph Autoencoders with Data Augmentation for Photovoltaic Data ImputationProceedings of the ACM on Management of Data10.1145/35887301:1(1-19)Online publication date: 30-May-2023

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media