Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3318299.3318391acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlcConference Proceedingsconference-collections
research-article

Missing Data Processing Based on Deep Neural Network Enhanced by K-Means

Published: 22 February 2019 Publication History

Abstract

This paper proposes a neural network model based on K-means to process the problem of data missing. The method first clusters the samples according to the attributes without missing values to get several clusters, and then puts these clusters into different neural networks to predict the missing values. In this paper, the data can be divided into two types: the continuous numerical type and the discrete numerical type. At the same time, corresponding neural network models are established for these two types. We conduct experiments on the dataset called Human Development Index and Its Components, showing our method to be feasible and superior.

References

[1]
Little R J A, Rubin D B. 2014. Statistical analysis with missing data. John Wiley & Sons.
[2]
Qu L, Li L, Zhang Y, et al. 2009. PPCA-based missing data imputation for traffic flow volume: A systematical approach. IEEE Transactions on intelligent transportation systems, 10(3): 512--522.
[3]
Yuan Y C. 2010. Multiple imputation for missing data: Concepts and new development. SAS Institute Inc, Rockville, MD, 49: 1--11.
[4]
Malla L, Perera-Salazar R, McFadden E, et al. 2018. Handling missing data in propensity score estimation in comparative effectiveness evaluations: a systematic review. Journal of comparative effectiveness research, 7(3): 271--279.
[5]
Zhang Q, Wang L. 2017. Moderation analysis with missing data in the predictors. Psychological methods, 22(4): 649.
[6]
Peugh J L, Enders C K. 2004. Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of educational research, 74(4): 525--556.
[7]
Krause R W, Huisman M, Steglich C, et al. 2018. Missing network data a comparison of different imputation methods. 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 159--163.
[8]
Allison P D. 2003. Missing data techniques for structural equation modeling. Journal of abnormal psychology, 112(4): 545--557.
[9]
Baraldi A N, Enders C K. 2010. An introduction to modern missing data analyses. Journal of school psychology, 48(1): 5--37.
[10]
Caruana R. 2001. A non-parametric EM-style algorithm for imputing missing values. AISTATS.
[11]
Dempster A P, Laird N M, Rubin D B. 1977. Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society. Series B (methodological), 39(1): 1--38.
[12]
Enders C K, Bandalos D L. 2001. The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural equation modeling, 8(3): 430--457.
[13]
Rubin D B. 1987. The calculation of posterior distributions by data augmentation: Comment: A noniterative sampling importance resampling alternative to the data augmentation algorithm for creating a few imputations when fractions of missing information are modest: The sir algorithm. Journal of the American Statistical Association, 82(398): 543--546.
[14]
Enders C K. 2017. Multiple imputation as a flexible tool for missing data handling in clinical research. Behaviour research and therapy, 98: 4--18.
[15]
Lüdtke O, Robitzsch A, Grund S. 2017. Multiple imputation of missing data in multilevel designs: A comparison of different strategies. Psychological methods, 22(1): 141.
[16]
Ni D, Leonard J D. 2005. Markov chain monte carlo multiple imputation using bayesian networks for incomplete intelligent transportation systems data. Transportation research record, 1935(1): 57--67.
[17]
Gilks W R, Richardson S, Spiegelhalter D. 1995. Markov chain Monte Carlo in practice. Chapman and Hall/CRC.
[18]
Yu B, Zhang C, Tang Z H, et al. 2018. Verification method of data quality in science and technology cloud in Shaanxi province. Big Data Analysis (ICBDA), 2018 IEEE 3rd International Conference on. IEEE, 319--323.

Cited By

View all
  • (2020)A Denoising Scheme-Based Traffic Flow Prediction Model: Combination of Ensemble Empirical Mode Decomposition and Fuzzy C-Means Neural NetworkIEEE Access10.1109/ACCESS.2020.29640708(11546-11559)Online publication date: 2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and Computing
February 2019
563 pages
ISBN:9781450366007
DOI:10.1145/3318299
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Southwest Jiaotong University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data missing
  2. continuous numerical type
  3. discrete numerical type
  4. k-means
  5. neural network

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Fundamental Research Funds for the Central Universities
  • National Natural Science Foundation of China

Conference

ICMLC '19

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)A Denoising Scheme-Based Traffic Flow Prediction Model: Combination of Ensemble Empirical Mode Decomposition and Fuzzy C-Means Neural NetworkIEEE Access10.1109/ACCESS.2020.29640708(11546-11559)Online publication date: 2020

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media