research-article

Missing Data Processing Based on Deep Neural Network Enhanced by K-Means

Authors:

ZhouHua TangAuthors Info & Claims

ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and Computing

Pages 151 - 155

https://doi.org/10.1145/3318299.3318391

Published: 22 February 2019 Publication History

Abstract

This paper proposes a neural network model based on K-means to process the problem of data missing. The method first clusters the samples according to the attributes without missing values to get several clusters, and then puts these clusters into different neural networks to predict the missing values. In this paper, the data can be divided into two types: the continuous numerical type and the discrete numerical type. At the same time, corresponding neural network models are established for these two types. We conduct experiments on the dataset called Human Development Index and Its Components, showing our method to be feasible and superior.

References

[1]

Little R J A, Rubin D B. 2014. Statistical analysis with missing data. John Wiley & Sons.

Digital Library

[2]

Qu L, Li L, Zhang Y, et al. 2009. PPCA-based missing data imputation for traffic flow volume: A systematical approach. IEEE Transactions on intelligent transportation systems, 10(3): 512--522.

Digital Library

[3]

Yuan Y C. 2010. Multiple imputation for missing data: Concepts and new development. SAS Institute Inc, Rockville, MD, 49: 1--11.

[4]

Malla L, Perera-Salazar R, McFadden E, et al. 2018. Handling missing data in propensity score estimation in comparative effectiveness evaluations: a systematic review. Journal of comparative effectiveness research, 7(3): 271--279.

[5]

Zhang Q, Wang L. 2017. Moderation analysis with missing data in the predictors. Psychological methods, 22(4): 649.

[6]

Peugh J L, Enders C K. 2004. Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of educational research, 74(4): 525--556.

[7]

Krause R W, Huisman M, Steglich C, et al. 2018. Missing network data a comparison of different imputation methods. 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 159--163.

[8]

Allison P D. 2003. Missing data techniques for structural equation modeling. Journal of abnormal psychology, 112(4): 545--557.

[9]

Baraldi A N, Enders C K. 2010. An introduction to modern missing data analyses. Journal of school psychology, 48(1): 5--37.

[10]

Caruana R. 2001. A non-parametric EM-style algorithm for imputing missing values. AISTATS.

[11]

Dempster A P, Laird N M, Rubin D B. 1977. Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society. Series B (methodological), 39(1): 1--38.

[12]

Enders C K, Bandalos D L. 2001. The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural equation modeling, 8(3): 430--457.

[13]

Rubin D B. 1987. The calculation of posterior distributions by data augmentation: Comment: A noniterative sampling importance resampling alternative to the data augmentation algorithm for creating a few imputations when fractions of missing information are modest: The sir algorithm. Journal of the American Statistical Association, 82(398): 543--546.

[14]

Enders C K. 2017. Multiple imputation as a flexible tool for missing data handling in clinical research. Behaviour research and therapy, 98: 4--18.

[15]

Lüdtke O, Robitzsch A, Grund S. 2017. Multiple imputation of missing data in multilevel designs: A comparison of different strategies. Psychological methods, 22(1): 141.

[16]

Ni D, Leonard J D. 2005. Markov chain monte carlo multiple imputation using bayesian networks for incomplete intelligent transportation systems data. Transportation research record, 1935(1): 57--67.

[17]

Gilks W R, Richardson S, Spiegelhalter D. 1995. Markov chain Monte Carlo in practice. Chapman and Hall/CRC.

[18]

Yu B, Zhang C, Tang Z H, et al. 2018. Verification method of data quality in science and technology cloud in Shaanxi province. Big Data Analysis (ICBDA), 2018 IEEE 3rd International Conference on. IEEE, 319--323.

Cited By

Tang JGao FLiu FChen X(2020)A Denoising Scheme-Based Traffic Flow Prediction Model: Combination of Ensemble Empirical Mode Decomposition and Fuzzy C-Means Neural NetworkIEEE Access10.1109/ACCESS.2020.29640708(11546-11559)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2964070

Index Terms

Missing Data Processing Based on Deep Neural Network Enhanced by K-Means
1. Information systems
  1. Data management systems
    1. Information integration
      1. Data cleaning
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Dynamic imputation for improved training of neural network with missing values
Abstract
To train a neural network with an incomplete dataset containing missing values, the dataset is required to be completed in advance. The conventional approach applies missing value imputation before training, and consistently uses the ...
Highlights
- The proposed method aims to improve training of neural network with missing values.
Optimization of missing value imputation for neural networks
Abstract
To train a neural network with an incomplete dataset, missing values can be replaced with plausible substitutions using missing value imputation. Various missing value imputers are available for use, each with its own competencies. ...
Prediction Model in Statistics Data Based on Improved Cluster and Neural Network
ISDEA '12: Proceedings of the 2012 Second International Conference on Intelligent System Design and Engineering Application

This paper presents a new prediction model, which combines the clustering with neural network. The existed clustering algorithm has the shortcoming that can not determine the clustering number K, so this paper combines it with the concept of rough set ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and Computing

February 2019

563 pages

ISBN:9781450366007

DOI:10.1145/3318299

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Southwest Jiaotong University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Fundamental Research Funds for the Central Universities
National Natural Science Foundation of China

Conference

ICMLC '19

ICMLC '19: 2019 11th International Conference on Machine Learning and Computing

February 22 - 24, 2019

Zhuhai, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
77
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tang JGao FLiu FChen X(2020)A Denoising Scheme-Based Traffic Flow Prediction Model: Combination of Ensemble Empirical Mode Decomposition and Fuzzy C-Means Neural NetworkIEEE Access10.1109/ACCESS.2020.29640708(11546-11559)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2964070

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents