CAGAIN: Column Attention Generative Adversarial Imputation Networks

Kawagoshi, Jun; Dong, Yuyang; Nozawa, Takuma; Xiao, Chuan

doi:10.1007/978-3-031-39821-6_21

Jun Kawagoshi¹²,
Yuyang Dong¹³,
Takuma Nozawa¹³ &
…
Chuan Xiao^12,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14147))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

687 Accesses
1 Citations

Abstract

Imputation for missing values is a key operation in building data analysis models. In this paper, we target numerical and categorical values in tabular data. While previous studies have demonstrated the effectiveness of state-of-the-art methods, a major limitation is that these methods lack robustness and their performance significantly varies across datasets and the missing rate of values, hence posing considerable overhead of selecting and tuning models in a real-world scenario. To tackle this problem, we propose a Column Attention Generative Adversarial Imputation Network (CAGAIN), an imputation model which employs a generative adversarial network (GAN) and the attention mechanism. The generator of CAGAIN mimics the distribution of original data and generates imputed samples similar to real ones. The discriminator of CAGAIN distinguishes real and generated samples, so as to improve the quality of the imputed data. At the same time, the attention mechanism captures the correlation between attributes and focuses on the most significant attributes that determine the values of the missing positions. By inheriting the advantages of GAN and the attention mechanism, our model is endowed with robustness to shifting datasets and missing rates, which is demonstrated by experiments using 9 real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Improved generative adversarial imputation networks for missing data

Article 05 September 2024

Numerical Data Imputation: Choose kNN over Deep Learning

SGAIN, WSGAIN-CP and WSGAIN-GP: Novel GAN Methods for Missing Data Imputation

References

Asuncion, A., Newman, D.: UCI machine learning repository (2007). http://archive.ics.uci.edu/ml
Belkin, M., Hsu, D.J., Mitra, P.: Overfitting or perfect fitting? risk bounds for classification and regression rules that interpolate. In: NeurIPS, pp. 2306–2317 (2018)
Google Scholar
Breve, B., Caruccio, L., Deufemia, V., Polese, G.: RENUVER: a missing value imputation algorithm based on relaxed functional dependencies. In: EDBT, pp. 1:52–1:64. OpenProceedings.org (2022)
Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: KDD, pp. 785–794 (2016)
Google Scholar
D’Ambrosio, A., Aria, M., Siciliano, R.: Accurate tree-based missing data imputation and data fusion within the statistical learning paradigm. J. Classif. 29(2), 227–258 (2012)
Article MathSciNet MATH Google Scholar
Friedjungová, M., Vašata, D., Balatsko, M., Jiřina, M.: Missing features reconstruction using a wasserstein generative adversarial imputation network. In: ICCS, pp. 225–239 (2020)
Google Scholar
Gondara, L., Wang, K.: MIDA: multiple imputation using denoising autoencoders. In: PAKDD, pp. 260–272 (2018)
Google Scholar
Goodfellow, I.J., et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)
Google Scholar
Jonsson, P., Wohlin, C.: An evaluation of k-nearest neighbour imputation using likert data. In: METRICS, pp. 108–118 (2004)
Google Scholar
Kalton, G., Kasprzyk, D.: Imputing for missing survey responses. In: ASA-SRMS, vol. 22, p. 31 (1982)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Kodali, N., Abernethy, J., Hays, J., Kira, Z.: On convergence and stability of GANs (2017). arXiv preprint arXiv:1705.07215
Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Smolley, S.P.: Least squares generative adversarial networks. In: ICCV, pp. 2794–2802 (2017)
Google Scholar
McCoy, J.T., Kroon, S., Auret, L.: Variational autoencoders for missing data imputation with application to a simulated milling circuit. IFAC 51(21), 141–146 (2018)
Google Scholar
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: ICLR (2018)
Google Scholar
Nazabal, A., Olmos, P.M., Ghahramani, Z., Valera, I.: Handling incomplete heterogeneous data using VAEs. Pattern Recogn. 107, 107501 (2020)
Article Google Scholar
Neves, D.T., Naik, M.G., Proença, A.: SGAIN, WSGAIN-CP and WSGAIN-GP: novel GAN methods for missing data imputation. In: ICCS, pp. 98–113 (2021)
Google Scholar
Oh, E., Kim, T., Ji, Y., Khyalia, S.: STING: self-attention based time-series imputation networks using GAN. In: ICDM, pp. 1264–1269 (2021)
Google Scholar
Qiu, W., Huang, Y., Li, Q.: IFGAN: missing value imputation using feature-specific generative adversarial networks. In: BigData, pp. 4715–4723 (2020)
Google Scholar
Rekatsinas, T., Chu, X., Ilyas, I.F., Ré, C.: Holoclean: holistic data repairs with probabilistic inference. PVLDB 10(11), 1190–1201 (2017)
Google Scholar
Ryu, S., Kim, M., Kim, H.: Denoising autoencoder-based missing value imputation for smart meters. IEEE Access 8, 40656–40666 (2020)
Article Google Scholar
Song, S., Sun, Y., Zhang, A., Chen, L., Wang, J.: Enriching data imputation under similarity rule constraints. IEEE Trans. Knowl. Data Eng. 32(2), 275–287 (2020)
Article Google Scholar
Stekhoven, D.J., Bühlmann, P.: MissForest - non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118 (2012)
Article Google Scholar
Tihon, S., Javaid, M.U., Fourure, D., Posocco, N., Peel, T.: DAEMA: denoising autoencoder with mask attention. In: ICANN, pp. 229–240 (2021)
Google Scholar
Van Buuren, S., Groothuis-Oudshoorn, K.: mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45, 1–67 (2011)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Google Scholar
Wu, R., Zhang, A., Ilyas, I., Rekatsinas, T.: Attention-based learning for missing data imputation in HoloClean. MLSys 2, 307–325 (2020)
Google Scholar
Yoon, J., Jordon, J., Schaar, M.: GAIN: missing data imputation using generative adversarial nets. In: ICML, pp. 5689–5698 (2018)
Google Scholar

Download references

Acknowledgements

This work is mainly supported by NEC Corporation, and partially supported by JSPS Kakenhi 22H03903 and CREST JPMJCR22M2.

Author information

Authors and Affiliations

Osaka University, Suita, Japan
Jun Kawagoshi & Chuan Xiao
NEC Corporation, Tokyo, Japan
Yuyang Dong & Takuma Nozawa
Nagoya University, Nagoya, Japan
Chuan Xiao

Authors

Jun Kawagoshi
View author publications
You can also search for this author in PubMed Google Scholar
Yuyang Dong
View author publications
You can also search for this author in PubMed Google Scholar
Takuma Nozawa
View author publications
You can also search for this author in PubMed Google Scholar
Chuan Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuyang Dong .

Editor information

Editors and Affiliations

University of Vienna, Vienna, Austria
Christine Strauss
University of Tsukuba, Ibaraki, Japan
Toshiyuki Amagasa
Johannes Kepler University Linz, Linz, Austria
Gabriele Kotsis
Vienna University of Technology, Vienna, Austria
A Min Tjoa
Johannes Kepler University Linz, Linz, Austria
Ismail Khalil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kawagoshi, J., Dong, Y., Nozawa, T., Xiao, C. (2023). CAGAIN: Column Attention Generative Adversarial Imputation Networks. In: Strauss, C., Amagasa, T., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2023. Lecture Notes in Computer Science, vol 14147. Springer, Cham. https://doi.org/10.1007/978-3-031-39821-6_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-39821-6_21
Published: 16 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39820-9
Online ISBN: 978-3-031-39821-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CAGAIN: Column Attention Generative Adversarial Imputation Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improved generative adversarial imputation networks for missing data

Numerical Data Imputation: Choose kNN over Deep Learning

SGAIN, WSGAIN-CP and WSGAIN-GP: Novel GAN Methods for Missing Data Imputation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

CAGAIN: Column Attention Generative Adversarial Imputation Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improved generative adversarial imputation networks for missing data

Numerical Data Imputation: Choose kNN over Deep Learning

SGAIN, WSGAIN-CP and WSGAIN-GP: Novel GAN Methods for Missing Data Imputation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation