Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

CAGAIN: Column Attention Generative Adversarial Imputation Networks

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14147))

Included in the following conference series:

Abstract

Imputation for missing values is a key operation in building data analysis models. In this paper, we target numerical and categorical values in tabular data. While previous studies have demonstrated the effectiveness of state-of-the-art methods, a major limitation is that these methods lack robustness and their performance significantly varies across datasets and the missing rate of values, hence posing considerable overhead of selecting and tuning models in a real-world scenario. To tackle this problem, we propose a Column Attention Generative Adversarial Imputation Network (CAGAIN), an imputation model which employs a generative adversarial network (GAN) and the attention mechanism. The generator of CAGAIN mimics the distribution of original data and generates imputed samples similar to real ones. The discriminator of CAGAIN distinguishes real and generated samples, so as to improve the quality of the imputed data. At the same time, the attention mechanism captures the correlation between attributes and focuses on the most significant attributes that determine the values of the missing positions. By inheriting the advantages of GAN and the attention mechanism, our model is endowed with robustness to shifting datasets and missing rates, which is demonstrated by experiments using 9 real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Asuncion, A., Newman, D.: UCI machine learning repository (2007). http://archive.ics.uci.edu/ml

  2. Belkin, M., Hsu, D.J., Mitra, P.: Overfitting or perfect fitting? risk bounds for classification and regression rules that interpolate. In: NeurIPS, pp. 2306–2317 (2018)

    Google Scholar 

  3. Breve, B., Caruccio, L., Deufemia, V., Polese, G.: RENUVER: a missing value imputation algorithm based on relaxed functional dependencies. In: EDBT, pp. 1:52–1:64. OpenProceedings.org (2022)

    Google Scholar 

  4. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: KDD, pp. 785–794 (2016)

    Google Scholar 

  5. D’Ambrosio, A., Aria, M., Siciliano, R.: Accurate tree-based missing data imputation and data fusion within the statistical learning paradigm. J. Classif. 29(2), 227–258 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  6. Friedjungová, M., Vašata, D., Balatsko, M., Jiřina, M.: Missing features reconstruction using a wasserstein generative adversarial imputation network. In: ICCS, pp. 225–239 (2020)

    Google Scholar 

  7. Gondara, L., Wang, K.: MIDA: multiple imputation using denoising autoencoders. In: PAKDD, pp. 260–272 (2018)

    Google Scholar 

  8. Goodfellow, I.J., et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)

    Google Scholar 

  9. Jonsson, P., Wohlin, C.: An evaluation of k-nearest neighbour imputation using likert data. In: METRICS, pp. 108–118 (2004)

    Google Scholar 

  10. Kalton, G., Kasprzyk, D.: Imputing for missing survey responses. In: ASA-SRMS, vol. 22, p. 31 (1982)

    Google Scholar 

  11. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)

    Google Scholar 

  12. Kodali, N., Abernethy, J., Hays, J., Kira, Z.: On convergence and stability of GANs (2017). arXiv preprint arXiv:1705.07215

  13. Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Smolley, S.P.: Least squares generative adversarial networks. In: ICCV, pp. 2794–2802 (2017)

    Google Scholar 

  14. McCoy, J.T., Kroon, S., Auret, L.: Variational autoencoders for missing data imputation with application to a simulated milling circuit. IFAC 51(21), 141–146 (2018)

    Google Scholar 

  15. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: ICLR (2018)

    Google Scholar 

  16. Nazabal, A., Olmos, P.M., Ghahramani, Z., Valera, I.: Handling incomplete heterogeneous data using VAEs. Pattern Recogn. 107, 107501 (2020)

    Article  Google Scholar 

  17. Neves, D.T., Naik, M.G., Proença, A.: SGAIN, WSGAIN-CP and WSGAIN-GP: novel GAN methods for missing data imputation. In: ICCS, pp. 98–113 (2021)

    Google Scholar 

  18. Oh, E., Kim, T., Ji, Y., Khyalia, S.: STING: self-attention based time-series imputation networks using GAN. In: ICDM, pp. 1264–1269 (2021)

    Google Scholar 

  19. Qiu, W., Huang, Y., Li, Q.: IFGAN: missing value imputation using feature-specific generative adversarial networks. In: BigData, pp. 4715–4723 (2020)

    Google Scholar 

  20. Rekatsinas, T., Chu, X., Ilyas, I.F., Ré, C.: Holoclean: holistic data repairs with probabilistic inference. PVLDB 10(11), 1190–1201 (2017)

    Google Scholar 

  21. Ryu, S., Kim, M., Kim, H.: Denoising autoencoder-based missing value imputation for smart meters. IEEE Access 8, 40656–40666 (2020)

    Article  Google Scholar 

  22. Song, S., Sun, Y., Zhang, A., Chen, L., Wang, J.: Enriching data imputation under similarity rule constraints. IEEE Trans. Knowl. Data Eng. 32(2), 275–287 (2020)

    Article  Google Scholar 

  23. Stekhoven, D.J., Bühlmann, P.: MissForest - non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118 (2012)

    Article  Google Scholar 

  24. Tihon, S., Javaid, M.U., Fourure, D., Posocco, N., Peel, T.: DAEMA: denoising autoencoder with mask attention. In: ICANN, pp. 229–240 (2021)

    Google Scholar 

  25. Van Buuren, S., Groothuis-Oudshoorn, K.: mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45, 1–67 (2011)

    Article  Google Scholar 

  26. Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)

    Google Scholar 

  27. Wu, R., Zhang, A., Ilyas, I., Rekatsinas, T.: Attention-based learning for missing data imputation in HoloClean. MLSys 2, 307–325 (2020)

    Google Scholar 

  28. Yoon, J., Jordon, J., Schaar, M.: GAIN: missing data imputation using generative adversarial nets. In: ICML, pp. 5689–5698 (2018)

    Google Scholar 

Download references

Acknowledgements

This work is mainly supported by NEC Corporation, and partially supported by JSPS Kakenhi 22H03903 and CREST JPMJCR22M2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuyang Dong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kawagoshi, J., Dong, Y., Nozawa, T., Xiao, C. (2023). CAGAIN: Column Attention Generative Adversarial Imputation Networks. In: Strauss, C., Amagasa, T., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2023. Lecture Notes in Computer Science, vol 14147. Springer, Cham. https://doi.org/10.1007/978-3-031-39821-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-39821-6_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-39820-9

  • Online ISBN: 978-3-031-39821-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics