Abstract
Filtering refers to the process of defining, detecting and correcting errors in a given dataset, to achieve system reliability and minimize the impact of errors in data analysis. Automated and accurate tools for data filtering and healing are crucial to ensure reliability of the system. This study aims to investigate statistical and machine-learning-based methodologies for data gaps healing and missing values imputation. In total, five models are being investigated individually, the well known ARIMA model, Linear and Polynomial Interpolation, General Regression and Facebook Prophet. The raw data that are used to evaluate these methods are simulated, and artificial data gaps are imposed randomly within the dataset to evaluate the univariate imputation performance of the aforementioned models based on Mean Squared Error and Mean Absolute Error. As expected the evaluation results illustrate the efficiency of highly elaborate machine-learning Facebook Prophet against more simple statistic ARIMA in expense of time and computational efforts. However, for Big Data univariate imputation applications the study findings suggest that a combination of ARIMA and Facebook Prophet, depending on the data gap size, could balance out the required computational resources while maintaining highly accurate imputation results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Roque, N.A., Ram, N.: tsfeaturex: an R package for automating time series feature extraction. J. Open Source Softw. 4(37) (2019)
Olivera, P., et al.: Big data in IBD: a look into the future. Nat. Rev. Gastroenterol. Hepatol. 16(5), 312–321 (2019)
Hancock, J.T., Khoshgoftaar, T.M.: CatBoost for big data: an interdisciplinary review. J. Big Data 7(1), 1–45 (2020). https://doi.org/10.1186/s40537-020-00369-8
Schauer, J.M., et al.: Exploratory analyses for missing data in meta-analyses and meta-regression: a tutorial. Alcohol Alcohol. 57(1), 35–46 (2022)
Bache-Mathiesen, L.K., et al.: Handling and reporting missing data in training load and injury risk research. Sci. Med. Footb. 1–13 (2021)
Kahale, L.A., et al.: Potential impact of missing outcome data on treatment effects in systematic reviews: imputation study. bmj 370 (2020)
Lin, W.-C., Tsai, C.-F.: Missing value imputation: a review and analysis of the literature (2006–2017). Artif. Intell. Rev. 53(2), 1487–1509 (2019). https://doi.org/10.1007/s10462-019-09709-4
Flores, A., Tito, H., Silva, C.: Local average of nearest neighbors: univariate time series imputation. Int. J. Adv. Comput. Sci. Appl. 10(8), 45–50 (2019)
Saad, M., et al.: Tackling imputation across time series models using deep learning and ensemble learning. In: 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE (2020)
Saad, M., et al.: Machine learning based approaches for imputation in time series data and their impact on forecasting. In: 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE (2020)
Zymbler, M., et al.: Cleaning sensor data in smart heating control system. In: 2020 Global Smart Industry Conference (GloSIC). IEEE (2020)
Brajković, H., Jakšić, D., Poščić, P.: Data warehouse and data quality-an overview. In: Central European Conference on Information and Intelligent Systems. Faculty of Organization and Informatics Varazdin (2020)
Chiu, P.C., Selamat, A., Krejcar, O.: Infilling missing rainfall and runoff data for Sarawak, Malaysia using gaussian mixture model based K-Nearest neighbor imputation. In: Wotawa, F., Friedrich, G., Pill, I., Koitz-Hristov, R., Ali, M. (eds.) IEA/AIE 2019. LNCS (LNAI), vol. 11606, pp. 27–38. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22999-3_3
Afrifa-Yamoah, E., et al.: Missing data imputation of high-resolution temporal climate time series data. Meteorol. Appl. 27(1), e1873 (2020)
Chaudhry, A., et al.: A method for improving imputation and prediction accuracy of highly seasonal univariate data with large periods of missingness. Wirel. Commun. Mob. Comput. 2019, 1–13 (2019)
Jan, B., et al.: Deep learning in big data analytics: a comparative study. Comput. Electr. Eng. 75, 275–287 (2019)
Acknowledgements
The research leading to these results was partially funded by the European Commission “EEB-07-2017 Integration of energy harvesting at building and district level” - PLUG-N-HARVEST H2020 project (Grant agreement ID: 768735) https://www.plug-n-harvest.eu/, accessed on 22 February 2022; and “LC-SC3-B4E-3-2020 Upgrading smartness of existing buildings through innovations for legacy equipment” - Smart2B H2020 project (Grant agreement ID: 101023666) https://www.smart2b-project.eu/, accessed on 2 March 2022.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 IFIP International Federation for Information Processing
About this paper
Cite this paper
Stefanopoulou, A. et al. (2022). Performance Meta-analysis for Big-Data Univariate Auto-Imputation in the Building Sector. In: Maglogiannis, I., Iliadis, L., Macintyre, J., Cortez, P. (eds) Artificial Intelligence Applications and Innovations. AIAI 2022 IFIP WG 12.5 International Workshops. AIAI 2022. IFIP Advances in Information and Communication Technology, vol 652. Springer, Cham. https://doi.org/10.1007/978-3-031-08341-9_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-08341-9_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08340-2
Online ISBN: 978-3-031-08341-9
eBook Packages: Computer ScienceComputer Science (R0)