Abstract
The Meteorology is a field where huge amounts of data are generated, mainly collected by sensors at weather stations, where different variables can be measured. Those data have some particularities such as high volume and dimensionality, the frequent existence of missing values in some stations, and the high correlation between collected variables. In this regard, it is crucial to make use of Big Data and Data Mining techniques to deal with those data and extract useful knowledge from them that can be used, for instance, to predict weather phenomena. In this paper, we propose a visual big data system that is designed to deal with high amounts of weather-related data and lets the user analyze those data to perform predictive tasks over the considered variables (temperature and rainfall). The proposed system collects open data and loads them onto a local NoSQL database fusing them at different levels of temporal and spatial aggregation in order to perform a predictive analysis using univariate and multivariate approaches as well as forecasting based on training data from neighbor stations in cases with high rates of missing values. The system has been assessed in terms of usability and predictive performance, obtaining an overall normalized mean squared error value of 0.00013, and an overall directional symmetry value of nearly 0.84. Our system has been rated positively by a group of experts in the area (all aspects of the system except graphic desing were rated 3 or above in a 1–5 scale). The promising preliminary results obtained demonstrate the validity of our system and invite us to keep working on this area.
Similar content being viewed by others
Notes
Check ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt for further information on the M-,Q-, and S-Flag.
All that information has been obtained from https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt
There are some Spanish word in the figure whose meaning is: estación = station; fecha = date; valor_dato = datum_value
References
Aggarwal C (2014) Data classification – algorithms and applications, Chapman & Hall/CRC
Alodah A, Seidou O (2019) The adequacy of stochastically generated climate time series for water resources systems risk and performance assessment. Stoch Environ Res Risk Assess 33:253–269
Ambigavathi M, and Sridharan D (2020) A survey on big data in healthcare applications. In: Choudhury S., Mishra R., Mishra R., Kumar A. (eds) Intelligent communication, control and devices. Advances in intelligent systems and computing, vol 989. Springer, Singapore
Baerg A (2017) Big data, sport, and the digital divide: theorizing how athletes might respond to big data monitoring. Journal of Sport and Social Issues 41(1):3–20
Bajaber F, Sakr S, Batarfi O, Altalhi A, Barnawi A (2020) Benchmarking big data systems: a survey. Comput Commun 149:241–251
Booz J, Yu W, Xu G, Griffith D, and Golmie N (2019) A Deep Learning-Based Weather Forecast System for Data Volume and Recency Analysis, 2019 International Conference on Computing, Networking and Communications (ICNC), Honolulu, HI, USA, pp. 697–701
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Chodorow K, and Dirolf M (2010) MongoDB: the definitive guide, O′Reilly media, Inc., Sebastopol, CA, USA
Chouksey P, Chauhan AS (2017) A review of weather data analytics using big data. International Journal of Advanced Research in Computer and Communication Engineering 6(1):365–368
Corbellini A, Mateos C, Zunino A, Godoy D, Schiaffino S (2017) Persisting big-data: the NoSQL landscape. Inf Syst 63:1–23
Dagade V, Lagali M, Avadhani S, Kalekar P (2015) Big data weather analytics using Hadoop. International Journal of Emerging Technology in Computer Science & Electronics 14(2):847–851
Fayyad UM, Piatetsky-Shapiro G, and Smyth P (1996) “From Data Mining To Knowledge Discovery: An Overview,” in Advances In Knowledge Discovery And Data Mining, eds. U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, AAAI Press/The MIT Press, Menlo Park, CA., pp. 1–34
Firican G (2020) The 10 Vs of big data. TDWI. https://tdwi.org/articles/2017/02/08/10-vs-of-big-data.aspx [accessed July 2020]
Gutiérrez PA, Pérez-Ortiz M, Sánchez-Monedero J, Fernández-Navarro F, Hervás-Martínez C (2016) Ordinal regression methods: survey and experimental study. IEEE Trans Knowl Data Eng 28(1):127–146
Hassani H, Silva ES (2015) Forecasting with big data: a review. Ann Data Sci 2:5–19
Haupt SE and Kosovic B (2015) Big Data and Machine Learning for Applied Weather Forecasts: Forecasting Solar Power for Utility Operations, 2015 IEEE Symposium Series on Computational Intelligence, Cape Town, pp. 496–501
Haykin S (1998) Neural Networks: A Comprehensive Foundation (2 ed.). Prentice Hall
Hussein E, Sadiki R, Jafta Y, Sungay MM, Ajayi O (2020) And a. Bagula a., big data processing using Hadoop and spark: the case of meteorology data. In: Zitouni R, Agueh M, Houngue P, Soude H (eds) E-infrastructure and e-Services for Developing Countries. AFRICOMM 2019. Lecture notes of the Institute for Computer Sciences, social informatics and telecommunications engineering, vol 311. Springer, Cham
Ismail KA, Majid MA, Zain JM, and Abu Bakar NA (2016) Big Data prediction framework for weather Temperature based on MapReduce algorithm, 2016 IEEE Conference on Open Systems (ICOS), Langkawi, pp. 13–17
Ismail KA, Majid MA, Fakherldin M, Zain JM (2017) A big data prediction framework for weather forecast using MapReduce algorithm. J Comput Theor Nanosci 23(11):11138–11143(6)
Jose B and Abraham S (2017) Exploring the merits of nosql: A study based on mongodb, International Conference on Networks & Advances in Computational Technologies (NetACT), Thiruvanthapuram, pp. 266–271, 2017
Küçükkeçeci C, Yazici A (2019) Multilevel object tracking in wireless multimedia sensor networks for surveillance applications using graph-based big data. IEEE Access 7:67818–67832
Kulkarni P, and Akhilesh KB (2020) big data analytics as an enabler in smart governance for the future smart cities. In: Akhilesh K., Möller D. (eds) Smart technologies. Springer, Singapore
Lakshman A, Malik P (2010) Cassandra: a decentralized structured storage system. ACM SIGOPS Oper Syst Rev 44(2):35–40
Lin S-Y, Chiang C-C, Li J-B, Hung Z-S, Chao K-M (2018) Dynamic fine-tuning stacked auto-encoder neural network for weather forecast. Futur Gener Comput Syst 89:446–454
Liu JNK, Hu Y, He Y, Chan PW, and Lai L (2015) Deep Neural Network Modeling for Big Data Weather Forecasting. In: Pedrycz W., Chen SM. (eds) Information Granularity, Big Data, and Computational Intelligence. Studies in Big Data, vol 8, pp 389–408, Springer, Cham
Liu H, Ong Y, Shen X and Cai J, When Gaussian Process Meets Big Data: A Review of Scalable GPs, in IEEE Transactions on Neural Networks and Learning Systems.
Lynch C (2008) Big data: How do your data grow? Nature 455(7209):28–29
Marchioni F (2012) Infinispan data grid platform. Packt Pub Limited, Birmingham
Membrey P, Plugge E, Hawkins T (2010) The definitive guide to MongoDB: the NoSQL database for cloud and desktop computing. Apress, Berkely
Miyoshi T, Kondo K, Terasaki K (2015) Big ensemble data assimilation in numerical weather prediction. Computer 48(11):15–21
Moreno FJ (2019) Sistema big data para mejorar los rendimientos agrícolas en Castilla y León, Degree dissertation, Udima, Madrid, Spain
Narendra K, and Aghila G (2020) Securing Online Bank's Big Data Through Block Chain Technology: Cross-Border Transactions Security and Tracking. In R. Joshi, & B. Gupta (Eds.), Security, Privacy, and Forensics Issues in Big Data pp. 247–263
Objectivity Inc. (2020) InfiniteGraph, http://www.objectivity.com/infinitegraph, 2013 (accessed 17.04.20).
Pandey P, Kumar M and Srivastava P (2016) Classification techniques for big data: A survey, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, pp. 3625–3629
Pyzel P (2019) Ampliación de un sistema de Big data para mejorar los rendimientos agrícolas con objetivo de realizar previsiones de necesidades de agua tratada en países con escasez de recursos hídricos, Degree dissertation, Udima
Renuka Devi D, and Sasikala S (2019) Online Feature Selection (OFS) with Accelerated Bat Algorithm (ABA) and Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP) for big data streams. Journal of Big Data, vol. 6, no. 103
Seber GAF, and Lee AJ, Linear regression analysis, 2nd edition, Wiley Series in Probability and Statistics, Wiley-Interscience, 2003.
Shastri A, Deshpande M (2020) A review of big data and its applications in healthcare and public sector. In: Kulkarni A et al (eds) Big data analytics in healthcare. Studies in big data, vol 66. Springer, Cham
Shevade SK, Keerthi SS, Bhattacharyya C, and Murthy KRK (1999) Improvements to the SMO algorithm for SVM regression, IEEE Trans Neural Netw
Torres JF, Troncoso A, Koprinska I, Wang Z, Martínez-Álvarez F (2019) Big data solar power forecasting based on deep learning and multiple data sources. Expert Syst 36:e12394. https://doi.org/10.1111/exsy.12394
Udeh K, Wanik DW, Bassill N and Anagnostou E (2019) Time Series Modeling of Storm Outages with Weather Mesonet Data for Emergency Preparedness and Response, 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York City, NY, USA, pp. 0499–0505
Werner Kristjanpoller R, Kevin Michell V (2018) A stock market risk forecasting model through integration of switching regime, ANFIS and GARCH techniques. Appl Soft Comput 67:106–116
Wibisono A, Adibah J, Mursanto P, and Saputri MS (2019) Improvement of Big Data Stream Mining Technique for Automatic Bone Age Assessment, Proceedings of the 2019 ACM 3rd International Conference on Big Data Research, pp. 119–123
Witten IH, Frank E, Trigg L, Hall M Holmes G, and Cunningham SJ (1999) Weka: Practical Machine Learning Tools and Techniques with Java Implementations, Proceedings of the ICONIP/ANZIIS/ANNES'99 Workshop on Emerging Knowledge Engineering and Connectionist-Based Information Systems, pp. 192–196
Wu Y, Huang H, Wu N, Wang Y, Bhuiyan MZA, Wang T (2020) An incentive-based protection and recovery strategy for secure big data in social networks. Inf Sci 508:79–91
Yang R, Yu L, Zhao Y, Yu H, Xu G, Wu Y, Liu Z (2020) Big data analytics for financial market volatility forecast based on support vector machine. Int J Inf Manag 50:452–462
Acknowledgments
This paper was drafted as part of Juan A. Lara’s research stay during 2019-2020 at Jordan University of Science and Technology, JUST (Jordan), which partially sponsored this research. The authors would like to thank UDIMA’s and JUST’s students who took part in the design and implementation of the system, particularly Francisco Javier Moreno Hermosilla, Paulina Pyzel and Amnah Al-Abdi; and JUST’s experts for providing their feedback in order to assess this system.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
APPENDIX
APPENDIX
1.1 I –. ARFF files generated by the system
1.2 A. Excerpt of a particular minable view created for “standard” analysis (file .arff)
@relation weather-project.
@attribute Date date ‘yyyy-MM-dd’.
@attribute raLnfall numeric.
@attribute tmLn numeric.
@attribute tmax numeric.
@data.
2016–1-C1,40.90490992906111,3.125,13.33111111111111.
2016–2-C1,34.753053538158774,5.157777777777778,18.84.
2016-3-01,48.504665419434346,7.76046511627907,21.05625.
2016-4-01,42.176677782541,12.59375,28.476829268292683.
2016-5-01,?,14.482608695652175,29.78192771084337.
2016-6-01,?,19.125555555555557,36.2038961038961.
2016-7-01,?,20.276767676767676,36.09493670886076.
2016-8-01,?,21.55056179775281,36.88076923076923.
2016-9-01,?,16.78426966292135,32.72894736842105.
2016-10-01,48.04021044733257,13.712903225806452,29.78170731707317.
2016-11-01,?f7.062637362637362,21.71772151898734.
2016-12-01,44.539838581248475,2.833707865168539,13.484210526315788.
2017-1-01,32.95836866004329,1.4148936170212765,13.410975609756099.
2017-2-01,36.37586159726386,1.1903225806451612,15.182894736842105.
2017-3-01,40.60443010546419,6.790425531914893,20.07.
2017-4-01,39.17010546939185,12.114285714285714,27.001785714285717.
2017-5-01,?,14.576842105263157,31.349.
2017-6-01,?,18.812222222222225,34.6.
2017-7-01,?,22.743478260869566,38.703947368421055.
2017-8-01,?,20.94123711340206,37.10253164556962.
2017-9-01,?,18.993269230769233,34.8725.
2017–10-01,0,14.519847328244273,27.939772727272725.
2017–11-01,34.965075614664805,8.31359223300971,21.822093023255814.
2017–12-01,40.16383020752389,5.935294117647059,19.084883720930232
1.3 B. Excerpt of a particular minable view created for “neighbour-based” analysis (file .arff)
@relation weather-project @attriblate year numem-ic @attribu-te month numeric @attribu-te rainfall numeric @attribu-te latitude numeric @attribu-te longitiade numem-ic @attribu-te altitaide numeric @data.
2016,1,3-258,096,528,021,482,325,390,381,950,6E16.
2016,2,3-8,213,641,296,489,095,325,390,381,950,686.
2016,2,5-299,971,020,274,537,325,390,381,950,686.
2016,1,2-4,849,066,497,880,004,325,390,381,950,686.
2016,5,2,325,390,381,950,686.
2016,6,2,325,390,381,950,686.
2016,7,2,325,390,381′950,686.
2016,8,2,325,390,381′950,686.
2016,9,2,325,390,381′950,686.
2016,10,?,325,390,381,950,686.
2016,11,?,325,390,381,950,686.
2016,12,3-349,904,087,274,605,325,390,381,950,6436.
2017,1,?,325,290,381,950,686.
2017,2,2,225,390,381,950,686.
2017,3,4-762,173,924,797,756,325,390,381,950,6436.
2017,4,4-269,697,449,699,962,325,390,381,950,6436.
2017,5,2,325,390,281,950,686.
2017,6,2,325,390,281,950,686.
2017,7,2,325,390,281,950,686.
2017,8,2,325,390,281,950,686.
2017,9,2,325,390,201,950,686.
2017,10,?,325,390,201,950,686.
2017,11,2-4,849,066,497,880,004,325,390,381,950,606.
2017,12,2-0149020205422647,325,390,381,950,606.
2016,1,2-7,950,615,700,918,397,321,610,371,490,677
1.4 C. Excerpt of. ARFF test file
@relation weather-project.
@attribute year numeric.
@attribute month numeric.
@attribute rainfall numeric.
@attribute latitude numeric.
@attribute longitude numeric.
@attribute altitude numeric.
data.
2018,1,?,325,390,381,950,686.
2018,2,?,325,390,381,950,686.
2018,3,?,325,390,381,950,686.
2018,4,?,325,390,381,950,686.
2018,5,?,325,390,381,950,686.
2018,6,?,325,390,381,950,686.
2018,7,?,325,390,381,950,686.
2018,8,?,325,390,381,950,686.
2018,9,?,325,390,381,950,686.
2018,10,?,325,390,381,950,686.
2018,11,?,325,390,381,950,686.
2018,12,?,325,390,381,950,686.
Rights and permissions
About this article
Cite this article
Aljawarneh, S., Lara, J.A. & Yassein, M.B. A visual big data system for the prediction of weather-related variables: Jordan-Spain case study. Multimed Tools Appl 82, 13103–13139 (2023). https://doi.org/10.1007/s11042-020-09848-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09848-9