Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A visual big data system for the prediction of weather-related variables: Jordan-Spain case study

Published: 01 October 2020 Publication History

Abstract

The Meteorology is a field where huge amounts of data are generated, mainly collected by sensors at weather stations, where different variables can be measured. Those data have some particularities such as high volume and dimensionality, the frequent existence of missing values in some stations, and the high correlation between collected variables. In this regard, it is crucial to make use of Big Data and Data Mining techniques to deal with those data and extract useful knowledge from them that can be used, for instance, to predict weather phenomena. In this paper, we propose a visual big data system that is designed to deal with high amounts of weather-related data and lets the user analyze those data to perform predictive tasks over the considered variables (temperature and rainfall). The proposed system collects open data and loads them onto a local NoSQL database fusing them at different levels of temporal and spatial aggregation in order to perform a predictive analysis using univariate and multivariate approaches as well as forecasting based on training data from neighbor stations in cases with high rates of missing values. The system has been assessed in terms of usability and predictive performance, obtaining an overall normalized mean squared error value of 0.00013, and an overall directional symmetry value of nearly 0.84. Our system has been rated positively by a group of experts in the area (all aspects of the system except graphic desing were rated 3 or above in a 1–5 scale). The promising preliminary results obtained demonstrate the validity of our system and invite us to keep working on this area.

References

[1]
Aggarwal C (2014) Data classification – algorithms and applications, Chapman & Hall/CRC
[2]
Alodah A and Seidou O The adequacy of stochastically generated climate time series for water resources systems risk and performance assessment Stoch Environ Res Risk Assess 2019 33 253-269
[3]
Ambigavathi M, and Sridharan D (2020) A survey on big data in healthcare applications. In: Choudhury S., Mishra R., Mishra R., Kumar A. (eds) Intelligent communication, control and devices. Advances in intelligent systems and computing, vol 989. Springer, Singapore
[4]
Baerg A Big data, sport, and the digital divide: theorizing how athletes might respond to big data monitoring Journal of Sport and Social Issues 2017 41 1 3-20
[5]
Bajaber F, Sakr S, Batarfi O, Altalhi A, and Barnawi A Benchmarking big data systems: a survey Comput Commun 2020 149 241-251
[6]
Booz J, Yu W, Xu G, Griffith D, and Golmie N (2019) A Deep Learning-Based Weather Forecast System for Data Volume and Recency Analysis, 2019 International Conference on Computing, Networking and Communications (ICNC), Honolulu, HI, USA, pp. 697–701
[7]
Breiman L Bagging predictors Mach Learn 1996 24 2 123-140
[8]
Chodorow K, and Dirolf M (2010) MongoDB: the definitive guide, O′Reilly media, Inc., Sebastopol, CA, USA
[9]
Chouksey P and Chauhan AS A review of weather data analytics using big data International Journal of Advanced Research in Computer and Communication Engineering 2017 6 1 365-368
[10]
Corbellini A, Mateos C, Zunino A, Godoy D, and Schiaffino S Persisting big-data: the NoSQL landscape Inf Syst 2017 63 1-23
[11]
Dagade V, Lagali M, Avadhani S, and Kalekar P Big data weather analytics using Hadoop International Journal of Emerging Technology in Computer Science & Electronics 2015 14 2 847-851
[12]
Fayyad UM, Piatetsky-Shapiro G, and Smyth P (1996) “From Data Mining To Knowledge Discovery: An Overview,” in Advances In Knowledge Discovery And Data Mining, eds. U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, AAAI Press/The MIT Press, Menlo Park, CA., pp. 1–34
[13]
Firican G (2020) The 10 Vs of big data. TDWI. https://tdwi.org/articles/2017/02/08/10-vs-of-big-data.aspx [accessed July 2020]
[14]
Gutiérrez PA, Pérez-Ortiz M, Sánchez-Monedero J, Fernández-Navarro F, and Hervás-Martínez C Ordinal regression methods: survey and experimental study IEEE Trans Knowl Data Eng 2016 28 1 127-146
[15]
Hassani H and Silva ES Forecasting with big data: a review Ann Data Sci 2015 2 5-19
[16]
Haupt SE and Kosovic B (2015) Big Data and Machine Learning for Applied Weather Forecasts: Forecasting Solar Power for Utility Operations, 2015 IEEE Symposium Series on Computational Intelligence, Cape Town, pp. 496–501
[17]
Haykin S (1998) Neural Networks: A Comprehensive Foundation (2 ed.). Prentice Hall
[18]
Hussein E, Sadiki R, Jafta Y, Sungay MM, and Ajayi O Zitouni R, Agueh M, Houngue P, and Soude H And a. Bagula a., big data processing using Hadoop and spark: the case of meteorology data E-infrastructure and e-Services for Developing Countries. AFRICOMM 2019. Lecture notes of the Institute for Computer Sciences, social informatics and telecommunications engineering 2020 Cham Springer
[19]
Ismail KA, Majid MA, Zain JM, and Abu Bakar NA (2016) Big Data prediction framework for weather Temperature based on MapReduce algorithm, 2016 IEEE Conference on Open Systems (ICOS), Langkawi, pp. 13–17
[20]
Ismail KA, Majid MA, Fakherldin M, and Zain JM A big data prediction framework for weather forecast using MapReduce algorithm J Comput Theor Nanosci 2017 23 11 11138-11143(6)
[21]
Jose B and Abraham S (2017) Exploring the merits of nosql: A study based on mongodb, International Conference on Networks & Advances in Computational Technologies (NetACT), Thiruvanthapuram, pp. 266–271, 2017
[22]
Küçükkeçeci C and Yazici A Multilevel object tracking in wireless multimedia sensor networks for surveillance applications using graph-based big data IEEE Access 2019 7 67818-67832
[23]
Kulkarni P, and Akhilesh KB (2020) big data analytics as an enabler in smart governance for the future smart cities. In: Akhilesh K., Möller D. (eds) Smart technologies. Springer, Singapore
[24]
Lakshman A and Malik P Cassandra: a decentralized structured storage system ACM SIGOPS Oper Syst Rev 2010 44 2 35-40
[25]
Lin S-Y, Chiang C-C, Li J-B, Hung Z-S, and Chao K-M Dynamic fine-tuning stacked auto-encoder neural network for weather forecast Futur Gener Comput Syst 2018 89 446-454
[26]
Liu JNK, Hu Y, He Y, Chan PW, and Lai L (2015) Deep Neural Network Modeling for Big Data Weather Forecasting. In: Pedrycz W., Chen SM. (eds) Information Granularity, Big Data, and Computational Intelligence. Studies in Big Data, vol 8, pp 389–408, Springer, Cham
[27]
Liu H, Ong Y, Shen X and Cai J, When Gaussian Process Meets Big Data: A Review of Scalable GPs, in IEEE Transactions on Neural Networks and Learning Systems.
[28]
Lynch C Big data: How do your data grow? Nature 2008 455 7209 28-29
[29]
Marchioni F Infinispan data grid platform 2012 Birmingham Packt Pub Limited
[30]
Membrey P, Plugge E, and Hawkins T The definitive guide to MongoDB: the NoSQL database for cloud and desktop computing 2010 Berkely Apress
[31]
Miyoshi T, Kondo K, and Terasaki K Big ensemble data assimilation in numerical weather prediction Computer 2015 48 11 15-21
[32]
Moreno FJ (2019) Sistema big data para mejorar los rendimientos agrícolas en Castilla y León, Degree dissertation, Udima, Madrid, Spain
[33]
Narendra K, and Aghila G (2020) Securing Online Bank's Big Data Through Block Chain Technology: Cross-Border Transactions Security and Tracking. In R. Joshi, & B. Gupta (Eds.), Security, Privacy, and Forensics Issues in Big Data pp. 247–263
[34]
Objectivity Inc. (2020) InfiniteGraph, http://www.objectivity.com/infinitegraph, 2013 (accessed 17.04.20).
[35]
Pandey P, Kumar M and Srivastava P (2016) Classification techniques for big data: A survey, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, pp. 3625–3629
[36]
Pyzel P (2019) Ampliación de un sistema de Big data para mejorar los rendimientos agrícolas con objetivo de realizar previsiones de necesidades de agua tratada en países con escasez de recursos hídricos, Degree dissertation, Udima
[37]
Renuka Devi D, and Sasikala S (2019) Online Feature Selection (OFS) with Accelerated Bat Algorithm (ABA) and Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP) for big data streams. Journal of Big Data, vol. 6, no. 103
[38]
Seber GAF, and Lee AJ, Linear regression analysis, 2nd edition, Wiley Series in Probability and Statistics, Wiley-Interscience, 2003.
[39]
Shastri A, Deshpande M, et al. Kulkarni A et al. A review of big data and its applications in healthcare and public sector Big data analytics in healthcare. Studies in big data 2020 Cham Springer
[40]
Shevade SK, Keerthi SS, Bhattacharyya C, and Murthy KRK (1999) Improvements to the SMO algorithm for SVM regression, IEEE Trans Neural Netw
[41]
Torres JF, Troncoso A, Koprinska I, Wang Z, and Martínez-Álvarez F Big data solar power forecasting based on deep learning and multiple data sources Expert Syst 2019 36
[42]
Udeh K, Wanik DW, Bassill N and Anagnostou E (2019) Time Series Modeling of Storm Outages with Weather Mesonet Data for Emergency Preparedness and Response, 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York City, NY, USA, pp. 0499–0505
[43]
Werner Kristjanpoller R and Kevin Michell V A stock market risk forecasting model through integration of switching regime, ANFIS and GARCH techniques Appl Soft Comput 2018 67 106-116
[44]
Wibisono A, Adibah J, Mursanto P, and Saputri MS (2019) Improvement of Big Data Stream Mining Technique for Automatic Bone Age Assessment, Proceedings of the 2019 ACM 3rd International Conference on Big Data Research, pp. 119–123
[45]
Witten IH, Frank E, Trigg L, Hall M Holmes G, and Cunningham SJ (1999) Weka: Practical Machine Learning Tools and Techniques with Java Implementations, Proceedings of the ICONIP/ANZIIS/ANNES'99 Workshop on Emerging Knowledge Engineering and Connectionist-Based Information Systems, pp. 192–196
[46]
Wu Y, Huang H, Wu N, Wang Y, Bhuiyan MZA, and Wang T An incentive-based protection and recovery strategy for secure big data in social networks Inf Sci 2020 508 79-91
[47]
Yang R, Yu L, Zhao Y, Yu H, Xu G, Wu Y, and Liu Z Big data analytics for financial market volatility forecast based on support vector machine Int J Inf Manag 2020 50 452-462

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Multimedia Tools and Applications
Multimedia Tools and Applications  Volume 82, Issue 9
Apr 2023
1545 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 October 2020
Accepted: 09 September 2020
Revision received: 30 July 2020
Received: 07 May 2020

Author Tags

  1. Big data
  2. Weather forecasting
  3. Data mining
  4. Information fusion
  5. MongoDB

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media