Abstract
This paper provides an introduction to an innovative methodology for scaling posterior distributions over differently-curated datasets. The proposed methodology is based on Bayesian Neural Networks, improved by effective sampling algorithms. These algorithms finally realize a suitable model setup for improving the scaling effect. Theoretical results are presented and discussed in details, as well as a modern case study focused on stock quotation prediction that confirms the successful application of our proposed methodology to emerging big data analytics settings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aitchison, L.: A statistical theory of cold posteriors in deep neural networks (2021)
Al Nuaimi, E., Al Neyadi, H., Mohamed, N., Al-Jaroodi, J.: Applications of big data to smart cities. J. Internet Serv. Appl. 6(1), 1–15 (2015). https://doi.org/10.1186/s13174-015-0041-5
Audu, A.-R.A., Cuzzocrea, A., Leung, C.K., MacLeod, K.A., Ohin, N.I., Pulgar-Vidal, N.C.: An intelligent predictive analytics system for transportation analytics on open data towards the development of a smart city. In: Barolli, L., Hussain, F.K., Ikeda, M. (eds.) CISIS 2019. AISC, vol. 993, pp. 224–236. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-22354-0_21
Bellatreche, L., Cuzzocrea, A., Benkrid, S.: \(\cal{F}\) &\(\cal{A}\): a methodology for effectively and efficiently designing parallel relational data warehouses on heterogenous database clusters. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2010. LNCS, vol. 6263, pp. 89–104. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15105-7_8
Bello-Orgaz, G., Jung, J.J., Camacho, D.: Social big data: recent achievements and new challenges. Inf. Fusion 28, 45–59 (2016)
Bonifati, A., Cuzzocrea, A.: Efficient fragmentation of large XML documents. In: Wagner, R., Revell, N., Pernul, G. (eds.) DEXA 2007. LNCS, vol. 4653, pp. 539–550. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74469-6_53
Brooks, S., Gelman, A., Jones, G.L., Meng, X.L.: Handbook of Markov Chain Monte Carlo. Chapman and Hall/CRC, Boca Raton (2011)
Ceci, M., Cuzzocrea, A., Malerba, D.: Effectively and efficiently supporting roll-up and drill-down OLAP operations over continuous dimensions via hierarchical clustering. J. Intell. Inf. Syst. 44(3), 309–333 (2013). https://doi.org/10.1007/s10844-013-0268-1
Chen, T., Fox, E.B., Guestrin, C.: Stochastic gradient Hamiltonian monte Carlo (2014)
Cuzzocrea, A., Darmont, J., Mahboubi, H.: Fragmenting very large XML data warehouses via k-means clustering algorithm. Int. J. Bus. Intell. Data Min. 4(3/4), 301–328 (2009)
Cuzzocrea, A., Furfaro, F., Greco, S., Masciari, E., Mazzeo, G.M., Saccà , D.: A distributed system for answering range queries on sensor network data. In: 3rd IEEE Conference PerCom 2005. Workshops, 2005. pp. 369–373. IEEE Computer Society (2005)
Cuzzocrea, A., Furfaro, F., Saccà , D.: Enabling OLAP in mobile environments via intelligent data cube compression techniques. J. Intell. Inf. Syst. 33(2), 95–143 (2009)
Heek, J., Kalchbrenner, N.: Bayesian inference for large scale image classification. CoRR abs/1908.03491 (2019)
Hoffman, M.D., Gelman, A.: The no-u-turn sampler: adaptively setting path lengths in Hamiltonian monte Carlo. J. Mach. Learn. Res. 15, 1593–1623 (2011)
Koulali, R., Zaidani, H., Zaim, M.: Image classification approach using machine learning and an industrial Hadoop based data pipeline. Big Data Res. 24, 100184 (2021)
Li, C., Chen, C., Carlson, D., Carin, L.: Preconditioned stochastic gradient Langevin dynamics for deep neural networks (2015)
Ma, Y.A., Chen, T., Fox, E.B.: A complete recipe for stochastic gradient MCMC (2015)
Milinovich, G.J., Magalhães, R.J.S., Hu, W.: Role of big data in the early detection of Ebola and other emerging infectious diseases. Lancet Glob. Health 3(1), 20–21 (2015)
Morris, K.J., Egan, S.D., Linsangan, J.L., Leung, C.K., Cuzzocrea, A., Hoi, C.S.H.: Token-based adaptive time-series prediction by ensembling linear and non-linear estimators: a machine learning approach for predictive analytics on big stock data. In: 17th IEEE International Conference on ICMLA 2018, pp. 1486–1491. IEEE (2018)
Morzfeld, M., Tong, X.T., Marzouk, Y.M.: Localization for MCMC: sampling high-dimensional posterior distributions with local structure. J. Comput. Phys. 380, 1–28 (2019)
Nawaz, M.Z., Arif, O.: Robust kernel embedding of conditional and posterior distributions with applications. In: 15th IEEE ICMLA 2016, pp. 39–44. IEEE Computer Society (2016)
Ngiam, K.Y., Khor, W.: Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 20(5), 262–273 (2019)
Nguyen, D.T., Nguyen, S.P., Pham, U.H., Nguyen, T.D.: A calibration-based method in computing Bayesian posterior distributions with applications in stock market. In: Kreinovich, V., Sriboonchitta, S., Chakpitak, N. (eds.) TES 2018. SCI, vol. 753, pp. 182–191. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-70942-0_10
Ollier, V., Korso, M.N.E., Ferrari, A., Boyer, R., Larzabal, P.: Bayesian calibration using different prior distributions: an iterative maximum A posteriori approach for radio interferometers. In: 26th European Conference, EUSIPCO 2018, pp. 2673–2677. IEEE (2018)
Ovadia, Y., et al.: Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift (2019)
Pearce, T., Tsuchida, R., Zaki, M., Brintrup, A., Neely, A.: Expressive priors in Bayesian neural networks: kernel combinations and periodic functions (2019)
Pendharkar, P.C.: Bayesian posterior misclassification error risk distributions for ensemble classifiers. Eng. Appl. Artif. Intell. 65, 484–492 (2017)
Perez, L., Wang, J.: The effectiveness of data augmentation in image classification using deep learning (2017)
Ramamoorthi, R.V., Sriram, K., Martin, R.: On posterior concentration in misspecified models. Bayesian Anal. 10(4), 759–789 (2015)
Ruli, E., Ventura, L.: Higher-order Bayesian approximations for pseudo-posterior distributions. Commun. Stat. Simul. Comput. 45(8), 2863–2873 (2016)
Rajaraman, V.: Big data analytics. Resonance 21(8), 695–716 (2016). https://doi.org/10.1007/s12045-016-0376-7
Shokrzade, A., Ramezani, M., Tab, F.A., Mohammad, M.A.: A novel extreme learning machine based KNN classification method for dealing with big data. Expert Syst. Appl. 183, 115293 (2021)
Springenberg, J.T., Klein, A., Falkner, S., Hutter, F.: Bayesian optimization with robust Bayesian neural networks. In: Advances in Neural Information Processing Systems, vol. 29, pp. 4134–4142 (2016)
Stuart, A.M., Teckentrup, A.L.: Posterior consistency for gaussian process approximations of Bayesian posterior distributions. Math. Comput. 87(310), 721–753 (2018)
Tran, B.H., Rossi, S., Milios, D., Filippone, M.: All you need is a good functional prior for Bayesian deep learning (2020)
Tsai, C.-W., Lai, C.-F., Chao, H.-C., Vasilakos, A.V.: Big data analytics: a survey. J. Data 2(1), 1–32 (2015). https://doi.org/10.1186/s40537-015-0030-3
Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient Langevin dynamics. In: Proceedings of the 28th International Conference on ICML 2011, pp. 681–688. Omnipress (2011)
Wenzel, F., et al.: How good is the Bayes posterior in deep neural networks really? (2020)
Zhu, L., Yu, F.R., Wang, Y., Ning, B., Tang, T.: Big data analytics in intelligent transportation systems: a survey. IEEE Trans. Intell. Transp. Syst. 20(1), 383–398 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cuzzocrea, A., Soufargi, S., Baldo, A., Fadda, E. (2022). Scaling Posterior Distributions over Differently-Curated Datasets: A Bayesian-Neural-Networks Methodology. In: Ceci, M., Flesca, S., Masciari, E., Manco, G., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2022. Lecture Notes in Computer Science(), vol 13515. Springer, Cham. https://doi.org/10.1007/978-3-031-16564-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-16564-1_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16563-4
Online ISBN: 978-3-031-16564-1
eBook Packages: Computer ScienceComputer Science (R0)