Abstract
In this paper, we propose a hybrid framework for query processing and data analytics over large-scale data on Spark, to support multi-paradigm process (incl. SQL, OLAP, data mining, machine learning etc.) in distributed environments. The framework features a three-layer data process module and a work flow module which controls the former. We will demonstrate the strength of our framework properly applying traffic scenarios in a real world.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. SIGMOD Rec. 26(1), 65–74 (1997)
Berson, A., Smith, S.J.: Data Warehousing, Data Mining, and OLAP. McGraw-Hill, New York (1997)
Gray, J., Chaudhuri, S., Bosworth, A., et al.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Discov. 1(1), 29–53 (1997)
Baragoin, C., Bercianos, J., Komel, J., Robinson, G., Sawa, R., Schuinder, E.: DB2 OLAP server theory and practices. International Technical Support Organization (2001)
Zaharia, M., et al.: Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. HotCloud 12, 10–10 (2012)
Fernández-Delgado, M., Cernadas, E., Barro, S., Gomes Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems. J. Mach. Learn. Res. 15(1), 3133–3181 (2014)
Gonzalez, J.E., Xin, R.S., et al.: GraphX: graph processing in a distributed dataflow framework. OSDI 14, 599–613 (2014)
Hadoop (2015). http://hadoop.apache.org/
Thusoo, A., Sarma, J.S., Jain, N., et al.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)
Ricci, F., Rokach, L., Shapira, B.: Introduction to recommender systems handbook. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook, pp. 1–35. Springer, Boston, MA (2011). https://doi.org/10.1007/978-0-387-85820-3_1
Zhou, Y., Wilkinson, D., Schreiber, R., Pan, R.: Large-scale parallel collaborative filtering for the Netflix prize. In: Fleischer, R., Xu, J. (eds.) AAIM 2008. LNCS, vol. 5034, pp. 337–348. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68880-8_32
zur Muehlen, M., Rosemann, M.: Multi-paradigm process management. In: Proceedings of CAISE 2004, pp. 169–175 (2004)
Meng, X., Bradley, J.K., Yavuz, B., et al.: Mllib: machine learning in apache spark. J. Mach. Learn. Res. 17, 1–7 (2016)
Spofford, G.: MDX Solutions: With Microsoft SQL Server Analysis Services. Wiley, New York (2001)
Sheth, A.P., et al.: Supporting state-wide immunisation tracking using multi-paradigm workflow technology. In: Proceedings of VLDB 1996, pp. 263–273 (1996)
Oozie: Apache workflow scheduler for Hadoop. The Apache Software Foundation (September, 2010). http://oozie.apache.org/
Dodge, G., Gorman, T.: Oracle Data Warehousing. Wiley, New York (1998)
Schrader, M., Vlamis, D.: Oracle Essbase & Oracle OLAP. Peter Gbolagade Akintunde (2009)
Bontempo, C., Zagelow, G.: The IBM data warehouse architecture. Commun. ACM 41(9), 38–48 (1998)
Rouse, W.: What is big data analytics? TechTarget.com (2012). http://searchbusinessanalytics.techtarget.com/definition/big-data-analytics
Acknowledgements
We would like to thank CAR Inc provides datasets for science research. This work is supported by the Key Technology R&D Program of Tianjin (16YFZCGX00210), the the National Key R&D Program of China (2016YFB1000603, 2017YFC0908401), and the National Natural Science Foundation of China (61672377).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, H., Zhang, X., Zhang, J., Feng, Z. (2018). A Hybrid Framework for Query Processing and Data Analytics on Spark. In: U, L., Xie, H. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 11268. Springer, Cham. https://doi.org/10.1007/978-3-030-01298-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-01298-4_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01297-7
Online ISBN: 978-3-030-01298-4
eBook Packages: Computer ScienceComputer Science (R0)