Abstract
Big data requirements have revolutionized database technology, bringing many innovative and revamped DBMSs to process transactional (OLTP) or demanding query workloads (cubes, exploration, pre-processing). Parallel and main memory processing have become important features to exploit new hardware and cope with data volume. With such landscape in mind, we present a survey comparing modern row and columnar DBMSs, contrasting their ability to write data (storage mechanisms, transaction processing, batch loading, enforcing ACID) and their ability to read data (query processing, physical operators, sequential vs parallel). We provide a unifying view of alternative storage mechanisms, database algorithms and query optimizations used across diverse DBMSs. We contrast the architecture and processing of a parallel DBMS with an HPC system. We cover the full spectrum of subsystems going from storage to query processing. We consider parallel processing and the impact of much larger RAM, which brings back main-memory databases. We then discuss important parallel aspects including speedup, sequential bottlenecks, data redistribution, high speed networks, main memory processing with larger RAM and fault-tolerance at query processing time. We outline an agenda for future research.
C. Ordonez—Work partially conducted while the first author was visiting MIT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abadi, D.J., Madden, S., Hachem, N.: Column-stores vs. row-stores: how different are they really? In: Proceedings of ACM SIGMOD Conference, pp. 967–980 (2008)
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases: The Logical Level, Facsimile edn. Pearson Education POD, London (1994)
Abouzied, A., Bajda-Pawlikowski, K., Huang, J., Abadi, D.J., Silberschatz, A.: HadoopDB in action: building real world applications. In: Proceedings of ACM SIGMOD Conference, pp. 1111–1114. ACM (2010)
Bancilhon, F., Ramakrishnan, R.: An Amateur’s introduction to recursive query processing strategies. In: Proceedings of ACM SIGMOD Conference, pp. 16–52 (1986)
Baumann, P., Dumitru, A.M., Merticariu, V.: The array database that is not a database: file based array query answering in Rasdaman. In: Nascimento, M.A., et al. (eds.) SSTD 2013. LNCS, vol. 8098, pp. 478–483. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40235-7_32
Bellatreche, L., Benkrid, S., Ghazal, A., Crolotte, A., Cuzzocrea, A.: Verification of partitioning and allocation techniques on teradata DBMS. In: Xiang, Y., Cuzzocrea, A., Hobbs, M., Zhou, W. (eds.) ICA3PP 2011. LNCS, vol. 7016, pp. 158–169. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24650-0_14
Ceri, S., Della Valle, E., Pedreschi, D., Trasarti, R.: Mega-modeling for big data analytics. In: Atzeni, P., Cheung, D., Ram, S. (eds.) ER 2012. LNCS, vol. 7532, pp. 1–15. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34002-4_1
Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J., Welton, C.: MAD skills: new analysis practices for big data. In: Proceeidngs of VLDB Conference, pp. 1481–1492 (2009)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)
Dongarra, J., Duff, I.S., Sorensen, D.C., van der Vost, H.A.: Numerical Linear Algebra for High-Performance Computers. SIAM (1998)
Färber, F., et al.: The SAP HANA database: an architecture overview. IEEE Data Eng. Bull. 35(1), 28–33 (2012)
Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems: The Complete Book, 2nd edn. Prentice Hall, Upper Saddle River (2008)
Ghazal, A., et al.: BigBench: towards an industry standard benchmark for big data analytics. In: Proceedings of ACM SIGMOD Conference, pp. 1197–1208. ACM (2013)
Hameurlain, A., Morvan, F.: Parallel relational database systems: why, how and beyond. In: Wagner, R.R., Thoma, H. (eds.) DEXA 1996. LNCS, vol. 1134, pp. 302–312. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0034690
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
Hellerstein, J., et al.: The MADlib analytics library or MAD skills, the SQL. Proc. VLDB 5(12), 1700–1711 (2012)
Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., Kersten, M.L.: MonetDB: two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012)
Idreos, S., Kersten, M.L., Manegold, S.: Self-organizing tuple reconstruction in column stores. In: Proceedings of ACM SIGMOD Conference, pp. 297–308 (2009)
Jacobs, A.: The pathologies of big data. Commun. ACM 52(8), 36–44 (2009)
Jemal, D., Faiz, R., Boukorca, A., Bellatreche, L.: MapReduce-DBMS: an integration model for big data management and optimization. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds.) DEXA 2015. LNCS, vol. 9262, pp. 430–439. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22852-5_36
Lamb, A., et al.: The Vertica analytic database: C-store 7 years later. PVLDB 5(12), 1790–1801 (2012)
Larson, P.A., Hanson, E.N., Price, S.L.: Columnar storage in SQL server 2012. IEEE Data Eng. Bull. 35(1), 15–20 (2012)
MacNicol, R., French, B.: Sybase IQ multiplex - designed for analytics. In: Proceedings of VLDB Conference, pp. 1227–1230 (2004)
Manegold, S., Boncz, P.A., Kersten, M.L.: Optimizing main-memory join on modern hardware. IEEE Trans. Knowl. Data Eng. (TKDE) 14(4), 709–730 (2002)
Ordonez, C.: Optimization of linear recursive queries in SQL. IEEE Trans. Knowl. Data Eng. (TKDE) 22(2), 264–277 (2010)
Ordonez, C.: Statistical model computation with UDFs. IEEE Trans. Knowl. Data Eng. (TKDE) 22(12), 1752–1765 (2010)
Ordonez, C., Chen, Z.: Horizontal aggregations in SQL to prepare data sets for data mining analysis. IEEE Trans. Knowl. Data Eng. (TKDE) 24(4), 678–691 (2012)
Sismanis, Y., Deligiannakis, A., Roussopoulos, N., Kotidis, Y.: Dwarf: shrinking the petacube. In: ACM SIGMOD Conference, pp. 464–475 (2002)
Stonebraker, M., et al.: MapReduce and parallel DBMSs: friends or foes? Commun. ACM 53(1), 64–71 (2010)
Stonebraker, M., et al.: C-Store: a column-oriented DBMS. In: Proceedings of VLDB Conference, pp. 553–564 (2005)
Stonebraker, M., Brown, P., Zhang, D., Becla, J.: SciDB: a database management system for applications with complex analytics. Comput. Sci. Eng. 15(3), 54–62 (2013)
Stonebraker, M., Madden, S., Abadi, D.J., Harizopoulos, S., Hachem, N., Helland, P.: The end of an architectural era: (it’s time for a complete rewrite). In: VLDB, pp. 1150–1160 (2007)
Tran, N., Bodagala, S., Dave, J.: Designing query optimizers for big data problems of the future. PVLDB 11(6), 1168–1169 (2013)
Xin, R.S., Rosen, J., Zaharia, M., Franklin, M.J., Shenker, S., Stoica, I.: Shark: SQL and rich analytics at scale. In: Proceedings of ACM SIGMOD Conference, pp. 13–24 (2013)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud USENIX Workshop (2010)
Zukowski, M., Boncz, P.: Vectorwise: beyond column stores. IEEE Data Eng. Bull. 35(1), 21–27 (2012)
Acknowledgments
The first author thanks the guidance from Michael Stonebraker to understand query processing based on columnar storage, arrays of unlimited size to support mathematical analytics and lock-free transaction processing in main memory.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Ordonez, C., Bellatreche, L. (2018). A Survey on Parallel Database Systems from a Storage Perspective: Rows Versus Columns. In: Elloumi, M., et al. Database and Expert Systems Applications. DEXA 2018. Communications in Computer and Information Science, vol 903. Springer, Cham. https://doi.org/10.1007/978-3-319-99133-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-99133-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99132-0
Online ISBN: 978-3-319-99133-7
eBook Packages: Computer ScienceComputer Science (R0)