A Survey on Parallel Database Systems from a Storage Perspective: Rows Versus Columns

Ordonez, Carlos; Bellatreche, Ladjel

doi:10.1007/978-3-319-99133-7_1

Carlos Ordonez¹⁵ &
Ladjel Bellatreche¹⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 903))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

691 Accesses
2 Citations

Abstract

Big data requirements have revolutionized database technology, bringing many innovative and revamped DBMSs to process transactional (OLTP) or demanding query workloads (cubes, exploration, pre-processing). Parallel and main memory processing have become important features to exploit new hardware and cope with data volume. With such landscape in mind, we present a survey comparing modern row and columnar DBMSs, contrasting their ability to write data (storage mechanisms, transaction processing, batch loading, enforcing ACID) and their ability to read data (query processing, physical operators, sequential vs parallel). We provide a unifying view of alternative storage mechanisms, database algorithms and query optimizations used across diverse DBMSs. We contrast the architecture and processing of a parallel DBMS with an HPC system. We cover the full spectrum of subsystems going from storage to query processing. We consider parallel processing and the impact of much larger RAM, which brings back main-memory databases. We then discuss important parallel aspects including speedup, sequential bottlenecks, data redistribution, high speed networks, main memory processing with larger RAM and fault-tolerance at query processing time. We outline an agenda for future research.

C. Ordonez—Work partially conducted while the first author was visiting MIT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

GPU-Accelerated Database Systems: Survey and Open Challenges

Apara: Workload-Aware Data Partition and Replication for Parallel Databases

Big Data Query Engines

References

Abadi, D.J., Madden, S., Hachem, N.: Column-stores vs. row-stores: how different are they really? In: Proceedings of ACM SIGMOD Conference, pp. 967–980 (2008)
Google Scholar
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases: The Logical Level, Facsimile edn. Pearson Education POD, London (1994)
Google Scholar
Abouzied, A., Bajda-Pawlikowski, K., Huang, J., Abadi, D.J., Silberschatz, A.: HadoopDB in action: building real world applications. In: Proceedings of ACM SIGMOD Conference, pp. 1111–1114. ACM (2010)
Google Scholar
Bancilhon, F., Ramakrishnan, R.: An Amateur’s introduction to recursive query processing strategies. In: Proceedings of ACM SIGMOD Conference, pp. 16–52 (1986)
Google Scholar
Baumann, P., Dumitru, A.M., Merticariu, V.: The array database that is not a database: file based array query answering in Rasdaman. In: Nascimento, M.A., et al. (eds.) SSTD 2013. LNCS, vol. 8098, pp. 478–483. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40235-7_32
Chapter MATH Google Scholar
Bellatreche, L., Benkrid, S., Ghazal, A., Crolotte, A., Cuzzocrea, A.: Verification of partitioning and allocation techniques on teradata DBMS. In: Xiang, Y., Cuzzocrea, A., Hobbs, M., Zhou, W. (eds.) ICA3PP 2011. LNCS, vol. 7016, pp. 158–169. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24650-0_14
Chapter Google Scholar
Ceri, S., Della Valle, E., Pedreschi, D., Trasarti, R.: Mega-modeling for big data analytics. In: Atzeni, P., Cheung, D., Ram, S. (eds.) ER 2012. LNCS, vol. 7532, pp. 1–15. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34002-4_1
Chapter Google Scholar
Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J., Welton, C.: MAD skills: new analysis practices for big data. In: Proceeidngs of VLDB Conference, pp. 1481–1492 (2009)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)
Article Google Scholar
Dongarra, J., Duff, I.S., Sorensen, D.C., van der Vost, H.A.: Numerical Linear Algebra for High-Performance Computers. SIAM (1998)
Google Scholar
Färber, F., et al.: The SAP HANA database: an architecture overview. IEEE Data Eng. Bull. 35(1), 28–33 (2012)
Google Scholar
Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems: The Complete Book, 2nd edn. Prentice Hall, Upper Saddle River (2008)
Google Scholar
Ghazal, A., et al.: BigBench: towards an industry standard benchmark for big data analytics. In: Proceedings of ACM SIGMOD Conference, pp. 1197–1208. ACM (2013)
Google Scholar
Hameurlain, A., Morvan, F.: Parallel relational database systems: why, how and beyond. In: Wagner, R.R., Thoma, H. (eds.) DEXA 1996. LNCS, vol. 1134, pp. 302–312. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0034690
Chapter Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
MATH Google Scholar
Hellerstein, J., et al.: The MADlib analytics library or MAD skills, the SQL. Proc. VLDB 5(12), 1700–1711 (2012)
Article Google Scholar
Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., Kersten, M.L.: MonetDB: two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012)
Google Scholar
Idreos, S., Kersten, M.L., Manegold, S.: Self-organizing tuple reconstruction in column stores. In: Proceedings of ACM SIGMOD Conference, pp. 297–308 (2009)
Google Scholar
Jacobs, A.: The pathologies of big data. Commun. ACM 52(8), 36–44 (2009)
Article Google Scholar
Jemal, D., Faiz, R., Boukorca, A., Bellatreche, L.: MapReduce-DBMS: an integration model for big data management and optimization. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds.) DEXA 2015. LNCS, vol. 9262, pp. 430–439. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22852-5_36
Chapter Google Scholar
Lamb, A., et al.: The Vertica analytic database: C-store 7 years later. PVLDB 5(12), 1790–1801 (2012)
MathSciNet Google Scholar
Larson, P.A., Hanson, E.N., Price, S.L.: Columnar storage in SQL server 2012. IEEE Data Eng. Bull. 35(1), 15–20 (2012)
Google Scholar
MacNicol, R., French, B.: Sybase IQ multiplex - designed for analytics. In: Proceedings of VLDB Conference, pp. 1227–1230 (2004)
Google Scholar
Manegold, S., Boncz, P.A., Kersten, M.L.: Optimizing main-memory join on modern hardware. IEEE Trans. Knowl. Data Eng. (TKDE) 14(4), 709–730 (2002)
Article Google Scholar
Ordonez, C.: Optimization of linear recursive queries in SQL. IEEE Trans. Knowl. Data Eng. (TKDE) 22(2), 264–277 (2010)
Article MathSciNet Google Scholar
Ordonez, C.: Statistical model computation with UDFs. IEEE Trans. Knowl. Data Eng. (TKDE) 22(12), 1752–1765 (2010)
Article Google Scholar
Ordonez, C., Chen, Z.: Horizontal aggregations in SQL to prepare data sets for data mining analysis. IEEE Trans. Knowl. Data Eng. (TKDE) 24(4), 678–691 (2012)
Article Google Scholar
Sismanis, Y., Deligiannakis, A., Roussopoulos, N., Kotidis, Y.: Dwarf: shrinking the petacube. In: ACM SIGMOD Conference, pp. 464–475 (2002)
Google Scholar
Stonebraker, M., et al.: MapReduce and parallel DBMSs: friends or foes? Commun. ACM 53(1), 64–71 (2010)
Article Google Scholar
Stonebraker, M., et al.: C-Store: a column-oriented DBMS. In: Proceedings of VLDB Conference, pp. 553–564 (2005)
Google Scholar
Stonebraker, M., Brown, P., Zhang, D., Becla, J.: SciDB: a database management system for applications with complex analytics. Comput. Sci. Eng. 15(3), 54–62 (2013)
Article Google Scholar
Stonebraker, M., Madden, S., Abadi, D.J., Harizopoulos, S., Hachem, N., Helland, P.: The end of an architectural era: (it’s time for a complete rewrite). In: VLDB, pp. 1150–1160 (2007)
Google Scholar
Tran, N., Bodagala, S., Dave, J.: Designing query optimizers for big data problems of the future. PVLDB 11(6), 1168–1169 (2013)
Google Scholar
Xin, R.S., Rosen, J., Zaharia, M., Franklin, M.J., Shenker, S., Stoica, I.: Shark: SQL and rich analytics at scale. In: Proceedings of ACM SIGMOD Conference, pp. 13–24 (2013)
Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud USENIX Workshop (2010)
Google Scholar
Zukowski, M., Boncz, P.: Vectorwise: beyond column stores. IEEE Data Eng. Bull. 35(1), 21–27 (2012)
Google Scholar

Download references

Acknowledgments

The first author thanks the guidance from Michael Stonebraker to understand query processing based on columnar storage, arrays of unlimited size to support mathematical analytics and lock-free transaction processing in main memory.

Author information

Authors and Affiliations

University of Houston, Houston, USA
Carlos Ordonez
LIAS/ISAE-ENSMA, Poitiers, France
Ladjel Bellatreche

Authors

Carlos Ordonez
View author publications
You can also search for this author in PubMed Google Scholar
Ladjel Bellatreche
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos Ordonez .

Editor information

Editors and Affiliations

University of Tunis, Tunis, Tunisia
Mourad Elloumi
MiCS, Media Computer Science, University of Passau, Passau, Bayern, Germany
Michael Granitzer
IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
University of Twente, Enschede, Overijssel, The Netherlands
Christin Seifert
Fak. Medien, Bauhaus Universität Weimar, Weimar, Thüringen, Germany
Benno Stein
Inst. für Softwaretechnik, Vienna University of Technology, Vienna, Austria
A Min Tjoa
FAW, Johannes Kepler University of Linz, Linz, Austria
Roland Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ordonez, C., Bellatreche, L. (2018). A Survey on Parallel Database Systems from a Storage Perspective: Rows Versus Columns. In: Elloumi, M., et al. Database and Expert Systems Applications. DEXA 2018. Communications in Computer and Information Science, vol 903. Springer, Cham. https://doi.org/10.1007/978-3-319-99133-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-99133-7_1
Published: 07 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99132-0
Online ISBN: 978-3-319-99133-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Survey on Parallel Database Systems from a Storage Perspective: Rows Versus Columns

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GPU-Accelerated Database Systems: Survey and Open Challenges

Apara: Workload-Aware Data Partition and Replication for Parallel Databases

Big Data Query Engines

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Survey on Parallel Database Systems from a Storage Perspective: Rows Versus Columns

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GPU-Accelerated Database Systems: Survey and Open Challenges

Apara: Workload-Aware Data Partition and Replication for Parallel Databases

Big Data Query Engines

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation