Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A Survey on Parallel Database Systems from a Storage Perspective: Rows Versus Columns

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 903))

Included in the following conference series:

Abstract

Big data requirements have revolutionized database technology, bringing many innovative and revamped DBMSs to process transactional (OLTP) or demanding query workloads (cubes, exploration, pre-processing). Parallel and main memory processing have become important features to exploit new hardware and cope with data volume. With such landscape in mind, we present a survey comparing modern row and columnar DBMSs, contrasting their ability to write data (storage mechanisms, transaction processing, batch loading, enforcing ACID) and their ability to read data (query processing, physical operators, sequential vs parallel). We provide a unifying view of alternative storage mechanisms, database algorithms and query optimizations used across diverse DBMSs. We contrast the architecture and processing of a parallel DBMS with an HPC system. We cover the full spectrum of subsystems going from storage to query processing. We consider parallel processing and the impact of much larger RAM, which brings back main-memory databases. We then discuss important parallel aspects including speedup, sequential bottlenecks, data redistribution, high speed networks, main memory processing with larger RAM and fault-tolerance at query processing time. We outline an agenda for future research.

C. Ordonez—Work partially conducted while the first author was visiting MIT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abadi, D.J., Madden, S., Hachem, N.: Column-stores vs. row-stores: how different are they really? In: Proceedings of ACM SIGMOD Conference, pp. 967–980 (2008)

    Google Scholar 

  2. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases: The Logical Level, Facsimile edn. Pearson Education POD, London (1994)

    Google Scholar 

  3. Abouzied, A., Bajda-Pawlikowski, K., Huang, J., Abadi, D.J., Silberschatz, A.: HadoopDB in action: building real world applications. In: Proceedings of ACM SIGMOD Conference, pp. 1111–1114. ACM (2010)

    Google Scholar 

  4. Bancilhon, F., Ramakrishnan, R.: An Amateur’s introduction to recursive query processing strategies. In: Proceedings of ACM SIGMOD Conference, pp. 16–52 (1986)

    Google Scholar 

  5. Baumann, P., Dumitru, A.M., Merticariu, V.: The array database that is not a database: file based array query answering in Rasdaman. In: Nascimento, M.A., et al. (eds.) SSTD 2013. LNCS, vol. 8098, pp. 478–483. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40235-7_32

    Chapter  MATH  Google Scholar 

  6. Bellatreche, L., Benkrid, S., Ghazal, A., Crolotte, A., Cuzzocrea, A.: Verification of partitioning and allocation techniques on teradata DBMS. In: Xiang, Y., Cuzzocrea, A., Hobbs, M., Zhou, W. (eds.) ICA3PP 2011. LNCS, vol. 7016, pp. 158–169. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24650-0_14

    Chapter  Google Scholar 

  7. Ceri, S., Della Valle, E., Pedreschi, D., Trasarti, R.: Mega-modeling for big data analytics. In: Atzeni, P., Cheung, D., Ram, S. (eds.) ER 2012. LNCS, vol. 7532, pp. 1–15. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34002-4_1

    Chapter  Google Scholar 

  8. Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J., Welton, C.: MAD skills: new analysis practices for big data. In: Proceeidngs of VLDB Conference, pp. 1481–1492 (2009)

    Google Scholar 

  9. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  10. DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)

    Article  Google Scholar 

  11. Dongarra, J., Duff, I.S., Sorensen, D.C., van der Vost, H.A.: Numerical Linear Algebra for High-Performance Computers. SIAM (1998)

    Google Scholar 

  12. Färber, F., et al.: The SAP HANA database: an architecture overview. IEEE Data Eng. Bull. 35(1), 28–33 (2012)

    Google Scholar 

  13. Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems: The Complete Book, 2nd edn. Prentice Hall, Upper Saddle River (2008)

    Google Scholar 

  14. Ghazal, A., et al.: BigBench: towards an industry standard benchmark for big data analytics. In: Proceedings of ACM SIGMOD Conference, pp. 1197–1208. ACM (2013)

    Google Scholar 

  15. Hameurlain, A., Morvan, F.: Parallel relational database systems: why, how and beyond. In: Wagner, R.R., Thoma, H. (eds.) DEXA 1996. LNCS, vol. 1134, pp. 302–312. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0034690

    Chapter  Google Scholar 

  16. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)

    MATH  Google Scholar 

  17. Hellerstein, J., et al.: The MADlib analytics library or MAD skills, the SQL. Proc. VLDB 5(12), 1700–1711 (2012)

    Article  Google Scholar 

  18. Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., Kersten, M.L.: MonetDB: two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012)

    Google Scholar 

  19. Idreos, S., Kersten, M.L., Manegold, S.: Self-organizing tuple reconstruction in column stores. In: Proceedings of ACM SIGMOD Conference, pp. 297–308 (2009)

    Google Scholar 

  20. Jacobs, A.: The pathologies of big data. Commun. ACM 52(8), 36–44 (2009)

    Article  Google Scholar 

  21. Jemal, D., Faiz, R., Boukorca, A., Bellatreche, L.: MapReduce-DBMS: an integration model for big data management and optimization. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds.) DEXA 2015. LNCS, vol. 9262, pp. 430–439. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22852-5_36

    Chapter  Google Scholar 

  22. Lamb, A., et al.: The Vertica analytic database: C-store 7 years later. PVLDB 5(12), 1790–1801 (2012)

    MathSciNet  Google Scholar 

  23. Larson, P.A., Hanson, E.N., Price, S.L.: Columnar storage in SQL server 2012. IEEE Data Eng. Bull. 35(1), 15–20 (2012)

    Google Scholar 

  24. MacNicol, R., French, B.: Sybase IQ multiplex - designed for analytics. In: Proceedings of VLDB Conference, pp. 1227–1230 (2004)

    Google Scholar 

  25. Manegold, S., Boncz, P.A., Kersten, M.L.: Optimizing main-memory join on modern hardware. IEEE Trans. Knowl. Data Eng. (TKDE) 14(4), 709–730 (2002)

    Article  Google Scholar 

  26. Ordonez, C.: Optimization of linear recursive queries in SQL. IEEE Trans. Knowl. Data Eng. (TKDE) 22(2), 264–277 (2010)

    Article  MathSciNet  Google Scholar 

  27. Ordonez, C.: Statistical model computation with UDFs. IEEE Trans. Knowl. Data Eng. (TKDE) 22(12), 1752–1765 (2010)

    Article  Google Scholar 

  28. Ordonez, C., Chen, Z.: Horizontal aggregations in SQL to prepare data sets for data mining analysis. IEEE Trans. Knowl. Data Eng. (TKDE) 24(4), 678–691 (2012)

    Article  Google Scholar 

  29. Sismanis, Y., Deligiannakis, A., Roussopoulos, N., Kotidis, Y.: Dwarf: shrinking the petacube. In: ACM SIGMOD Conference, pp. 464–475 (2002)

    Google Scholar 

  30. Stonebraker, M., et al.: MapReduce and parallel DBMSs: friends or foes? Commun. ACM 53(1), 64–71 (2010)

    Article  Google Scholar 

  31. Stonebraker, M., et al.: C-Store: a column-oriented DBMS. In: Proceedings of VLDB Conference, pp. 553–564 (2005)

    Google Scholar 

  32. Stonebraker, M., Brown, P., Zhang, D., Becla, J.: SciDB: a database management system for applications with complex analytics. Comput. Sci. Eng. 15(3), 54–62 (2013)

    Article  Google Scholar 

  33. Stonebraker, M., Madden, S., Abadi, D.J., Harizopoulos, S., Hachem, N., Helland, P.: The end of an architectural era: (it’s time for a complete rewrite). In: VLDB, pp. 1150–1160 (2007)

    Google Scholar 

  34. Tran, N., Bodagala, S., Dave, J.: Designing query optimizers for big data problems of the future. PVLDB 11(6), 1168–1169 (2013)

    Google Scholar 

  35. Xin, R.S., Rosen, J., Zaharia, M., Franklin, M.J., Shenker, S., Stoica, I.: Shark: SQL and rich analytics at scale. In: Proceedings of ACM SIGMOD Conference, pp. 13–24 (2013)

    Google Scholar 

  36. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud USENIX Workshop (2010)

    Google Scholar 

  37. Zukowski, M., Boncz, P.: Vectorwise: beyond column stores. IEEE Data Eng. Bull. 35(1), 21–27 (2012)

    Google Scholar 

Download references

Acknowledgments

The first author thanks the guidance from Michael Stonebraker to understand query processing based on columnar storage, arrays of unlimited size to support mathematical analytics and lock-free transaction processing in main memory.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos Ordonez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ordonez, C., Bellatreche, L. (2018). A Survey on Parallel Database Systems from a Storage Perspective: Rows Versus Columns. In: Elloumi, M., et al. Database and Expert Systems Applications. DEXA 2018. Communications in Computer and Information Science, vol 903. Springer, Cham. https://doi.org/10.1007/978-3-319-99133-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99133-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99132-0

  • Online ISBN: 978-3-319-99133-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics