Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Free access

MapReduce and parallel DBMSs: friends or foes?

Published: 01 January 2010 Publication History

Abstract

MapReduce complements DBMSs since databases are not designed for extract-transform-load tasks, a MapReduce specialty.

References

[1]
Abadi, D.J., Madden, S.R., and Hachem, N. Column-stores vs. row-stores: How different are they really? In Proceedings of the SIGMOD Conference on Management of Data. ACM Press, New York, 2008
[2]
Abadi, D.J., Marcus, A., Madden, S.R., and Hollenbach, K. Scalable semantic Web data management using vertical partitioning. In Proceedings of the 33rd International Conference on Very Large Databases, 2007
[3]
Abadi, D.J. Column-stores for wide and sparse data. In Proceedings of the Conference on Innovative Data Systems Research, 2007.
[4]
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J. Silberschatz, A., and Rasin, A. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In Proceedings of the Conference on Very Large Databases, 2009
[5]
Boral, H. et al. Prototyping Bubba, a highly parallel database system. IEEE Transactions on Knowledge and Data Engineering 2, 1 (Mar. 1990), 4--24.
[6]
Chaiken, R., Jenkins, B., Larson, P., Ramsey, B., Shakib, D., Weaver, S., and Zhou, J. SCOPE: Easy and efficient parallel processing of massive data sets. In Proceedings of the Conference on Very Large Databases, 2008.
[7]
Dean, J. and Ghemawat, S. MapReduce: Simplified data processing on large clusters. In Proceedings of the Sixth Conference on Operating System Design and Implementation (Berkeley, CA, 2004).
[8]
DeWitt, D.J. and Gray, J. Parallel database systems: The future of high-performance database systems. Commun. ACM 35, 6 (June 1992), 85--98.
[9]
DeWitt, D.J., Gerber, R.H., Graefe, G., Heytens, M.L., Kumar, K.B., and Muralikrishna, M. GAMMA: A high-performance dataflow database machine. In Proceedings of the 12th International Conference on Very Large Databases. Morgan Kaufmann Publishers, Inc., 1986, 228--237.
[10]
Englert, S., Gray, J., Kocher, T., and shah, P. A benchmark of NonStop SQL Release 2 demonstrating near-linear speedup and scaleup on large databases. Sigmetrics Performance Evaluation Review 18, 1 (1990), 1990, 245--246.
[11]
Fushimi, S., Kitsuregawa, M., and Tanaka, H. An overview of the system software of a parallel relational database machine. In Proceedings of the 12th International Conference on Very Large Databases, Morgan Kaufmann Publishers, Inc., 1986, 209--219.
[12]
Isard, M., Budiu, M., Yu, Y., Birrell, A., and Fetterly, D. Dryad: Distributed data-parallel programs from sequential building blocks. SIGOPS Operating System Review 41, 3 (2007), 59--72.
[13]
Monash, C. Some very, very, very large data warehouses. In NetworkWorld.com community blog, May 12, 2009; http://www.networkworld.com/community/node/41777.
[14]
Monash, C. Cloudera presents the MapReduce bull case. In DBMS2.com blog, Apr. 15, 2009; http://www.dbms2.com/2009/04/15/cloudera-presents-the-mapreduce-bull-case/.
[15]
Olston, C., Reed, B., Srivastava, U., Kumar, R., and Tomkins, A. Pig Latin: A not-so-foreign language for data processing. In Proceedings of the SIGMOD Conference. ACM Press, new York, 2008, 1099--1110.
[16]
Patterson, D.A. Technical perspective: The data center is the computer. Commun. ACM 51, 1 (Jan. 2008), 105.
[17]
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S.R., and Stonebraker, M. A comparison of approaches to large-scale data analysis. In Proceedings of the 35th SIGMOD International Conference on Management of Data. ACM Press, new York, 2009, 165--178.
[18]
Stonebraker, M. and Rowe, L. The design of Postgres. In Proceedings of the SIGMOD Conference, 1986, 340--355.
[19]
Stonebraker, M. The case for shared nothing. Data Engineering 9 (Mar. 1986), 4--9.
[20]
Teradata Corp. Database Computer System Manual, Release 1.3. Los Angeles, CA, Feb. 1985.
[21]
Thusoo, A. et al. Hive: A warehousing solution over a Map-Reduce framework. In Proceedings of the Conference on Very Large Databases, 2009, 1626--1629.

Cited By

View all
  • (2024)High-Performance Spatial Data Analytics: Systematic R&D for Scale-Out and Scale-Up Solutions from the Past to NowProceedings of the VLDB Endowment10.14778/3685800.368591217:12(4507-4520)Online publication date: 1-Aug-2024
  • (2023)Using Cloud Functions as Accelerator for Elastic Data AnalyticsProceedings of the ACM on Management of Data10.1145/35893061:2(1-27)Online publication date: 20-Jun-2023
  • (2023)Ensuring the Reliability of a Highly Loaded Vehicle Monitoring and Traffic Control Platform2023 Systems of Signals Generating and Processing in the Field of on Board Communications10.1109/IEEECONF56737.2023.10092031(1-8)Online publication date: 14-Mar-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 53, Issue 1
Amir Pnueli: Ahead of His Time
January 2010
142 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/1629175
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2010
Published in CACM Volume 53, Issue 1

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Popular
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,134
  • Downloads (Last 6 weeks)150
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)High-Performance Spatial Data Analytics: Systematic R&D for Scale-Out and Scale-Up Solutions from the Past to NowProceedings of the VLDB Endowment10.14778/3685800.368591217:12(4507-4520)Online publication date: 1-Aug-2024
  • (2023)Using Cloud Functions as Accelerator for Elastic Data AnalyticsProceedings of the ACM on Management of Data10.1145/35893061:2(1-27)Online publication date: 20-Jun-2023
  • (2023)Ensuring the Reliability of a Highly Loaded Vehicle Monitoring and Traffic Control Platform2023 Systems of Signals Generating and Processing in the Field of on Board Communications10.1109/IEEECONF56737.2023.10092031(1-8)Online publication date: 14-Mar-2023
  • (2023)Scheduling distributed multiway spatial join queries: optimization models and algorithmsInternational Journal of Geographical Information Science10.1080/13658816.2023.217038037:6(1388-1419)Online publication date: 6-Feb-2023
  • (2023)Evolution of Hadoop and Big Data Trends in Smart WorldSustainable Computing10.1007/978-3-031-13577-4_6(99-127)Online publication date: 1-Jan-2023
  • (2022)Parallel Query Processing: To Separate Communication from ComputationProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526164(1447-1461)Online publication date: 10-Jun-2022
  • (2022)In-Memory Indexed Caching for Distributed Data Processing2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00019(104-114)Online publication date: May-2022
  • (2022)Scalable machine learning computing a data summarization matrix with a parallel array DBMSDistributed and Parallel Databases10.1007/s10619-018-7229-137:3(329-350)Online publication date: 10-Mar-2022
  • (2021)Bringing Cloud-Native Storage to SAP IQProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457563(2410-2422)Online publication date: 9-Jun-2021
  • (2021)An efficient Time-sensitive data scheduling approach for Wireless Sensor Networks in smart citiesComputer Communications10.1016/j.comcom.2021.05.006175(112-122)Online publication date: Jul-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media