Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Distributed data management using MapReduce

Published: 01 January 2014 Publication History

Abstract

MapReduce is a framework for processing and managing large-scale datasets in a distributed cluster, which has been used for applications such as generating search indexes, document clustering, access log analysis, and various other forms of data analytics. MapReduce adopts a flexible computation model with a simple interface consisting of map and reduce functions whose implementations can be customized by application developers. Since its introduction, a substantial amount of research effort has been directed toward making it more usable and efficient for supporting database-centric operations. In this article, we aim to provide a comprehensive review of a wide range of proposals and systems that focusing fundamentally on the support of distributed data management and processing using the MapReduce framework.

References

[1]
Daniel J. Abadi, Samuel Madden, and Miguel Ferreira. 2006. Integrating compression and execution in column-oriented database systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 671--682.
[2]
Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel Abadi, Avi Silberschatz, and Alexander Rasin. 2009. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow. 2, 1 (Aug. 2009), 922--933.
[3]
Azza Abouzied, Kamil Bajda-Pawlikowski, Jiewen Huang, Daniel J. Abadi, and Avi Silberschatz. 2010. HadoopDB in action: building real world applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data. New York, NY, 1111--1114.
[4]
Foto N. Afrati, Anish Das Sarma, David Menestrina, Aditya G. Parameswaran, and Jeffrey D. Ullman. 2012. Fuzzy joins using MapReduce. In Proceedings of the 28th International Conference on Data Engineering. 498--509.
[5]
Foto N. Afrati and Jeffrey D. Ullman. 2010. Optimizing joins in a map-reduce environment. In Proceedings of the 13th International Conference on Extending Database Technology. 99--110.
[6]
Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha, and Edward Harris. 2010. Reining in the outliers in map-reduce clusters using Mantri. In Proceedings of the 9th USENIX Symposium on Operating System Design and Implementation. 1--16.
[7]
Roberto J. Bayardo, Yiming Ma, and Ramakrishnan Srikant. 2007. Scaling up all pairs similarity search. In Proceedings of the 16th International World Wide Web Conference. 131--140.
[8]
Philip A. Bernstein and Dah-Ming W. Chiu. 1981. Using semi-joins to solve relational queries. J. ACM 28, 1 (Jan. 1981), 25--40.
[9]
Kevin S. Beyer, Vuk Ercegovac, Rajasekar Krishnamurthy, Sriram Raghavan, Jun Rao, Frederick Reiss, Eugene J. Shekita, David E. Simmen, Sandeep Tata, Shivakumar Vaithyanathan, and Huaiyu Zhu. 2009. Towards a scalable enterprise content analytics platform. Q. Bull. IEEE TC on Data Eng. 32, 1 (2009), 28--35.
[10]
Spyros Blanas, Jignesh M. Patel, Vuk Ercegovac, Jun Rao, Eugene J. Shekita, and Yuanyuan Tian. 2010. A comparison of join algorithms for log processing in MapReduce. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 975--986.
[11]
Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael D. Ernst. 2010. HaLoop: Efficient iterative data processing on large clusters. Proc. VLDB Endow. 3, 1--2 (Sept. 2010), 285--296.
[12]
Yu Cao, Chun Chen, Fei Guo, Dawei Jiang, Yuting Lin, Beng Chin Ooi, Hoang Tam Vo, Sai Wu, and Quanqing Xu. 2011. ES2: A cloud data storage system for supporting both OLTP and OLAP. In Proceedings of the 27th International Conference on Data Engineering. 291--302.
[13]
Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. 2010. FlumeJava: Easy, efficient data-parallel pipelines. In Proceedings of the ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation. 363--375.
[14]
Biswapesh Chattopadhyay, Liang Lin, Weiran Liu, Sagar Mittal, Prathyusha Aragonda, Vera Lychagina, Younghee Kwon, and Michael Wong. 2011. Tenzing: A SQL implementation on the MapReduce framework. Proc. VLDB Endow. 4, 12 (2011), 1318--1327.
[15]
Surajit Chaudhuri, Venkatesh Ganti, and Raghav Kaushik. 2006. A primitive operator for similarity joins in data cleaning. In Proceedings of the 22nd International Conference on Data Engineering. 5.
[16]
Surajit Chaudhuri and Gerhard Weikum. 2000. Rethinking database system architecture: Towards a self-tuning RISC-style database system. In Proceedings of the 26th International Conference on Very Large Data Bases. 1--10.
[17]
Chun Chen, Gang Chen, Dawei Jiang, Beng Chin Ooi, Hoang Tam Vo, Sai Wu, and Quanqing Xu. 2010. Providing scalable database aervices on the cloud. In Proceedings of the 11th International Conference on Web Information Systems Engineering. 1--19.
[18]
Gang Chen, Hoang Tam Vo, Sai Wu, Beng Chin Ooi, and M. Tamer Özsu. 2011. A Framework for Supporting DBMS-like Indexes in the Cloud. PVLDB 4, 11 (2011), 702--713.
[19]
Songting Chen. 2010. Cheetah: A high performance, custom data warehouse on top of MapReduce. Proc. VLDB Endow. 3, 2 (2010), 1459--1468.
[20]
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. 2010. MapReduce online. In Proceedings of the 7th USENIX Symposium on Networked Systems Design and Implementation. 21--21.
[21]
Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th USENIX Symposium on Operating System Design and Implementation. 137--150.
[22]
D. Dewitt and M Stonebraker. 2009. MapReduce: A Major Step Backwards. http://www.cs.washington.edu/homes/billhowe/mapreduce_a_major_step_backw ards.html. (2009).
[23]
David J. DeWitt, Erik Paulson, Eric Robinson, Jeffrey Naughton, Joshua Royalty, Srinath Shankar, and Andrew Krioukov. 2008. Clustera: An integrated computation and data management system. Proc. VLDB Endow. 1, 1 (2008), 28--41.
[24]
Jens Dittrich, Jorge-Arnulfo Quiané-Ruiz, Alekh Jindal, Yagiz Kargin, Vinay Setty, and Jörg Schad. 2010. Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing). Proc. VLDB Endow. 3, 1 (2010), 518--529.
[25]
Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox. 2010. Twister: A runtime for iterative MapReduce. In Proceedings of the 19th IEEE International Symposium High Performance Distributed Computing. 810--818.
[26]
Avrilia Floratou, Jignesh M. Patel, Eugene J. Shekita, and Sandeep Tata. 2011. Column-oriented storage techniques for MapReduce. Proc. VLDB Endow. 4, 7 (April 2011), 419--429.
[27]
Eric Friedman, Peter Pawlowski, and John Cieslewicz. 2009. SQL/MapReduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions. Proc. VLDB Endow. 2 (August 2009), 1402--1413. Issue 2.
[28]
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the 19th ACM Symposium on Operating System Principles. 29--43.
[29]
Amol Ghoting, Rajasekar Krishnamurthy, Edwin P. D. Pednault, Berthold Reinwald, Vikas Sindhwani, Shirish Tatikonda, Yuanyuan Tian, and Shivakumar Vaithyanathan. 2011. SystemML: Declarative machine learning on MapReduce. In Proceedings of the 27th International Conference on Data Engineering. 231--242.
[30]
Lukasz Golab and M. Tamer Özsu. 2010. Data Stream Systems. Morgan & Claypool.
[31]
Goetz Graefe and Leonard D. Shapiro. 1991. Data compression and database performance. In Proceedings of the 1991 ACM Symposium on Applied Computing. 22--27.
[32]
Benjamin Gufler, Nikolaus Augsten, Angelika Reiser, and Alfons Kemper. 2012. Load balancing in MapReduce based on scalable cardinality estimates. In Proceedings of the 28th International Conference on Data Engineering. 522--533.
[33]
Yongqiang He, Rubao Lee, Yin Huai, Zheng Shao, Namit Jain, Xiaodong Zhang, and Zhiwei Xu. 2011. RCfile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems. In Proceedings of the 27th International Conference on Data Engineering. 1199--1208.
[34]
Joseph M. Hellerstein, Peter J. Haas, and Helen J. Wang. 1997. Online aggregation. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 171--182.
[35]
Herodotos Herodotou and Shivnath Babu. 2011. Profiling, what-if analysis, and cost-based optimization of MapReduce programs. Proc. VLDB Endow. 4, 11 (2011), 1111--1122.
[36]
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: distributed data-parallel programs from sequential building blocks. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems. 59--72.
[37]
Eaman Jahani, Michael J. Cafarella, and Christopher Ré. 2011. Automatic optimization for MapReduce programs. Proc. VLDB Endow. 4, 6 (2011), 385--396.
[38]
Jeffrey Jestes, Ke Yi, and Feifei Li. 2011. Building wavelet histograms on large data in MapReduce. Proc. VLDB Endow. 5, 2 (Oct. 2011), 109--120.
[39]
Dawei Jiang, Beng Chin Ooi, Lei Shi, and Sai Wu. 2010. The performance of MapReduce: An in-depth study. Proc. VLDB Endow. 3, 1 (2010), 472--483.
[40]
David Jiang, Anthony K. H. Tung, and Gang Chen. 2011. MAP-JOIN-REDUCE: Toward scalable and efficient data analysis on large clusters. IEEE Trans. Knowl. Data Eng. 23, 9 (2011), 1299--1311.
[41]
Alekh Jindal, Jorge-Arnulfo Quiané-Ruiz, and Jens Dittrich. 2011. Trojan data layouts: Right shoes for a running elephant. In Proceedings of the 2nd ACM Symposium on Cloud Computing. Article 21, 14 pages.
[42]
Younghoon Kim and Kyuseok Shim. 2012. Parallel top-k similarity join algorithms using MapReduce. In Proceedings of the 28th International Conference on Data Engineering. 510--521.
[43]
Paraschos Koutris and Dan Suciu. 2011. Parallel evaluation of conjunctive queries. In Proceedings of the 30th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, New York, NY, USA, 223--234.
[44]
Vibhore Kumar, Henrique Andrade, Bugra Gedik, and Kun-Lung Wu. 2010. DEDUCE: At the intersection of MapReduce and stream processing. In Proceedings of the 13th International Conference on Extending Database Technology. 657--662.
[45]
YongChul Kwon, Magdalena Balazinska, Bill Howe, and Jerome Rolia. 2012. SkewTune: mitigating skew in MapReduce applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 25--36.
[46]
Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGregor, and Prashant Shenoy. 2011. A platform for scalable one-pass analytics using MapReduce. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 985--996.
[47]
Yuting Lin, Divyakant Agrawal, Chun Chen, Beng Chin Ooi, and Sai Wu. 2011. Llama: leveraging columnar storage for scalable join processing in the MapReduce framework. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 961--972.
[48]
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning in the cloud. CoRR abs/1204.6078 (2012).
[49]
Peng Lu, Sai Wu, Lidan Shou, and Kian-Lee Tan. 2013. An Efficient and Compact Indexing Scheme for Large-scale Data Store. In Proceedings of the 29th International Conference on Data Engineering. 326--337.
[50]
Wei Lu, Yanyan Shen, Su Chen, and Beng Chin Ooi. 2012. Efficient processing of k nearest neighbor joins using MapReduce. Proc. VLDB Endow. 5, 10 (June 2012), 1016--1027.
[51]
Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 135--146.
[52]
Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, and Theo Vassilakis. 2011. Dremel: Interactive analysis of web-scale datasets. Commun. ACM 54, 6 (June 2011), 114--123.
[53]
Ahmed Metwally and Christos Faloutsos. 2012. V-SMART-join: A scalable MapReduce framework for all-pair similarity joins of multisets and vectors. Proc. VLDB Endow. 5, 8 (April 2012), 704--715.
[54]
Kristi Morton, Magdalena Balazinska, and Dan Grossman. 2010a. ParaTimer: A progress indicator for MapReduce DAGs. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 507--518.
[55]
Kristi Morton, Abram Friesen, Magdalena Balazinska, and Dan Grossman. 2010b. Estimating the progress of MapReduce pipelines. In Proceedings of the 26th International Conference on Data Engineering. 681--684.
[56]
Leonardo Neumeyer, Bruce Robbins, Anish Nair, and Anand Kesari. 2010. S4: Distributed stream computing platform. In Proceedings of the 2010 IEEE International Conference on Data Mining Workshops. 170--177.
[57]
Tomasz Nykiel, Michalis Potamias, Chaitanya Mishra, George Kollios, and Nick Koudas. 2010. MRShare: Sharing across multiple queries in MapReduce. Proc. VLDB Endow. 3, 1 (2010), 494--505.
[58]
Alper Okcan and Mirek Riedewald. 2011. Processing theta-joins using MapReduce. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 949--960.
[59]
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008. Pig Latin: A not-so-foreign language for data processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1099--1110.
[60]
Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. 2011. Fast crash recovery in RAMCloud. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles. 29--41.
[61]
M. Tamer Özsu and Patrick Valduriez. 2011. Principles of Distributed Database Systems (3 ed.). Springer.
[62]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999--66. Stanford InfoLab.
[63]
Niketan Pansare, Vinayak R. Borkar, Chris Jermaine, and Tyson Condie. 2011. Online aggregation for large MapReduce jobs. Proc. VLDB Endow. 4, 11 (2011), 1135--1145.
[64]
Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, and Michael Stonebraker. 2009. A comparison of approaches to large-scale data analysis. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 165--178.
[65]
Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. 2005. Interpreting the data: Parallel analysis with Sawzall. Sci. Program. 13, 4 (Oct. 2005), 277--298.
[66]
Smriti R. Ramakrishnan, Garret Swart, and Aleksey Urmanov. 2012. Balancing reducer skew in MapReduce workloads using progressive sampling. In Proceedings of the 3rd ACM Symposium on Cloud Computing. 16:1--16:14.
[67]
Vijayshankar Raman and Garret Swart. 2006. How to wring a table dry: Entropy compression of relations and querying of compressed relations. In Proceedings of the 32nd International Conference on Very Large Data Bases. 858--869.
[68]
Sriram Rao, Raghu Ramakrishnan, Adam Silberstein, Mike Ovsiannikov, and Damian Reeves. 2012. Sailfish: a framework for large scale data processing. In Proceedings of the 3rd ACM Symposium on Cloud Computing. 4:1--4:14.
[69]
Alexander Rasmussen, Vinh The Lam, Michael Conley, George Porter, Rishi Kapoor, and Amin Vahdat. 2012. Themis: an I/O-efficient MapReduce. In Proceedings of the 3rd ACM Symposium on Cloud Computing. 13:1--13:14.
[70]
Donovan A. Schneider and David J. Dewitt. 1989. A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 110--121.
[71]
R. Stephens. 1997. A survey of stream processing. Acta Informatica 34, 7 (1997), 491--541.
[72]
Michael Stonebraker, Daniel Abadi, David J. DeWitt, Sam Madden, Erik Paulson, Andrew Pavlo, and Alexander Rasin. 2010. MapReduce and parallel DBMSs: Friends or foes? Commun. ACM 53, 1 (2010), 64--71.
[73]
Mike Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O’Neil, Pat O’Neil, Alex Rasin, Nga Tran, and Stan Zdonik. 2005. C-store: A column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases. 553--564.
[74]
Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos, Nabil Hachem, and Pat Helland. 2007. The end of an architectural era: (it’s time for a complete rewrite). In Proceedings of the 33rd International Conference on Very Large Data Bases. 1150--1160.
[75]
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive—A warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2, 2 (2009), 1626--1629.
[76]
Leslie G. Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (Aug. 1990), 103--111.
[77]
Rares Vernica, Michael J. Carey, and Chen Li. 2010. Efficient parallel set-similarity joins using MapReduce. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 495--506.
[78]
Hoang Tam Vo, Sheng Wang, Divyakant Agrawal, Gang Chen, and Beng Chin Ooi. 2012. LogBase: a scalable log-structured database system in the cloud. Proc. VLDB Endow. 5, 10 (2012), 1004--1015.
[79]
Guanying Wang, Ali Raza Butt, Prashant Pandey, and Karan Gupta. 2009. A simulation approach to evaluating design decisions in MapReduce setups. In Proceedings of the 17th IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Syst. 1--11.
[80]
Jinbao Wang, Sai Wu, Hong Gao, Jianzhong Li, and Beng Chin Ooi. 2010. Indexing multi-dimensional data in a cloud system. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 591--602.
[81]
Xiaodan Wang, Christopher Olston, Anish Das Sarma, and Randal Burns. 2011. CoScan: cooperative scan sharing in the cloud. In Proceedings of the 2nd ACM Symposium on Cloud Computing. 11:1--11:12.
[82]
Sai Wu, Dawei Jiang, Beng Chin Ooi, and Kun-Lung Wu. 2010. Efficient B-tree based indexing for cloud data processing. Proc. VLDB Endow. 3, 1 (2010), 1207--1218.
[83]
Sai Wu, Feng Li, Sharad Mehrotra, and Beng Chin Ooi. 2011. Query optimization for massively parallel data processing. In Proceedings of the 2nd ACM Symposium on Cloud Computing. Article 12, 13 pages.
[84]
Sai Wu, Beng Chin Ooi, and Kian-Lee Tan. 2010. Continuous sampling for online aggregation over multiple queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data. New York, NY, 651--662.
[85]
Chuan Xiao, Wei Wang, Xuemin Lin, and Jeffrey Xu Yu. 2008. Efficient similarity joins for near duplicate detection. In Proceedings of the 17th International World Wide Web Conference. 131--140.
[86]
Hung-Chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and Douglas Stott Parker Jr. 2007. Map-reduce-merge: simplified relational data processing on large clusters. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1029--1040.
[87]
Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2009. Job Scheduling for Multi-User MapReduce Clusters. Technical Report. Berkeley.
[88]
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing.
[89]
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy H. Katz, and Ion Stoica. 2008. Improving MapReduce performance in heterogeneous environments. In Proceedings of the 8th USENIX Symposium on Operating System Design and Implementation. 29--42.
[90]
Chi Zhang, Feifei Li, and Jeffrey Jestes. 2012b. Efficient parallel kNN joins for large data in MapReduce. In Proceedings of the 12th International Conference on Extending Database Technology. 38--49.
[91]
Xiaofei Zhang, Lei Chen, and Min Wang. 2012a. Efficient multi-way theta-join processing using MapReduce. Proc. VLDB Endow. 5, 11 (2012), 1184--1195.

Cited By

View all
  • (2024)Data Sovereignty and Compliance in the Computing Continuum2024 11th International Conference on Future Internet of Things and Cloud (FiCloud)10.1109/FiCloud62933.2024.00027(123-130)Online publication date: 19-Aug-2024
  • (2024)A New Boosting Algorithm for Online Portfolio Selection Based on dynamic Time Warping and Anti-correlationComputational Economics10.1007/s10614-023-10383-663:5(1777-1803)Online publication date: 1-May-2024
  • (2024)Extracting Common DNA Segments from the Complete Genomes of 7538 Viruses and Five Selected MammalsAdvances in Computational Collective Intelligence10.1007/978-3-031-70248-8_29(371-383)Online publication date: 8-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 46, Issue 3
January 2014
507 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/2578702
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2014
Accepted: 26 June 2013
Revised: 21 February 2013
Received: 15 September 2012
Published in CSUR Volume 46, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Hadoop
  2. MapReduce
  3. scalability

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)151
  • Downloads (Last 6 weeks)18
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Data Sovereignty and Compliance in the Computing Continuum2024 11th International Conference on Future Internet of Things and Cloud (FiCloud)10.1109/FiCloud62933.2024.00027(123-130)Online publication date: 19-Aug-2024
  • (2024)A New Boosting Algorithm for Online Portfolio Selection Based on dynamic Time Warping and Anti-correlationComputational Economics10.1007/s10614-023-10383-663:5(1777-1803)Online publication date: 1-May-2024
  • (2024)Extracting Common DNA Segments from the Complete Genomes of 7538 Viruses and Five Selected MammalsAdvances in Computational Collective Intelligence10.1007/978-3-031-70248-8_29(371-383)Online publication date: 8-Sep-2024
  • (2023)Toward Building Edge Learning PipelinesIEEE Internet Computing10.1109/MIC.2022.317164327:1(61-69)Online publication date: 1-Jan-2023
  • (2023)Advancing Decentralized IoT with Privacy-preserving AI: Harnessing Federated Learning and NLP Techniques2023 IEEE International Conference on Artificial Intelligence, Blockchain, and Internet of Things (AIBThings)10.1109/AIBThings58340.2023.10292448(1-5)Online publication date: 16-Sep-2023
  • (2023)Distributed Shared MemoryDistributed Systems10.1002/9781119825968.ch13(337-369)Online publication date: 10-Feb-2023
  • (2022)Automating Common Data Integration for Improved Data-Driven Decision-Support System in Industrial ConstructionJournal of Computing in Civil Engineering10.1061/(ASCE)CP.1943-5487.000100136:2Online publication date: Mar-2022
  • (2022)Solving Reward-Collecting Problems with UAVs: A Comparison of Online Optimization and Q-LearningJournal of Intelligent and Robotic Systems10.1007/s10846-021-01548-2104:2Online publication date: 1-Feb-2022
  • (2022)AI/ML Data Pipelines for Edge-Cloud ArchitecturesEdge Intelligence10.1007/978-3-031-22155-2_5(159-181)Online publication date: 28-Nov-2022
  • (2021)vCDS: A Virtualized Cross Domain Solution ArchitectureMILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM)10.1109/MILCOM52596.2021.9652903(61-68)Online publication date: 29-Nov-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media