research-article

Distributed data management using MapReduce

Authors:

M. Tamer Özsu,

Sai WuAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 46, Issue 3

Article No.: 31, Pages 1 - 42

https://doi.org/10.1145/2503009

Published: 01 January 2014 Publication History

Abstract

MapReduce is a framework for processing and managing large-scale datasets in a distributed cluster, which has been used for applications such as generating search indexes, document clustering, access log analysis, and various other forms of data analytics. MapReduce adopts a flexible computation model with a simple interface consisting of map and reduce functions whose implementations can be customized by application developers. Since its introduction, a substantial amount of research effort has been directed toward making it more usable and efficient for supporting database-centric operations. In this article, we aim to provide a comprehensive review of a wide range of proposals and systems that focusing fundamentally on the support of distributed data management and processing using the MapReduce framework.

References

[1]

Daniel J. Abadi, Samuel Madden, and Miguel Ferreira. 2006. Integrating compression and execution in column-oriented database systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 671--682.

Digital Library

[2]

Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel Abadi, Avi Silberschatz, and Alexander Rasin. 2009. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow. 2, 1 (Aug. 2009), 922--933.

Digital Library

[3]

Azza Abouzied, Kamil Bajda-Pawlikowski, Jiewen Huang, Daniel J. Abadi, and Avi Silberschatz. 2010. HadoopDB in action: building real world applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data. New York, NY, 1111--1114.

Digital Library

[4]

Foto N. Afrati, Anish Das Sarma, David Menestrina, Aditya G. Parameswaran, and Jeffrey D. Ullman. 2012. Fuzzy joins using MapReduce. In Proceedings of the 28th International Conference on Data Engineering. 498--509.

Digital Library

[5]

Foto N. Afrati and Jeffrey D. Ullman. 2010. Optimizing joins in a map-reduce environment. In Proceedings of the 13th International Conference on Extending Database Technology. 99--110.

Digital Library

[6]

Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha, and Edward Harris. 2010. Reining in the outliers in map-reduce clusters using Mantri. In Proceedings of the 9th USENIX Symposium on Operating System Design and Implementation. 1--16.

Digital Library

[7]

Roberto J. Bayardo, Yiming Ma, and Ramakrishnan Srikant. 2007. Scaling up all pairs similarity search. In Proceedings of the 16th International World Wide Web Conference. 131--140.

Digital Library

[8]

Philip A. Bernstein and Dah-Ming W. Chiu. 1981. Using semi-joins to solve relational queries. J. ACM 28, 1 (Jan. 1981), 25--40.

Digital Library

[9]

Kevin S. Beyer, Vuk Ercegovac, Rajasekar Krishnamurthy, Sriram Raghavan, Jun Rao, Frederick Reiss, Eugene J. Shekita, David E. Simmen, Sandeep Tata, Shivakumar Vaithyanathan, and Huaiyu Zhu. 2009. Towards a scalable enterprise content analytics platform. Q. Bull. IEEE TC on Data Eng. 32, 1 (2009), 28--35.

[10]

Spyros Blanas, Jignesh M. Patel, Vuk Ercegovac, Jun Rao, Eugene J. Shekita, and Yuanyuan Tian. 2010. A comparison of join algorithms for log processing in MapReduce. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 975--986.

Digital Library

[11]

Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael D. Ernst. 2010. HaLoop: Efficient iterative data processing on large clusters. Proc. VLDB Endow. 3, 1--2 (Sept. 2010), 285--296.

Digital Library

[12]

Yu Cao, Chun Chen, Fei Guo, Dawei Jiang, Yuting Lin, Beng Chin Ooi, Hoang Tam Vo, Sai Wu, and Quanqing Xu. 2011. ES2: A cloud data storage system for supporting both OLTP and OLAP. In Proceedings of the 27th International Conference on Data Engineering. 291--302.

Digital Library

[13]

Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. 2010. FlumeJava: Easy, efficient data-parallel pipelines. In Proceedings of the ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation. 363--375.

Digital Library

[14]

Biswapesh Chattopadhyay, Liang Lin, Weiran Liu, Sagar Mittal, Prathyusha Aragonda, Vera Lychagina, Younghee Kwon, and Michael Wong. 2011. Tenzing: A SQL implementation on the MapReduce framework. Proc. VLDB Endow. 4, 12 (2011), 1318--1327.

Digital Library

[15]

Surajit Chaudhuri, Venkatesh Ganti, and Raghav Kaushik. 2006. A primitive operator for similarity joins in data cleaning. In Proceedings of the 22nd International Conference on Data Engineering. 5.

Digital Library

[16]

Surajit Chaudhuri and Gerhard Weikum. 2000. Rethinking database system architecture: Towards a self-tuning RISC-style database system. In Proceedings of the 26th International Conference on Very Large Data Bases. 1--10.

Digital Library

[17]

Chun Chen, Gang Chen, Dawei Jiang, Beng Chin Ooi, Hoang Tam Vo, Sai Wu, and Quanqing Xu. 2010. Providing scalable database aervices on the cloud. In Proceedings of the 11th International Conference on Web Information Systems Engineering. 1--19.

Digital Library

[18]

Gang Chen, Hoang Tam Vo, Sai Wu, Beng Chin Ooi, and M. Tamer Özsu. 2011. A Framework for Supporting DBMS-like Indexes in the Cloud. PVLDB 4, 11 (2011), 702--713.

[19]

Songting Chen. 2010. Cheetah: A high performance, custom data warehouse on top of MapReduce. Proc. VLDB Endow. 3, 2 (2010), 1459--1468.

Digital Library

[20]

Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. 2010. MapReduce online. In Proceedings of the 7th USENIX Symposium on Networked Systems Design and Implementation. 21--21.

Digital Library

[21]

Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th USENIX Symposium on Operating System Design and Implementation. 137--150.

Digital Library

[22]

D. Dewitt and M Stonebraker. 2009. MapReduce: A Major Step Backwards. http://www.cs.washington.edu/homes/billhowe/mapreduce_a_major_step_backw ards.html. (2009).

[23]

David J. DeWitt, Erik Paulson, Eric Robinson, Jeffrey Naughton, Joshua Royalty, Srinath Shankar, and Andrew Krioukov. 2008. Clustera: An integrated computation and data management system. Proc. VLDB Endow. 1, 1 (2008), 28--41.

Digital Library

[24]

Jens Dittrich, Jorge-Arnulfo Quiané-Ruiz, Alekh Jindal, Yagiz Kargin, Vinay Setty, and Jörg Schad. 2010. Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing). Proc. VLDB Endow. 3, 1 (2010), 518--529.

Digital Library

[25]

Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox. 2010. Twister: A runtime for iterative MapReduce. In Proceedings of the 19th IEEE International Symposium High Performance Distributed Computing. 810--818.

Digital Library

[26]

Avrilia Floratou, Jignesh M. Patel, Eugene J. Shekita, and Sandeep Tata. 2011. Column-oriented storage techniques for MapReduce. Proc. VLDB Endow. 4, 7 (April 2011), 419--429.

Digital Library

[27]

Eric Friedman, Peter Pawlowski, and John Cieslewicz. 2009. SQL/MapReduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions. Proc. VLDB Endow. 2 (August 2009), 1402--1413. Issue 2.

Digital Library

[28]

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the 19th ACM Symposium on Operating System Principles. 29--43.

Digital Library

[29]

Amol Ghoting, Rajasekar Krishnamurthy, Edwin P. D. Pednault, Berthold Reinwald, Vikas Sindhwani, Shirish Tatikonda, Yuanyuan Tian, and Shivakumar Vaithyanathan. 2011. SystemML: Declarative machine learning on MapReduce. In Proceedings of the 27th International Conference on Data Engineering. 231--242.

Digital Library

[30]

Lukasz Golab and M. Tamer Özsu. 2010. Data Stream Systems. Morgan & Claypool.

[31]

Goetz Graefe and Leonard D. Shapiro. 1991. Data compression and database performance. In Proceedings of the 1991 ACM Symposium on Applied Computing. 22--27.

[32]

Benjamin Gufler, Nikolaus Augsten, Angelika Reiser, and Alfons Kemper. 2012. Load balancing in MapReduce based on scalable cardinality estimates. In Proceedings of the 28th International Conference on Data Engineering. 522--533.

Digital Library

[33]

Yongqiang He, Rubao Lee, Yin Huai, Zheng Shao, Namit Jain, Xiaodong Zhang, and Zhiwei Xu. 2011. RCfile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems. In Proceedings of the 27th International Conference on Data Engineering. 1199--1208.

Digital Library

[34]

Joseph M. Hellerstein, Peter J. Haas, and Helen J. Wang. 1997. Online aggregation. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 171--182.

Digital Library

[35]

Herodotos Herodotou and Shivnath Babu. 2011. Profiling, what-if analysis, and cost-based optimization of MapReduce programs. Proc. VLDB Endow. 4, 11 (2011), 1111--1122.

Digital Library

[36]

Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: distributed data-parallel programs from sequential building blocks. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems. 59--72.

Digital Library

[37]

Eaman Jahani, Michael J. Cafarella, and Christopher Ré. 2011. Automatic optimization for MapReduce programs. Proc. VLDB Endow. 4, 6 (2011), 385--396.

Digital Library

[38]

Jeffrey Jestes, Ke Yi, and Feifei Li. 2011. Building wavelet histograms on large data in MapReduce. Proc. VLDB Endow. 5, 2 (Oct. 2011), 109--120.

Digital Library

[39]

Dawei Jiang, Beng Chin Ooi, Lei Shi, and Sai Wu. 2010. The performance of MapReduce: An in-depth study. Proc. VLDB Endow. 3, 1 (2010), 472--483.

Digital Library

[40]

David Jiang, Anthony K. H. Tung, and Gang Chen. 2011. MAP-JOIN-REDUCE: Toward scalable and efficient data analysis on large clusters. IEEE Trans. Knowl. Data Eng. 23, 9 (2011), 1299--1311.

Digital Library

[41]

Alekh Jindal, Jorge-Arnulfo Quiané-Ruiz, and Jens Dittrich. 2011. Trojan data layouts: Right shoes for a running elephant. In Proceedings of the 2nd ACM Symposium on Cloud Computing. Article 21, 14 pages.

Digital Library

[42]

Younghoon Kim and Kyuseok Shim. 2012. Parallel top-k similarity join algorithms using MapReduce. In Proceedings of the 28th International Conference on Data Engineering. 510--521.

Digital Library

[43]

Paraschos Koutris and Dan Suciu. 2011. Parallel evaluation of conjunctive queries. In Proceedings of the 30th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, New York, NY, USA, 223--234.

Digital Library

[44]

Vibhore Kumar, Henrique Andrade, Bugra Gedik, and Kun-Lung Wu. 2010. DEDUCE: At the intersection of MapReduce and stream processing. In Proceedings of the 13th International Conference on Extending Database Technology. 657--662.

Digital Library

[45]

YongChul Kwon, Magdalena Balazinska, Bill Howe, and Jerome Rolia. 2012. SkewTune: mitigating skew in MapReduce applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 25--36.

Digital Library

[46]

Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGregor, and Prashant Shenoy. 2011. A platform for scalable one-pass analytics using MapReduce. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 985--996.

Digital Library

[47]

Yuting Lin, Divyakant Agrawal, Chun Chen, Beng Chin Ooi, and Sai Wu. 2011. Llama: leveraging columnar storage for scalable join processing in the MapReduce framework. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 961--972.

Digital Library

[48]

Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning in the cloud. CoRR abs/1204.6078 (2012).

Digital Library

[49]

Peng Lu, Sai Wu, Lidan Shou, and Kian-Lee Tan. 2013. An Efficient and Compact Indexing Scheme for Large-scale Data Store. In Proceedings of the 29th International Conference on Data Engineering. 326--337.

Digital Library

[50]

Wei Lu, Yanyan Shen, Su Chen, and Beng Chin Ooi. 2012. Efficient processing of k nearest neighbor joins using MapReduce. Proc. VLDB Endow. 5, 10 (June 2012), 1016--1027.

Digital Library

[51]

Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 135--146.

Digital Library

[52]

Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, and Theo Vassilakis. 2011. Dremel: Interactive analysis of web-scale datasets. Commun. ACM 54, 6 (June 2011), 114--123.

Digital Library

[53]

Ahmed Metwally and Christos Faloutsos. 2012. V-SMART-join: A scalable MapReduce framework for all-pair similarity joins of multisets and vectors. Proc. VLDB Endow. 5, 8 (April 2012), 704--715.

Digital Library

[54]

Kristi Morton, Magdalena Balazinska, and Dan Grossman. 2010a. ParaTimer: A progress indicator for MapReduce DAGs. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 507--518.

Digital Library

[55]

Kristi Morton, Abram Friesen, Magdalena Balazinska, and Dan Grossman. 2010b. Estimating the progress of MapReduce pipelines. In Proceedings of the 26th International Conference on Data Engineering. 681--684.

[56]

Leonardo Neumeyer, Bruce Robbins, Anish Nair, and Anand Kesari. 2010. S4: Distributed stream computing platform. In Proceedings of the 2010 IEEE International Conference on Data Mining Workshops. 170--177.

Digital Library

[57]

Tomasz Nykiel, Michalis Potamias, Chaitanya Mishra, George Kollios, and Nick Koudas. 2010. MRShare: Sharing across multiple queries in MapReduce. Proc. VLDB Endow. 3, 1 (2010), 494--505.

Digital Library

[58]

Alper Okcan and Mirek Riedewald. 2011. Processing theta-joins using MapReduce. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 949--960.

Digital Library

[59]

Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008. Pig Latin: A not-so-foreign language for data processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1099--1110.

Digital Library

[60]

Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. 2011. Fast crash recovery in RAMCloud. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles. 29--41.

Digital Library

[61]

M. Tamer Özsu and Patrick Valduriez. 2011. Principles of Distributed Database Systems (3 ed.). Springer.

Digital Library

[62]

Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999--66. Stanford InfoLab.

[63]

Niketan Pansare, Vinayak R. Borkar, Chris Jermaine, and Tyson Condie. 2011. Online aggregation for large MapReduce jobs. Proc. VLDB Endow. 4, 11 (2011), 1135--1145.

Digital Library

[64]

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, and Michael Stonebraker. 2009. A comparison of approaches to large-scale data analysis. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 165--178.

Digital Library

[65]

Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. 2005. Interpreting the data: Parallel analysis with Sawzall. Sci. Program. 13, 4 (Oct. 2005), 277--298.

Digital Library

[66]

Smriti R. Ramakrishnan, Garret Swart, and Aleksey Urmanov. 2012. Balancing reducer skew in MapReduce workloads using progressive sampling. In Proceedings of the 3rd ACM Symposium on Cloud Computing. 16:1--16:14.

Digital Library

[67]

Vijayshankar Raman and Garret Swart. 2006. How to wring a table dry: Entropy compression of relations and querying of compressed relations. In Proceedings of the 32nd International Conference on Very Large Data Bases. 858--869.

Digital Library

[68]

Sriram Rao, Raghu Ramakrishnan, Adam Silberstein, Mike Ovsiannikov, and Damian Reeves. 2012. Sailfish: a framework for large scale data processing. In Proceedings of the 3rd ACM Symposium on Cloud Computing. 4:1--4:14.

Digital Library

[69]

Alexander Rasmussen, Vinh The Lam, Michael Conley, George Porter, Rishi Kapoor, and Amin Vahdat. 2012. Themis: an I/O-efficient MapReduce. In Proceedings of the 3rd ACM Symposium on Cloud Computing. 13:1--13:14.

Digital Library

[70]

Donovan A. Schneider and David J. Dewitt. 1989. A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 110--121.

Digital Library

[71]

R. Stephens. 1997. A survey of stream processing. Acta Informatica 34, 7 (1997), 491--541.

[72]

Michael Stonebraker, Daniel Abadi, David J. DeWitt, Sam Madden, Erik Paulson, Andrew Pavlo, and Alexander Rasin. 2010. MapReduce and parallel DBMSs: Friends or foes&quest; Commun. ACM 53, 1 (2010), 64--71.

Digital Library

[73]

Mike Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O’Neil, Pat O’Neil, Alex Rasin, Nga Tran, and Stan Zdonik. 2005. C-store: A column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases. 553--564.

Digital Library

[74]

Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos, Nabil Hachem, and Pat Helland. 2007. The end of an architectural era: (it’s time for a complete rewrite). In Proceedings of the 33rd International Conference on Very Large Data Bases. 1150--1160.

Digital Library

[75]

Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive—A warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2, 2 (2009), 1626--1629.

Digital Library

[76]

Leslie G. Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (Aug. 1990), 103--111.

Digital Library

[77]

Rares Vernica, Michael J. Carey, and Chen Li. 2010. Efficient parallel set-similarity joins using MapReduce. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 495--506.

Digital Library

[78]

Hoang Tam Vo, Sheng Wang, Divyakant Agrawal, Gang Chen, and Beng Chin Ooi. 2012. LogBase: a scalable log-structured database system in the cloud. Proc. VLDB Endow. 5, 10 (2012), 1004--1015.

Digital Library

[79]

Guanying Wang, Ali Raza Butt, Prashant Pandey, and Karan Gupta. 2009. A simulation approach to evaluating design decisions in MapReduce setups. In Proceedings of the 17th IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Syst. 1--11.

[80]

Jinbao Wang, Sai Wu, Hong Gao, Jianzhong Li, and Beng Chin Ooi. 2010. Indexing multi-dimensional data in a cloud system. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 591--602.

Digital Library

[81]

Xiaodan Wang, Christopher Olston, Anish Das Sarma, and Randal Burns. 2011. CoScan: cooperative scan sharing in the cloud. In Proceedings of the 2nd ACM Symposium on Cloud Computing. 11:1--11:12.

Digital Library

[82]

Sai Wu, Dawei Jiang, Beng Chin Ooi, and Kun-Lung Wu. 2010. Efficient B-tree based indexing for cloud data processing. Proc. VLDB Endow. 3, 1 (2010), 1207--1218.

Digital Library

[83]

Sai Wu, Feng Li, Sharad Mehrotra, and Beng Chin Ooi. 2011. Query optimization for massively parallel data processing. In Proceedings of the 2nd ACM Symposium on Cloud Computing. Article 12, 13 pages.

Digital Library

[84]

Sai Wu, Beng Chin Ooi, and Kian-Lee Tan. 2010. Continuous sampling for online aggregation over multiple queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data. New York, NY, 651--662.

Digital Library

[85]

Chuan Xiao, Wei Wang, Xuemin Lin, and Jeffrey Xu Yu. 2008. Efficient similarity joins for near duplicate detection. In Proceedings of the 17th International World Wide Web Conference. 131--140.

Digital Library

[86]

Hung-Chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and Douglas Stott Parker Jr. 2007. Map-reduce-merge: simplified relational data processing on large clusters. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1029--1040.

Digital Library

[87]

Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2009. Job Scheduling for Multi-User MapReduce Clusters. Technical Report. Berkeley.

[88]

Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing.

Digital Library

[89]

Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy H. Katz, and Ion Stoica. 2008. Improving MapReduce performance in heterogeneous environments. In Proceedings of the 8th USENIX Symposium on Operating System Design and Implementation. 29--42.

Digital Library

[90]

Chi Zhang, Feifei Li, and Jeffrey Jestes. 2012b. Efficient parallel kNN joins for large data in MapReduce. In Proceedings of the 12th International Conference on Extending Database Technology. 38--49.

Digital Library

[91]

Xiaofei Zhang, Lei Chen, and Min Wang. 2012a. Efficient multi-way theta-join processing using MapReduce. Proc. VLDB Endow. 5, 11 (2012), 1184--1195.

Digital Library

Cited By

Karagiannis V(2024)Data Sovereignty and Compliance in the Computing Continuum2024 11th International Conference on Future Internet of Things and Cloud (FiCloud)10.1109/FiCloud62933.2024.00027(123-130)Online publication date: 19-Aug-2024
https://doi.org/10.1109/FiCloud62933.2024.00027
He HLi H(2024)A New Boosting Algorithm for Online Portfolio Selection Based on dynamic Time Warping and Anti-correlationComputational Economics10.1007/s10614-023-10383-663:5(1777-1803)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1007/s10614-023-10383-6
Wang JWang Y(2024)Extracting Common DNA Segments from the Complete Genomes of 7538 Viruses and Five Selected MammalsAdvances in Computational Collective Intelligence10.1007/978-3-031-70248-8_29(371-383)Online publication date: 8-Sep-2024
https://doi.org/10.1007/978-3-031-70248-8_29
Show More Cited By

Index Terms

Distributed data management using MapReduce

Recommendations

MapReduce: Review and open challenges

The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...
Implementation of Distributed Searching and Sorting using Hadoop MapReduce
ICTCS '14: Proceedings of the 2014 International Conference on Information and Communication Technology for Competitive Strategies

This paper focuses on implementation of MapReduce programming model on Hadoop cluster for parallel processing of huge amount of data efficiently. There is deluge of data everywhere and we need to process these data efficiently to take decisions and to ...
Efficient Batch Processing of Related Big Data Tasks using Persistent MapReduce Technique
VisionNet'16: Proceedings of the Third International Symposium on Computer Vision and the Internet

The data generated by today's enterprises has been increasing at exponential rates in size from most recent couple of years. Also, the need to process and break down the substantial volumes of data has likewise expanded. In order to handle this enormous ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 46, Issue 3

January 2014

507 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/2578702

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2014

Accepted: 26 June 2013

Revised: 21 February 2013

Received: 15 September 2012

Published in CSUR Volume 46, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

106
Total Citations
View Citations
4,514
Total Downloads

Downloads (Last 12 months)151
Downloads (Last 6 weeks)18

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Karagiannis V(2024)Data Sovereignty and Compliance in the Computing Continuum2024 11th International Conference on Future Internet of Things and Cloud (FiCloud)10.1109/FiCloud62933.2024.00027(123-130)Online publication date: 19-Aug-2024
https://doi.org/10.1109/FiCloud62933.2024.00027
He HLi H(2024)A New Boosting Algorithm for Online Portfolio Selection Based on dynamic Time Warping and Anti-correlationComputational Economics10.1007/s10614-023-10383-663:5(1777-1803)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1007/s10614-023-10383-6
Wang JWang Y(2024)Extracting Common DNA Segments from the Complete Genomes of 7538 Viruses and Five Selected MammalsAdvances in Computational Collective Intelligence10.1007/978-3-031-70248-8_29(371-383)Online publication date: 8-Sep-2024
https://doi.org/10.1007/978-3-031-70248-8_29
Gounaris AMichailidou ADustdar S(2023)Toward Building Edge Learning PipelinesIEEE Internet Computing10.1109/MIC.2022.317164327:1(61-69)Online publication date: 1-Jan-2023
https://doi.org/10.1109/MIC.2022.3171643
Sarker AJesser ASpeidel M(2023)Advancing Decentralized IoT with Privacy-preserving AI: Harnessing Federated Learning and NLP Techniques2023 IEEE International Conference on Artificial Intelligence, Blockchain, and Internet of Things (AIBThings)10.1109/AIBThings58340.2023.10292448(1-5)Online publication date: 16-Sep-2023
https://doi.org/10.1109/AIBThings58340.2023.10292448
Ghosh RGhosh H(2023)Distributed Shared MemoryDistributed Systems10.1002/9781119825968.ch13(337-369)Online publication date: 10-Feb-2023
https://doi.org/10.1002/9781119825968.ch13
Wu LLi ZAbouRizk S(2022)Automating Common Data Integration for Improved Data-Driven Decision-Support System in Industrial ConstructionJournal of Computing in Civil Engineering10.1061/(ASCE)CP.1943-5487.000100136:2Online publication date: Mar-2022
https://doi.org/10.1061/(ASCE)CP.1943-5487.0001001
Liu YVogiatzis CYoshida RMorman E(2022)Solving Reward-Collecting Problems with UAVs: A Comparison of Online Optimization and Q-LearningJournal of Intelligent and Robotic Systems10.1007/s10846-021-01548-2104:2Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1007/s10846-021-01548-2
Taheri JDustdar SZomaya ADeng STaheri JDustdar SZomaya ADeng S(2022)AI/ML Data Pipelines for Edge-Cloud ArchitecturesEdge Intelligence10.1007/978-3-031-22155-2_5(159-181)Online publication date: 28-Nov-2022
https://doi.org/10.1007/978-3-031-22155-2_5
Daughety NPendleton MXu SNjilla LFranco J(2021)vCDS: A Virtualized Cross Domain Solution ArchitectureMILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM)10.1109/MILCOM52596.2021.9652903(61-68)Online publication date: 29-Nov-2021
https://doi.org/10.1109/MILCOM52596.2021.9652903
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents