Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2938503.2939571acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

Computing Marginals Using MapReduce: Keynote talk paper

Published: 11 July 2016 Publication History

Abstract

We consider the problem of computing the data-cube marginals of a fixed order k (i.e., all marginals that aggregate over k dimensions), using a single round of MapReduce. The focus is on the relationship between the reducer size (number of inputs allowed at a single reducer) and the replication rate (number of reducers to which an input is sent). We show that the replication rate is minimized when the reducers receive all the inputs necessary to compute one marginal of higher order. That observation lets us view the problem as one of covering sets of k dimensions with sets of a larger size m, a problem that has been studied under the name "covering numbers." We offer a number of constructions that, for different values of k and m meet or come close to yielding the minimum possible replication rate for a given reducer size.

References

[1]
Gamma function. https://en.wikipedia.org/wiki/Gamma_function.
[2]
A. Abelló, J. Ferrarons, and O. Romero. Building cubes with mapreduce. In DOLAP 2011, ACM 14th International Workshop on Data Warehousing and OLAP, Glasgow, United Kingdom, October 28, 2011, Proceedings, pages 17--24, 2011.
[3]
A. Abouzeid, K. Bajda-Pawlikowski, D. J. Abadi, A. Rasin, and A. Silberschatz. Hadoopdb: An architectural hybrid of mapreduce and DBMS technologies for analytical workloads. PVLDB, 2(1):922--933, 2009.
[4]
F. N. Afrati, S. Dolev, S. Sharma, and J. D. Ullman. Bounds for overlapping interval join on mapreduce. In Proceedings of the Workshops of the EDBT/ICDT 2015 Joint Conference (EDBT/ICDT), Brussels, Belgium, March 27th, 2015., pages 3--6, 2015.
[5]
F. N. Afrati, A. D. Sarma, S. Salihoglu, and J. D. Ullman. Upper and lower bounds on the cost of a map-reduce computation. PVLDB, 6(4):277--288, 2013.
[6]
F. N. Afrati and J. D. Ullman. Matching bounds for the all-pairs mapreduce problem. In 17th International Database Engineering & Applications Symposium, IDEAS '13, Barcelona, Spain - October 09-11, 2013, pages 3--4, 2013.
[7]
A. V. Aho and J. D. Ullman. Foundations of Computer Science: C Edition. W. H. Freeman, 1995.
[8]
I. Anderson. Combinatorial Designs and Tournaments.
[9]
D. Applegate, E. M. Rains, and N. J. A. Sloane. On Asymmetric Coverings and Covering Numbers. Journal on Combinatorial Designs, 11:2003, 2003.
[10]
B. Bollabas. Combinatorics: set systems, hypergraphs, families of vectors, and combinatorial probability. Cambridge University Press, 1986.
[11]
P. G. Brown. Overview of scidb: large scale array storage, processing and analysis. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010, pages 963--968, 2010.
[12]
J. Cohen, B. Dolan, M. Dunlap, J. M. Hellerstein, and C. Welton. MAD skills: New analysis practices for big data. PVLDB, 2(2):1481--1492, 2009.
[13]
J. N. Cooper, R. B. Ellis, and A. B. Kahng. Asymmetric Binary Covering Codes. Journal on Combinatorial Theory, Series A, 100(2):232--249, 2002.
[14]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In OSDI, 2004.
[15]
C. Doulkeridis and K. Nørvåg. A survey of large-scale analytical query processing in mapreduce. VLDB J., 23(3):355--380, 2014.
[16]
M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J. D. Ullman. Computing iceberg queries efficiently. In VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA, pages 299--310, 1998.
[17]
E. Friedman, P. M. Pawlowski, and J. Cieslewicz. Sql/mapreduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions. PVLDB, 2(2):1402--1413, 2009.
[18]
J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In Proceedings of the Twelfth International Conference on Data Engineering, February 26-March 1, 1996, New Orleans, Louisiana, pages 152--159, 1996.
[19]
V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4-6, 1996., pages 205--216, 1996.
[20]
J. M. Hellerstein, C. Ré, F. Schoppmann, D. Z. Wang, E. Fratkin, A. Gorajek, K. S. Ng, C. Welton, X. Feng, K. Li, and A. Kumar. The madlib analytics library or MAD skills, the SQL. PVLDB, 5(12):1700--1711, 2012.
[21]
S. Hoory, N. Linial, and A. Widgerson. Expander graphs and their applications. Bulletin (New Series) of the AMS, 43(4):439--561, 2006.
[22]
S. Lee, J. Kim, Y.-S. Moon, and W. Lee. Efficient distributed parallel top-down computation of rolap data cube using mapreduce. In A. Cuzzocrea and U. Dayal, editors, Data Warehousing and Knowledge Discovery, volume 7448 of Lecture Notes in Computer Science, pages 168--179. Springer Berlin Heidelberg, 2012.
[23]
A. Nandi, C. Yu, P. Bohannon, and R. Ramakrishnan. Data cube materialization and mining over mapreduce. IEEE Trans. Knowl. Data Eng., 24(10):1747--1759, 2012.
[24]
K. Rohitkumar and S. Patil. Data cube materialization using mapreduce. International Journal of Innovative Research in Computer and Communication Engineering, 11(2):6506--6511, 2014.
[25]
K. Sergey and K. Yury. Applying map-reduce paradigm for parallel closed cube computation. In The First International Conference on Advances in Databases, Knowledge, and Data Applications, DBKDS 2009, Gosier, Guadeloupe, France, 1-6 March 2009, pages 62--67, 2009.
[26]
E. Soroush, M. Balazinska, and D. L. Wang. Arraystore: a storage manager for complex parallel array processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12-16, 2011, pages 253--264, 2011.
[27]
J. D. Ullman. Designing good mapreduce algorithms. ACM Crossroads, 19(1):30--34, 2012.
[28]
S. Vemuri, M. Varshney, K. Puttaswamy, and R. Liu. Execution primitives for scalable joins and aggregations in map reduce. Proc. VLDB Endow., 7(13):1462--1473, Aug. 2014.
[29]
B. Wang, H. Gui, M. Roantree, and M. F. O'Connor. Data cube computational model with hadoop mapreduce. In WEBIST 2014 - Proceedings of the 10th International Conference on Web Information Systems and Technologies, Volume 1, Barcelona, Spain, 3-5 April, 2014, pages 193--199, 2014.
[30]
Z. Wang, Y. Chu, K. Tan, D. Agrawal, A. El Abbadi, and X. Xu. Scalable data cube analysis over big data. CoRR, abs/1311.5663, 2013.
[31]
Z. Wang, Q. Fan, H. Wang, K.-L. Tan, D. Agrawal, and A. El Abbadi. Pagrol: Parallel graph olap over large-scale attributed graphs. In Data Engineering (ICDE), 2014 IEEE 30th International Conference on, pages 496--507. IEEE, 2014.
[32]
D. Zhang. Integrative Text Mining and Management in Multidimensional Text Databases. PhD thesis, University of Illinois at Urbana-Champaign, 2012.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IDEAS '16: Proceedings of the 20th International Database Engineering & Applications Symposium
July 2016
420 pages
ISBN:9781450341189
DOI:10.1145/2938503
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

  • Keio University: Keio University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2016

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

IDEAS '16

Acceptance Rates

Overall Acceptance Rate 74 of 210 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 86
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media