Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2379436.2379440acmotherconferencesArticle/Chapter ViewAbstractPublication PagesasbdConference Proceedingsconference-collections
research-article

Workload diversity and dynamics in big data analytics: implications to system designers

Published: 09 June 2012 Publication History

Abstract

The emergence of big data analytics and the need for cost/energy efficient IT infrastructure motivate a new focus on data-centric designs. In this paper, we aim to better understand the design implications of data analytics systems by quantifying workload requirements and runtime dynamics. We examine four workloads representing big data analytics trends for fast decisions, total integration, deep analysis and fresh insights: an archive store, a columnar database enhanced with table compression, an analytics engine with distributed R, and a transaction/analytics hybrid system. These appliations demonstrate diverse resource requirements both within and across workloads as well as load imbalance due to data skew. Our observations suggest several directions to design balanced data analytics systems, including tight integration of heterogeneous, active data stores, support for efficient communication and data-centric load balancing.

References

[1]
J. Cipar, et al., LazyBase: Trading Freshness for Performance in a Scalable Database. EuroSys 2012.
[2]
MonetDB. http://www.monetdb.org. 2012
[3]
Google Snappy. http://code.google.com/p/snappy.
[4]
TPC Council. TPC-H. http://www.tpc.org/tpch/.
[5]
S. Venkataraman, et al., Using R for Iterative and Incremental Processing. Draft under submission.
[6]
The R project for statistical computing. www.r-project.org
[7]
TPC Council. TPC-C. http://www.tpc.org/tpcc/.
[8]
Apache Hadoop. http://wiki.apache.org/hadoop/Sort.
[9]
E. Anderson. Capture, Conversion, and Analysis of an Intense NFS Workload. FAST 2009.
[10]
J. Gray and P. Shenoy. Rules of Thumb in Data Engineering. TechReport MS-T R-99-100. 2000.
[11]
C. Kozyrakis, et al. Server Engineering Insights for Large-Scale Online Services. IEEE Micro, vol. 30(4), July/Aug. 2010.
[12]
M. Berezecki et al. Many-Core Key-Value Store. IGCC 2011.
[13]
D. Andersen, et al. FAWN: a Fast Array of Wimpy Nodes. SOSP 2009.
[14]
K. Lim, et al. Understanding and designing new server architectures for emerging warehouse-computing environments. ISCA 2008.
[15]
Y. Kwon et al. A Study of Skew in MapReduce Applications. OpenCirrus Summit, 2011.
[16]
A. Kemper and T. Neumann. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. ICDE 2011.
[17]
R. Cole, F. Funke, et al. The Mixed Workload CH-benCHmark. DBTest 2011.
[18]
D. Meisner, B. Gold and T. Wenisch. PowerNap: Eliminating Server Idle Power. ASPLOS 2009.

Cited By

View all
  • (2017)Wireless-Optical Network Convergence: Enabling the 5G Architecture to Support Operational and End-User ServicesIEEE Communications Magazine10.1109/MCOM.2017.160064355:10(184-192)Online publication date: Oct-2017
  • (2017)Augmenting Amdahl's Second Law: A Theoretical Model to Build Cost-Effective Balanced HPC Infrastructure for Data-Driven Science2017 IEEE 10th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD.2017.27(147-154)Online publication date: Jun-2017
  • (2016)Stochastic Energy Efficient Cloud Service Provisioning Deploying Renewable Energy SourcesIEEE Journal on Selected Areas in Communications10.1109/JSAC.2016.260006134:12(3927-3940)Online publication date: 1-Dec-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ASBD '12: Proceedings of the 2nd Workshop on Architectures and Systems for Big Data
June 2012
26 pages
ISBN:9781450314442
DOI:10.1145/2379436
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. analytics system
  2. balanced system designs
  3. big data
  4. system architectures
  5. workload diversity

Qualifiers

  • Research-article

Conference

ASBD '12

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Wireless-Optical Network Convergence: Enabling the 5G Architecture to Support Operational and End-User ServicesIEEE Communications Magazine10.1109/MCOM.2017.160064355:10(184-192)Online publication date: Oct-2017
  • (2017)Augmenting Amdahl's Second Law: A Theoretical Model to Build Cost-Effective Balanced HPC Infrastructure for Data-Driven Science2017 IEEE 10th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD.2017.27(147-154)Online publication date: Jun-2017
  • (2016)Stochastic Energy Efficient Cloud Service Provisioning Deploying Renewable Energy SourcesIEEE Journal on Selected Areas in Communications10.1109/JSAC.2016.260006134:12(3927-3940)Online publication date: 1-Dec-2016
  • (2016)Big Data Benchmark CompendiumPerformance Evaluation and Benchmarking: Traditional to Big Data to Internet of Things10.1007/978-3-319-31409-9_9(135-155)Online publication date: 2016
  • (2015)High performance CDR processing with MapReduce2015 9th International Conference on Telecommunication Systems Services and Applications (TSSA)10.1109/TSSA.2015.7440424(1-6)Online publication date: Nov-2015
  • (2015)Aggressive Resource Provisioning for Ensuring QoS in Virtualized EnvironmentsIEEE Transactions on Cloud Computing10.1109/TCC.2014.23530453:2(119-131)Online publication date: 1-Apr-2015
  • (2015)Evaluation of converged networks for 5G infrastructures2015 7th International Workshop on Reliable Networks Design and Modeling (RNDM)10.1109/RNDM.2015.7324301(1-6)Online publication date: Oct-2015
  • (2015)Optical wireless network convergence in support of energy-efficient mobile cloud servicesPhotonic Network Communications10.1007/s11107-015-0494-229:3(269-281)Online publication date: 1-Jun-2015
  • (2014)Performance Characterization of Hadoop and Data MPI Based on Amdahl's Second LawProceedings of the 2014 9th IEEE International Conference on Networking, Architecture, and Storage10.1109/NAS.2014.39(207-215)Online publication date: 6-Aug-2014
  • (2014)Energy-aware offloading in mobile cloud systems with delay considerations2014 IEEE Globecom Workshops (GC Wkshps)10.1109/GLOCOMW.2014.7063383(42-47)Online publication date: Dec-2014
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media