Article

Hadoop Characterization

Authors:

Tao LiAuthors Info & Claims

TRUSTCOM-BIGDATASE-ISPA '15: Proceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 02

Pages 96 - 103

https://doi.org/10.1109/Trustcom.2015.567

Published: 20 August 2015 Publication History

Abstract

In the last decade, Warehouse Scale Computers (WSC) have grown in number and capacity while Hadoop became the de facto standard framework for Big data processing. Despite the existence of several benchmark suites, sizing guides, and characterization studies, there are few concrete guidelines for WSC designers and engineers who need to know how real Hadoop workloads are going to stress the different hardware subsystems of their servers. Available studies have shown execution statistics of Hadoop benchmarks but have not being able to extract meaningful and reusable results. Secondly, existing sizing guides provide hardware acquisition lists without considering the workloads. In this study, we propose a simple Big data workload differentiation, deliver general and specific conclusions about how demanding the different types of Hadoop workloads are for several hardware subsystems, and show how power consumption is influenced in each case. HiBench and Big-Bench suites were used to capture real time memory traces, and CPU, disk, and power consumption statistics of Hadoop. Our results show that CPU intensive and disk intensive workloads have a different behavior. CPU intensive workloads consume more power and memory bandwidth while disk intensive workloads usually require more memory. These and other conclusions presented in the paper are expected to help WSC designers to decide the hardware characteristics of their Hadoop systems, and better understand the behavior of big data workloads in Hadoop.

Cited By

View all

Makrani HSayadi HDinakarra SRafatirad SHomayoun HJacob B(2018)A comprehensive memory analysis of data intensive workloads on server class architectureProceedings of the International Symposium on Memory Systems10.1145/3240302.3240320(19-30)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.1145/3240302.3240320
Makrani HRafatirad SHoumansadr AHomayoun HEl-Araby EEl-Ghazawi TPanda D(2018)Main-memory requirements of big data applications on commodity server platformProceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2018.00097(653-660)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1109/CCGRID.2018.00097

Recommendations

Hadoop Characterization
TRUSTCOM-BIGDATASE-ISPA '15: Proceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 02

In the last decade, Warehouse Scale Computers (WSC) have grown in number and capacity while Hadoop became the de facto standard framework for Big data processing. Despite the existence of several benchmark suites, sizing guides, and characterization ...
'Big data', Hadoop and cloud computing in genomics

Graphical abstractDisplay Omitted Ever improving next generation sequencing technologies has led to an unprecedented proliferation of sequence data.Biology is now one of the fastest growing fields of big data science.Cloud computing and big data ...
Evaluating SQL-on-Hadoop for Big Data Warehousing on Not-So-Good Hardware
IDEAS '17: Proceedings of the 21st International Database Engineering & Applications Symposium

Big Data is currently conceptualized as data whose volume, variety or velocity impose significant difficulties in traditional techniques and technologies. Big Data Warehousing is emerging as a new concept for Big Data analytics. In this context, SQL-on-...

Comments

Information & Contributors

Information

Published In

TRUSTCOM-BIGDATASE-ISPA '15: Proceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 02

August 2015

494 pages

ISBN:9781467379526

Publisher

IEEE Computer Society

United States

Publication History

Published: 20 August 2015

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Makrani HSayadi HDinakarra SRafatirad SHomayoun HJacob B(2018)A comprehensive memory analysis of data intensive workloads on server class architectureProceedings of the International Symposium on Memory Systems10.1145/3240302.3240320(19-30)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.1145/3240302.3240320
Makrani HRafatirad SHoumansadr AHomayoun HEl-Araby EEl-Ghazawi TPanda D(2018)Main-memory requirements of big data applications on commodity server platformProceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2018.00097(653-660)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1109/CCGRID.2018.00097

Abstract

Cited By

Recommendations