Exploring Performance Models of Hadoop Applications on Cloud Architecture

Published: 04 May 2015


Hadoop is an open source implementation of the MapReduce programming model, and provides the runtime infrastructure for map and reduce functions programmed in individual applications. Commercial clouds such as Amazon Elastic MapReduce provides the Hadoop architecture with IaaS support. In this architecture, the map and reduce functions are major determinants of end-to-end application latency, along with the framework components responsible for data access and exchange. In this paper, we aim to explore modeling methods that capture the performance characteristic and the semantics of a Hadoop architecture. We present our early results for modeling the performance of a Hadoop application given the design of map and reduce functions using Layered Queueing Network (LQN). We build two different LQN models to represent the data parallel computing of these functions and calibrate both models using monitored performance data. The output of both models produces converging results that are within ~10% of observed performance. From our modeling experience, we further discuss the issues of modeling Hadoop architecture using LQN in general and describe our future work.


Author Tags

  1. layered queueing network
  2. mapreduce
  3. performance modeling


