Supporting mapreduce on large-scale asymmetric multi-core clusters

MM Rafique, B Rose, AR Butt… - ACM SIGOPS Operating …, 2009 - dl.acm.org
ACM SIGOPS Operating Systems Review, 2009dl.acm.org
Asymmetric multi-core processors (AMPs) with general-purpose and specialized cores
packaged on the same chip, are emerging as a leading paradigm for high-end computing. A
large body of existing research explores the use of standalone AMPs in computationally
challenging and data-intensive applications. AMPs are rapidly deployed as high-
performance accelerators on clusters. In these settings, scheduling, communication and I/O
are managed by generalpurpose processors (GPPs), while computation is off-loaded to …
Asymmetric multi-core processors (AMPs) with general-purpose and specialized cores packaged on the same chip, are emerging as a leading paradigm for high-end computing. A large body of existing research explores the use of standalone AMPs in computationally challenging and data-intensive applications. AMPs are rapidly deployed as high-performance accelerators on clusters. In these settings, scheduling, communication and I/O are managed by generalpurpose processors (GPPs), while computation is off-loaded to AMPs. Design space exploration for the configuration and software stack of hybrid clusters of AMPs and GPPs is an open problem. In this paper, we explore this design space in an implementation of the popular MapReduce programming model. Our contributions are: An exploration of various design alternatives for hybrid asymmetric clusters of AMPs and GPPs; the adoption of a streaming approach to supporting MapReduce computations on clusters with asymmetric components; and adaptive schedulers that take into account individual component capabilities in asymmetric clusters. Throughout our design, we remove I/O bottlenecks, using double-buffering and asynchronous I/O. We present an evaluation of the design choices through experiments on a real cluster with MapReduce workloads of varying degrees of computation intensity. We find that in a cluster with resource-constrained and well-provisioned AMP accelerators, a streaming approach achieves 50.5% and 73.1% better performance compared to the non-streaming approach, respectively, and scales almost linearly with increasing number of compute nodes.We also show that our dynamic scheduling mechanisms adapt effectively the parameters of the scheduling policies between applications with different computation density.
ACM Digital Library