HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters

HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters 02/28/11 Xiao Qin Department of Computer Science and Software Engineering Auburn University http://www.eng.auburn.edu/~xqin [email_address] Slides 2-20 are adapted from notes by Subbarao Kambhampati (ASU), Dan Weld (U. Washington), Jeff Dean, Sanjay Ghemawat, (Google, Inc.)

Motivation Large-Scale Data Processing Want to use 1000s of CPUs But don ’ t want hassle of managing things MapReduce provides Automatic parallelization & distribution Fault tolerance I/O scheduling Monitoring & status updates

Map/Reduce Map/Reduce Programming model from Lisp (and other functional languages) Many problems can be phrased this way Easy to distribute across nodes Nice retry/failure semantics

Distributed Grep Very big data Split data Split data Split data Split data grep grep grep grep matches matches matches matches cat All matches

Distributed Word Count Very big data Split data Split data Split data Split data count count count count count count count count merge merged count

Map Reduce Map: Accepts input key/value pair Emits intermediate key/value pair Reduce : Accepts intermediate key/value* pair Emits output key/value pair Very big data Result M A P R E D U C E Partitioning Function

Map in Lisp (Scheme) (map f list [list 2 list 3 … ] ) (map square ‘ (1 2 3 4)) (1 4 9 16) (reduce + ‘ (1 4 9 16)) (+ 16 (+ 9 (+ 4 1) ) ) 30 (reduce + (map square (map – l 1 l 2 )))) Unary operator Binary operator

Map/Reduce ala Google map(key, val) is run on each item in set emits new-key / new-val pairs reduce(key, vals) is run for each unique key emitted by map() emits final output

count words in docs Input consists of (url, contents) pairs map(key=url, val=contents): For each word w in contents, emit (w, “ 1 ” ) reduce(key=word, values=uniq_counts): Sum all “ 1 ” s in values list Emit result “ (word, sum) ”

Count, Illustrated map(key=url, val=contents): For each word w in contents, emit (w, “ 1 ” ) reduce(key=word, values=uniq_counts): Sum all “ 1 ” s in values list Emit result “ (word, sum) ” see bob throw see spot run see 1 bob 1 run 1 see 1 spot 1 throw 1 bob 1 Run 1 see 2 spot 1 throw 1

Grep Input consists of (url+offset, single line) map(key=url+offset, val=line): If contents matches regexp, emit (line, “ 1 ” ) reduce(key=line, values=uniq_counts): Don ’ t do anything; just emit line

Reverse Web-Link Graph Map For each URL linking to target, … Output <target, source> pairs Reduce Concatenate list of all source URLs Outputs: <target, list (source)> pairs

Model is Widely Applicable MapReduce Programs In Google Source Tree Example uses: distributed grep distributed sort web link-graph reversal term-vector / host web access log stats inverted index construction document clustering machine learning statistical machine translation ... ... ...

Implementation Overview Typical cluster: 100s/1000s of 2-CPU x86 machines, 2-4 GB of memory Limited bisection bandwidth Storage is on local IDE disks GFS: distributed file system manages data (SOSP'03) Job scheduling system: jobs made up of tasks, scheduler assigns tasks to machines Implementation is a C++ library linked into user programs

Execution How is this distributed? Partition input key/value pairs into chunks, run map() tasks in parallel After all map()s are complete, consolidate all emitted values for each unique emitted key Now partition space of output map keys, and run reduce() in parallel If map() or reduce() fails, reexecute!

Job Processing JobTracker TaskTracker 0 TaskTracker 1 TaskTracker 2 TaskTracker 3 TaskTracker 4 TaskTracker 5 Client submits “grep” job, indicating code and input files JobTracker breaks input file into k chunks, (in this case 6). Assigns work to ttrackers. After map(), tasktrackers exchange map-output to build reduce() keyspace JobTracker breaks reduce() keyspace into m chunks (in this case 6). Assigns work. reduce() output may go to NDFS “ grep”

Task Granularity & Pipelining Fine granularity tasks: map tasks >> machines Minimizes time for fault recovery Can pipeline shuffling with map execution Better dynamic load balancing Often use 200,000 map & 5000 reduce tasks Running on 2000 machines

MapReduce outside Google Hadoop (Java) Emulates MapReduce and GFS The architecture of Hadoop MapReduce and DFS is master/slave Master Slave MapReduce jobtracker tasktracker DFS namenode datanode

Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters Download Software at: http://www.eng.auburn.edu/~xqin/software/hdfs-hc This HDFS-HC tool was described in our paper - Improving MapReduce Performance via Data Placement in Heterogeneous Hadoop Clusters - by J. Xie, S. Yin, X.-J. Ruan, Z.-Y. Ding, Y. Tian, J. Majors, and X. Qin, published in Proc. 19th Int'l Heterogeneity in Computing Workshop, Atlanta, Georgia, April 2010.

Hadoop Overview (J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI ’04, pages 137–150, 2008)

One time setup set hadoop-site.xml and slaves Initiate namenode Run Hadoop MapReduce and DFS Upload your data to DFS Run your process… Download your data from DFS

Hadoop Distributed File System (http://lucene.apache.org/hadoop)

Motivational Example Time (min) Node A (fast) Node B (slow) Node C (slowest) 2x slower 3x slower 1 task/min

The Native Strategy Node A Node B Node C 3 tasks 2 tasks 6 tasks Loading Transferring Processing Time (min)

Our Solution --Reducing data transfer time Node A’ Node B’ Node C’ 3 tasks 2 tasks 6 tasks Loading Transferring Processing Time (min) Node A

Preliminary Results Impact of data placement on performance of grep

Challenges Does computing ratio depend on the application? Initial data distribution Data skew problem New data arrival Data deletion New joining node Data updating

Measure Computing Ratios Computing ratio Fast machines process large data sets Time Node A Node B Node C 2x slower 3x slower 1 task/min

Steps to Measure Computing Ratios 1. Run the application on each node with the same size data, individually collect the response time 2. Set the ratio of the shortest response as 1, accordingly set the ratio of other nodes 3.Caculate the least common multiple of these ratios 4. Count the portion of each node Node Response time(s) Ratio # of File Fragments Speed Node A 10 1 6 Fastest Node B 20 2 3 Average Node C 30 3 2 Slowest

Initial Data Distribution Namenode Datanodes File1 6 c Input files split into 64MB blocks Round-robin data distribution algorithm C B A Portion 3:2:1 1 2 3 4 5 7 8 9 a b

Data Redistribution 1.Get network topology, the ratio and utilization 2.Build and sort two lists: under-utilized node list  L1 over-utilized node list  L2 3. Select the source and destination node from the lists. 4.Transfer data 5.Repeat step 3, 4 until the list is empty. 1 Namenode 1 2 3 4 5 6 7 8 9 a b c C A C B A B 2 3 4 L1 L2 Portion 3:2:1

Sharing Files among Multiple Applications The computing ratio depends on data-intensive applications. Redistribution Redundancy

Experimental Environment Five nodes in a hadoop heterogeneous cluster Node CPU Model CPU(Hz) L1 Cache(KB) Node A Intel core 2 Duo 2*1G=2G 204 Node B Intel Celeron 2.8G 256 Node C Intel Pentium 3 1.2G 256 Node D Intel Pentium 3 1.2G 256 Node E Intel Pentium 3 1.2G 256

Grep and WordCount Grep is a tool searching for a regular expression in a text file WordCount is a program used to count words in a text file

Computing ratio for two applications Computing ratio of the five nodes with respective of Grep and Wordcount applications Computing Node Ratios for Grep Ratios for Wordcount Node A 1 1 Node B 2 2 Node C 3.3 5 Node D 3.3 5 Node E 3.3 5

Response time of Grep and wordcount in each Node Application dependence Data size independence

Impact of data placement on performance of Grep

Impact of data placement on performance of WordCount

Conclusion Identify the performance degradation caused by heterogeneity. Designed and implemented a data placement mechanism in HDFS.

Future Work Data redundancy issue Dynamic data distribution mechanism Prefetching

Fellowship Program Samuel Ginn College of Engineering at Auburn University Dean's Fellowship: $32,000 per year plus tuition fellowship College Fellowship: $24,000 per year plus tuition fellowship Departmental Fellowship: $20,000 per year plus tuition fellowship. Tuition Fellowships: Tuition Fellowships provide a full tuition waiver for a student with a 25 percent or greater full-time-equivalent (FTE) assignment. Both graduate research assistants (GRAs) and graduate teaching assistants (GTAs) are eligible.

http://www.eng.auburn.edu/programs/grad-school/fellowship-program/

Download the presentation slides http://www.slideshare.net/xqin74 Google: slideshare Xiao Qin

HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters

More Related Content

HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters

Editor's Notes