Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters 02/28/11 Xiao Qin Department of Computer Science and Software Engineering Auburn University http://www.eng.auburn.edu/~xqin [email_address] Slides 2-20 are adapted from notes by Subbarao Kambhampati (ASU), Dan Weld (U. Washington), Jeff Dean, Sanjay Ghemawat, (Google, Inc.)
Motivation Large-Scale Data Processing Want to use 1000s of CPUs But don ’ t want hassle of  managing  things MapReduce provides Automatic parallelization & distribution Fault tolerance I/O scheduling Monitoring & status updates
Map/Reduce Map/Reduce  Programming model from Lisp  (and other functional languages) Many problems can be phrased this way Easy to distribute across nodes Nice retry/failure semantics
Distributed Grep Very  big data Split data Split data Split data Split data grep grep grep grep matches matches matches matches cat All matches
Distributed Word Count Very  big data Split data Split data Split data Split data count count count count count count count count merge merged count
Map Reduce Map: Accepts  input  key/value pair Emits  intermediate  key/value pair Reduce : Accepts  intermediate  key/value* pair Emits  output  key/value pair Very  big data Result M A P R E D U C E Partitioning Function
Map in Lisp (Scheme) (map  f   list [list 2  list 3   … ] ) (map square  ‘ (1 2 3 4)) (1 4 9 16) (reduce +  ‘ (1 4 9 16)) (+ 16 (+ 9 (+ 4 1) ) ) 30 (reduce + (map square (map  –  l 1  l 2 )))) Unary operator Binary operator
Map/Reduce ala Google map(key, val)  is run on each item in set emits new-key / new-val pairs reduce(key, vals)   is run for each unique key emitted by  map() emits final output
count words in docs Input consists of (url, contents) pairs map(key=url, val=contents): For each word  w  in contents, emit (w,  “ 1 ” ) reduce(key=word, values=uniq_counts): Sum all  “ 1 ” s in values list Emit result  “ (word, sum) ”
Count,  Illustrated map(key=url, val=contents): For each word  w  in contents, emit (w,  “ 1 ” ) reduce(key=word, values=uniq_counts): Sum all  “ 1 ” s in values list Emit result  “ (word, sum) ” see bob throw see spot run see 1 bob 1  run   1 see  1 spot  1 throw 1 bob 1  Run  1 see  2 spot  1 throw 1
Grep Input consists of (url+offset, single line) map(key=url+offset, val=line): If contents matches regexp, emit (line,  “ 1 ” ) reduce(key=line, values=uniq_counts): Don ’ t do anything; just emit line
Reverse Web-Link Graph Map For each URL linking to target,  … Output <target, source> pairs  Reduce Concatenate list of all source URLs Outputs: <target,  list  (source)> pairs
Model is Widely Applicable MapReduce Programs In Google Source Tree  Example uses:  distributed grep   distributed sort    web link-graph reversal  term-vector / host web access log stats  inverted index construction  document clustering  machine learning  statistical machine translation  ...  ...  ...
Implementation Overview Typical cluster:   100s/1000s of 2-CPU x86 machines, 2-4 GB of memory  Limited bisection bandwidth  Storage is on local IDE disks  GFS: distributed file system manages data (SOSP'03)  Job scheduling system: jobs made up of tasks,     scheduler assigns tasks to machines  Implementation is a C++ library linked into user programs
Execution How is this distributed? Partition input key/value pairs into chunks, run map() tasks in parallel After all map()s are complete, consolidate all emitted values for each unique emitted key Now partition space of output map keys, and run reduce() in parallel If map() or reduce() fails, reexecute!
Job Processing JobTracker TaskTracker 0 TaskTracker 1 TaskTracker 2 TaskTracker 3 TaskTracker 4 TaskTracker 5 Client submits “grep” job, indicating code and input files JobTracker breaks input file into  k  chunks, (in this case 6).  Assigns work to ttrackers. After map(), tasktrackers exchange map-output to build reduce() keyspace JobTracker breaks reduce() keyspace into  m  chunks (in this case 6). Assigns work. reduce() output may go to NDFS “ grep”
Execution
Parallel Execution
Task Granularity & Pipelining Fine granularity tasks:  map tasks >> machines Minimizes time for fault recovery Can pipeline shuffling with map execution Better dynamic load balancing  Often use 200,000 map  &  5000 reduce tasks  Running on 2000 machines
MapReduce outside Google Hadoop (Java) Emulates MapReduce and GFS The architecture of Hadoop MapReduce and DFS is master/slave Master Slave MapReduce jobtracker tasktracker DFS namenode datanode
Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters Download Software at:   http://www.eng.auburn.edu/~xqin/software/hdfs-hc This HDFS-HC tool was described in our paper  -  Improving  MapReduce  Performance via Data Placement in Heterogeneous  Hadoop  Clusters  - by J. Xie, S. Yin, X.-J. Ruan, Z.-Y. Ding, Y. Tian, J. Majors, and X. Qin, published in Proc. 19th Int'l Heterogeneity in Computing Workshop, Atlanta, Georgia, April 2010.
Hadoop Overview (J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters.  OSDI ’04, pages 137–150, 2008)
One time setup set  hadoop-site.xml  and  slaves Initiate namenode Run Hadoop MapReduce and DFS Upload your data to DFS Run your process… Download your data from DFS
Hadoop Distributed File System (http://lucene.apache.org/hadoop)
Motivational Example Time (min) Node A (fast) Node B (slow) Node C (slowest) 2x slower 3x slower 1 task/min
The Native Strategy Node A Node B Node C 3 tasks 2 tasks 6 tasks Loading Transferring  Processing  Time (min)
Our Solution --Reducing data transfer time Node A’ Node B’ Node C’ 3 tasks 2 tasks 6 tasks Loading Transferring  Processing  Time (min) Node A
Preliminary Results Impact of data placement on performance of grep
Challenges    Does computing ratio depend on the application? Initial data distribution Data skew problem New data arrival Data deletion    New joining node Data updating
Measure Computing Ratios Computing ratio Fast machines process large data sets Time  Node A Node B Node C 2x slower 3x slower 1 task/min
Steps to Measure Computing Ratios 1. Run the application on each node with the same size data, individually collect the response time 2. Set the ratio of the shortest response as 1, accordingly set the ratio of other nodes  3.Caculate the least common multiple of these ratios 4. Count the portion of each node Node Response time(s) Ratio # of File Fragments Speed Node A 10 1 6 Fastest Node B 20 2 3 Average Node C 30 3 2 Slowest
Initial Data Distribution Namenode Datanodes File1 6 c Input files split into 64MB  blocks Round-robin data distribution algorithm C B A Portion   3:2:1 1 2 3 4 5 7 8 9 a b
Data Redistribution 1.Get network topology, the ratio and utilization 2.Build and sort two lists: under-utilized node list    L1 over-utilized node list    L2 3. Select the source and destination node from the lists. 4.Transfer data 5.Repeat step 3, 4 until the list is empty. 1 Namenode 1 2 3 4 5 6 7 8 9 a b c C A C B A B 2 3 4 L1 L2 Portion   3:2:1
Sharing Files among  Multiple Applications The computing ratio depends on data-intensive applications. Redistribution Redundancy
Experimental Environment Five nodes in a hadoop heterogeneous cluster Node CPU Model CPU(Hz) L1 Cache(KB) Node A Intel core 2 Duo 2*1G=2G 204 Node B Intel Celeron 2.8G 256 Node C Intel Pentium 3 1.2G 256 Node D Intel Pentium 3 1.2G 256 Node  E Intel Pentium 3 1.2G 256
Grep and WordCount Grep is a tool searching for a regular expression in a text file WordCount is a program used to count words in a text file
Computing ratio for two applications Computing ratio of the five nodes with respective of Grep and Wordcount applications Computing Node Ratios for Grep Ratios for Wordcount Node A 1 1 Node B 2 2 Node C 3.3 5 Node D 3.3 5 Node E 3.3 5
Response time of Grep and wordcount in each Node Application dependence Data size independence
Six Data Placement Decisions
Impact of data placement on performance of Grep
Impact of data placement on performance of WordCount
Conclusion Identify the performance degradation caused by heterogeneity. Designed and implemented a data placement mechanism in HDFS.
Future Work Data redundancy issue Dynamic data distribution mechanism Prefetching
Fellowship Program  Samuel Ginn College of Engineering at Auburn University Dean's Fellowship: $32,000 per year plus tuition fellowship College Fellowship: $24,000 per year plus tuition fellowship Departmental Fellowship: $20,000 per year plus tuition fellowship. Tuition Fellowships: Tuition Fellowships provide a full tuition waiver for a student with a 25 percent or greater full-time-equivalent (FTE) assignment. Both graduate research assistants (GRAs) and graduate teaching assistants (GTAs) are eligible.
http://www.eng.auburn.edu/programs/grad-school/fellowship-program/
http://www.eng.auburn.edu
http://www.eng.auburn.edu
http://www.eng.auburn.edu
Download the presentation slides http://www.slideshare.net/xqin74 Google:  slideshare Xiao Qin
Questions

More Related Content

HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters

  • 1. HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters 02/28/11 Xiao Qin Department of Computer Science and Software Engineering Auburn University http://www.eng.auburn.edu/~xqin [email_address] Slides 2-20 are adapted from notes by Subbarao Kambhampati (ASU), Dan Weld (U. Washington), Jeff Dean, Sanjay Ghemawat, (Google, Inc.)
  • 2. Motivation Large-Scale Data Processing Want to use 1000s of CPUs But don ’ t want hassle of managing things MapReduce provides Automatic parallelization & distribution Fault tolerance I/O scheduling Monitoring & status updates
  • 3. Map/Reduce Map/Reduce Programming model from Lisp (and other functional languages) Many problems can be phrased this way Easy to distribute across nodes Nice retry/failure semantics
  • 4. Distributed Grep Very big data Split data Split data Split data Split data grep grep grep grep matches matches matches matches cat All matches
  • 5. Distributed Word Count Very big data Split data Split data Split data Split data count count count count count count count count merge merged count
  • 6. Map Reduce Map: Accepts input key/value pair Emits intermediate key/value pair Reduce : Accepts intermediate key/value* pair Emits output key/value pair Very big data Result M A P R E D U C E Partitioning Function
  • 7. Map in Lisp (Scheme) (map f list [list 2 list 3 … ] ) (map square ‘ (1 2 3 4)) (1 4 9 16) (reduce + ‘ (1 4 9 16)) (+ 16 (+ 9 (+ 4 1) ) ) 30 (reduce + (map square (map – l 1 l 2 )))) Unary operator Binary operator
  • 8. Map/Reduce ala Google map(key, val) is run on each item in set emits new-key / new-val pairs reduce(key, vals) is run for each unique key emitted by map() emits final output
  • 9. count words in docs Input consists of (url, contents) pairs map(key=url, val=contents): For each word w in contents, emit (w, “ 1 ” ) reduce(key=word, values=uniq_counts): Sum all “ 1 ” s in values list Emit result “ (word, sum) ”
  • 10. Count, Illustrated map(key=url, val=contents): For each word w in contents, emit (w, “ 1 ” ) reduce(key=word, values=uniq_counts): Sum all “ 1 ” s in values list Emit result “ (word, sum) ” see bob throw see spot run see 1 bob 1 run 1 see 1 spot 1 throw 1 bob 1 Run 1 see 2 spot 1 throw 1
  • 11. Grep Input consists of (url+offset, single line) map(key=url+offset, val=line): If contents matches regexp, emit (line, “ 1 ” ) reduce(key=line, values=uniq_counts): Don ’ t do anything; just emit line
  • 12. Reverse Web-Link Graph Map For each URL linking to target, … Output <target, source> pairs Reduce Concatenate list of all source URLs Outputs: <target, list (source)> pairs
  • 13. Model is Widely Applicable MapReduce Programs In Google Source Tree Example uses: distributed grep   distributed sort   web link-graph reversal term-vector / host web access log stats inverted index construction document clustering machine learning statistical machine translation ... ... ...
  • 14. Implementation Overview Typical cluster: 100s/1000s of 2-CPU x86 machines, 2-4 GB of memory Limited bisection bandwidth Storage is on local IDE disks GFS: distributed file system manages data (SOSP'03) Job scheduling system: jobs made up of tasks, scheduler assigns tasks to machines Implementation is a C++ library linked into user programs
  • 15. Execution How is this distributed? Partition input key/value pairs into chunks, run map() tasks in parallel After all map()s are complete, consolidate all emitted values for each unique emitted key Now partition space of output map keys, and run reduce() in parallel If map() or reduce() fails, reexecute!
  • 16. Job Processing JobTracker TaskTracker 0 TaskTracker 1 TaskTracker 2 TaskTracker 3 TaskTracker 4 TaskTracker 5 Client submits “grep” job, indicating code and input files JobTracker breaks input file into k chunks, (in this case 6). Assigns work to ttrackers. After map(), tasktrackers exchange map-output to build reduce() keyspace JobTracker breaks reduce() keyspace into m chunks (in this case 6). Assigns work. reduce() output may go to NDFS “ grep”
  • 19. Task Granularity & Pipelining Fine granularity tasks: map tasks >> machines Minimizes time for fault recovery Can pipeline shuffling with map execution Better dynamic load balancing Often use 200,000 map & 5000 reduce tasks Running on 2000 machines
  • 20. MapReduce outside Google Hadoop (Java) Emulates MapReduce and GFS The architecture of Hadoop MapReduce and DFS is master/slave Master Slave MapReduce jobtracker tasktracker DFS namenode datanode
  • 21. Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters Download Software at: http://www.eng.auburn.edu/~xqin/software/hdfs-hc This HDFS-HC tool was described in our paper -  Improving MapReduce Performance via Data Placement in Heterogeneous Hadoop Clusters  - by J. Xie, S. Yin, X.-J. Ruan, Z.-Y. Ding, Y. Tian, J. Majors, and X. Qin, published in Proc. 19th Int'l Heterogeneity in Computing Workshop, Atlanta, Georgia, April 2010.
  • 22. Hadoop Overview (J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI ’04, pages 137–150, 2008)
  • 23. One time setup set hadoop-site.xml and slaves Initiate namenode Run Hadoop MapReduce and DFS Upload your data to DFS Run your process… Download your data from DFS
  • 24. Hadoop Distributed File System (http://lucene.apache.org/hadoop)
  • 25. Motivational Example Time (min) Node A (fast) Node B (slow) Node C (slowest) 2x slower 3x slower 1 task/min
  • 26. The Native Strategy Node A Node B Node C 3 tasks 2 tasks 6 tasks Loading Transferring Processing Time (min)
  • 27. Our Solution --Reducing data transfer time Node A’ Node B’ Node C’ 3 tasks 2 tasks 6 tasks Loading Transferring Processing Time (min) Node A
  • 28. Preliminary Results Impact of data placement on performance of grep
  • 29. Challenges   Does computing ratio depend on the application? Initial data distribution Data skew problem New data arrival Data deletion   New joining node Data updating
  • 30. Measure Computing Ratios Computing ratio Fast machines process large data sets Time Node A Node B Node C 2x slower 3x slower 1 task/min
  • 31. Steps to Measure Computing Ratios 1. Run the application on each node with the same size data, individually collect the response time 2. Set the ratio of the shortest response as 1, accordingly set the ratio of other nodes 3.Caculate the least common multiple of these ratios 4. Count the portion of each node Node Response time(s) Ratio # of File Fragments Speed Node A 10 1 6 Fastest Node B 20 2 3 Average Node C 30 3 2 Slowest
  • 32. Initial Data Distribution Namenode Datanodes File1 6 c Input files split into 64MB blocks Round-robin data distribution algorithm C B A Portion 3:2:1 1 2 3 4 5 7 8 9 a b
  • 33. Data Redistribution 1.Get network topology, the ratio and utilization 2.Build and sort two lists: under-utilized node list  L1 over-utilized node list  L2 3. Select the source and destination node from the lists. 4.Transfer data 5.Repeat step 3, 4 until the list is empty. 1 Namenode 1 2 3 4 5 6 7 8 9 a b c C A C B A B 2 3 4 L1 L2 Portion 3:2:1
  • 34. Sharing Files among Multiple Applications The computing ratio depends on data-intensive applications. Redistribution Redundancy
  • 35. Experimental Environment Five nodes in a hadoop heterogeneous cluster Node CPU Model CPU(Hz) L1 Cache(KB) Node A Intel core 2 Duo 2*1G=2G 204 Node B Intel Celeron 2.8G 256 Node C Intel Pentium 3 1.2G 256 Node D Intel Pentium 3 1.2G 256 Node E Intel Pentium 3 1.2G 256
  • 36. Grep and WordCount Grep is a tool searching for a regular expression in a text file WordCount is a program used to count words in a text file
  • 37. Computing ratio for two applications Computing ratio of the five nodes with respective of Grep and Wordcount applications Computing Node Ratios for Grep Ratios for Wordcount Node A 1 1 Node B 2 2 Node C 3.3 5 Node D 3.3 5 Node E 3.3 5
  • 38. Response time of Grep and wordcount in each Node Application dependence Data size independence
  • 39. Six Data Placement Decisions
  • 40. Impact of data placement on performance of Grep
  • 41. Impact of data placement on performance of WordCount
  • 42. Conclusion Identify the performance degradation caused by heterogeneity. Designed and implemented a data placement mechanism in HDFS.
  • 43. Future Work Data redundancy issue Dynamic data distribution mechanism Prefetching
  • 44. Fellowship Program Samuel Ginn College of Engineering at Auburn University Dean's Fellowship: $32,000 per year plus tuition fellowship College Fellowship: $24,000 per year plus tuition fellowship Departmental Fellowship: $20,000 per year plus tuition fellowship. Tuition Fellowships: Tuition Fellowships provide a full tuition waiver for a student with a 25 percent or greater full-time-equivalent (FTE) assignment. Both graduate research assistants (GRAs) and graduate teaching assistants (GTAs) are eligible.
  • 49. Download the presentation slides http://www.slideshare.net/xqin74 Google: slideshare Xiao Qin

Editor's Notes

  1. Cite here What is the hadoop? http://www.cloudera.com/what-is-hadoop/ Hadoop is an open-source project administered by the Apache Software Foundation . Hadoop’s contributors work for some of the world’s biggest technology companies. That diverse, motivated community has produced a genuinely innovative platform for consolidating, combining and understanding data. Technically, Hadoop consists of two key services: reliable data storage using the Hadoop Distributed File System (HDFS) and high-performance parallel data processing using a technique called MapReduce. Hadoop runs on a collection of commodity, shared-nothing servers. You can add or remove servers in a Hadoop cluster at will; the system detects and compensates for hardware or system problems on any server. Hadoop, in other words, is self-healing. It can deliver data — and can run large-scale, high-performance processing jobs — in spite of system changes or failures.
  2. Slow down, add animation of speculative task helping Note: Mention that we tested this on a heterogeneous Hadoop cluster
  3. Copy page 7 to here Note: Please add legend on this diagram. E.g., black bars represent ******, red bars represent ***** Show data movement (migration)
  4. Real results for the aforementioned motivational example.
  5. Note: Explain what is the definition of computing ratio
  6. Note: Name the over-utilized node list and the under-utilized node list on the diagram. a-&gt;A, b-&gt;B
  7. The heterogeneity measurement of a cluster depends on data-intensive applications. If multiple MapReduce applications must process the same input file, the data placement mechanism may need to distribute the input file’s fragments in several ways - one for each MapReduce application. In the case where multiple applications are similar in terms of data processing speed, one data placement decision may fit the needs of all the application
  8. Note: improve the resolution of the figures
  9. Title Application dependence Independenced of Data size Note 1: improve the quality of the figures. Large figures and large legend Note 2:
  10. number