Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
28 views

Map Reduce Algorithm

The MapReduce algorithm allows for distributed processing of large datasets across clusters of computers. It works in two phases: 1. The map phase where the input data is processed key-value pair by key-value pair, possibly converting or filtering the values, to generate a set of intermediate key-value pairs. 2. The reduce phase where all intermediate values with the same key are grouped together and passed to the reduce function to produce the final output, stored back in the distributed file system. The MapReduce framework implemented in Hadoop provides a scalable solution for processing vast amounts of structured and unstructured data stored in HDFS in a parallel and distributed manner.

Uploaded by

Leela Rallapudi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Map Reduce Algorithm

The MapReduce algorithm allows for distributed processing of large datasets across clusters of computers. It works in two phases: 1. The map phase where the input data is processed key-value pair by key-value pair, possibly converting or filtering the values, to generate a set of intermediate key-value pairs. 2. The reduce phase where all intermediate values with the same key are grouped together and passed to the reduce function to produce the final output, stored back in the distributed file system. The MapReduce framework implemented in Hadoop provides a scalable solution for processing vast amounts of structured and unstructured data stored in HDFS in a parallel and distributed manner.

Uploaded by

Leela Rallapudi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Map Reduce Algorithm

Map Reduce:
Hadoop MapReduce is the core Hadoop ecosystem component which provides
data processing. MapReduce is a software framework for easily writing
applications that process the vast amount of structured and unstructured data
stored in the Hadoop Distributed File system.
MapReduce framework works on the data that is stored in
1.Hadoop Distributed File System (HDFS)

2.Google File System (GFS)

Map reduce Analogy:


 Consider the problem of counting the number of occurrences of each word
in alarge collection of documents
 How would you do it in parallel?
Solution:
 Divide documents among workers
 Each worker parses document to find all words, outputs (word, count) pairs
 Partition (word, count) pairs across workers based on word
 For each word at a worker, locally add up counts
How map reduce do it?
 100 files with daily temperature in two cities. Each file has 10,000 entries.
 For example, one file may have (Toronto 20), (New York 30),
 Our goal is to compute the maximum temperature in the two cities.
 Assign the task to 100 Map processors each works on one file.Each
processor outputs a list ofkey-value pairs, e.g., (Toronto 30), (New York
65), …
 Now we have 100 lists each with two elements. We give this list to two
reducers – one forToronto and another for New York.
 The reducer produce the final answer: (Toronto 55), (New York 65)
Working Of Map reduce:
 MapReduce works by breaking the data processing into two phases:
1.Map phase
2.Reduce phase.
Map Phase − The map or mapper’s job is to process the input data. Generally the
input data is in the form of file or directory and is stored in the Hadoop file system
(HDFS). The input file is passed to themapper function line by line. The mapper
processes the data and creates several small chunks of data.
Reduce Phase − The Reducer’s job is to process the data that comes from the
mapper. After processing,it produces a new set of output, which will be stored in
the HDFS.

Keys and Values:


 The programmer in MapReduce has to specify two functions, the map
function and the reduce function thatimplement the Mapper and the
Reducer in a MapReduce program
 In MapReduce data elements are always structured as key-value (i.e., (K, V))
pairs
 The map and reduce functions receive and emit (K, V) pairs

Input Splits Intermediate Outputs Final Outputs

Map Reduce
(K, V) Functio (K’’, V’’)
Functio (K’, V’)
Pairs n Pairs
n
Pairs

Anatomy of MapReduce:

Input Output
Map <k1, v1> list (<k2, v2>)
Reduce <k2, list(v2)> list (<k3, v3>)
How MapReduce works:
The complete execution process (execution of Map and Reduce tasks, both) is
controlled by two types of entities called a
Jobtracker: Acts like a master (responsible for complete execution of submitted
job)
Multiple Task Trackers: Acts like slaves, each of them performing the job
For every job submitted for execution in the system, there is one Jobtracker that
resideson Namenode and there are multiple tasktrackers which reside on
Datanode.
Examples Of Map Reduce:

You might also like