Map Reduce Algorithm
Map Reduce Algorithm
Map Reduce:
Hadoop MapReduce is the core Hadoop ecosystem component which provides
data processing. MapReduce is a software framework for easily writing
applications that process the vast amount of structured and unstructured data
stored in the Hadoop Distributed File system.
MapReduce framework works on the data that is stored in
1.Hadoop Distributed File System (HDFS)
Map Reduce
(K, V) Functio (K’’, V’’)
Functio (K’, V’)
Pairs n Pairs
n
Pairs
Anatomy of MapReduce:
Input Output
Map <k1, v1> list (<k2, v2>)
Reduce <k2, list(v2)> list (<k3, v3>)
How MapReduce works:
The complete execution process (execution of Map and Reduce tasks, both) is
controlled by two types of entities called a
Jobtracker: Acts like a master (responsible for complete execution of submitted
job)
Multiple Task Trackers: Acts like slaves, each of them performing the job
For every job submitted for execution in the system, there is one Jobtracker that
resideson Namenode and there are multiple tasktrackers which reside on
Datanode.
Examples Of Map Reduce: