Map Reduce Algorithm

The MapReduce algorithm allows for distributed processing of large datasets across clusters of computers. It works in two phases: 1. The map phase where the input data is processed key-value pair by key-value pair, possibly converting or filtering the values, to generate a set of intermediate key-value pairs. 2. The reduce phase where all intermediate values with the same key are grouped together and passed to the reduce function to produce the final output, stored back in the distributed file system. The MapReduce framework implemented in Hadoop provides a scalable solution for processing vast amounts of structured and unstructured data stored in HDFS in a parallel and distributed manner.

Uploaded by

Leela Rallapudi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views

Map Reduce Algorithm

Uploaded by

Leela Rallapudi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Map Reduce Algorithm

Map Reduce:
Hadoop MapReduce is the core Hadoop ecosystem component which provides
data processing. MapReduce is a software framework for easily writing
applications that process the vast amount of structured and unstructured data
stored in the Hadoop Distributed File system.
MapReduce framework works on the data that is stored in
1.Hadoop Distributed File System (HDFS)

2.Google File System (GFS)

Map reduce Analogy:

 Consider the problem of counting the number of occurrences of each word
in alarge collection of documents
 How would you do it in parallel?
Solution:
 Divide documents among workers
 Each worker parses document to find all words, outputs (word, count) pairs
 Partition (word, count) pairs across workers based on word
 For each word at a worker, locally add up counts
How map reduce do it?
 100 files with daily temperature in two cities. Each file has 10,000 entries.
 For example, one file may have (Toronto 20), (New York 30),
 Our goal is to compute the maximum temperature in the two cities.
 Assign the task to 100 Map processors each works on one file.Each
processor outputs a list ofkey-value pairs, e.g., (Toronto 30), (New York
65), …
 Now we have 100 lists each with two elements. We give this list to two
reducers – one forToronto and another for New York.
 The reducer produce the final answer: (Toronto 55), (New York 65)
Working Of Map reduce:
 MapReduce works by breaking the data processing into two phases:
1.Map phase
2.Reduce phase.
Map Phase − The map or mapper’s job is to process the input data. Generally the
input data is in the form of file or directory and is stored in the Hadoop file system
(HDFS). The input file is passed to themapper function line by line. The mapper
processes the data and creates several small chunks of data.
Reduce Phase − The Reducer’s job is to process the data that comes from the
mapper. After processing,it produces a new set of output, which will be stored in
the HDFS.

Keys and Values:

 The programmer in MapReduce has to specify two functions, the map
function and the reduce function thatimplement the Mapper and the
Reducer in a MapReduce program
 In MapReduce data elements are always structured as key-value (i.e., (K, V))
pairs
 The map and reduce functions receive and emit (K, V) pairs

Input Splits Intermediate Outputs Final Outputs

Map Reduce
(K, V) Functio (K’’, V’’)
Functio (K’, V’)
Pairs n Pairs
n
Pairs

Anatomy of MapReduce:

Input Output
Map <k1, v1> list (<k2, v2>)
Reduce <k2, list(v2)> list (<k3, v3>)
How MapReduce works:
The complete execution process (execution of Map and Reduce tasks, both) is
controlled by two types of entities called a
Jobtracker: Acts like a master (responsible for complete execution of submitted
job)
Multiple Task Trackers: Acts like slaves, each of them performing the job
For every job submitted for execution in the system, there is one Jobtracker that
resideson Namenode and there are multiple tasktrackers which reside on
Datanode.
Examples Of Map Reduce:

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
The Hold Me Tight Workbook - Dr. Sue Johnson
100% (16)
The Hold Me Tight Workbook - Dr. Sue Johnson
187 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Cheat Code To The Universe
94% (77)
Cheat Code To The Universe
34 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
Shortcut To Shred Ebook Revised 9-9-2015 PDF
86% (7)
Shortcut To Shred Ebook Revised 9-9-2015 PDF
15 pages
Anastasia: The New Broadway Musical (LIBRETTO)
94% (174)
Anastasia: The New Broadway Musical (LIBRETTO)
117 pages
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
94% (212)
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
212 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (541)
How To Develop and Write A Grant Proposal
17 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (28)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
75% (12)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
78% (27)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
Singer's Anthology Master Song and Show Index 2008 PDF
37% (43)
Singer's Anthology Master Song and Show Index 2008 PDF
38 pages
(Psilocybin) How To Grow Magic Mushrooms A Simple Psilocybe Cubensis Growing Technique PDF
75% (8)
(Psilocybin) How To Grow Magic Mushrooms A Simple Psilocybe Cubensis Growing Technique PDF
48 pages
Cellular Communication POGIL
80% (10)
Cellular Communication POGIL
5 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
1001 Songs
71% (69)
1001 Songs
1,798 pages
Trademark License Agreement
78% (381)
Trademark License Agreement
3 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Online Library Management System
95% (19)
Online Library Management System
38 pages
M&S - Case Study
No ratings yet
M&S - Case Study
1 page
BIG DATA UNIT -3
No ratings yet
BIG DATA UNIT -3
7 pages
Map Reduce
No ratings yet
Map Reduce
40 pages
Map Reduce and Hadoop
No ratings yet
Map Reduce and Hadoop
39 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
Lecture Notes Map Reduce
No ratings yet
Lecture Notes Map Reduce
24 pages
Bda Unit 4
No ratings yet
Bda Unit 4
20 pages
Lecture 1 - Map Reduce
No ratings yet
Lecture 1 - Map Reduce
31 pages
Map Reduce PDF
No ratings yet
Map Reduce PDF
29 pages
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
No ratings yet
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
12 pages
Unit-4
No ratings yet
Unit-4
19 pages
Da Unit 5 Data Analytics
No ratings yet
Da Unit 5 Data Analytics
43 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Bda Module 4
No ratings yet
Bda Module 4
34 pages
Hadoop Interview Questions Faq
No ratings yet
Hadoop Interview Questions Faq
14 pages
Map Reduce_3
No ratings yet
Map Reduce_3
23 pages
Map Reduce
No ratings yet
Map Reduce
3 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
48 pages
18mcs35e U4
No ratings yet
18mcs35e U4
7 pages
Unit 5 - Mapreduce
No ratings yet
Unit 5 - Mapreduce
8 pages
Hadoop
No ratings yet
Hadoop
38 pages
3.Map-Reduce Framework - 1
No ratings yet
3.Map-Reduce Framework - 1
47 pages
Map-reduce-Developing a map-reduce application – Map-reduce working procedure-2
No ratings yet
Map-reduce-Developing a map-reduce application – Map-reduce working procedure-2
10 pages
Big Data Analytics Mid 2
No ratings yet
Big Data Analytics Mid 2
9 pages
Unit-2 Map Reduce Notes
No ratings yet
Unit-2 Map Reduce Notes
28 pages
3 Bda Unit 3 Notes
No ratings yet
3 Bda Unit 3 Notes
12 pages
BDA Unit 3 Notes
No ratings yet
BDA Unit 3 Notes
11 pages
3 Bda Unit 3 Notes
No ratings yet
3 Bda Unit 3 Notes
12 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
Map Reduce
No ratings yet
Map Reduce
16 pages
Big Data 4 Vivek
No ratings yet
Big Data 4 Vivek
3 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
Map Reduce Architecture: Adapted From Lectures by
No ratings yet
Map Reduce Architecture: Adapted From Lectures by
37 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Unit 3 - Big Data Technologies
No ratings yet
Unit 3 - Big Data Technologies
42 pages
What Is MapReduce in Hadoop
No ratings yet
What Is MapReduce in Hadoop
5 pages
Unit V Big Data Analytics
No ratings yet
Unit V Big Data Analytics
47 pages
MapReduce
No ratings yet
MapReduce
14 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Map Reduce Algorithm - Hadoop
No ratings yet
Map Reduce Algorithm - Hadoop
15 pages
BDA notes
No ratings yet
BDA notes
39 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
3-bda-unit-3-notes
No ratings yet
3-bda-unit-3-notes
12 pages
Hadoop Streaming: Mapreduce
No ratings yet
Hadoop Streaming: Mapreduce
8 pages
1 UNIT-1
No ratings yet
1 UNIT-1
59 pages
BDA-MapReduce (1) 5rfgy656yhgvcft6
No ratings yet
BDA-MapReduce (1) 5rfgy656yhgvcft6
60 pages
BDA Module 3
No ratings yet
BDA Module 3
66 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Data Science
No ratings yet
Data Science
7 pages
Map Reduce
No ratings yet
Map Reduce
18 pages
Top Answers To Map Reduce Interview Questions
No ratings yet
Top Answers To Map Reduce Interview Questions
6 pages
3-bda-unit-3-notes
No ratings yet
3-bda-unit-3-notes
12 pages
Bda Experiment No2
No ratings yet
Bda Experiment No2
12 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
Unit4 Fos
No ratings yet
Unit4 Fos
7 pages
Unit 4 Da
No ratings yet
Unit 4 Da
57 pages
Map Reduce
No ratings yet
Map Reduce
45 pages
Lisp Programming Language
From Everand
Lisp Programming Language
Faiz ul haque Zeya
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
From Everand
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
Ginno
No ratings yet
Certificates - MIC-178
No ratings yet
Certificates - MIC-178
1 page
Certificates MIC 180
No ratings yet
Certificates MIC 180
1 page
PCS%20GLOBAL%20E.I.L.P%20Joining%20form
100% (1)
PCS%20GLOBAL%20E.I.L.P%20Joining%20form
12 pages
2240027 (JSD 1) (2)
No ratings yet
2240027 (JSD 1) (2)
9 pages
2240027 (JSD 1) (4)
No ratings yet
2240027 (JSD 1) (4)
11 pages
AMAZON Reasoning Ability
No ratings yet
AMAZON Reasoning Ability
3 pages
Cloud Comuting Mcqs
No ratings yet
Cloud Comuting Mcqs
31 pages
What Is Computer
No ratings yet
What Is Computer
8 pages
WhatsNew21xFD06 09 - Final Extended
No ratings yet
WhatsNew21xFD06 09 - Final Extended
132 pages
5.2.1.4 Packet Tracer - Configuring SSH Instruction
No ratings yet
5.2.1.4 Packet Tracer - Configuring SSH Instruction
2 pages
KR2000-3000 - 21CFR Part11
No ratings yet
KR2000-3000 - 21CFR Part11
24 pages
Object-Oriented Programming (OOP)
No ratings yet
Object-Oriented Programming (OOP)
28 pages
TOGAF 9.1 Big Picture
No ratings yet
TOGAF 9.1 Big Picture
1 page
Ali Reza Qasimi's CL and CV
No ratings yet
Ali Reza Qasimi's CL and CV
4 pages
Aes & Rsa
No ratings yet
Aes & Rsa
3 pages
Topic2 Transaction Processing Part 2
No ratings yet
Topic2 Transaction Processing Part 2
15 pages
L4B-Sqoop Import - Mysql To Hive: Scenario 1 - The Setting
No ratings yet
L4B-Sqoop Import - Mysql To Hive: Scenario 1 - The Setting
14 pages
Benchmark Gensuite International Launch - Quick Start Guide
No ratings yet
Benchmark Gensuite International Launch - Quick Start Guide
42 pages
Arcgis Enterprise Functionality Matrix
No ratings yet
Arcgis Enterprise Functionality Matrix
15 pages
Ecp Clocl Fail
No ratings yet
Ecp Clocl Fail
4 pages
Digital Marketing-Course Content - Latest
No ratings yet
Digital Marketing-Course Content - Latest
1 page
SD Customer Master
No ratings yet
SD Customer Master
5 pages
A New Approach of Software Test Automation Using Ai
No ratings yet
A New Approach of Software Test Automation Using Ai
12 pages
KonaKart User Guide
No ratings yet
KonaKart User Guide
203 pages
PC Tree Paper
No ratings yet
PC Tree Paper
8 pages
Integrigy OBIEE Security Quick Reference
No ratings yet
Integrigy OBIEE Security Quick Reference
2 pages
Chapter-10 Parallel Programming Models, Languages and Compilers
No ratings yet
Chapter-10 Parallel Programming Models, Languages and Compilers
30 pages
TortoiseSVN-1 14 6-En
No ratings yet
TortoiseSVN-1 14 6-En
254 pages