3 MapReduce program ex code
3 MapReduce program ex code
Driver Code
Mapper code:
public static class Map extends
Mapper<LongWritable,Text,Text,IntWritable> {
public void map(LongWritable key, Text value, Context context)
throws IOException,InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
value.set(tokenizer.nextToken());
context.write(value, new IntWritable(1));
}
We have created a class Map that extends the class Mapper which is
already defined in the MapReduce Framework.
We define the data types of input and output key/value pair after the
class declaration using angle brackets.
Both the input and output of the Mapper is a key/value pair.
Input:
The key is nothing but the offset of each line in the text file: LongWritable
The value is each individual line (as shown in the figure at the right): Text
Output:
The key is the tokenized words: Text
We have the hardcoded value in our case which is 1: IntWritable
Example – Dear 1, Bear 1, etc.
We have written a java code where we have tokenized each word and
assigned them a hardcoded value equal to 1.
Reducer Code:
public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable>
{
public void reduce(Text key, Iterable<IntWritable> values,Context context)
throws IOException,InterruptedException {
int sum=0;
for(IntWritable x: values)
{
sum+=x.get();
}
context.write(key, new IntWritable(sum));
}
}
We have created a class Reduce which extends class Reducer like that of Mapper.
We define the data types of input and output key/value pair after the class
declaration using angle brackets as done for Mapper.
Both the input and the output of the Reducer is a key-value pair.
Input:
The key nothing but those unique words which have been generated after the sorting
and shuffling phase: Text
The value is a list of integers corresponding to each key: IntWritable
Example – Bear, [1, 1], etc.
Output:
The key is all the unique words present in the input text file: Text
The value is the number of occurrences of each of the unique words: IntWritable
Example – Bear, 2; Car, 3, etc.
We have aggregated the values present in each of the list corresponding to each
key and produced the final answer.
In general, a single reducer is created for each of the unique words, but, you can
specify the number of reducer in mapred-site.xml.
Driver Code:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.fs.Path;
https://www.edureka.co/blog/mapreduce-tutorial/
https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html
https://www.tutorialspoint.com/hadoop/hadoop_mapreduce.htm
https://www.geeksforgeeks.org/mapreduce-understanding-with-real-life-e
xample/