100% found this document useful (1 vote)

65 views

Lab 4 - Installation of Hadoop and MapReduce WordCount Example

The document provides steps to install Hadoop and configure the environment. It then describes how to run a sample WordCount MapReduce program on the Hadoop cluster.

Uploaded by

b.benchenni27

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

65 views

Lab 4 - Installation of Hadoop and MapReduce WordCount Example

The document provides steps to install Hadoop and configure the environment. It then describes how to run a sample WordCount MapReduce program on the Hadoop cluster.

Uploaded by

b.benchenni27

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

Steps to install Hadoop:

1. Make sure java is installed.
java -version

If java is not installed, then type in the following commands:

sudo apt-get install update
sudo apt-get update
sudo apt-get install default-jdk
Make sure now java is installed.
java -version

2. Install ssh server

sudo apt-get install ssh-server
Generate public/private RSA key pair.
ssh-keygen -t rsa -P “”
When prompted for the file name to save the key, press Enter (leave it blank).

Type the following commands:

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

1
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

ssh localhost
exit

3. Install Hadoop by navigating to the following link and downloading the tar.gz file
for Hadoop version 3.3.0 (or a later version if you wish). (478 MB)
https://hadoop.apache.org/release/3.3.0.html

4. Once downloaded, open the terminal and cd to the directory where it is

downloaded (assume the desktop for example) and extract it as follows:
cd Desktop

2
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

sudo tar -xvzf hadoop-3.3.0.tar.gz

You can now check that there is an extracted file named hadoop-3.3.0 by typing
the command “ls” or by visually inspecting the files.
5. Now, we move the extracted file to the location /usr/local/hadoop
sudo mv hadoop-3.3.0 /usr/local/hadoop
6. Let’s configure the hadoop system.
Type the following command:
sudo gedit ~/.bashrc
At the end of the file, add the following lines: (Note: Replace the java version with the version
number you already have. You can navigate to the directory /usr/lib/jvm and check the file
name java-xx-openjdk-amd64)
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/native"

3
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

7. Save the file and close it.

8. Now from the terminal, type the following command:
source ~/.bashrc
9. We start configuring Hadoop by opening hadoop-env.sh as follows:
sudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh
Search for the line starting with export JAVA_HOME= and replace it with the
following line.
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

Save the file by clicking on “Save” or (Ctrl+S)

4
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

10. Open core-site.xml as follows:

sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml

Add the following lines between the tags <configuration> and </configuration> and
save it (Ctrl+S).
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

5
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

11. Open hdfs-site.xml as follows:

sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

Add the following lines between the tags <configuration> and </configuration> and
save it (Ctrl+S).
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

6
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_space/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_space/hdfs/datanode</value>
</property>

7
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

12. Open yarn-site.xml as follows:

sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml

Add the following lines between the tags <configuration> and </configuration> and
save it (Ctrl+S)
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>

8
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

</property>

13. Open mapred-site.xml as follows:

9
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>

14. Now, run the following commands on the terminal to create a directory for
hadoop space, name node and data node.

sudo mkdir -p /usr/local/hadoop_space

sudo mkdir -p /usr/local/hadoop_space/hdfs/namenode
sudo mkdir -p /usr/local/hadoop_space/hdfs/datanode

Now we have successfully installed Hadoop.

10
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

15. Format the namenode as follows:

hdfs namenode -format

This step should end by shutting down the namenode as follows:

16. Before starting the Hadoop Distributed File System (hdfs), we need to
make sure that the rcmd type is “ssh” not “rsh” when we type the following
command
pdsh -q -w localhost

11
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

17. If the rcmd type is “rsh” as in the above figure, type the following
commands:
export PDSH_RCMD_TYPE=ssh
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
Run Step 16 again to check that the rcmd type is now ssh.
If not, skip that step.

18. Start the HDFS System using the command.

start-dfs.sh

19. Start the YARN using the command

12
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

start-yarn.sh

20. Type the following command. You should see an output similar to the one
in the following figure.
jps

Make sure these nodes are listed: (ResourceManager, NameNode,

NodeManager, SecondaryNameNode, Jps and DataNode).

21. Go to localhost:9870 from the browser. You should expect the following

Steps to run WordCount Program on Hadoop:

13
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

1. Make sure Hadoop and Java are installed properly

hadoop version
javac -version

2. Create a directory on the Desktop named Lab and inside it create two folders;
one called “Input” and the other called “tutorial_classes”.
[You can do this step using GUI normally or through terminal commands]
cd Desktop
mkdir Lab
mkdir Lab/Input
mkdir Lab/tutorial_classes

3. Add the file attached with this document “WordCount.java” in the directory Lab

4. Add the file attached with this document “input.txt” in the directory Lab/Input.

5. Type the following command to export the hadoop classpath into bash.
export HADOOP_CLASSPATH=$(hadoop classpath)
Make sure it is now exported.
echo $HADOOP_CLASSPATH
6. It is time to create these directories on HDFS rather than locally. Type the
following commands.
hadoop fs -mkdir /WordCountTutorial
hadoop fs -mkdir /WordCountTutorial/Input
hadoop fs -put Lab/Input/input.txt /WordCountTutorial/Input
7. Go to localhost:9870 from the browser, Open “Utilities → Browse File System”
and you should see the directories and files we placed in the file system.
8. Then, back to local machine where we will compile the WordCount.java file.
Assuming we are currently in the Desktop directory.
cd Lab
javac -classpath $HADOOP_CLASSPATH -d tutorial_classes
WordCount.javaPut the output files in one jar file (There is a dot at the end)
jar -cvf WordCount.jar -C tutorial_classes .

9. Now, we run the jar file on Hadoop.

hadoop jar WordCount.jar WordCount /WordCountTutorial/Input
/WordCountTutorial/Output

14
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

10. Output the result:

hadoop dfs -cat /WordCountTutorial/Output/*

Requirement:

15
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

Vodafone Egypt is launching a marketing campaign in Ramadan to promote their

sales and increase their profit from selling the prepaid recharge cards. These cards
are worth 5, 10, 15, 50, and 100 EGP.

The data science team at Vodafone are analyzing the customers’ data which include
the customer personal information, the prepaid card they purchased, the timestamp
they registered the prepaid amount on their Vodafone accounts, among other
information.

The details of the customers are omitted, and you are only provided with a file “in.csv”
which includes two columns.
1. Customer ID. (Each ID maps to a certain customer, whose data is hidden for
confidentiality).
2. Prepaid Card Amount.

Your task is to generate a report using MapReduce (similar to the WordCount

program) showing the total amount of prepaid cards for each customer that they have
purchased. For example, if a customer with ID 300 purchased 5 cards with 10, 15,
15, 10, 100, then the report should include that customer ID 300 bought cards with a
total amount of 150.

Disclaimer: Thanks to Vodafone DS team who provided us with this real customer
data.

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
DMDW Auto Final
No ratings yet
DMDW Auto Final
12 pages
Map Reduce
No ratings yet
Map Reduce
38 pages
Hadoop and Related Tools
No ratings yet
Hadoop and Related Tools
57 pages
Big Data Technologies
No ratings yet
Big Data Technologies
4 pages
DSBDSAssingment 11
No ratings yet
DSBDSAssingment 11
20 pages
BDA Unit - II
No ratings yet
BDA Unit - II
66 pages
Top 10 Trenduri Big Data
No ratings yet
Top 10 Trenduri Big Data
13 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
BDA Experiment 14 PDF
No ratings yet
BDA Experiment 14 PDF
77 pages
How To Copy or Move Files From One Folder To Another Based On A List in Excel
No ratings yet
How To Copy or Move Files From One Folder To Another Based On A List in Excel
8 pages
Install Hadoop-2.6.0 On Windows10
No ratings yet
Install Hadoop-2.6.0 On Windows10
10 pages
Advanced Database Protocols
No ratings yet
Advanced Database Protocols
15 pages
DATA ANALYTICS Lab
No ratings yet
DATA ANALYTICS Lab
3 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
Android Day 2 PDF
No ratings yet
Android Day 2 PDF
13 pages
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
No ratings yet
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
55 pages
Client/Server Software Testing: Hongmei Yang
67% (3)
Client/Server Software Testing: Hongmei Yang
21 pages
Big Data Analytics - Lab-Manual
No ratings yet
Big Data Analytics - Lab-Manual
19 pages
Chapter 01
No ratings yet
Chapter 01
43 pages
Unit Iii
No ratings yet
Unit Iii
43 pages
Software Testing
No ratings yet
Software Testing
25 pages
Mini Project B LVL
No ratings yet
Mini Project B LVL
62 pages
Unit 4 Hadoop
No ratings yet
Unit 4 Hadoop
86 pages
3170722_BDA_Lab Manual(1)
No ratings yet
3170722_BDA_Lab Manual(1)
78 pages
Smart Parking
No ratings yet
Smart Parking
39 pages
E-Library Management System: Project Proposal For
No ratings yet
E-Library Management System: Project Proposal For
8 pages
Batch Processing Vs Stream Processing
No ratings yet
Batch Processing Vs Stream Processing
3 pages
PHP
No ratings yet
PHP
15 pages
Sepm Unit 3.... Roshan
No ratings yet
Sepm Unit 3.... Roshan
16 pages
Hadoop Interviews Q
No ratings yet
Hadoop Interviews Q
9 pages
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
No ratings yet
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
30 pages
Coursera - Programming Mobile Apps Android
No ratings yet
Coursera - Programming Mobile Apps Android
6 pages
Size and Effort Paper
No ratings yet
Size and Effort Paper
9 pages
MOEAFramework 2.1 ManualFixed
No ratings yet
MOEAFramework 2.1 ManualFixed
191 pages
Latest Trends in Serverless Computing: Bachelors of Technology in
No ratings yet
Latest Trends in Serverless Computing: Bachelors of Technology in
7 pages
Major Synopsis IPU PDF
No ratings yet
Major Synopsis IPU PDF
17 pages
Sample Paper Q0503
No ratings yet
Sample Paper Q0503
20 pages
BigData Hadoop - Interview Questions and Answers - Multiple Choice - Objective
67% (3)
BigData Hadoop - Interview Questions and Answers - Multiple Choice - Objective
2 pages
Online Book Store Report
No ratings yet
Online Book Store Report
30 pages
Full Stack UNIT 3
No ratings yet
Full Stack UNIT 3
36 pages
CH 1 AngularJS
No ratings yet
CH 1 AngularJS
41 pages
Directory Structure of Android App
No ratings yet
Directory Structure of Android App
8 pages
DB Administration and Security
No ratings yet
DB Administration and Security
20 pages
ALL UNITS PPTS Walker Royce
No ratings yet
ALL UNITS PPTS Walker Royce
122 pages
391 - CS8091 Big Data Analytics - Anna University 2017 Regulation Syllabus
0% (2)
391 - CS8091 Big Data Analytics - Anna University 2017 Regulation Syllabus
2 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
28 pages
Apache Sqoop: Hanoi - Autumn 2019
No ratings yet
Apache Sqoop: Hanoi - Autumn 2019
18 pages
MCQ Type Questions
No ratings yet
MCQ Type Questions
24 pages
College Management
No ratings yet
College Management
58 pages
BCSL 058 Computer Oriented Numerical Techniques Lab Solved Assignment 2019 20
No ratings yet
BCSL 058 Computer Oriented Numerical Techniques Lab Solved Assignment 2019 20
17 pages
Nosql - Journey Ahead!: Origin: Punch Cards To Dbms
No ratings yet
Nosql - Journey Ahead!: Origin: Punch Cards To Dbms
54 pages
Big Data Platforms
No ratings yet
Big Data Platforms
8 pages
MongoDB Pagination
No ratings yet
MongoDB Pagination
6 pages
Hadoop I/O: Jaeyong Choi
No ratings yet
Hadoop I/O: Jaeyong Choi
36 pages
Department of Computing: Lab 07: Express JS
No ratings yet
Department of Computing: Lab 07: Express JS
5 pages
BDA Presentations
No ratings yet
BDA Presentations
26 pages
ColdFusion Interview Questions, Answers, and Explanations: ColdFusion Certification Review
From Everand
ColdFusion Interview Questions, Answers, and Explanations: ColdFusion Certification Review
equitypress
No ratings yet
Cloud Development and Deployment with CloudBees
From Everand
Cloud Development and Deployment with CloudBees
Nicolas De loof
No ratings yet
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet