0% found this document useful (0 votes)

30 views

Lab 4 - Installation of Hadoop and MapReduce WordCount Example

Hadip

Uploaded by

muhamadfajarsidik368

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

Lab 4 - Installation of Hadoop and MapReduce WordCount Example

Hadip

Uploaded by

muhamadfajarsidik368

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

Steps to install Hadoop:

1. Make sure java is installed.
java -version

If java is not installed, then type in the following commands:

sudo apt-get install update
sudo apt-get update
sudo apt-get install default-jdk
Make sure now java is installed.
java -version

2. Install ssh server

sudo apt-get install ssh-server
Generate public/private RSA key pair.
ssh-keygen -t rsa
When prompted for the file name to save the key, press Enter (leave it blank).
Type the following commands:
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
ssh localhost
exit

1
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

3. Install Hadoop by navigating to the following link and downloading the tar.gz file
for Hadoop version 3.3.0 (or a later version if you wish). (478 MB)
https://hadoop.apache.org/release/3.3.0.html

4. Once downloaded, open the terminal and cd to the directory where it is

downloaded and extract it as follows:
cd Downloads
sudo tar -xvzf hadoop-3.3.0.tar.gz

2
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

You can now check that there is an extracted file named hadoop-3.3.0 by typing
the command “ls” or by visually inspecting the files.
5. Now, we move the extracted file to the location /usr/local/hadoop

sudo mv hadoop-3.3.0 /usr/local/hadoop

6. Let’s configure the hadoop system.

Type the following command:

sudo gedit ~/.bashrc

At the end of the file, add the following lines: (Note: Replace the java version with the version
number you already have. You can navigate to the directory /usr/lib/jvm and check the file
name java-xx-openjdk-amd64)

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/native"
export PDSH_RCMD_TYPE=ssh

3
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

7. Save the file and close it.

8. Now from the terminal, type the following command:

source ~/.bashrc

9. We start configuring Hadoop by opening hadoop-env.sh as follows:

sudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh

Search for the line starting with export JAVA_HOME= and replace it with the
following line.

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

Save the file by clicking on “Save” or (Ctrl+S)

4
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

10. Open core-site.xml as follows:

sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml

Add the following lines between the tags <configuration> and </configuration> and
save it (Ctrl+S).
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

5
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

11. Open hdfs-site.xml as follows:

sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

Add the following lines between the tags <configuration> and </configuration> and
save it (Ctrl+S).
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/hadoop_space/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/hadoop_space/hdfs/datanode</value>
</property>

6
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

12. Open yarn-site.xml as follows:

sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml

Add the following lines between the tags <configuration> and </configuration> and
save it (Ctrl+S)
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

7
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

13. Open mapred-site.xml as follows:

sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml

Add the following lines between the tags <configuration> and </configuration> and
save it (Ctrl+S)
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>

8
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

14. Now, run the following commands on the terminal to create a directory for
hadoop space, name node and data node.

sudo mkdir -p /usr/local/hadoop/hadoop_space

sudo mkdir -p /usr/local/hadoop/hadoop_space/hdfs/namenode
sudo mkdir -p /usr/local/hadoop/hadoop_space/hdfs/datanode
Now we have successfully installed Hadoop.
15. Format the namenode as follows:
hdfs namenode -format

This step should end by shutting down the namenode as follows:

9
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

16. Before starting the Hadoop Distributed File System (hdfs), we need to
make sure that the rcmd type is “ssh” not “rsh” when we type the following
command
pdsh -q -w localhost

17. If the rcmd type is “rsh” as in the above figure, type the following
commands:
export PDSH_RCMD_TYPE=ssh
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
Run Step 16 again to check that the rcmd type is now ssh.
If not, skip that step.

18. Start the HDFS System using the command.

start-dfs.sh

10
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

19. Start the YARN using the command

start-yarn.sh

20. Type the following command. You should see an output similar to the one
in the following figure.
jps

Make sure these nodes are listed: (ResourceManager, NameNode,

NodeManager, SecondaryNameNode, Jps and DataNode).

21. Go to localhost:9870 from the browser. You should expect the following

Steps to run WordCount Program on Hadoop:

11
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

1. Make sure Hadoop and Java are installed properly

hadoop version
javac -version

2. Create a directory on the Desktop named Lab and inside it create two folders;
one called “Input” and the other called “tutorial_classes”.
[You can do this step using GUI normally or through terminal commands]
cd Desktop
mkdir Lab
mkdir Lab/Input
mkdir Lab/classes

3. Add the file attached with this document “WordCount.java” in the directory Lab

4. Add the file attached with this document “input.txt” in the directory Lab/Input.

5. Type the following command to export the hadoop classpath into bash.
export HADOOP_CLASSPATH=$(hadoop classpath)
Make sure it is now exported.
echo $HADOOP_CLASSPATH
6. It is time to create these directories on HDFS rather than locally. Type the
following commands.
hadoop fs -mkdir /WordCount
hadoop fs -mkdir /WordCount/Input
hadoop fs -put Lab/Input/input.txt /WordCount/Input
7. Go to localhost:9870 from the browser, Open “Utilities → Browse File
System” and you should see the directories and files we placed in the
file system.
8. Then, back to local machine where we will compile the WordCount.java file.
Assuming we are currently in the Desktop directory.
cd Lab
javac -classpath $HADOOP_CLASSPATH -d classes WordCount.java

Put the output files in one jar file (There is a dot at the end). And execute below
command:

jar -cvf WordCount.jar -C tutorial_classes .

12
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

9. Now, we run the jar file on Hadoop.

hadoop jar WordCount.jar WordCount /WordCount/Input
/WordCount/Output

10. Output the result:

hadoop dfs -cat /WordCount/Output/*

Requirement:

13
CMPN451 Big Data Analytics Lab 4 - MapReduce with Hadoop

Vodafone Egypt is launching a marketing campaign in Ramadan to promote sales

and increase profit from selling prepaid recharge cards. These cards are worth 5, 10,
15, 50, and 100 EGP.

The data science team at Vodafone is analyzing the customers’ data which includes
the customer's personal information, the prepaid card they purchased, and the
timestamp they registered the prepaid amount on their Vodafone accounts, among
other information.

The details of the customers are omitted, and you are only provided with a file “in.csv”
which includes two columns.
1. Customer ID. (Each ID maps to a certain customer, whose data is hidden for
confidentiality).
2. Prepaid Card Amount.

Your task is to generate a report using MapReduce (similar to the WordCount

program) showing the total amount of prepaid cards for each customer that they have
purchased. For example, if a customer with ID 300 purchased 5 cards with 10, 15,
15, 10, and 100, then the report should include that customer ID 300 bought cards
with a total amount of 150.

Disclaimer: Thanks to the Vodafone DS team who provided us with this real customer
data.

Bda Manual
No ratings yet
Bda Manual
80 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Pumps Lecture
100% (1)
Pumps Lecture
38 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
bda-manual
No ratings yet
bda-manual
33 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
34 pages
Hadoop Single Node Cluster Setup Steps
No ratings yet
Hadoop Single Node Cluster Setup Steps
7 pages
bigdatamanual(2)
No ratings yet
bigdatamanual(2)
45 pages
Bda Lab
No ratings yet
Bda Lab
47 pages
ccs 334 bigdata manual
No ratings yet
ccs 334 bigdata manual
45 pages
BDA LAB Programs
No ratings yet
BDA LAB Programs
56 pages
213nt1306- Big Data Analytics Lab Manual
No ratings yet
213nt1306- Big Data Analytics Lab Manual
80 pages
2 - Installation
No ratings yet
2 - Installation
15 pages
Step 1: Download Binary Package
No ratings yet
Step 1: Download Binary Package
50 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
Big Data File
No ratings yet
Big Data File
16 pages
Big Data Manual Ai
No ratings yet
Big Data Manual Ai
33 pages
Hadoop Installation Step by Step
No ratings yet
Hadoop Installation Step by Step
8 pages
Hadoop Administrator Training - Lab Hand Book
No ratings yet
Hadoop Administrator Training - Lab Hand Book
12 pages
TP2 _3IM - En
No ratings yet
TP2 _3IM - En
7 pages
How To Install Hadoop On Ubuntu 18.04 or 20.04
No ratings yet
How To Install Hadoop On Ubuntu 18.04 or 20.04
15 pages
CC EXP 8 VBHV
No ratings yet
CC EXP 8 VBHV
8 pages
Big data analytics lab-JD
No ratings yet
Big data analytics lab-JD
49 pages
BigData_Lab_Manual
No ratings yet
BigData_Lab_Manual
44 pages
Big Data Manual
No ratings yet
Big Data Manual
19 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
Experiment-2_BDA_Lab
No ratings yet
Experiment-2_BDA_Lab
13 pages
BIG_DATA_RECORD
No ratings yet
BIG_DATA_RECORD
14 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
BIG_DATA_RECORD
No ratings yet
BIG_DATA_RECORD
13 pages
Hadoop Installation Manual 2.odt
No ratings yet
Hadoop Installation Manual 2.odt
20 pages
HADOOP RECORD 2024-FINAL
No ratings yet
HADOOP RECORD 2024-FINAL
59 pages
big data
No ratings yet
big data
32 pages
Instalisasi Hadoop Dengan Ubuntu
No ratings yet
Instalisasi Hadoop Dengan Ubuntu
17 pages
Hadoop Installation Steps
No ratings yet
Hadoop Installation Steps
16 pages
Big Data Lab Manual and Syllabus
No ratings yet
Big Data Lab Manual and Syllabus
71 pages
AICTE SPONSORED Faculty Development Programme (FDP) On "DATA SCIENCE RESEARCH AND BIG DATA ANALYTICS"
No ratings yet
AICTE SPONSORED Faculty Development Programme (FDP) On "DATA SCIENCE RESEARCH AND BIG DATA ANALYTICS"
28 pages
BDA Practical
No ratings yet
BDA Practical
38 pages
BDA
No ratings yet
BDA
88 pages
CCS334-BDA LAB MANUAL final (1)
No ratings yet
CCS334-BDA LAB MANUAL final (1)
46 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
37 pages
1.Mrplab Intro
No ratings yet
1.Mrplab Intro
18 pages
BIG DATA WITH HADOOP, HDFS & MAPREDUCE (Hands On Training)
No ratings yet
BIG DATA WITH HADOOP, HDFS & MAPREDUCE (Hands On Training)
35 pages
Online:: Setting Up The Environment
No ratings yet
Online:: Setting Up The Environment
9 pages
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
No ratings yet
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
11 pages
Big_data_2
No ratings yet
Big_data_2
2 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
Course: Big Data Analytics Lab Scheme: 2017
No ratings yet
Course: Big Data Analytics Lab Scheme: 2017
25 pages
big datalab
No ratings yet
big datalab
4 pages
Install Hadoop-2.6.0 On Windows10
No ratings yet
Install Hadoop-2.6.0 On Windows10
8 pages
Hadoop & Spark
No ratings yet
Hadoop & Spark
40 pages
Cloud Computing Lab Setup Using Hadoop & Open Nebula
100% (4)
Cloud Computing Lab Setup Using Hadoop & Open Nebula
46 pages
BDA LAB MANUEL
No ratings yet
BDA LAB MANUEL
9 pages
Install Sqoop
No ratings yet
Install Sqoop
7 pages
Installing Multi Node Cluster - Handbook 2.0
No ratings yet
Installing Multi Node Cluster - Handbook 2.0
2 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
45 pages
Word Count using MapReduce on Hadoop
No ratings yet
Word Count using MapReduce on Hadoop
14 pages
Big Data Manual - Fall 2023
No ratings yet
Big Data Manual - Fall 2023
76 pages
BDA Lab
No ratings yet
BDA Lab
13 pages
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
CHE 324 (Dr. Garba's Part) Chemical Engineering Thermodynamics II (2 Credits)
No ratings yet
CHE 324 (Dr. Garba's Part) Chemical Engineering Thermodynamics II (2 Credits)
10 pages
Accessories & Welding Mig Guns: Catalog
100% (1)
Accessories & Welding Mig Guns: Catalog
136 pages
1.1K - 3.3KTL-G3 User Manual20210519
No ratings yet
1.1K - 3.3KTL-G3 User Manual20210519
72 pages
Captiva Series II Product Overview
No ratings yet
Captiva Series II Product Overview
10 pages
Equipment Operational Reliability Evaluation Metho
No ratings yet
Equipment Operational Reliability Evaluation Metho
9 pages
74HC4051A Datasheet
No ratings yet
74HC4051A Datasheet
15 pages
SNA-Graph Essentials
No ratings yet
SNA-Graph Essentials
106 pages
Computer Organization: Digital Computer: It Is A Fast Electronic Calculating Machine That Accepts Digitized Input
No ratings yet
Computer Organization: Digital Computer: It Is A Fast Electronic Calculating Machine That Accepts Digitized Input
52 pages
Cutter Option: Installation Instructions
No ratings yet
Cutter Option: Installation Instructions
14 pages
Lab Report Template PHYA11
No ratings yet
Lab Report Template PHYA11
10 pages
Samacheer Kalvi Tenth Book Back One Word Questions English Medium
No ratings yet
Samacheer Kalvi Tenth Book Back One Word Questions English Medium
22 pages
Lecture-1 Inventory Control Introduction
No ratings yet
Lecture-1 Inventory Control Introduction
44 pages
Basics of Spar Analysis (Tension Field Beams)
No ratings yet
Basics of Spar Analysis (Tension Field Beams)
39 pages
AHU Static Pressure Calc
No ratings yet
AHU Static Pressure Calc
56 pages
Al15 Kgdraft en Rev00 ZSP Datasheet Web
No ratings yet
Al15 Kgdraft en Rev00 ZSP Datasheet Web
2 pages
Directional Earth Fault MRP2
No ratings yet
Directional Earth Fault MRP2
36 pages
Gustav Mie Theorie
No ratings yet
Gustav Mie Theorie
52 pages
COE201 Lab 1
No ratings yet
COE201 Lab 1
48 pages
Six Axis Articlated Robotic Arm 2nd Presentation
No ratings yet
Six Axis Articlated Robotic Arm 2nd Presentation
17 pages
Module 9 - Forecasting
No ratings yet
Module 9 - Forecasting
56 pages
Pinoy Bix Tomasi
No ratings yet
Pinoy Bix Tomasi
22 pages
Ketones in Urine (Ketonuria)
No ratings yet
Ketones in Urine (Ketonuria)
9 pages
Validated TQ Math5 Q2 SY23-24
No ratings yet
Validated TQ Math5 Q2 SY23-24
4 pages
Simplifying Complexity-A Review of Complexity Theory
No ratings yet
Simplifying Complexity-A Review of Complexity Theory
10 pages
Reactive Dyes For Single-Bath and Single-Stage Dyeing of Polyester-Cellulose Blends
No ratings yet
Reactive Dyes For Single-Bath and Single-Stage Dyeing of Polyester-Cellulose Blends
4 pages
Raysafe x2 Manual en 7
No ratings yet
Raysafe x2 Manual en 7
79 pages
Gage Repeatability and Reproducibility Data Sheet
No ratings yet
Gage Repeatability and Reproducibility Data Sheet
12 pages
Bulletin 627 Bureau of Mines Flammability of Combustible GAs and Vapors
No ratings yet
Bulletin 627 Bureau of Mines Flammability of Combustible GAs and Vapors
130 pages
Five Project Management Performance Metrics Key To Successful Project Execution
No ratings yet
Five Project Management Performance Metrics Key To Successful Project Execution
7 pages