Hadoop Installation

Hadoop Installation
Step 1: Installing Java on Ubuntu

To install java on Ubuntu, all you have to do is execute the following command:
sudo apt install default-jdk default-jre -y

To verify the installation, check the java version on your system:
java -version
Step 2: Create a user for Hadoop and configure SSH

First, create a new user named hadoop:
sudo adduser hadoop

To enable superuser privileges to the new user, add it to the sudo group:
sudo usermod -aG sudo hadoop

Once done, switch to the user hadoop:
sudo su - hadoop
Next, install the OpenSSH server and client:
sudo apt install openssh-server openssh-client -y

Now, use the following command to generate private and public keys:
ssh-keygen -t rsa
Here, it will ask you:
Where to save the key (hit enter to save it inside your home directory)
Create passphrase for keys (leave blank for no passphrase)
Now, add the public key to authorized_keys:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Use the chmod command to change the file permissions of authorized_keys:
sudo chmod 640 ~/.ssh/authorized_keys

Finally, verify the SSH configuration:
ssh localhost
If you have not configured the password, all you have to do is type yes and hit enter if you
added a passphrase for the keys, it will ask you to enter here:
Step 3: Download and install Apache Hadoop on Ubuntu

While writing, Version Hadoop Version is 3.3.6 so We will be using the wget command to
download this release:
wget https://downloads.apache.org/hadoop/common/stable/hadoop-3.3.6.tar.gz
Once you are done with the download, extract the file using the following command:
tar -xvzf hadoop-3.3.6.tar.gz

Next, move the extracted file to the /usr/local/hadoop using the following command:
sudo mv hadoop-3.3.6 /usr/local/hadoop

Now, create a directory using mkdir command to store logs:
sudo mkdir /usr/local/hadoop/logs

Finally, change the ownership of the /usr/local/hadoop to the user hadoop:
sudo chown -R hadoop:hadoop /usr/local/hadoop
Step 4: Configure Hadoop on Ubuntu

Here, We will walk you through the configuration of the Hadoop environment variable.
First, open the .bashrc file using the following command:
sudo nano ~/.bashrc

Jump to the end of the line in the nano text editor by pressing Alt + / and paste the following
lines:
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
Save changes and exit from the nano text editor.
To enable the changes, source the .bashrc file:
source ~/.bashrc
Step 5: Configure java environment variables

To use Hadoop, you are required to enable its core functions which include YARN, HDFS,
MapReduce, and Hadoop-related project settings.
To do that, you will have to define java environment variables in hadoop-env.sh file.
Edit the hadoop-env.sh file

First, open the hadoop-env.sh file:
sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Press Alt + / to jump to the end of the file and paste the following lines in the file to add the path
of the Java:
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_CLASSPATH+=" $HADOOP_HOME/lib/*.jar"
add java path in the Hadoop env file

Save changes and exit from the text editor.
Next, change your current working directory to /usr/local/hadoop/lib:
cd /usr/local/hadoop/lib
Here, download the javax activation file:

sudo wget
https://jcenter.bintray.com/javax/activation/javax.activation-api/1.2.0/javax.activation-api-1.2.0.jar
Once done, check the Hadoop version in Ubuntu:
hadoop version
check the installed version of hadoop

Next, you will have to edit the core-site.xml file to specify the URL for the name node.
Edit the core-site.xml file

First, open the core-site.xml file using the following command:
sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml

And add the following lines in between <configuration> </configuration>:
<property>
<name>fs.default.name</name>
<value>hdfs://0.0.0.0:9000</value>
<description>The default file system URI</description>
</property>
configure core-site.xml file to enable Hadoop

Save the changes(ctrl+s) and exit(ctrl+x) from the text editor.
Next, create a directory to store node metadata using the following command:
sudo mkdir -p /home/hadoop/hdfs/{namenode,datanode}
And change the ownership of the created directory to the hadoop user:
sudo chown -R hadoop:hadoop /home/hadoop/hdfs
Edit the hdfs-site.xml configuration file

By configuring the hdfs-site.xml file, you will define the location for storing node metadata,
fs-image file.
So first open the configuration file:
sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

And paste the following line in between <configuration> ... </configuration>:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hdfs/datanode</value>
</property>
Save changes and exit from the hdfs-site.xml file.
Edit the mapred-site.xml file

By editing the mapred-site.xml file, you can define the MapReduce values.
To do that, first, open the configuration file using the following command:
sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

And paste the following line in between <configuration> ... </configuration>:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
edit the mapred-site.xml file to enable hadoop on ubuntu
Save and exit from the nano text editor.
Edit the yarn-site.xml file

This is the last configuration file that needs to be edited to use the Hadoop service.
The purpose of editing this file is to define the YARN settings.
First, open the configuration file:
sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

Paste the following in between <configuration> ... </configuration>:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
edit the yarn-site.xml file to use hadoop in ubuntu
Save changes and exit from the config file.
Finally, use the following command to validate the Hadoop configuration and to format the
HDFS NameNode:
hdfs namenode -format
validate the Hadoop configuration in Ubuntu
Step 6: Start the Hadoop cluster
To start the Hadoop cluster, you will have to start the previously configured nodes.
So let's start with starting the NameNode and DataNode:
start-dfs.sh
start the NameNode and DataNode in the Hadoop cluster in Ubuntu

Next, start the node manager and resource manager:
start-yarn.sh
start the node manager and resource manager to start the Hadoop cluster in Ubuntu
To verify whether the services are running as intended, use the following command:
jps
check running services for the Hadoop cluster in Ubuntu
Step 7: Access the Hadoop web interface

To access the Hadoop web interface, you will have to know your IP(Or use localhost) and
append the port no 9870 in your address bar:
http://server-IP:9870
OR
http://localhost:9870
For any Doubts regarding the Instructions Please refer to

https://learnubuntu.com/install-hadoop/
Note: They have used Hadoop 3.3.4 but as of today the version has been updated to 3.3.6

Hadoop Installation

Uploaded by

Copyright:

Available Formats

Hadoop Installation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hadoop Installation

Uploaded by

Copyright:

Available Formats

Hadoop Installation

Step 1: Installing Java on Ubuntu

sudo apt install default-jdk default-jre -y

Step 2: Create a user for Hadoop and configure SSH

sudo adduser hadoop

sudo usermod -aG sudo hadoop

sudo apt install openssh-server openssh-client -y

Now, add the public key to authorized_keys:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

sudo chmod 640 ~/.ssh/authorized_keys

Step 3: Download and install Apache Hadoop on Ubuntu

tar -xvzf hadoop-3.3.6.tar.gz

sudo mv hadoop-3.3.6 /usr/local/hadoop

sudo mkdir /usr/local/hadoop/logs

sudo chown -R hadoop:hadoop /usr/local/hadoop

Step 4: Configure Hadoop on Ubuntu

First, open the .bashrc file using the following command:

sudo nano ~/.bashrc

Save changes and exit from the nano text editor.

To enable the changes, source the .bashrc file:

Step 5: Configure java environment variables

Edit the hadoop-env.sh file

sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

add java path in the Hadoop env file

Next, change your current working directory to /usr/local/hadoop/lib:

Here, download the javax activation file:

Once done, check the Hadoop version in Ubuntu:

check the installed version of hadoop

Edit the core-site.xml file

sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml

<description>The default file system URI</description>

configure core-site.xml file to enable Hadoop

sudo mkdir -p /home/hadoop/hdfs/{namenode,datanode}

sudo chown -R hadoop:hadoop /home/hadoop/hdfs

Edit the hdfs-site.xml configuration file

So first open the configuration file:

sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Save changes and exit from the hdfs-site.xml file.

Edit the mapred-site.xml file

sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

Edit the yarn-site.xml file

The purpose of editing this file is to define the YARN settings.

First, open the configuration file:

sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

hdfs namenode -format

validate the Hadoop configuration in Ubuntu

Step 6: Start the Hadoop cluster

So let's start with starting the NameNode and DataNode:

start the NameNode and DataNode in the Hadoop cluster in Ubuntu

check running services for the Hadoop cluster in Ubuntu

Step 7: Access the Hadoop web interface

For any Doubts regarding the Instructions Please refer to

You might also like