Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

BDA record

Download as pdf or txt
Download as pdf or txt
You are on page 1of 58

REG NO : 411622243038

Ex.No: 01 Downloading and installing Hadoop; Understanding


different Hadoop modes. Start-up scripts,
Date:
Configuration files.

AIM:
To Install Apache Hadoop.
Hadoop software can be installed in three modes of
Hadoop is a Java-based programming framework that supports the processing and
storage of extremely large datasets on a cluster of inexpensive machines. It was
the first major open source project in the big data playing field and is sponsored
by the Apache Software Foundation.
Hadoop-2.7.3 is comprised of four main layers:
➢ Hadoop Common is the collection of utilities and libraries that support other
Hadoop modules.
➢ HDFS, which stands for Hadoop Distributed File System, is responsible for
persisting data to disk.
➢ YARN, short for Yet Another Resource Negotiator, is the "operating system"
for HDFS.
➢ Map Reduce is the original processing model for Hadoop clusters. It
distributes work within the cluster or map, then organizes and reduces the
results from the nodes into a response to a query. Many other processing
models are available for the 2.x version of Hadoop.
Hadoop clusters are relatively complex to set up, so the project includes a stand-
alone mode which is suitable for learning about Hadoop, performing simple
operations, and debugging

1
REG NO : 411622243038

ALGORITHM:
1. Install Apache Hadoop 2.2.0 in Microsoft Windows OS
If Apache Hadoop 2.2.0 is not already installed then follow the post Build, Install,
Configure and Run Apache Hadoop 2.2.0 in Microsoft Windows OS.
2. Start HDFS (Namenode and Datanode) and YARN (Resource Manager and
Node Manager).

PROGRAM:

Run following commands. Command Prompt


C:\Users\abhijitg>cd c:\hadoop
c:\hadoop>sbin\start-dfs
c:\hadoop>sbin\start-yarn
starting yarn daemons
Namenode, Datanode, Resource Manager and Node Manager will be started
in few minutes and ready to execute Hadoop MapReduce job in the Single Node
(pseudo-distributed mode) cluster.

2
REG NO : 411622243038

Run wordcount MapReduce job


Now we'll run wordcount MapReduce job available in
%HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-
examples-2.2.0.jar
Create a text file with some content. We'll pass this file as input to the wordcount
MapReduce job for counting words. C:\file1.txt
Install Hadoop
Run Hadoop Wordcount Mapreduce Example
Create a directory (say 'input') in HDFS to keep all the text files (say 'file1.txt') to
be used for counting words.
C:\Users\abhijitg>cd c:\hadoop
C:\hadoop>bin\hdfs dfs -mkdir input
Copy the text file(say 'file1.txt') from local disk to the newly created 'input'
directory in HDFS.
C:\hadoop>bin\hdfs dfs -copyFromLocal c:/file1.txt input

Check content of the copied file.


C:\hadoop>hdfs dfs -ls input

3
REG NO : 411622243038

Found 1 items
-rw-r--r-- 1 ABHIJITG supergroup 55 2014-02-03 13:19 input/file1.txt
C:\hadoop>bin\hdfs dfs -cat input/file1.txt
Install Hadoop
Run Hadoop Wordcount Mapreduce Example
Run the wordcount MapReduce job provided in
%HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-
2.2.0.jar
C:\hadoop>bin\yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-
2.2.0.jar wordcount input output
14/02/03 13:22:02 INFO client.RMProxy: Connecting to ResourceManager at
/0.0.0.0:8032
14/02/03 13:22:03 INFO input.FileInputFormat: Total input paths to process : 1
14/02/03 13:22:03 INFO mapreduce.JobSubmitter: number of splits:1
:
:
14/02/03 13:22:04 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1391412385921_0002
14/02/03 13:22:04 INFO impl.YarnClientImpl: Submitted application
application_1391412385921_0002 to ResourceManager at /0.0.0.0:8032
14/02/03 13:22:04 INFO mapreduce.Job: The url to track the job:
http://ABHIJITG:8088/proxy/application_1391412385921_0002/
14/02/03 13:22:04 INFO mapreduce.Job: Running job:
job_1391412385921_0002
14/02/03 13:22:14 INFO mapreduce.Job: Job job_1391412385921_0002 running
in uber mode : false
14/02/03 13:22:14 INFO mapreduce.Job: map 0% reduce 0%
14/02/03 13:22:22 INFO mapreduce.Job: map 100% reduce 0%
14/02/03 13:22:30 INFO mapreduce.Job: map 100% reduce 100%

4
REG NO : 411622243038

14/02/03 13:22:30 INFO mapreduce.Job: Job job_1391412385921_0002


completed successfully
14/02/03 13:22:31 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=89
FILE: Number of bytes written=160142
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0

HDFS: Number of bytes read=171


HDFS: Number of bytes written=59
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5657
Total time spent by all reduces in occupied slots (ms)=6128
Map-Reduce Framework
Map input records=2
Map output records=7
Map output bytes=82
Map output materialized bytes=89
Input split bytes=116
Combine input records=7
Combine output records=6

5
REG NO : 411622243038

Reduce input groups=6


Reduce shuffle bytes=89
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=145
CPU time spent (ms)=1418
Physical memory (bytes) snapshot=368246784
Virtual memory (bytes) snapshot=513716224
Total committed heap usage (bytes)=307757056
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=55
File Output Format Counters

Bytes Written=59

6
REG NO : 411622243038

OUTPUT:

RESULT:
We've installed Hadoop in stand-alone mode and verified it by running an
example program it provided.

7
REG NO : 411622243038

Ex.No: 02 Hadoop Implementation of file management tasks,


such as Adding files and directories, retrieving files
Date:
and Deleting files

AIM:
To create and Implementation of file management tasks, such as Adding
files and directories, retrieving files and Deleting files

PROCEDURE:

Using Hadoop CLI:


Hadoop CLI provides several commands for file management

1. Adding files and Directories

This command copies files or directories from the local directiores from the local
filesystem to HDFS. Replace <local_path> with the path of the file directory
on your local machine, and <hdfs_path> with the desired destination path
in HDFS.
2. Retrieving Files:

This command copies files or directoires from HDFS to the local filesystem.
Replace <hdfs_path>with the path of the or directory in HDFS, and <local_path>
with the desired destination path on your local machine.

8
REG NO : 411622243038

3. Deleting Files

Using Hadoop JAVA APIs:

RESULT:
Thus the program Adding files and directories, retrieving files and Deleting
files successfully completed
9
REG NO : 411622243038

Ex.No: 03 Implement of Matrix Multiplication with Hadoop


Map Reduce
Date:

AIM:
To Develop a Map Reduce program to implement Matrix Multiplication.
Matrix multiplication or matrix product is a binary operation in mathematics that
creates a matrix from two matrices. It is influenced by linear equations and vector
transformations, which have applications in applied mathematics, physics, and
engineering. For example, if A is an n × m matrix and B is an m × p matrix, their
matrix product AB is an n × p matrix. The matrix product represents the
composition of two linear transformations represented by matrices.

10
REG NO : 411622243038

ALGORITHM:

Algorithm for Map Function:

a. for each element mij of M do produce (key,value) pairs as ((i,k), (M,j,mij), for
k=1,2,3,.. upto the number of columns of N
b. for each element njk of N do produce (key,value) pairs as ((i,k),(N,j,Njk), for i
= 1,2,3,.. Upto the number of rows of M.
c. return Set of (key,value) pairs that each key (i,k), has list with values (M,j,mij)
and (N, j,njk) for all possible values of j.

Algorithm for Reduce Function:


d. for each key (i,k) do
e. sort values begin with M by j in listM sort values begin with N by j in listN
multiply mij and njk for jth value of each list
f. sum up mij x njk return (i,k), Σj=1 mij x njk

Step 1. Download the hadoop jar files with these links.


Download Hadoop Common Jar files: https://goo.gl/G4MyHp
$ wget https://goo.gl/G4MyHp -O hadoop-common-2.2.0.jar
Download Hadoop Mapreduce Jar File: https://goo.gl/KT8yfB
$ wget https://goo.gl/KT8yfB -O hadoop-mapreduce-client-core-2.7.1.jar

Step 2. Creating Mapper file for Matrix Multiplication.


import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.util.ArrayList;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;

11
REG NO : 411622243038

import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.ReflectionUtils;
class Element implements Writable {
int tag;
int index;
double value;
Element() {
tag = 0;
index = 0;
value = 0.0;
}
Element(int tag, int index, double value) {
this.tag = tag;
this.index = index;
this.value = value;
}
@Override
public void readFields(DataInput input) throws IOException {
tag = input.readInt();
index = input.readInt();

12
REG NO : 411622243038

value = input.readDouble();
}
@Override
public void write(DataOutput output) throws IOException {
output.writeInt(tag);
output.writeInt(index);
output.writeDouble(value);
}
}
class Pair implements WritableComparable<Pair> {
int i;
int j;
Pair() {
i = 0;
j = 0;
}
Pair(int i, int j) {
this.i = i;
this.j = j;
}
@Override
public void readFields(DataInput input) throws IOException {
i = input.readInt();
j = input.readInt();
}
@Override
public void write(DataOutput output) throws IOException {
output.writeInt(i);
output.writeInt(j);

13
REG NO : 411622243038

}
@Override
public int compareTo(Pair compare) {
if (i > compare.i) {
return 1;
} else if ( i < compare.i) {
return -1;
} else {
if(j > compare.j) {
return 1;
} else if (j < compare.j) {
return -1;
}
}
return 0;
}
public String toString() {
return i + " " + j + " ";
}
}
public class Multiply
{
public static class MatriceMapperM extends
Mapper<Object,Text,IntWritable,Element>
{ 24 Department of CSE
@Override
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String readLine = value.toString();

14
REG NO : 411622243038

String[] stringTokens = readLine.split(",");


int index = Integer.parseInt(stringTokens[0]);
double elementValue = Double.parseDouble(stringTokens[2]);
Element e = new Element(0, index, elementValue);
IntWritable keyValue = new IntWritable(Integer.parseInt(stringTokens[1]));
context.write(keyValue, e);
}
}
public static class MatriceMapperN extends
Mapper<Object,Text,IntWritable,Element> {
@Override
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String readLine = value.toString();
String[] stringTokens = readLine.split(",");
int index = Integer.parseInt(stringTokens[1]);
double elementValue = Double.parseDouble(stringTokens[2]);
Element e = new Element(1,index, elementValue);
IntWritable keyValue = new IntWritable(Integer.parseInt(stringTokens[0]));
context.write(keyValue, e);
}
}
public static void main(String[] args) throws Exception {
Job job = Job.getInstance();
job.setJobName("MapIntermediate");
job.setJarByClass(Project1.class);
MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class,
MatriceMapperM.class);

15
REG NO : 411622243038

MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class,


MatriceMapperN.class);
job.setReducerClass(ReducerMxN.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(Element.class);
job.setOutputKeyClass(Pair.class);
job.setOutputValueClass(DoubleWritable.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.waitForCompletion(true);
Job job2 = Job.getInstance();
job2.setJobName("MapFinalOutput");
job2.setJarByClass(Project1.class);
job2.setMapperClass(MapMxN.class);
job2.setReducerClass(ReduceMxN.class);
job2.setMapOutputKeyClass(Pair.class);
job2.setMapOutputValueClass(DoubleWritable.class);
job2.setOutputKeyClass(Pair.class);
job2.setOutputValueClass(DoubleWritable.class);
job2.setInputFormatClass(TextInputFormat.class);
job2.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job2, new Path(args[2]));
FileOutputFormat.setOutputPath(job2, new Path(args[3]));
job2.waitForCompletion(true);
}
}
Step 5. Compiling the program in particular folder named as operation
#!/bin/bash
rm -rf multiply.jar classes

16
REG NO : 411622243038

module load hadoop/2.6.0


mkdir -p classes
javac -d classes -cp classes:`$HADOOP_HOME/bin/hadoop classpath`
Multiply.java
jar cf multiply.jar -C classes .
echo "end"

Step 6. Running the program in particular folder named as operation


export HADOOP_CONF_DIR=/home/$USER/cometcluster
module load hadoop/2.6.0
myhadoop-configure.sh
start-dfs.sh
start-yarn.sh
hdfs dfs -mkdir -p /user/$USER
hdfs dfs -put M-matrix-large.txt /user/$USER/M-matrix-large.txt
hdfs dfs -put N-matrix-large.txt /user/$USER/N-matrix-large.txt
hadoop jar multiply.jar edu.uta.cse6331.Multiply /user/$USER/M-matrix-
large.txt /user/$USER/N-matrix-large.txt /user/$USER/intermediate
/user/$USER/output
rm -rf output-distr
mkdir output-distr
hdfs dfs -get /user/$USER/output/part* output-distr
stop-yarn.sh
stop-dfs.sh
myhadoop-cleanup.sh

17
REG NO : 411622243038

OUTPUT:

module load hadoop/2.6.0


rm -rf output intermediate
hadoop --config $HOME jar multiply.jar edu.uta.cse6331.Multiply M-matrix-
small.txt N-matrix-small.txt intermediate output.

Result:
Thus, the program implement matrix multiplication with Hadoop Map
Reduce.

18
REG NO : 411622243038

Ex.No: 04 Run a basic Word Count Map Reduce program to


understand Map Reduce Paradigm.
Date:

AIM:
Run a basic Word Count MapReduce program to understand MapReduce
paradigm: Count words in a given file. View the output file. Calculate the
execution time
About HDFS
MapReduce is a processing technique and a program model for distributed
computing based on java. The MapReduce algorithm contains two important
tasks, namely Map and Reduce. Map takes a set of data and converts it into
another set of data, where individual elements are broken down into tuples
(key/value pairs). Reduce task, which takes the output from a map as an input and
combines those data tuples into a smaller set of tuples. As the sequence of the
name MapReduce implies, the reduce task is always performed after the map job.
The major advantage of MapReduce is that it is easy to scale data processing over
multiple computing nodes. Under the MapReduce model, the data processing
primitives are called mappers and reducers. Decomposing a data processing
application into mappers and reducers is sometimes nontrivial. But, once we write
an application in the MapReduce form, scaling the application to run over
hundreds, thousands, or even tens of thousands of machines in a cluster is merely
a configuration change. This simple scalability is what has attracted many
programmers to use the MapReduce model.

Below are the steps for MapReduce data flow:

Step 1: One block is processed by one mapper at a time. In the mapper, a


developer can specify his own business logic as per the requirements. In this

19
REG NO : 411622243038

manner, Map runs on all the nodes of the cluster and process the data blocks in
parallel.

Step 2: Output of Mapper also known as intermediate output is written to the


local disk. An output of mapper is not stored on HDFS as this is temporary data
and writing on HDFS will create unnecessary many copies.

Step 3: Output of mapper is shuffled to reducer node (which is a normal slave


node but reduce phase will run here hence called as reducer node). The
shuffling/copying is a physical movement of data which is done over the network.

Step 4: Once all the mappers are finished and their output is shuffled on reducer
nodes then this intermediate output is merged & sorted. Which is then provided
as input to reduce phase.

Step 5: Reduce is the second phase of processing where the user can specify his
own custom business logic as per the requirements. An input to a reducer is
provided from all the mappers. An output of reducer is the final output, which is
written on HDFS.

20
REG NO : 411622243038

PROGRAM:
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class wordCount {


public static class Map extends Mapper<LongWritable, Text, Text,
IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(LongWritable key, Text value, Context context)


throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}

21
REG NO : 411622243038

public static class Reduce extends Reducer<Text, IntWritable, Text,


IntWritable> {

public void reduce(Text key, Iterable <IntWritable> values, Context context)


throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();

Job job = new Job(conf, "wordcount");


job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}

22
REG NO : 411622243038

OUTPUT:

RESULT:
Thus, the program run a basic Word Count Map Reduce to understand
Map Reduce Paradigm has been executed successfully.

23
REG NO : 411622243038

Ex.No: 05(a) To install and run Hive


Date:

AIM:
To install and run the Apache Hive
PROCEDURE:
1. Downloading Apache Hive binaries
In order to download Apache Hive binaries, you should go to the following
website: https://downloads.apache.org/hive/hive-3.1.2/. Then, download the
apache-hive-3.1.2.-bin.tar.gz file.

When the file download is complete, we should extract twice (as mentioned
above) the apache-hive.3.1.2-bin.tar.gz archive into “E:\hadoop-env\apache-
hive-3.1.2” directory (Since we decided to use E:\hadoop-env\” as the
installation directory for all technologies used in the previous guide.

2. Setting environment variables

After extracting Derby and Hive archives, we should go to Control Panel


> System and Security > System. Then Click on “Advanced system settings”.

24
REG NO : 411622243038

In the advanced system settings dialog, click on “Environment variables” button.

25
REG NO : 411622243038

Now we should add the following user variables:

• HIVE_HOME: “E:\hadoop-env\apache-hive-3.1.2\”

• DERBY_HOME: “E:\hadoop-env\db-derby-10.14.2.0\”

• HIVE_LIB: “%HIVE_HOME%\lib”

• HIVE_BIN: “%HIVE_HOME%\bin”

• HADOOP_USER_CLASSPATH_FIRST: “true”

26
REG NO : 411622243038

Besides, we should add the following system variable:


• HADOOP_USER_CLASSPATH_FIRST: “true”
Now, we should edit the Path user variable to add the following paths:
• %HIVE_BIN%
• %DERBY_HOME%\bin

3. Configuring Hive
4.1. Copy Derby libraries
Now, we should go to the Derby libraries directory (E:\hadoop-env\db-derby-
10.14.2.0\lib) and copy all *.jar files.

27
REG NO : 411622243038

Then, we should paste them within the Hive libraries directory (E:\hadoop-
env\apache-hive-3.1.2\lib).

28
REG NO : 411622243038

3.2. Configuring hive-site.xml

Now, we should go to the Apache Hive configuration directory


(E:\hadoop-env\apache-hive-3.1.2\conf) create a new file “hive-site.xml”. We
should paste the following XML code within this file:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration><property> <name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby://localhost:1527/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property><property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.ClientDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<description>Enable user impersonation for HiveServer2</description>
<value>true</value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>NONE</value>
<description> Client authentication types. NONE: no authentication check LDAP:
LDAP/AD based authentication KERBEROS: Kerberos/GSSAPI authentication
CUSTOM: Custom authentication provider (Use with property
hive.server2.custom.authentication.class) </description>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>True</value>
</property>
</configuration>

4. Starting Services
4.1. Hadoop Services
To start Apache Hive, open the command prompt utility as administrator.
Then, start the Hadoop services using start-dfs and start-yarn commands (as
illustrated in the Hadoop installation guide).
29
REG NO : 411622243038

4.2. Derby Network Server


Then, we should start the Derby network server on the localhost using the
following command:
E:\hadoop-env\db-derby-10.14.2.0\bin\StartNetworkServer -h 0.0.0.0

5. Starting Apache Hive


Now, let try to open a command prompt tool and go to the Hive binaries
directory (E:\hadoop-env\apache-hive-3.1.2\bin) and execute the following
command:
hive

RESULT:
Thus, we successfully install and run the apache Hive
successfully.

30
REG NO : 411622243038

Ex.No: 05(b) Hive Operations


Date:

AIM:
To perform the Hive operation
ALGORITHM:

Step 1: Create a Database (if not exists)


Database name (`userdb`).

Step 2: Create a Table (if not exists)


Table name (`employee`), columns (`eid`, `name`, `salary`, `designation`),
delimiters (`'\t'`, `'\n'`), and storage location (`'/user/input'`).

Step 3: Load Data into the Table


Path to the local data file (`inputdata.txt`).

Step 4: Create a View


Input: View name (`writer_editor`), condition (`designation='Writer' OR
designation='Editor'`).

Step 5: Create an Index (with Deferred Rebuild)


Index name (`index_salary`), indexed column (`salary`), index handler
(`'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'`).

Step 6: Query Data


Retrieve and display all records from the `employee` table.
Retrieve and display records from the `writer_editor` view.

31
REG NO : 411622243038

PROGRAM:

CREATE DATABASE IF NOT EXISTS userdb;

CREATE TABLE IF NOT EXISTS employee ( eid int, name String, salary
String, designation String) COMMENT 'Employee details' ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION '/user/input';

LOAD DATA LOCAL INPATH 'inputdata.txt' OVERWRITE INTO TABLE


employee;

CREATE VIEW writer_editor AS SELECT * FROM employee WHERE


designation='Writer' or designation='Editor';

CREATE INDEX index_salary ON TABLE employee(salary) AS


'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH
DEFERRED REBUILD;

SELECT * from employee;

SELECT * from writer_editor;

32
REG NO : 411622243038

OUTPUT:

RESULT:
Thus, we successfully perform Hive operation

33
REG NO : 411622243038

Ex.No: 06 Installation of HBase, Installing thrift along with


Practice examples
Date:

AIM:
To install HBase in windows
PROCEDURE:
Step-1: (Extraction of files)
Extract all the files in C drive

Step-2:(Creating Folder)
Create folders named "hbase" and "zookeeper."

34
REG NO : 411622243038

Step-3: (Deleting line in HBase.cmd)


Open hbase.cmd in any text editor.
Search for line %HEAP_SETTINGS% and remove it.

Step-4: (Add lines in hbase-env.cmd)


Now open hbase-env.cmd, which is in the conf folder in any text editor.
set JAVA_HOME=%JAVA_HOME%
set HBASE_CLASSPATH=%HBASE_HOME%\lib\client-facing-thirdparty\*
set HBASE_HEAPSIZE=8000
set HBASE_OPTS="-XX:+UseConcMarkSweepGC" "-Djava.net.preferIPv4Stack=true"
set SERVER_GC_OPTS="-verbose:gc" "-XX:+PrintGCDetails" "-
XX:+PrintGCDateStamps" %HBASE_GC_OPTS%
set HBASE_USE_GC_LOGFILE=true

set HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false" "-


Dcom.sun.management.jmxremote.authenticate=false"

35
REG NO : 411622243038

set HBASE_MASTER_OPTS=%HBASE_JMX_BASE% "-


Dcom.sun.management.jmxremote.port=10101"
set HBASE_REGIONSERVER_OPTS=%HBASE_JMX_BASE% "-
Dcom.sun.management.jmxremote.port=10102"
set HBASE_THRIFT_OPTS=%HBASE_JMX_BASE% "-
Dcom.sun.management.jmxremote.port=10103"
set HBASE_ZOOKEEPER_OPTS=%HBASE_JMX_BASE% -
Dcom.sun.management.jmxremote.port=10104"
set HBASE_REGIONSERVERS=%HBASE_HOME%\conf\regionservers
set HBASE_LOG_DIR=%HBASE_HOME%\logs
set HBASE_IDENT_STRING=%USERNAME%
set HBASE_MANAGES_ZK=true

Step-6: (Setting Environment Variables)


Now set up the environment variables.
Search "System environment variables."

36
REG NO : 411622243038

Now click on " Environment Variables."

Then click on "New."

37
REG NO : 411622243038

Variable name: HBASE_HOME


Variable Value: Put the path of the Hbase folder.
We have completed the HBase Setup on Windows procedure.

Step 7: Install Apache Thrift

Download Thrift:
Visit the Apache Thrift website: https://thrift.apache.org/download.
Download and extract Thrift.
Build and Install Thrift:
./configure
make
sudo make install

38
REG NO : 411622243038

Step 8: Practice Examples (Using Java with HBase and Thrift)


Below is a simple Java example demonstrating how to use Apache Thrift to
interact with HBase:
import org.apache.thrift.TException;
import org.apache.thrift.protocol.TBinaryProtocol;
import org.apache.thrift.transport.TSocket;
import org.apache.thrift.transport.TTransport;
import org.apache.hadoop.hbase.thrift.generated.Hbase;

public class HBaseThriftExample {

public static void main(String[] args) {


TTransport transport = new TSocket("localhost", 9090);
try {
transport.open();

// Create Thrift client


Hbase.Client client = new Hbase.Client(new
TBinaryProtocol(transport));

// Perform operations
// ... add your HBase Thrift operations here ...

// Close the transport


transport.close();
} catch (TException e) {
e.printStackTrace();
}
}
}

39
REG NO : 411622243038

Ensure that your HBase Thrift server is running and accessible at the specified
host and port. Also, make sure the necessary HBase Thrift libraries are included
in your Java project's classpath.

The provided Java code connects to an HBase Thrift server, performs unspecified
operations (indicated by comments), and handles exceptions. Since the actual
operations are not specified in the code, the output would depend on what
operations you perform within the try block.

If everything runs successfully (meaning the HBase Thrift server is running and
reachable, and your operations execute without errors), the program will
terminate without any output.

RESULT:
Thus, we install of HBase, Installing thrift along with Practice examples

40
REG NO : 411622243038

Ex.No: 07 Practice importing and exporting data from


various databases.
Date:

AIM:
To perform importing and exporting data from various databases.
Such as HDFS, Apache Hive and Apache spark

PROCEDURE:

Importing Data:

1. Hadoop Distributed File System (HDFS):

• Use the Hadoop hdfs dfs command-line tool or Hadoop File System API to
copy data from a local file system or another location to HDFS. For
example:

S hdfs dfs -put local_file.txt /hdfs/path

• This command uploads the local_file.txt from the local file system to the
HDFS path /hdfs/path.

2. Apache Hive:

• Hive supports data import from various sources, including local files,
HDFS, and databases. You can use the LOAD DATA statement to import
data into Hive tables. For example:

LOAD DATA INPATH '/hdfs/path/data.txt' INTO TABLE my_table;

• This statement loads data from the HDFS path /hdfs/path/data.txt into the

41
REG NO : 411622243038

Hive table my_table.

3. Apache Spark:

• Spark provides rich APIs for data ingestion. You can use th
DataFrameReader or SparkSession APIs to read data from different source
such as CSV files, databases, or streaming systems. For example:

val df = spark.read.format("esv").load("/path/to/data.csv")

• This code reads data from the CSV file located at /path/to/data.csv inte
DataFrame in Spark.

Exporting Data:

1. Hadoop Distributed File System (HDFS):

• Use the Hadoop hdfs dfs command-line tool or Hadoop File System AP
copy data from HDFS to a local file system or another location. For
example:

S hdfs dfs -get/hdfs/path/file.txt local_file.txt

• This command downloads the file /hdfs/path/file.txt from HDFS and saves
it as local file.txt in the local file system.

2. Apache Hive:

• Exporting data from Hive can be done in various ways, depending on the
desired output format. You can use the INSERT OVERWRITE statement
to export data from Hive tables to files or other Hive tables. For example:
42
REG NO : 411622243038

INSERT OVERWRITE LOCAL DIRECTORY '/path/to/output


SELECT FROM my_table;

• This statement exports the data from the table Hive table to the local
directory /path/to/output.

3. Apache Spark:

• Spark provides flexible options for data export. You can use theDataFrame
Writer or Dataset Writer APIs to write data to different file formats,
databases, or streaming systems. For example:

df.write.format("parquet").save("/path/to/output")

• This code saves the DataFrame df in Parquet format to the specified output
directory.

RESULT:
Thus, we perform importing and exporting data from various databases.

43
REG NO : 411622243038

Ex.No: 08 MapReduce to find the maximum electrical


consumption in each year
Date:

AIM:

To Develop a MapReduce to find the maximum electrical consumption in


each year given electrical consumption for each month in each year.
PROCEDURE:

Given below is the data regarding the electrical consumption of an


organization It contains the monthly electrical consumption and the annual
average for various years.

If the above data is given as input, we have to write applications to process it and
produce results such as finding the year of maximum usage, year of minimum
usage, and so on. This is a walkover for the programmers with finite number of
records. They will simply write the logic to produce the required output, and pass
the data to the application written.
But, think of the data representing the electrical consumption of all the largescale
industries of a particular state, since its formation.

When we write applications to process such bulk data,


• They will take a lot of time to execute.
• There will be a heavy network traffic when we move data from source to
network server and so on.
To solve these problems, we have the MapReduce framework

44
REG NO : 411622243038

Input Data
The above data is saved as sample.txt and given as input. The input file looks as
shown below.
1979 23 23 2 43 24 25 26 26 26 26 25 26 25
1980 26 27 28 28 28 30 31 31 31 30 30 30 29
1981 31 32 32 32 33 34 35 36 36 34 34 34 34
1984 39 38 39 39 39 41 42 43 40 39 38 38 40
1985 38 39 39 39 39 41 41 41 00 40 39 39 45

PROGRAM:
import java.util.*;
import java.io.IOException;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class ProcessUnits
{
//Mapper class
public static class E_EMapper extends MapReduceBase implements
Mapper&lt;LongWritable ,/*Input key Type */ Text, /*Input value Type*/
Text, /*Output key Type*/ IntWritable&gt; /*Output value Type*/
{
//Map function
public void map(LongWritable key, Text value, OutputCollector&lt;Text,
IntWritable&gt; output, Reporter reporter) throws IOException
{

45
REG NO : 411622243038

String line = value.toString(); String lasttoken = null;


StringTokenizer s = new StringTokenizer(line,"\t");
String year = s.nextToken();
while(s.hasMoreTokens())
{
lasttoken=s.nextToken();
}
int avgprice = Integer.parseInt(lasttoken);
output.collect(new Text(year), new IntWritable(avgprice));
}
}
//Reducer class
public static class E_EReduce extends MapReduceBase implements
Reducer&lt; Text, IntWritable, Text, IntWritable &gt;
{
//Reduce function
public void reduce( Text key, Iterator &lt;IntWritable&gt; values,
OutputCollector&lt;Text, IntWritable&gt; output, Reporter reporter) throws
IOException
{
int maxavg=30;
int val=Integer.MIN_VALUE;
while (values.hasNext())
{
if((val=values.next().get())&gt;maxavg)
{
output.collect(key, new IntWritable(val));
}
}

46
REG NO : 411622243038

}}
//Main function
public static void main(String args[])throws Exception
{
JobConf conf = new JobConf(ProcessUnits.class);
conf.setJobName("max_eletricityunits");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(E_EMapper.class);
conf.setCombinerClass(E_EReduce.class);
conf.setReducerClass(E_EReduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}

47
REG NO : 411622243038

OUTPUT:
Input:
Kolkata,56
Jaipur,45
Delhi,43
Mumbai,34
Goa,45
Kolkata,35
Jaipur,34
Delhi,32
Output:
Kolkata 56
Jaipur 45
Delhi 43
Mumbai 34

RESULT:
Thus, we perform MapReduce to find the maximum electrical consumption
in each year has been executed successfully.

48
REG NO : 411622243038

Ex.No: 09 MapReduce program to analyze Uber data set


Date:

AIM:
To Develop a MapReduce program to analyze Uber data set to find the
days on which each basement has more trips using the following dataset.
PROCEDURE:

Problem Statement 1: In this problem statement, we will find the days on


which each basement has more trips.
PROGRAM:
Mapper Class:

public static class TokenizerMapper


extends Mapper<Object, Text, Text, IntWritable>{
java.text.SimpleDateFormat format = new
java.text.SimpleDateFormat("MM/dd/yyyy");
String[] days ={"Sun","Mon","Tue","Wed","Thu","Fri","Sat"};
private Text basement = new Text();
Date date = null;
private int trips;
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
String line = value.toString();
String[] splits = line.split(",");
basement.set(splits[0]);
try {
date = format.parse(splits[1]);
} catch (ParseException e) {
49
REG NO : 411622243038

// TODO Auto-generated catch block


e.printStackTrace();
}
trips = new Integer(splits[3]);
String keys = basement.toString()+ " "+days[date.getDay()];
context.write(new Text(keys), new IntWritable(trips));
}
}
Reducer Class:
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}

50
REG NO : 411622243038

Whole Source Code:

import java.io.IOException;
import java.text.ParseException;
import java.util.Date;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class Uber1 {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
java.text.SimpleDateFormat format = new
java.text.SimpleDateFormat("MM/dd/yyyy");
String[] days ={"Sun","Mon","Tue","Wed","Thu","Fri","Sat"};
private Text basement = new Text();
Date date = null;
private int trips;
public void map(Object key, Text value, Context context )
throw IOException, InterruptedException
String line = value.toString();
String[] splits = line.split(",");
basement.set(splits[0]);
try {

51
REG NO : 411622243038

date = format.parse(splits[1]);
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
trips = new Integer(splits[3]);
String keys = basement.toString()+ " "+days[date.getDay()];
context.write(new Text(keys), new IntWritable(trips));
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable>
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Uber1");
job.setJarByClass(Uber1.class);
job.setMapperClass(TokenizerMapper.class);

52
REG NO : 411622243038

job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Running the Program:

First, we need to build a jar file for the above program and we need to run it as a
normal Hadoop program by passing the input dataset and the output file path as
shown below.
hadoop jar uber1.jar /uber /user/output1
In the output file directory, a part of the file is created and contains the below

53
REG NO : 411622243038

OUTPUT:

B02512 Sat 15026


B02512 Sun 10487
B02512 Thu 15809
B02512 Tue 12041
B02512 Wed 12691
B02598 Fri 93126
B02598 Mon 60882
B02598 Sat 94588
B02598 Sun 66477
B02598 Thu 90333
B02598 Tue 63429
B02598 Wed 71956
B02617 Fri 125067
B02617 Mon 80591
B02617 Sat 127902
B02617 Sun 91722
B02617 Thu 118254
B02617 Tue 86602
B02617 Wed 94887
B02682 Fri 114662
B02682 Mon 74939
B02682 Sat 120283
B02682 Sun 82825
B02682 Thu 106643
B02682 Tue 76905
B02682 Wed 86252
B02764 Fri 326968

54
REG NO : 411622243038

B02764 Mon 214116


B02764 Sat 356789
B02764 Sun 249896
B02764 Thu 304200
B02764 Tue 221343
B02764 Wed 241137
B02765 Fri 34934
B02765 Mon 21974
B02765 Sat 36737

RESULT:
Thus, we perform MapReduce program to analyze Uber data set has been
executed successfully.

55
REG NO : 411622243038

Ex.No: 10 MapReduce program to find the grades of


student’s
Date:

AIM:
To Develop a MapReduce program to find the grades of student’s
ALGORITHM:

Step 1: Input Marks and Calculate Average


1. Initialize an array `marks[]` of size 6 and variables `i` for iteration and
`total` to store the total marks.
2. Create a `Scanner` object `scanner` to read user input.
3. Use a loop to iterate from `i=0` to `i<6`:
- Prompt the user for marks of Subject `(i+1)`.
- Read the input marks and store them in `marks[i]`.
- Add `marks[i]` to `total`.
4. Calculate the average marks by dividing `total` by 6:
float avg = total / 6;
Step 2: Determine Grade
1. Use `if-else if-else` statements to determine the grade based on `avg`.
- If `avg` is >= 80, print "A".
- Else if `avg` is >= 60, print "B".
- Else if `avg` is >= 40, print "C".
- Else, print "D".
Step 3: Output the Result and Close Scanner
1. Print the student's grade calculated in Step 2.
2. Close the `Scanner` object using `scanner.close()` to release resources.
Step 4: End
1. End the algorithm.

56
REG NO : 411622243038

PROGRAM:

import java.util.Scanner;
public class JavaExample
{
public static void main(String args[])
{
int marks[] = new int[6];
int i;
float total=0, avg;
Scanner scanner = new Scanner(System.in);
for(i=0; i<6; i++) {
System.out.print("Enter Marks of Subject"+(i+1)+":");
marks[i] = scanner.nextInt();
total = total + marks[i];
}
scanner.close();
//Calculating average here avg = total/6;
System.out.print("The student Grade is: ");
if(avg>=80)
{
System.out.print("A");
}
else if(avg>=60 && avg<80)
{
System.out.print("B");
}
else if(avg>=40 && avg<60)
{
System.out.print("C");
}
else
{
System.out.print("D");
}
}
}

57
REG NO : 411622243038

OUTPUT:
Enter Marks of Subject1:40
Enter Marks of Subject2:80
Enter Marks of Subject3:80
Enter Marks of Subject4:40
Enter Marks of Subject5:60
Enter Marks of Subject6:60
The student Grade is: B

RESULT:
Thus, we perform MapReduce program to find the grades of student’s has
been executed successfully.
58

You might also like