BDA record
BDA record
BDA record
AIM:
To Install Apache Hadoop.
Hadoop software can be installed in three modes of
Hadoop is a Java-based programming framework that supports the processing and
storage of extremely large datasets on a cluster of inexpensive machines. It was
the first major open source project in the big data playing field and is sponsored
by the Apache Software Foundation.
Hadoop-2.7.3 is comprised of four main layers:
➢ Hadoop Common is the collection of utilities and libraries that support other
Hadoop modules.
➢ HDFS, which stands for Hadoop Distributed File System, is responsible for
persisting data to disk.
➢ YARN, short for Yet Another Resource Negotiator, is the "operating system"
for HDFS.
➢ Map Reduce is the original processing model for Hadoop clusters. It
distributes work within the cluster or map, then organizes and reduces the
results from the nodes into a response to a query. Many other processing
models are available for the 2.x version of Hadoop.
Hadoop clusters are relatively complex to set up, so the project includes a stand-
alone mode which is suitable for learning about Hadoop, performing simple
operations, and debugging
1
REG NO : 411622243038
ALGORITHM:
1. Install Apache Hadoop 2.2.0 in Microsoft Windows OS
If Apache Hadoop 2.2.0 is not already installed then follow the post Build, Install,
Configure and Run Apache Hadoop 2.2.0 in Microsoft Windows OS.
2. Start HDFS (Namenode and Datanode) and YARN (Resource Manager and
Node Manager).
PROGRAM:
2
REG NO : 411622243038
3
REG NO : 411622243038
Found 1 items
-rw-r--r-- 1 ABHIJITG supergroup 55 2014-02-03 13:19 input/file1.txt
C:\hadoop>bin\hdfs dfs -cat input/file1.txt
Install Hadoop
Run Hadoop Wordcount Mapreduce Example
Run the wordcount MapReduce job provided in
%HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-
2.2.0.jar
C:\hadoop>bin\yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-
2.2.0.jar wordcount input output
14/02/03 13:22:02 INFO client.RMProxy: Connecting to ResourceManager at
/0.0.0.0:8032
14/02/03 13:22:03 INFO input.FileInputFormat: Total input paths to process : 1
14/02/03 13:22:03 INFO mapreduce.JobSubmitter: number of splits:1
:
:
14/02/03 13:22:04 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1391412385921_0002
14/02/03 13:22:04 INFO impl.YarnClientImpl: Submitted application
application_1391412385921_0002 to ResourceManager at /0.0.0.0:8032
14/02/03 13:22:04 INFO mapreduce.Job: The url to track the job:
http://ABHIJITG:8088/proxy/application_1391412385921_0002/
14/02/03 13:22:04 INFO mapreduce.Job: Running job:
job_1391412385921_0002
14/02/03 13:22:14 INFO mapreduce.Job: Job job_1391412385921_0002 running
in uber mode : false
14/02/03 13:22:14 INFO mapreduce.Job: map 0% reduce 0%
14/02/03 13:22:22 INFO mapreduce.Job: map 100% reduce 0%
14/02/03 13:22:30 INFO mapreduce.Job: map 100% reduce 100%
4
REG NO : 411622243038
5
REG NO : 411622243038
Bytes Written=59
6
REG NO : 411622243038
OUTPUT:
RESULT:
We've installed Hadoop in stand-alone mode and verified it by running an
example program it provided.
7
REG NO : 411622243038
AIM:
To create and Implementation of file management tasks, such as Adding
files and directories, retrieving files and Deleting files
PROCEDURE:
This command copies files or directories from the local directiores from the local
filesystem to HDFS. Replace <local_path> with the path of the file directory
on your local machine, and <hdfs_path> with the desired destination path
in HDFS.
2. Retrieving Files:
This command copies files or directoires from HDFS to the local filesystem.
Replace <hdfs_path>with the path of the or directory in HDFS, and <local_path>
with the desired destination path on your local machine.
8
REG NO : 411622243038
3. Deleting Files
RESULT:
Thus the program Adding files and directories, retrieving files and Deleting
files successfully completed
9
REG NO : 411622243038
AIM:
To Develop a Map Reduce program to implement Matrix Multiplication.
Matrix multiplication or matrix product is a binary operation in mathematics that
creates a matrix from two matrices. It is influenced by linear equations and vector
transformations, which have applications in applied mathematics, physics, and
engineering. For example, if A is an n × m matrix and B is an m × p matrix, their
matrix product AB is an n × p matrix. The matrix product represents the
composition of two linear transformations represented by matrices.
10
REG NO : 411622243038
ALGORITHM:
a. for each element mij of M do produce (key,value) pairs as ((i,k), (M,j,mij), for
k=1,2,3,.. upto the number of columns of N
b. for each element njk of N do produce (key,value) pairs as ((i,k),(N,j,Njk), for i
= 1,2,3,.. Upto the number of rows of M.
c. return Set of (key,value) pairs that each key (i,k), has list with values (M,j,mij)
and (N, j,njk) for all possible values of j.
11
REG NO : 411622243038
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.ReflectionUtils;
class Element implements Writable {
int tag;
int index;
double value;
Element() {
tag = 0;
index = 0;
value = 0.0;
}
Element(int tag, int index, double value) {
this.tag = tag;
this.index = index;
this.value = value;
}
@Override
public void readFields(DataInput input) throws IOException {
tag = input.readInt();
index = input.readInt();
12
REG NO : 411622243038
value = input.readDouble();
}
@Override
public void write(DataOutput output) throws IOException {
output.writeInt(tag);
output.writeInt(index);
output.writeDouble(value);
}
}
class Pair implements WritableComparable<Pair> {
int i;
int j;
Pair() {
i = 0;
j = 0;
}
Pair(int i, int j) {
this.i = i;
this.j = j;
}
@Override
public void readFields(DataInput input) throws IOException {
i = input.readInt();
j = input.readInt();
}
@Override
public void write(DataOutput output) throws IOException {
output.writeInt(i);
output.writeInt(j);
13
REG NO : 411622243038
}
@Override
public int compareTo(Pair compare) {
if (i > compare.i) {
return 1;
} else if ( i < compare.i) {
return -1;
} else {
if(j > compare.j) {
return 1;
} else if (j < compare.j) {
return -1;
}
}
return 0;
}
public String toString() {
return i + " " + j + " ";
}
}
public class Multiply
{
public static class MatriceMapperM extends
Mapper<Object,Text,IntWritable,Element>
{ 24 Department of CSE
@Override
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String readLine = value.toString();
14
REG NO : 411622243038
15
REG NO : 411622243038
16
REG NO : 411622243038
17
REG NO : 411622243038
OUTPUT:
Result:
Thus, the program implement matrix multiplication with Hadoop Map
Reduce.
18
REG NO : 411622243038
AIM:
Run a basic Word Count MapReduce program to understand MapReduce
paradigm: Count words in a given file. View the output file. Calculate the
execution time
About HDFS
MapReduce is a processing technique and a program model for distributed
computing based on java. The MapReduce algorithm contains two important
tasks, namely Map and Reduce. Map takes a set of data and converts it into
another set of data, where individual elements are broken down into tuples
(key/value pairs). Reduce task, which takes the output from a map as an input and
combines those data tuples into a smaller set of tuples. As the sequence of the
name MapReduce implies, the reduce task is always performed after the map job.
The major advantage of MapReduce is that it is easy to scale data processing over
multiple computing nodes. Under the MapReduce model, the data processing
primitives are called mappers and reducers. Decomposing a data processing
application into mappers and reducers is sometimes nontrivial. But, once we write
an application in the MapReduce form, scaling the application to run over
hundreds, thousands, or even tens of thousands of machines in a cluster is merely
a configuration change. This simple scalability is what has attracted many
programmers to use the MapReduce model.
19
REG NO : 411622243038
manner, Map runs on all the nodes of the cluster and process the data blocks in
parallel.
Step 4: Once all the mappers are finished and their output is shuffled on reducer
nodes then this intermediate output is merged & sorted. Which is then provided
as input to reduce phase.
Step 5: Reduce is the second phase of processing where the user can specify his
own custom business logic as per the requirements. An input to a reducer is
provided from all the mappers. An output of reducer is the final output, which is
written on HDFS.
20
REG NO : 411622243038
PROGRAM:
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
21
REG NO : 411622243038
22
REG NO : 411622243038
OUTPUT:
RESULT:
Thus, the program run a basic Word Count Map Reduce to understand
Map Reduce Paradigm has been executed successfully.
23
REG NO : 411622243038
AIM:
To install and run the Apache Hive
PROCEDURE:
1. Downloading Apache Hive binaries
In order to download Apache Hive binaries, you should go to the following
website: https://downloads.apache.org/hive/hive-3.1.2/. Then, download the
apache-hive-3.1.2.-bin.tar.gz file.
When the file download is complete, we should extract twice (as mentioned
above) the apache-hive.3.1.2-bin.tar.gz archive into “E:\hadoop-env\apache-
hive-3.1.2” directory (Since we decided to use E:\hadoop-env\” as the
installation directory for all technologies used in the previous guide.
24
REG NO : 411622243038
25
REG NO : 411622243038
• HIVE_HOME: “E:\hadoop-env\apache-hive-3.1.2\”
• DERBY_HOME: “E:\hadoop-env\db-derby-10.14.2.0\”
• HIVE_LIB: “%HIVE_HOME%\lib”
• HIVE_BIN: “%HIVE_HOME%\bin”
• HADOOP_USER_CLASSPATH_FIRST: “true”
26
REG NO : 411622243038
3. Configuring Hive
4.1. Copy Derby libraries
Now, we should go to the Derby libraries directory (E:\hadoop-env\db-derby-
10.14.2.0\lib) and copy all *.jar files.
27
REG NO : 411622243038
Then, we should paste them within the Hive libraries directory (E:\hadoop-
env\apache-hive-3.1.2\lib).
28
REG NO : 411622243038
4. Starting Services
4.1. Hadoop Services
To start Apache Hive, open the command prompt utility as administrator.
Then, start the Hadoop services using start-dfs and start-yarn commands (as
illustrated in the Hadoop installation guide).
29
REG NO : 411622243038
RESULT:
Thus, we successfully install and run the apache Hive
successfully.
30
REG NO : 411622243038
AIM:
To perform the Hive operation
ALGORITHM:
31
REG NO : 411622243038
PROGRAM:
CREATE TABLE IF NOT EXISTS employee ( eid int, name String, salary
String, designation String) COMMENT 'Employee details' ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION '/user/input';
32
REG NO : 411622243038
OUTPUT:
RESULT:
Thus, we successfully perform Hive operation
33
REG NO : 411622243038
AIM:
To install HBase in windows
PROCEDURE:
Step-1: (Extraction of files)
Extract all the files in C drive
Step-2:(Creating Folder)
Create folders named "hbase" and "zookeeper."
34
REG NO : 411622243038
35
REG NO : 411622243038
36
REG NO : 411622243038
37
REG NO : 411622243038
Download Thrift:
Visit the Apache Thrift website: https://thrift.apache.org/download.
Download and extract Thrift.
Build and Install Thrift:
./configure
make
sudo make install
38
REG NO : 411622243038
// Perform operations
// ... add your HBase Thrift operations here ...
39
REG NO : 411622243038
Ensure that your HBase Thrift server is running and accessible at the specified
host and port. Also, make sure the necessary HBase Thrift libraries are included
in your Java project's classpath.
The provided Java code connects to an HBase Thrift server, performs unspecified
operations (indicated by comments), and handles exceptions. Since the actual
operations are not specified in the code, the output would depend on what
operations you perform within the try block.
If everything runs successfully (meaning the HBase Thrift server is running and
reachable, and your operations execute without errors), the program will
terminate without any output.
RESULT:
Thus, we install of HBase, Installing thrift along with Practice examples
40
REG NO : 411622243038
AIM:
To perform importing and exporting data from various databases.
Such as HDFS, Apache Hive and Apache spark
PROCEDURE:
Importing Data:
• Use the Hadoop hdfs dfs command-line tool or Hadoop File System API to
copy data from a local file system or another location to HDFS. For
example:
• This command uploads the local_file.txt from the local file system to the
HDFS path /hdfs/path.
2. Apache Hive:
• Hive supports data import from various sources, including local files,
HDFS, and databases. You can use the LOAD DATA statement to import
data into Hive tables. For example:
• This statement loads data from the HDFS path /hdfs/path/data.txt into the
41
REG NO : 411622243038
3. Apache Spark:
• Spark provides rich APIs for data ingestion. You can use th
DataFrameReader or SparkSession APIs to read data from different source
such as CSV files, databases, or streaming systems. For example:
val df = spark.read.format("esv").load("/path/to/data.csv")
• This code reads data from the CSV file located at /path/to/data.csv inte
DataFrame in Spark.
Exporting Data:
• Use the Hadoop hdfs dfs command-line tool or Hadoop File System AP
copy data from HDFS to a local file system or another location. For
example:
• This command downloads the file /hdfs/path/file.txt from HDFS and saves
it as local file.txt in the local file system.
2. Apache Hive:
• Exporting data from Hive can be done in various ways, depending on the
desired output format. You can use the INSERT OVERWRITE statement
to export data from Hive tables to files or other Hive tables. For example:
42
REG NO : 411622243038
• This statement exports the data from the table Hive table to the local
directory /path/to/output.
3. Apache Spark:
• Spark provides flexible options for data export. You can use theDataFrame
Writer or Dataset Writer APIs to write data to different file formats,
databases, or streaming systems. For example:
df.write.format("parquet").save("/path/to/output")
• This code saves the DataFrame df in Parquet format to the specified output
directory.
RESULT:
Thus, we perform importing and exporting data from various databases.
43
REG NO : 411622243038
AIM:
If the above data is given as input, we have to write applications to process it and
produce results such as finding the year of maximum usage, year of minimum
usage, and so on. This is a walkover for the programmers with finite number of
records. They will simply write the logic to produce the required output, and pass
the data to the application written.
But, think of the data representing the electrical consumption of all the largescale
industries of a particular state, since its formation.
44
REG NO : 411622243038
Input Data
The above data is saved as sample.txt and given as input. The input file looks as
shown below.
1979 23 23 2 43 24 25 26 26 26 26 25 26 25
1980 26 27 28 28 28 30 31 31 31 30 30 30 29
1981 31 32 32 32 33 34 35 36 36 34 34 34 34
1984 39 38 39 39 39 41 42 43 40 39 38 38 40
1985 38 39 39 39 39 41 41 41 00 40 39 39 45
PROGRAM:
import java.util.*;
import java.io.IOException;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class ProcessUnits
{
//Mapper class
public static class E_EMapper extends MapReduceBase implements
Mapper<LongWritable ,/*Input key Type */ Text, /*Input value Type*/
Text, /*Output key Type*/ IntWritable> /*Output value Type*/
{
//Map function
public void map(LongWritable key, Text value, OutputCollector<Text,
IntWritable> output, Reporter reporter) throws IOException
{
45
REG NO : 411622243038
46
REG NO : 411622243038
}}
//Main function
public static void main(String args[])throws Exception
{
JobConf conf = new JobConf(ProcessUnits.class);
conf.setJobName("max_eletricityunits");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(E_EMapper.class);
conf.setCombinerClass(E_EReduce.class);
conf.setReducerClass(E_EReduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
47
REG NO : 411622243038
OUTPUT:
Input:
Kolkata,56
Jaipur,45
Delhi,43
Mumbai,34
Goa,45
Kolkata,35
Jaipur,34
Delhi,32
Output:
Kolkata 56
Jaipur 45
Delhi 43
Mumbai 34
RESULT:
Thus, we perform MapReduce to find the maximum electrical consumption
in each year has been executed successfully.
48
REG NO : 411622243038
AIM:
To Develop a MapReduce program to analyze Uber data set to find the
days on which each basement has more trips using the following dataset.
PROCEDURE:
50
REG NO : 411622243038
import java.io.IOException;
import java.text.ParseException;
import java.util.Date;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class Uber1 {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
java.text.SimpleDateFormat format = new
java.text.SimpleDateFormat("MM/dd/yyyy");
String[] days ={"Sun","Mon","Tue","Wed","Thu","Fri","Sat"};
private Text basement = new Text();
Date date = null;
private int trips;
public void map(Object key, Text value, Context context )
throw IOException, InterruptedException
String line = value.toString();
String[] splits = line.split(",");
basement.set(splits[0]);
try {
51
REG NO : 411622243038
date = format.parse(splits[1]);
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
trips = new Integer(splits[3]);
String keys = basement.toString()+ " "+days[date.getDay()];
context.write(new Text(keys), new IntWritable(trips));
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable>
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Uber1");
job.setJarByClass(Uber1.class);
job.setMapperClass(TokenizerMapper.class);
52
REG NO : 411622243038
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
First, we need to build a jar file for the above program and we need to run it as a
normal Hadoop program by passing the input dataset and the output file path as
shown below.
hadoop jar uber1.jar /uber /user/output1
In the output file directory, a part of the file is created and contains the below
53
REG NO : 411622243038
OUTPUT:
54
REG NO : 411622243038
RESULT:
Thus, we perform MapReduce program to analyze Uber data set has been
executed successfully.
55
REG NO : 411622243038
AIM:
To Develop a MapReduce program to find the grades of student’s
ALGORITHM:
56
REG NO : 411622243038
PROGRAM:
import java.util.Scanner;
public class JavaExample
{
public static void main(String args[])
{
int marks[] = new int[6];
int i;
float total=0, avg;
Scanner scanner = new Scanner(System.in);
for(i=0; i<6; i++) {
System.out.print("Enter Marks of Subject"+(i+1)+":");
marks[i] = scanner.nextInt();
total = total + marks[i];
}
scanner.close();
//Calculating average here avg = total/6;
System.out.print("The student Grade is: ");
if(avg>=80)
{
System.out.print("A");
}
else if(avg>=60 && avg<80)
{
System.out.print("B");
}
else if(avg>=40 && avg<60)
{
System.out.print("C");
}
else
{
System.out.print("D");
}
}
}
57
REG NO : 411622243038
OUTPUT:
Enter Marks of Subject1:40
Enter Marks of Subject2:80
Enter Marks of Subject3:80
Enter Marks of Subject4:40
Enter Marks of Subject5:60
Enter Marks of Subject6:60
The student Grade is: B
RESULT:
Thus, we perform MapReduce program to find the grades of student’s has
been executed successfully.
58