An Experimental Approach Towards Big Data For Analyzing Memory Utilization On A Hadoop Cluster Using Hdfs and Mapreduce
An Experimental Approach Towards Big Data For Analyzing Memory Utilization On A Hadoop Cluster Using Hdfs and Mapreduce
An Experimental Approach Towards Big Data For Analyzing Memory Utilization On A Hadoop Cluster Using Hdfs and Mapreduce
Sanjay Agrawal
I.
INTRODUCTION
c
978-1-4799-3486-7/14/$31.002014
IEEE
Big data is a big challenge for the data analysts. The different
aspect of the big data makes it difficult to manage. Big data
require speed for its processing. This huge amount of data
requires fast information retrieval techniques that can retrieve
data from this huge amount. There are different tools available
for handling big data. Most of them use a distributed storage
for storing the data and for processing the data uses parallel
computing techniques. Hadoop provides the solution for the
big data. The Hadoop distributed file system is an efficient
system for providing storage to the big data. Yahoos'
MapReduce provides terminology for processing the data.
Apache Hadoop uses both hadoop distirbuted file system and
the map reduce. HDFS stores data on the different nodes. This
storage is in the form blocks. The default size of the block is 64
MB. The Hadoop system consists of the Name Node, Secondry
Node, Data Node, Job Tracker, Task Tracker. Name Node
work as a centralized node in the big data setup. Any request
for the retrieval of information pass through the Name Node.
There can be two types of setup the Hadoop. One is the single
Node setup, Multi Node setup. In case of the first all the
component of the Hadoop will be on the same node and in case
of the second component can be the different nodes. The paper
is divided into five sections. The first section is the introduction
section, the second is the related work done in this area. The
third section is the experimental setup that we have for
performing those experiments. The fourth section shows the
result that we have obtained during our experiments. Fifth is
the conclusion and the recommendation that can be taken care
in establishing a Hadoop cluster.
II.
RELATED WORK
442
We started our experiment with having the name node and the
secondary name node on the same system and the datanode
also on the same node. As we proceed further we have added a
datanode each time in our experiments.
EXPERIMENTAL SETUP
System
Dell
RAM
4GB
Disk
30 GB
Processor
CPU
64 bit
Operating
System
Ubuntu 12.04
Installation
Process
Wubi installer
Hadoop
Hadoop-1.2.1-bin.tar [8]
Java
Java OpenJdk 6
IP addresses
Class B address
RESULTS
SLOTS_MILLIS_MAPS
Experiment No Value
EXP1
117015
EXP2
159016
EXP3
206981
EXP4
185059
EXP5
195332
443
444
SLOTS_MILLIS_REDUCES
Experiment No
Value
EXP1
720142
EXP2
521457
EXP3
482919
EXP4
442648
EXP5
452687
Table 6 provides the amount of time spent by the reducer in a
slot for a experiment. The behavior is graphically shown in the
graph 6.
Graph 4. Virtual Memory snapshot
SLOTS_MILLIS_MAPS
Experiment No Value
EXP1
486850
EXP2
528451
EXP3
535697
EXP4
557874
EXP5
467145
Graph 5 shows the behaviour of the SLOTS_MILLIS_MAPS
with increasing number of nodes in the cluster. This output is
for the processing of the data.
445
12229713920
17475268608
18773729280
20090236928
21404524544
The map tasks are performed in the cluster for finding the
location of the exact block on which the data is actually
stored. In our next analysis, we have analyzed the number of
map tasks and the data local map task for processing the data.
Table 11 Map tasks
446
Table 11 shows the map task which are run on local and are
launched on the cluster. The graphical output is shown in the
graph 10.
[3]
[4]
V.
[5]
[6]
[7]
[8]
http://en.wikipedia.org/wiki/Data
Changqing Ji, Yu Li, Wenming Qiu, Uchechukwu Awada, Keqiu Li,
"Big Data Processing in Cloud Computing Environments" 2012
International Symposium on Pervasive Systems, Algorithms and
Networks.
Wei Tan, M. Brian Blake and Iman Saleh, Schahram Dustdar "SocialNetwork-Sourced Big Data Analytics" IEEE Computer Society 10897801/13 2013 IEEE
Aditya B. Patel, Manashvi Birla, Ushma Nair, Addressing Big Data
Problem Using Hadoop and Map Reduce, NUiCONE-2012, 0608DECEMBER, 2012.
Zibin Zheng, Jieming Zhu, and Michael R. Lyu , Service-generated Big
Data and Big Data-as-a-Service: An Overview , 978-0-7695-5006-0/13
2013 IEEE.
Yang Song, Gabriel Alatorre, Nagapramod Mandagere, and Aameek
Singh,"Storage Mining: Where IT Management Meets BigData
Analytics", IEEE International Congress on Big Data 2013.
Parameters
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/ha
doop-mapreduce-clientcore/0.23.1/org/apache/hadoop/mapreduce/JobCounter.properties
http://hadoop.apache.org/releases.html.
447