Welcome to Scribd!

0% found this document useful (0 votes)

7 views

U1-Lec 4

Uploaded by

The document is a lecture on introducing Apache Hadoop. It discusses that Hadoop is an open-source software framework used for storing and processing large datasets in a distributed computing environment. Hadoop has two main components - HDFS for storage and YARN for resource management. It also includes additional modules like Hive, Pig, and HBase that provide further functionality. Some key features of Hadoop are that it is fault tolerant, highly available, has easy programming and huge flexible storage at low cost.

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

U1-Lec 4

Uploaded by

Papu Kutty

0% found this document useful (0 votes)

7 views12 pages

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Download as pptx, pdf, or txt

0% found this document useful (0 votes)

7 views12 pages

U1-Lec 4

Uploaded by

Papu Kutty

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Download as pptx, pdf, or txt

Jump to Page

You are on page 1of 12

Search inside document

BIG DATA

UNIT 1

Lecture 4
Introducing Apache Hadoop

Prepared By
Mrs.J.Gokulapriya
Assistant Professor- CS
Department of Computer Science
Rathinam College of Arts and Science

22MCS3CB- Big Data Analytics - Lecture 4 | Page 1

INTRODUCTION

• Apache Software Foundation is the developers of

Hadoop, and it’s co-founders are Doug
Cutting and Mike Cafarella. It’s co-founder Doug
Cutting named it on his son’s toy elephant. In
October 2003 the first paper release was Google File
System. In January 2006, MapReduce development
started on the Apache Nutch which consisted of
around 6000 lines coding for it and around 5000 lines
coding for HDFS. In April 2006 Hadoop 0.1.0 was
released.

22MCS3CB- Big Data Analytics - Lecture 4

| Page 2
• What is Hadoop?
• Hadoop is an open source software programming
framework for storing a large amount of data and
performing the computation. Its framework is based
on Java programming with some native code in C and
shell scripts.
• Hadoop is an open-source software framework that is
used for storing and processing large amounts of data
in a distributed computing environment. It is
designed to handle big data and is based on the
MapReduce programming model, which allows for
the parallel processing of large datasets.

22MCS3CB- Big Data Analytics - Lecture 4

| Page 3
• Hadoop has two main components:
• HDFS (Hadoop Distributed File System): This is the
storage component of Hadoop, which allows for the
storage of large amounts of data across multiple
machines. It is designed to work with commodity
hardware, which makes it cost-effective.
• YARN (Yet Another Resource Negotiator): This is the
resource management component of Hadoop, which
manages the allocation of resources (such as CPU and
memory) for processing the data stored in HDFS.

22MCS3CB- Big Data Analytics - Lecture 4

| Page 4
• Hadoop also includes several additional modules that
provide additional functionality, such as Hive (a
SQL-like query language), Pig (a high-level platform
for creating MapReduce programs), and HBase (a
non-relational, distributed database).
• Hadoop is commonly used in big data scenarios such
as data warehousing, business intelligence, and
machine learning. It’s also used for data processing,
data analysis, and data mining. It enables the
distributed processing of large data sets across
clusters of computers using a simple programming
model.
22MCS3CB- Big Data Analytics - Lecture 4
| Page 5
• Features of hadoop:
• 1. it is fault tolerance.
• 2. it is highly available.
• 3. it’s programming is easy.
• 4. it have huge flexible storage.
• 5. it is low cost.

22MCS3CB- Big Data Analytics - Lecture 4

| Page 6
Hadoop Distributed File System

22MCS3CB- Big Data Analytics - Lecture 4

| Page 7
• Advantages of HDFS: It is inexpensive, immutable
in nature, stores data reliably, ability to tolerate faults,
scalable, block structured, can process a large amount
of data simultaneously and many more.
• Disadvantages of HDFS: It’s the biggest
disadvantage is that it is not fit for small quantities of
data. Also, it has issues related to potential stability,
restrictive and rough in nature. Hadoop also supports
a wide range of software packages such as Apache
Flumes, Apache Oozie, Apache HBase, Apache
Sqoop, Apache Spark, Apache Storm, Apache Pig,
Apache Hive, Apache Phoenix, Cloudera Impala.
22MCS3CB- Big Data Analytics - Lecture 4
| Page 8
• Hadoop framework is made up of the following
modules:
• Hadoop MapReduce- a MapReduce programming
model for handling and processing large data.
• Hadoop Distributed File System- distributed files in
clusters among nodes.
• Hadoop YARN- a platform which manages
computing resources.
• Hadoop Common- it contains packages and libraries
which are used for other modules.

22MCS3CB- Big Data Analytics - Lecture 4

| Page 9
• Advantages:
• Ability to store a large amount of data.
• High flexibility.
• Cost effective.
• High computational power.
• Tasks are independent.
• Linear scaling.

22MCS3CB- Big Data Analytics - Lecture 4

| Page 10
• Disadvantages:
• Not very effective for small data.
• Hard cluster management.
• Has stability issues.
• Security concerns.

22MCS3CB- Big Data Analytics - Lecture 4

| Page 11
Thank You

22MCS3CB- Big Data Analytics - Lecture 4

| Page 12

Master Data Management (MDM) Sales Assessment For ReSellers
Document37 pages
Master Data Management (MDM) Sales Assessment For ReSellers
ashis
0% (1)
Introduction To AI Unit 1
Document10 pages
Introduction To AI Unit 1
Abdun Nafay Qureshi
No ratings yet
Mind Design II
Document8 pages
Mind Design II
Boudzi_Boudzo_5264
0% (1)
Big Data Unit II
Document42 pages
Big Data Unit II
Reddy Srihemanth
No ratings yet
UNIT-I Introduction To Hadoop - A20
Document24 pages
UNIT-I Introduction To Hadoop - A20
Manoj Reddy
No ratings yet
Hadoop
Document13 pages
Hadoop
kajole7693
No ratings yet
HADOOP
Document18 pages
HADOOP
maiyi020106
No ratings yet
Bda PPT M1 P2 1
Document19 pages
Bda PPT M1 P2 1
Jv
No ratings yet
HADOOP
Document10 pages
HADOOP
debasmita.saha
No ratings yet
Seminar Umera Hadoop
Document14 pages
Seminar Umera Hadoop
Umera Rawoot
No ratings yet
Unit 3 ETI (BDA)
Document34 pages
Unit 3 ETI (BDA)
abdulahad.ubeid
No ratings yet
Chapter 2
Document19 pages
Chapter 2
lalisagutama
No ratings yet
CC Unit - 5
Document27 pages
CC Unit - 5
harshitamakhija100
No ratings yet
BDS-Session6.pptx
Document7 pages
BDS-Session6.pptx
arihantlunkar1993
No ratings yet
DS Unit 4.1
Document14 pages
DS Unit 4.1
Tanmay Mandal
No ratings yet
Unit Iii
Document20 pages
Unit Iii
srinivas79668
No ratings yet
Seminar PPT on Hadoop
Document13 pages
Seminar PPT on Hadoop
Umera Rawoot
No ratings yet
1 - Big Data and Hadoop Framework
Document40 pages
1 - Big Data and Hadoop Framework
Prishita Kapoor
No ratings yet
Big Data RAJNEESH CCC
Document11 pages
Big Data RAJNEESH CCC
vidhya associate
No ratings yet
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
Document23 pages
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
MAMAN MYTHIEN S
No ratings yet
Big Data ABHISHEK PRAJA C CCCCCCCCCCC
Document11 pages
Big Data ABHISHEK PRAJA C CCCCCCCCCCC
vidhya associate
No ratings yet
Map Reduce
Document3 pages
Map Reduce
jefferyleclerc
No ratings yet
Unit-2 Hadoop
Document16 pages
Unit-2 Hadoop
abhaypratapverma6969
No ratings yet
Unit - 3
Document34 pages
Unit - 3
sixit37787
No ratings yet
Chapter 2 Hadoop Eco System
Document34 pages
Chapter 2 Hadoop Eco System
lamisaldhamri237
No ratings yet
Unit 2
Document10 pages
Unit 2
tripathineeharika
No ratings yet
Bda 18CS72 Mod-2
Document152 pages
Bda 18CS72 Mod-2
Dhathri Reddy
No ratings yet
module -1-Part II
Document45 pages
module -1-Part II
Jonti Deuri
No ratings yet
Unit 2-1
Document43 pages
Unit 2-1
sahuakshat286
No ratings yet
Unit 2
Document21 pages
Unit 2
sisax91607
No ratings yet
Unit V Cloud Technologies and Advancements
Document33 pages
Unit V Cloud Technologies and Advancements
spartansheik
No ratings yet
Hadoop, A Distributed Framework For Big Data
Document55 pages
Hadoop, A Distributed Framework For Big Data
sonia choudhary
No ratings yet
BDA Unit 2
Document39 pages
BDA Unit 2
1DA20CS051JEEVAN
No ratings yet
Hadoop
Document11 pages
Hadoop
Inu Kag
No ratings yet
Bda Unit 4 Material
Document37 pages
Bda Unit 4 Material
Siva Saikumar Reddy K
No ratings yet
Lecture 1
Document55 pages
Lecture 1
George Okemwa
No ratings yet
Unit 4 Hadoop
Document31 pages
Unit 4 Hadoop
azhagu sundari
No ratings yet
Big Data Lab Manual
Document44 pages
Big Data Lab Manual
amartya1820
No ratings yet
School of Computer Engineering: Kalinga Institute of Industrial Technology Deemed To Be University Bhubaneswar-751024
Document260 pages
School of Computer Engineering: Kalinga Institute of Industrial Technology Deemed To Be University Bhubaneswar-751024
21053386
No ratings yet
Poetic Seminar
Document17 pages
Poetic Seminar
ifizon william
No ratings yet
Hadoop and Their Ecosystem
Document24 pages
Hadoop and Their Ecosystem
sunera pathan
100% (2)
Big Data Analytics Unit-3
Document15 pages
Big Data Analytics Unit-3
4241 DAYANA SRI VARSHA
No ratings yet
Hadoop 101 - Sales Training - v4 - 4x3format
Document29 pages
Hadoop 101 - Sales Training - v4 - 4x3format
SN
No ratings yet
Guided By:-Prof. K. Kakwani: Payal M. Wadhwani
Document24 pages
Guided By:-Prof. K. Kakwani: Payal M. Wadhwani
Ravi Joshi
No ratings yet
Hadoop Chapter 1
Document6 pages
Hadoop Chapter 1
Swati
No ratings yet
Big - Data - Analytics - Srii (2) - Read-Only
Document11 pages
Big - Data - Analytics - Srii (2) - Read-Only
divyadharshni100
No ratings yet
Big Data 3rd Module
Document22 pages
Big Data 3rd Module
Smitha Rajesh
No ratings yet
Big Data Analysis IAT-1
Document43 pages
Big Data Analysis IAT-1
mervismascarenhas
No ratings yet
UNIT-IV -BDA
Document150 pages
UNIT-IV -BDA
manikumargattu17
No ratings yet
BDA Unit 3
Document6 pages
BDA Unit 3
Sp
No ratings yet
Chapter 2 - 大数据生态系统
Document31 pages
Chapter 2 - 大数据生态系统
gs68295
No ratings yet
Unit 3 - BD - Hadoop Ecosystem
Document42 pages
Unit 3 - BD - Hadoop Ecosystem
2028110
No ratings yet
Hadoop, A Distributed Framework For Big Data
Document55 pages
Hadoop, A Distributed Framework For Big Data
HARISH REDDY B
No ratings yet
CC-KML051-Unit V
Document17 pages
CC-KML051-Unit V
Fdjs
No ratings yet
Hadoop Notesforstudents
Document13 pages
Hadoop Notesforstudents
Saif Fazal
No ratings yet
Spark Streaming Research
Document6 pages
Spark Streaming Research
reshmashaik4656
No ratings yet
Unit 2
Document30 pages
Unit 2
Awadhesh Maurya
No ratings yet
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
Document53 pages
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
jainam dude
No ratings yet
Chapter2 Bdi
Document101 pages
Chapter2 Bdi
Mahek Upadhye
No ratings yet
BD - HadoopEcoSystem Unit 2part 1
Document12 pages
BD - HadoopEcoSystem Unit 2part 1
Rameshwar Kanade
No ratings yet
Parallel Project
Document32 pages
Parallel Project
hafsabashir820
No ratings yet
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
From Everand
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
William Smith
No ratings yet
Data Engineering Guide for Beginners: Part 2
From Everand
Data Engineering Guide for Beginners: Part 2
Allan Murray
No ratings yet
Sentiment Analysis On Tweets
Document2 pages
Sentiment Analysis On Tweets
vikibytes
No ratings yet
Li Model-Contrastive Federated Learning CVPR 2021 Paper
Document10 pages
Li Model-Contrastive Federated Learning CVPR 2021 Paper
Dr. V. Padmavathi Associate Professor
No ratings yet
Alexander Riegler New Horizons For Secondorder Cybernetics 1
Document379 pages
Alexander Riegler New Horizons For Secondorder Cybernetics 1
orango quango
No ratings yet
CSA Presentation
Document70 pages
CSA Presentation
Abd Aziz
No ratings yet
Practical Research 1 (Technology-Artificial Intelligence)
Document3 pages
Practical Research 1 (Technology-Artificial Intelligence)
Mark Ormega
No ratings yet
Perceptron - Wikipedia
Document9 pages
Perceptron - Wikipedia
Noles Pande
No ratings yet
Tarea 1 - Ejercicio 2.4xx
Document6 pages
Tarea 1 - Ejercicio 2.4xx
joacocn
No ratings yet
Artificial Neural Networks For Structural Damage Detection Using Modal Data
Document26 pages
Artificial Neural Networks For Structural Damage Detection Using Modal Data
Kiran Hegde
No ratings yet
Fuzzy Logic
Document21 pages
Fuzzy Logic
Arpit Yadav
No ratings yet
Functionalist Stylistics
Document34 pages
Functionalist Stylistics
romamarianguadana31
56% (9)
How To Design Automatic Voltage Regulator (AVR) Model of ThermalHydro Power Plant Using Transfer Functions in MATLABSIMULINK
Document12 pages
How To Design Automatic Voltage Regulator (AVR) Model of ThermalHydro Power Plant Using Transfer Functions in MATLABSIMULINK
sengsouvanhphaysaly phaysaly
No ratings yet
Voice Recognition and Voice Comparison Using Machine Learning Techniques: A Survey
Document7 pages
Voice Recognition and Voice Comparison Using Machine Learning Techniques: A Survey
Brightworld Projects
No ratings yet
The Functions of Deep Learning: Gilbert Strang
Document1 page
The Functions of Deep Learning: Gilbert Strang
Gurpinder Singh
No ratings yet
CS401 Computer Graphics PDF
Document3 pages
CS401 Computer Graphics PDF
vonacoc49
No ratings yet
Motivations of Fuzzy Logic
Document3 pages
Motivations of Fuzzy Logic
aleena_bc
No ratings yet
BUSI 651 - Week 3n
Document24 pages
BUSI 651 - Week 3n
Fabian Enrique Acosta Cortes
No ratings yet
Estimate at Completion
Document3 pages
Estimate at Completion
c6nysf9jm6
No ratings yet
Unit: 1 Introduction To Artificial Intelligence and Machine Learning
Document17 pages
Unit: 1 Introduction To Artificial Intelligence and Machine Learning
Adnankhan
No ratings yet
ZAGATOWKA
Document115 pages
ZAGATOWKA
Andrei Botezatu
No ratings yet
Accenture AI Guide For Executives
Document92 pages
Accenture AI Guide For Executives
Rafael Novak
100% (2)
CEP 1 Employee Performance Mapping Problem Statement
Document4 pages
CEP 1 Employee Performance Mapping Problem Statement
Dania Alhrbi
No ratings yet
Matlab Control System Toolbox User's Guide (PDFDrive)
Document1,816 pages
Matlab Control System Toolbox User's Guide (PDFDrive)
Fathi Harrath
No ratings yet
Root Locus Design Method Cafsanchezdi
Document12 pages
Root Locus Design Method Cafsanchezdi
Juan Nicolas Carvajal Baron
No ratings yet
Analysis of Rumour Detection Using Deep Learning Methods On Social Media
Document10 pages
Analysis of Rumour Detection Using Deep Learning Methods On Social Media
International Journal of Innovative Science and Research Technology
No ratings yet
Basic and Advanced Regulatory Control - System Design and Application (2nd Edition)
Document4 pages
Basic and Advanced Regulatory Control - System Design and Application (2nd Edition)
siamak001
No ratings yet
Ai Mock
Document5 pages
Ai Mock
mondalkakoli83
No ratings yet
Applying Machine Learning To Estimate The Effort and Duration of Individual Tasks in Software Projects
Document14 pages
Applying Machine Learning To Estimate The Effort and Duration of Individual Tasks in Software Projects
Rodrigo De Nadai Grigoleto
No ratings yet