Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Introductio
nto
Dr. C.V. Suresh Babu
(CentreforKnowledgeTransfer)
institute
DiscussionTopics
• What is Hadoop?
• Need for Hadoop
• History of Hadoop
• Hadoop Overview
• Advantages and Disadvantages of Hadoop
• Hadoop Distributed File System
• Comparing: RDBMS vs. Hadoop
• Advantages and Disadvantages of HDFS
• Hadoop frameworks
• Modules of Hadoop frameworks
• Features of 'Hadoop‘
• Hadoop AnalyticsTools
(CentreforKnowledgeTransfer)
institute
What is Hadoop?
• Hadoop is an open source software programming framework for storing a
large amount of data and performing the computation.
• Its framework is based on Java programming with some native code in C and
shell scripts.
• Hadoop is used for some advanced level of analytics, which includes
Machine Learning and data mining
(CentreforKnowledgeTransfer)
institute
Need for Hadoop
• Redundant, Fault-tolerant data storage
• Parallel computation framework
• Job coordination
Programmers
Q: Where file is located?
Q: How to handle failures & data lost?
Q: How to divide computation?
Q: How to program for scaling?
No longer need to
worry about
(CentreforKnowledgeTransfer)
institute
History of Hadoop
• Apache Software Foundation is the developers of
Hadoop, and it’s co-founders are Doug Cutting and Mike
Cafarella.
• It’s co-founder Doug Cutting named it on his son’s toy
elephant. In October 2003 the first paper release was
Google File System.
• In January 2006, MapReduce development started on the
Apache Nutch which consisted of around 6000 lines
coding for it and around 5000 lines coding for HDFS.
• In April 2006 Hadoop 0.1.0 was released.
(CentreforKnowledgeTransfer)
institute
(CentreforKnowledgeTransfer)
institute
Advantages and Disadvantages of Hadoop
Advantages:
• Ability to store a large amount of
data.
• High flexibility.
• Cost effective.
• High computational power.
• Tasks are independent.
• Linear scaling.
Disadvantages:
• Not very effective for small
data.
• Hard cluster management.
• Has stability issues.
• Security concerns.
(CentreforKnowledgeTransfer)
institute
Hadoop Distributed File System
• It has distributed file system known
as HDFS and this HDFS splits files
into blocks and sends them across
various nodes in form of large
clusters.
• Also in case of a node failure, the
system operates and data transfer
takes place between the nodes
which are facilitated by HDFS.
(CentreforKnowledgeTransfer)
institute
Comparing: RDBMS vs. Hadoop
Traditional RDBMS Hadoop / MapReduce
Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)
Access Interactive and Batch Batch – NOT Interactive
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
Query Response
Time
Can be near immediate Has latency (due to batch processing)
(CentreforKnowledgeTransfer)
institute
Advantages of HDFS:
• It is inexpensive, immutable in nature, stores data reliably, ability to tolerate
faults, scalable, block structured, can process a large amount of data
simultaneously and many more.
Disadvantages of HDFS:
• It’s the biggest disadvantage is that it is not fit for small quantities of
data. Also, it has issues related to potential stability, restrictive and
rough in nature.
(CentreforKnowledgeTransfer)
institute
Some common frameworks of Hadoop
• Hive- It uses HiveQl for data structuring and for writing complicated
MapReduce in HDFS.
• Drill- It consists of user-defined functions and is used for data exploration.
• Storm- It allows real-time processing and streaming of data.
• Spark- It contains a Machine Learning Library(MLlib) for providing
enhanced machine learning and is widely used for data processing. It also
supports Java, Python, and Scala.
• Pig- It has Pig Latin, a SQL-Like language and performs data transformation
of unstructured data.
• Tez- It reduces the complexities of Hive and Pig and helps in the running of
their codes faster.
(CentreforKnowledgeTransfer)
institute
Modules of Hadoop frameworks
Hadoop framework is made up of the following modules:
1. Hadoop MapReduce- a MapReduce programming model for handling and
processing large data.
2. Hadoop Distributed File System- distributed files in clusters among nodes.
3. HadoopYARN- a platform which manages computing resources.
4. Hadoop Common- it contains packages and libraries which are used for other
modules.
(CentreforKnowledgeTransfer)
institute
Suitable for Big Data Analysis
• As Big Data tends to be distributed and unstructured in nature, HADOOP
clusters are best suited for analysis of Big Data.
• Since it is processing logic (not the actual data) that flows to the computing
nodes, less network bandwidth is consumed.
• This concept is called as data locality concept which helps increase the
efficiency of Hadoop based applications.
Features of 'Hadoop'
(CentreforKnowledgeTransfer)
institute
Scalability
• HADOOP clusters can easily be scaled to any extent
by adding additional cluster nodes and thus allows
for the growth of Big Data.
• Also, scaling does not require modifications to
application logic.
(CentreforKnowledgeTransfer)
institute
• HADOOP ecosystem has a provision to replicate the input data on to other
cluster nodes.
• That way, in the event of a cluster node failure, data processing can still
proceed by using data stored on another cluster node.
(CentreforKnowledgeTransfer)
institute
(CentreforKnowledgeTransfer)
institute
Scales to
Petabytes or
more easily
Parallel data processing
Suited for particular types
of big data problems
Hadoop AnalyticsTools
• There is a wide range of analytical tools available in the market that help
Hadoop deal with the astronomical size data efficiently.
• Let us discuss some of the most famous and widely used tools one by one.
Below are the top 10 Hadoop analytics tools for big data.
(CentreforKnowledgeTransfer)
institute
• Apache spark in an open-source processing engine that is designed for ease of
analytics operations.
• It is a cluster computing platform that is designed to be fast and made for general
purpose uses.
• Spark is designed to cover various batch applications, Machine Learning, streaming
data processing, and interactive queries.
Features of Spark:
• In memory processing
• Tight Integration Of component
• Easy and In-expensive
• The powerful processing engine makes it so fast
• Spark Streaming has high level library for streaming process
(CentreforKnowledgeTransfer)
institute
• MapReduce is just like an Algorithm or a data structure that is based on the
YARN framework.
• The primary feature of MapReduce is to perform the distributed processing in
parallel in a Hadoop cluster, which Makes Hadoop working so fast Because
when we are dealing with Big Data, serial processing is no more of any use.
Features of Map-Reduce:
• Scalable
• FaultTolerance
• Paraller Processing
• Tunable Replication
• Load Balancing
(CentreforKnowledgeTransfer)
institute
• Apache Hive is a Data warehousing tool that is built on top of the Hadoop, and Data
Warehousing is nothing but storing the data at a fixed location generated from various
sources.
• Hive is one of the best tools used for data analysis on Hadoop.
• The one who is having knowledge of SQL can comfortably use Apache Hive.
• The query language of high is known as HQL or HIVEQL.
Features of Hive:
• Queries are similar to SQL queries.
• Hive has different storage type HBase, ORC, Plain text, etc.
• Hive has in-built function for data-mining and other works.
• Hive operates on compressed data that is present inside Hadoop Ecosystem.
(CentreforKnowledgeTransfer)
institute
• Apache Impala is an open-source SQL engine designed for Hadoop.
• Impala overcomes the speed-related issue inApache Hive with its faster-processing speed.
• Apache Impala uses similar kinds of SQL syntax, ODBC driver, and user interface as that of
Apache Hive.
• Apache Impala can easily be integrated with Hadoop for data analytics purposes.
Features of Impala:
• Easy-Integration
• Scalability
• Security
• In Memory data processing
(CentreforKnowledgeTransfer)
institute
• The name Mahout is taken from the Hindi word Mahavat which means the
elephant rider.
• Apache Mahout runs the algorithm on the top of Hadoop, so it is named Mahout.
• Mahout is mainly used for implementing various Machine Learning algorithms on
our Hadoop like classification, Collaborative filtering, Recommendation.
• Apache Mahout can implement the Machine algorithms without integration on
Hadoop.
Features of Mahout:
• Used for Machine Learning Application
• Mahout hasVector and Matrix libraries
• Ability to analyze large datasets quickly
(CentreforKnowledgeTransfer)
institute
• This Pig was Initially developed byYahoo to get ease in programming.
• Apache Pig has the capability to process an extensive dataset as it works on top of the Hadoop.
• Apache pig is used for analyzing more massive datasets by representing them as dataflow.
• Apache Pig also raises the level of abstraction for processing enormous datasets.
• Pig Latin is the scripting language that the developer uses for working on the Pig framework
that runs on Pig runtime.
Features of Pig:
• EasyTo Programme
• Rich set of operators
• Ability to handle various kind of data
• Extensibility
(CentreforKnowledgeTransfer)
institute
• HBase is nothing but a non-relational, NoSQL distributed, and column-oriented
database. HBase consists of various tables where each table has multiple numbers
of data rows.
• These rows will have multiple numbers of column family’s, and this column family
will have columns that contain key-value pairs.
• HBase works on the top of HDFS(Hadoop Distributed File System).
• We use HBase for searching small size data from the more massive datasets.
Features of HBase:
• HBase has Linear and Modular Scalability
• JAVA API can easily be used for client access
• Block cache for real time data queries
(CentreforKnowledgeTransfer)
institute
• Sqoop is a command-line tool that is developed by Apache.
• The primary purpose of Apache Sqoop is to import structured data i.e.,
RDBMS(Relational database management System) like MySQL, SQL Server,
Oracle to our HDFS(Hadoop Distributed File System).
• Sqoop can also export the data from our HDFS to RDBMS.
Features of Sqoop:
• Sqoop can Import DataTo Hive or HBase
• Connecting to database server
• Controlling parallelism
(CentreforKnowledgeTransfer)
institute
• Tableau is a data visualization software that can be used for data analytics and
business intelligence.
• It provides a variety of interactive visualization to showcase the insights of the data
and can translate the queries to visualization and can also import all ranges and
sizes of data.
• Tableau offers rapid analysis and processing, so it Generates useful visualizing
charts on interactive dashboards and worksheets.
Features ofTableu:
• Tableau supports Bar chart, Histogram, Pie chart, Motion chart, Bullet chart, Gantt
chart and so many
• Secure and Robust
• Interactive Dashboard and worksheets
(CentreforKnowledgeTransfer)
institute
• Apache Storm is a free open source distributed real-time computation system build using
Programming languages like Clojure and java.
• It can be used with many programming languages.
• Apache Storm is used for the Streaming process, which is very faster.
• We use Daemons like Nimbus, Zookeeper, and Supervisor inApache Storm.
• Apache Storm can be used for real-time processing, online Machine learning, and many
more. Companies likeYahoo, Spotify,Twitter, and so many uses Apache Storm.
Features of Storm:
• Easily operatable
• each node can process millions of tuples in one second
• Scalable and FaultTolerance
(CentreforKnowledgeTransfer)
institute
Companies Using Hadoop
(CentreforKnowledgeTransfer)
institute
Common Hadoop Distributions
(CentreforKnowledgeTransfer)
institute
• Open Source
• Apache
• Commercial
• Cloudera
• Hortonworks
• MapR
• AWS MapReduce
• MicrosoftAzure HDInsight (Beta)

More Related Content

What's hot

Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
sunera pathan
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
Dr. C.V. Suresh Babu
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
Vigen Sahakyan
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
Manish Borkar
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
VNIT-ACM Student Chapter
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Apache Apex
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
Mahmood Reza Esmaili Zand
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
OLAP
OLAPOLAP
OLAP
Ashir Ali
 
Hadoop
HadoopHadoop
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
Ramakant Soni
 
Hive
HiveHive
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Prashant Gupta
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
nehabsairam
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
Avkash Chauhan
 

What's hot (20)

Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
OLAP
OLAPOLAP
OLAP
 
Hadoop
HadoopHadoop
Hadoop
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Hive
HiveHive
Hive
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 

Similar to Introduction to Hadoop

Bdm hadoop ecosystem
Bdm hadoop ecosystemBdm hadoop ecosystem
Bdm hadoop ecosystem
Amit Bhardwaj
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
NetajiGandi1
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
tommychauhan
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
Kibrom Gebrehiwot
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
Farzad Nozarian
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
Dr.Florence Dayana
 
Cloudera Hadoop Distribution
Cloudera Hadoop DistributionCloudera Hadoop Distribution
Cloudera Hadoop Distribution
Thisara Pramuditha
 
Big data solutions in Azure
Big data solutions in AzureBig data solutions in Azure
Big data solutions in Azure
Mostafa
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in Azure
Mostafa
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
raghavanand36
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Prashanth Yennampelli
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
葵慶 李
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
Tom Rogers
 
Hadoop
HadoopHadoop
Hadoop
thisisnabin
 
BDA: Introduction to HIVE, PIG and HBASE
BDA: Introduction to HIVE, PIG and HBASEBDA: Introduction to HIVE, PIG and HBASE
BDA: Introduction to HIVE, PIG and HBASE
tripathineeharika
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
Ayyappan Paramesh
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
Krisshhna Daasaarii
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptx
iaeronlineexm
 
Big data solutions in azure
Big data solutions in azureBig data solutions in azure
Big data solutions in azure
Mostafa
 

Similar to Introduction to Hadoop (20)

Bdm hadoop ecosystem
Bdm hadoop ecosystemBdm hadoop ecosystem
Bdm hadoop ecosystem
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
Cloudera Hadoop Distribution
Cloudera Hadoop DistributionCloudera Hadoop Distribution
Cloudera Hadoop Distribution
 
Big data solutions in Azure
Big data solutions in AzureBig data solutions in Azure
Big data solutions in Azure
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in Azure
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Hadoop
HadoopHadoop
Hadoop
 
BDA: Introduction to HIVE, PIG and HBASE
BDA: Introduction to HIVE, PIG and HBASEBDA: Introduction to HIVE, PIG and HBASE
BDA: Introduction to HIVE, PIG and HBASE
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptx
 
Big data solutions in azure
Big data solutions in azureBig data solutions in azure
Big data solutions in azure
 

More from Dr. C.V. Suresh Babu

Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
Dr. C.V. Suresh Babu
 
Association rules
Association rulesAssociation rules
Association rules
Dr. C.V. Suresh Babu
 
Clustering
ClusteringClustering
Classification
ClassificationClassification
Classification
Dr. C.V. Suresh Babu
 
Blue property assumptions.
Blue property assumptions.Blue property assumptions.
Blue property assumptions.
Dr. C.V. Suresh Babu
 
Introduction to regression
Introduction to regressionIntroduction to regression
Introduction to regression
Dr. C.V. Suresh Babu
 
DART
DARTDART
Mycin
MycinMycin
Expert systems
Expert systemsExpert systems
Expert systems
Dr. C.V. Suresh Babu
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
Dr. C.V. Suresh Babu
 
Bayes network
Bayes networkBayes network
Bayes network
Dr. C.V. Suresh Babu
 
Bayes' theorem
Bayes' theoremBayes' theorem
Bayes' theorem
Dr. C.V. Suresh Babu
 
Knowledge based agents
Knowledge based agentsKnowledge based agents
Knowledge based agents
Dr. C.V. Suresh Babu
 
Rule based system
Rule based systemRule based system
Rule based system
Dr. C.V. Suresh Babu
 
Formal Logic in AI
Formal Logic in AIFormal Logic in AI
Formal Logic in AI
Dr. C.V. Suresh Babu
 
Production based system
Production based systemProduction based system
Production based system
Dr. C.V. Suresh Babu
 
Game playing in AI
Game playing in AIGame playing in AI
Game playing in AI
Dr. C.V. Suresh Babu
 
Diagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AIDiagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AI
Dr. C.V. Suresh Babu
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
Dr. C.V. Suresh Babu
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
Dr. C.V. Suresh Babu
 

More from Dr. C.V. Suresh Babu (20)

Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
 
Association rules
Association rulesAssociation rules
Association rules
 
Clustering
ClusteringClustering
Clustering
 
Classification
ClassificationClassification
Classification
 
Blue property assumptions.
Blue property assumptions.Blue property assumptions.
Blue property assumptions.
 
Introduction to regression
Introduction to regressionIntroduction to regression
Introduction to regression
 
DART
DARTDART
DART
 
Mycin
MycinMycin
Mycin
 
Expert systems
Expert systemsExpert systems
Expert systems
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
 
Bayes network
Bayes networkBayes network
Bayes network
 
Bayes' theorem
Bayes' theoremBayes' theorem
Bayes' theorem
 
Knowledge based agents
Knowledge based agentsKnowledge based agents
Knowledge based agents
 
Rule based system
Rule based systemRule based system
Rule based system
 
Formal Logic in AI
Formal Logic in AIFormal Logic in AI
Formal Logic in AI
 
Production based system
Production based systemProduction based system
Production based system
 
Game playing in AI
Game playing in AIGame playing in AI
Game playing in AI
 
Diagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AIDiagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AI
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 

Recently uploaded

@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
kantakumariji156
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)
apoorva2579
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
jackson110191
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
ishalveerrandhawa1
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
UiPathCommunity
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
Mark Billinghurst
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
What's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdfWhat's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdf
SeasiaInfotech2
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
 
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
amitchopra0215
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Chris Swan
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
BookNet Canada
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
ScyllaDB
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
Matthew Sinclair
 

Recently uploaded (20)

@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
What's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdfWhat's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdf
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
 
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
 

Introduction to Hadoop

  • 1. Introductio nto Dr. C.V. Suresh Babu (CentreforKnowledgeTransfer) institute
  • 2. DiscussionTopics • What is Hadoop? • Need for Hadoop • History of Hadoop • Hadoop Overview • Advantages and Disadvantages of Hadoop • Hadoop Distributed File System • Comparing: RDBMS vs. Hadoop • Advantages and Disadvantages of HDFS • Hadoop frameworks • Modules of Hadoop frameworks • Features of 'Hadoop‘ • Hadoop AnalyticsTools (CentreforKnowledgeTransfer) institute
  • 3. What is Hadoop? • Hadoop is an open source software programming framework for storing a large amount of data and performing the computation. • Its framework is based on Java programming with some native code in C and shell scripts. • Hadoop is used for some advanced level of analytics, which includes Machine Learning and data mining (CentreforKnowledgeTransfer) institute
  • 4. Need for Hadoop • Redundant, Fault-tolerant data storage • Parallel computation framework • Job coordination Programmers Q: Where file is located? Q: How to handle failures & data lost? Q: How to divide computation? Q: How to program for scaling? No longer need to worry about (CentreforKnowledgeTransfer) institute
  • 5. History of Hadoop • Apache Software Foundation is the developers of Hadoop, and it’s co-founders are Doug Cutting and Mike Cafarella. • It’s co-founder Doug Cutting named it on his son’s toy elephant. In October 2003 the first paper release was Google File System. • In January 2006, MapReduce development started on the Apache Nutch which consisted of around 6000 lines coding for it and around 5000 lines coding for HDFS. • In April 2006 Hadoop 0.1.0 was released. (CentreforKnowledgeTransfer) institute
  • 7. Advantages and Disadvantages of Hadoop Advantages: • Ability to store a large amount of data. • High flexibility. • Cost effective. • High computational power. • Tasks are independent. • Linear scaling. Disadvantages: • Not very effective for small data. • Hard cluster management. • Has stability issues. • Security concerns. (CentreforKnowledgeTransfer) institute
  • 8. Hadoop Distributed File System • It has distributed file system known as HDFS and this HDFS splits files into blocks and sends them across various nodes in form of large clusters. • Also in case of a node failure, the system operates and data transfer takes place between the nodes which are facilitated by HDFS. (CentreforKnowledgeTransfer) institute
  • 9. Comparing: RDBMS vs. Hadoop Traditional RDBMS Hadoop / MapReduce Data Size Gigabytes (Terabytes) Petabytes (Hexabytes) Access Interactive and Batch Batch – NOT Interactive Updates Read / Write many times Write once, Read many times Structure Static Schema Dynamic Schema Integrity High (ACID) Low Scaling Nonlinear Linear Query Response Time Can be near immediate Has latency (due to batch processing) (CentreforKnowledgeTransfer) institute
  • 10. Advantages of HDFS: • It is inexpensive, immutable in nature, stores data reliably, ability to tolerate faults, scalable, block structured, can process a large amount of data simultaneously and many more. Disadvantages of HDFS: • It’s the biggest disadvantage is that it is not fit for small quantities of data. Also, it has issues related to potential stability, restrictive and rough in nature. (CentreforKnowledgeTransfer) institute
  • 11. Some common frameworks of Hadoop • Hive- It uses HiveQl for data structuring and for writing complicated MapReduce in HDFS. • Drill- It consists of user-defined functions and is used for data exploration. • Storm- It allows real-time processing and streaming of data. • Spark- It contains a Machine Learning Library(MLlib) for providing enhanced machine learning and is widely used for data processing. It also supports Java, Python, and Scala. • Pig- It has Pig Latin, a SQL-Like language and performs data transformation of unstructured data. • Tez- It reduces the complexities of Hive and Pig and helps in the running of their codes faster. (CentreforKnowledgeTransfer) institute
  • 12. Modules of Hadoop frameworks Hadoop framework is made up of the following modules: 1. Hadoop MapReduce- a MapReduce programming model for handling and processing large data. 2. Hadoop Distributed File System- distributed files in clusters among nodes. 3. HadoopYARN- a platform which manages computing resources. 4. Hadoop Common- it contains packages and libraries which are used for other modules. (CentreforKnowledgeTransfer) institute
  • 13. Suitable for Big Data Analysis • As Big Data tends to be distributed and unstructured in nature, HADOOP clusters are best suited for analysis of Big Data. • Since it is processing logic (not the actual data) that flows to the computing nodes, less network bandwidth is consumed. • This concept is called as data locality concept which helps increase the efficiency of Hadoop based applications. Features of 'Hadoop' (CentreforKnowledgeTransfer) institute
  • 14. Scalability • HADOOP clusters can easily be scaled to any extent by adding additional cluster nodes and thus allows for the growth of Big Data. • Also, scaling does not require modifications to application logic. (CentreforKnowledgeTransfer) institute
  • 15. • HADOOP ecosystem has a provision to replicate the input data on to other cluster nodes. • That way, in the event of a cluster node failure, data processing can still proceed by using data stored on another cluster node. (CentreforKnowledgeTransfer) institute
  • 16. (CentreforKnowledgeTransfer) institute Scales to Petabytes or more easily Parallel data processing Suited for particular types of big data problems
  • 17. Hadoop AnalyticsTools • There is a wide range of analytical tools available in the market that help Hadoop deal with the astronomical size data efficiently. • Let us discuss some of the most famous and widely used tools one by one. Below are the top 10 Hadoop analytics tools for big data. (CentreforKnowledgeTransfer) institute
  • 18. • Apache spark in an open-source processing engine that is designed for ease of analytics operations. • It is a cluster computing platform that is designed to be fast and made for general purpose uses. • Spark is designed to cover various batch applications, Machine Learning, streaming data processing, and interactive queries. Features of Spark: • In memory processing • Tight Integration Of component • Easy and In-expensive • The powerful processing engine makes it so fast • Spark Streaming has high level library for streaming process (CentreforKnowledgeTransfer) institute
  • 19. • MapReduce is just like an Algorithm or a data structure that is based on the YARN framework. • The primary feature of MapReduce is to perform the distributed processing in parallel in a Hadoop cluster, which Makes Hadoop working so fast Because when we are dealing with Big Data, serial processing is no more of any use. Features of Map-Reduce: • Scalable • FaultTolerance • Paraller Processing • Tunable Replication • Load Balancing (CentreforKnowledgeTransfer) institute
  • 20. • Apache Hive is a Data warehousing tool that is built on top of the Hadoop, and Data Warehousing is nothing but storing the data at a fixed location generated from various sources. • Hive is one of the best tools used for data analysis on Hadoop. • The one who is having knowledge of SQL can comfortably use Apache Hive. • The query language of high is known as HQL or HIVEQL. Features of Hive: • Queries are similar to SQL queries. • Hive has different storage type HBase, ORC, Plain text, etc. • Hive has in-built function for data-mining and other works. • Hive operates on compressed data that is present inside Hadoop Ecosystem. (CentreforKnowledgeTransfer) institute
  • 21. • Apache Impala is an open-source SQL engine designed for Hadoop. • Impala overcomes the speed-related issue inApache Hive with its faster-processing speed. • Apache Impala uses similar kinds of SQL syntax, ODBC driver, and user interface as that of Apache Hive. • Apache Impala can easily be integrated with Hadoop for data analytics purposes. Features of Impala: • Easy-Integration • Scalability • Security • In Memory data processing (CentreforKnowledgeTransfer) institute
  • 22. • The name Mahout is taken from the Hindi word Mahavat which means the elephant rider. • Apache Mahout runs the algorithm on the top of Hadoop, so it is named Mahout. • Mahout is mainly used for implementing various Machine Learning algorithms on our Hadoop like classification, Collaborative filtering, Recommendation. • Apache Mahout can implement the Machine algorithms without integration on Hadoop. Features of Mahout: • Used for Machine Learning Application • Mahout hasVector and Matrix libraries • Ability to analyze large datasets quickly (CentreforKnowledgeTransfer) institute
  • 23. • This Pig was Initially developed byYahoo to get ease in programming. • Apache Pig has the capability to process an extensive dataset as it works on top of the Hadoop. • Apache pig is used for analyzing more massive datasets by representing them as dataflow. • Apache Pig also raises the level of abstraction for processing enormous datasets. • Pig Latin is the scripting language that the developer uses for working on the Pig framework that runs on Pig runtime. Features of Pig: • EasyTo Programme • Rich set of operators • Ability to handle various kind of data • Extensibility (CentreforKnowledgeTransfer) institute
  • 24. • HBase is nothing but a non-relational, NoSQL distributed, and column-oriented database. HBase consists of various tables where each table has multiple numbers of data rows. • These rows will have multiple numbers of column family’s, and this column family will have columns that contain key-value pairs. • HBase works on the top of HDFS(Hadoop Distributed File System). • We use HBase for searching small size data from the more massive datasets. Features of HBase: • HBase has Linear and Modular Scalability • JAVA API can easily be used for client access • Block cache for real time data queries (CentreforKnowledgeTransfer) institute
  • 25. • Sqoop is a command-line tool that is developed by Apache. • The primary purpose of Apache Sqoop is to import structured data i.e., RDBMS(Relational database management System) like MySQL, SQL Server, Oracle to our HDFS(Hadoop Distributed File System). • Sqoop can also export the data from our HDFS to RDBMS. Features of Sqoop: • Sqoop can Import DataTo Hive or HBase • Connecting to database server • Controlling parallelism (CentreforKnowledgeTransfer) institute
  • 26. • Tableau is a data visualization software that can be used for data analytics and business intelligence. • It provides a variety of interactive visualization to showcase the insights of the data and can translate the queries to visualization and can also import all ranges and sizes of data. • Tableau offers rapid analysis and processing, so it Generates useful visualizing charts on interactive dashboards and worksheets. Features ofTableu: • Tableau supports Bar chart, Histogram, Pie chart, Motion chart, Bullet chart, Gantt chart and so many • Secure and Robust • Interactive Dashboard and worksheets (CentreforKnowledgeTransfer) institute
  • 27. • Apache Storm is a free open source distributed real-time computation system build using Programming languages like Clojure and java. • It can be used with many programming languages. • Apache Storm is used for the Streaming process, which is very faster. • We use Daemons like Nimbus, Zookeeper, and Supervisor inApache Storm. • Apache Storm can be used for real-time processing, online Machine learning, and many more. Companies likeYahoo, Spotify,Twitter, and so many uses Apache Storm. Features of Storm: • Easily operatable • each node can process millions of tuples in one second • Scalable and FaultTolerance (CentreforKnowledgeTransfer) institute
  • 29. Common Hadoop Distributions (CentreforKnowledgeTransfer) institute • Open Source • Apache • Commercial • Cloudera • Hortonworks • MapR • AWS MapReduce • MicrosoftAzure HDInsight (Beta)