The document discusses Hadoop and big data. It defines Hadoop as an open source, scalable, and fault tolerant platform for storing and processing large amounts of unstructured data distributed across machines. It describes Hadoop's core components like HDFS for data storage and MapReduce/YARN for data processing. It also discusses how Hadoop fits into big data scenarios and landscapes, applying Hadoop to save money, the concept of data lakes, Hadoop in the cloud, and big data analytics with Hadoop.
8. Hadoop is an Open Source (Java based), “Scalable”, “fault
tolerant” platform for large amount of unstructured data storage
& processing, distributed across machines.
9. Flexibility
A Single Repo for
storing and analyzing
any kind of data not
bounded by schema
Scalability
Scale-out architecture
divides workload across
multiple nodes using flexible
distributed file system
Low Cost
Deployed on
commodity
hardware & open
source platform
Fault Tolerant
Continue working
event if node(s) go
down
10. A system to move computation, where the data is.
11. Lets Start
and Define
Big Data
Hadoop
Landscape
How
Hadoop
Fits in this
scenario
16. Lets Start
and Define
Big Data
How Hadoop
Fits in this
scenario
Hadoop
Core
Components
Hadoop
Landscape
21. Lets Start
and Define
Big Data
How
Hadoop Fits
in this
scenario
Hadoop
Landscape
Applying
Hadoop to
Save $$
Hadoop
Core
Components
23. Lets Start
and Define
Big Data
How Hadoop
Fits in this
scenario
Hadoop
Landscape
Hadoop Core
Components
Applying
Hadoop to
Save $$
Concept of
Data Lake
28. Lets Start
and Define
Big Data
How
Hadoop Fits
in this
scenario
Hadoop
Landscape
Hadoop
Core
Components
Concept of
Data Lake
Applying
Hadoop to
Save $$
Hadoop in
Cloud
33. Lets Start
and Define
Big Data
How Hadoop
Fits in this
scenario
Hadoop
Landscape
Hadoop Core
Components
Applying
Hadoop to
Save $$
Hadoop in
Cloud
Concept of
Data Lake
Big Data
Analytics
38. Lets Start
and Define
Big Data
How Hadoop
Fits in this
scenario
Hadoop
Landscape
Hadoop Core
Components
Applying
Hadoop to
Save $$
Hadoop in
Cloud
Concept of
Data Lake
Big Data
Analytics
With Hadoop
41. Amazon HDInsight Directives
Data Storage S3 Azure Blobs Direct access to compute
machine to super fast data
delivery
Processing EC2
Azure Compute Dedicated Machines ready to
turn with specific version of
Hadoop runtime
Processing Libraries Java based or any
other language
supported through
Hadoop Streaming
.Net based code User uploads their code
processing binaries/ libraries
Results S3 Azure Blobs Once job is completed the
results are stored back to
specific data storage used as
source
Visualization Custom Custom 3rd party application can
connect to storage to perform
visualization
42. Lets Start
and Define
Big Data
How Hadoop
Fits in this
scenario
Hadoop
Landscape
Hadoop Core
Components
Applying
Hadoop to
Save $$
Hadoop in
Cloud
Concept of
Data Lake
Big Data
Analytics
With Hadoop