Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
© 2014 MapR Technologies, confidential
TREND

1

Hadoop is Providing Value Across Organizations

ENTERPRISE
DATA HUB

• Multi-structured
data staging & archive
• ETL / DW optimization
• Mainframe
optimization
• Data exploration

MARKETING
ANALYTICS

• Recommendation
engines & targeting
• Ad optimization
• Pricing analysis
• Lead scoring

RISK
ANALYTICS

• Network security
monitoring
• Security information &
event management
• Fraudulent behavioral
analysis

OPERATIONS
INTELLIGENCE

• Supply chain & logistics
• System log analysis
• Manufacturing quality
assurance
• Preventative
maintenance
• Sensor analysis

© 2014 MapR Technologies, confidential
Sellers
Cloud

Advertising
Automation
Cloud

Buyers
Cloud

90B
AD AUCTIONS

per day

© 2014 MapR Technologies, confidential
3
TREND

2

Organizations Have Many Workload-specific Systems

ENTERPRISE
USERS

• Mission-critical
reliability
• Transaction
guarantees
• Deep security
• Real-time performance
• Backup and recovery

OPERATIONAL
SYSTEMS

ANALYTICAL
SYSTEMS

• Interactive SQL
• Rich analytics
• Mixed workload
management
• Data governance
• Security
• Backup and recovery

© 2014 MapR Technologies, confidential
REALITY

Hadoop Can Relieve the Pressure from Enterprise Systems

ENTERPRISE
USERS

OPERATIONAL
SYSTEMS

Keys for Production Success
• Data protection and recovery
• Inter-operability
• Read-write performance
• Supports operations and
analytics

ANALYTICAL
SYSTEMS

•
•
•
•
•

Data staging
Archive
Data transformation
Data exploration
Streaming, interactions
© 2014 MapR Technologies, confidential
Fortune 100 Financial Services Company

104M
CARD MEMBERS

© 2014 MapR Technologies, confidential
6
REALITY

2

Most Hadoop Projects are Still Science Experiments
Number of
Companies
Cluster Size

Development/Testing
Focus: Educ/Svc

1st Production
Use Case
1 – 10 Nodes

Wide-scale
Production
10 – 2000 Nodes

© 2014 MapR Technologies, confidential
Largest Biometric Database in the World

1.2B
PEOPLE

PEOPLE

8

© 2014 MapR Technologies, confidential
8
REALITY

3

Going Big Requires a Rock-Solid Architecture

FOUNDATION

© 2014 MapR Technologies, confidential
REALITY

3

Going Big Requires a Rock-Solid Architecture

Enterprise-grade

Multi-tenancy

High Performance

Open Standards
for Interoperability
Data Protection

Operational &
Analytical

FOUNDATION

© 2014 MapR Technologies, confidential
MapR Distribution for Hadoop
APACHE HADOOP ECOSYSTEM
Hive/
Stinger/
Tez

Drill

Impala

Shark

Hue

...

Flume

Mahout

Cascading

Solr

Spark

Storm

Sentry

Zookeeper

Management

Sqoop

Whirr

Pig

YARN

MapReduce

Oozie

HBase

• High availability
• Standard file access
• Data protection
• Standard database
• Disaster recovery
access
Patent • Pluggable services
MAPR-FS
• Performance 2X-5X
MAPR-FS
Pending• Broad developer
FILES
support
Enterprise-grade

Performance

• Ability to logically
divide a cluster to
support different
use cases, job
types, user groups,
and administrators

• Enterprise security
authorization
• Wire-level
authentication
• Data governance

MapR Data Platform
MapR Data Platform
MapR Data Platform
MapR Data Platform

Multi-tenancy

Data
Protection

• Ability to support
predictive analytics,
real-time database
operations,MAPR-DB
and
MAPR-DB
support high arrival
TABLES
rate data
Inter-operability

• Unit of work
framework to provide
transactional
integrity

Operational &
Analytical

© 2014 MapR Technologies, confidential
Apache Hadoop NameNode High Availability (HA)
NAS
Appliance

HDFS HA

A

B

C

D
AA

A

E
BB

Primary NameNode
NameNode
NameNode

B

HDFS
Federation

D
E
F
B E C F D
DA
D
E
F
NameNode

F

C
CC

NameNode
NameNode

F

Standby NameNode
NameNode
NameNode

DataNode

Single point NameNode
Only one activeof failure
Multiple single points
of failure w/o HA
Limited to 50-200 million files
Needs 20 NameNodes
Performance bottleneck
for 1 Billion files

E

DataNode

DataNode

DataNode

DataNode

DataNode

Performance bottleneck
Commercial NASNAS needed
Commercial possibly needed
Metadata must fit in memory
DataNode

DataNode

DataNode

Double the block reports

Performance bottleneck

HDFS-based Distributions
© 2014 MapR Technologies, confidential
No NameNode Architecture

A

B

C

D

E

F

NameNode

No special config to enable HA

Up to 1T files (> 5000x advantage)
DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

Automatic failover & re-replication
Metadata is persisted to disk

Significantly less hardware & OpEx
Higher performance

© 2014 MapR Technologies, confidential
Comparative Study of Hadoop Distributions: I/O Performance
Read and Write Throughput Benchmarks

IDH 2.4.1

262

276

212

465

MB per Second

MB per Second

475

HDP 1.3
MapR M5 2.1.3

59

DFSIO Read Throughput

CDH 4.3

69

64

DFSIO Write Throughput

Source: Flux7 Labs Study, October 2013

© 2014 MapR Technologies, confidential
World Record Performance
NEW MINUTESORT WORLD RECORD

With a Fraction of the Hardware

1.65 TB
IN 1 MINUTE
298 NODES
PREVIOUS RECORD:
1.6 TB with 2200 nodes

© 2014 MapR Technologies, confidential
Hbase Apps: High Performance with Consistent Low Latency

--- M7 Read Latency

--- Others Read Latency

© 2014 MapR Technologies, confidential
MapR M7: The Best In-Hadoop Database

HBase

JVM

NoSQL Columnar Store
 Apache HBase API
 In-Hadoop database


HDFS
JVM
ext3/ext4

Tables/Files

Disks

Disks

Other Distros

MapR M7

The most scalable, enterprise-grade,
NoSQL database that supports online applications and analytics
© 2014 MapR Technologies, confidential
MapR M7: The Best In-Hadoop Database

Hbase
Interface

BigData Application

JVM
HDFS
Interface

NoSQL Columnar Store
 Apache HBase API
 In-Hadoop database


JVM
ext3/ext4

Tables/Files

Disks

Disks

Other Distros

MapR M7

The most scalable, enterprise-grade,
NoSQL database that supports online applications and analytics
© 2014 MapR Technologies, confidential
Opportunity to Revolutionize Enterprise Data Architecture

From Redundant Processing Silos and Data Science Experiments…
© 2014 MapR Technologies, confidential
The Production Enterprise BigData Platform

… to Consolidated Operational and Analytical Workloads
© 2014 MapR Technologies, confidential
Q&A

Engage with us!

@allenday, @mapr
linkedin.com/in/allenday
allenday@mapr.com
tsheng@mapr.com
mdarling@mapr.com
© 2014 MapR Technologies, confidential

More Related Content

20140228 - Singapore - BDAS - Ensuring Hadoop Production Success

  • 1. © 2014 MapR Technologies, confidential
  • 2. TREND 1 Hadoop is Providing Value Across Organizations ENTERPRISE DATA HUB • Multi-structured data staging & archive • ETL / DW optimization • Mainframe optimization • Data exploration MARKETING ANALYTICS • Recommendation engines & targeting • Ad optimization • Pricing analysis • Lead scoring RISK ANALYTICS • Network security monitoring • Security information & event management • Fraudulent behavioral analysis OPERATIONS INTELLIGENCE • Supply chain & logistics • System log analysis • Manufacturing quality assurance • Preventative maintenance • Sensor analysis © 2014 MapR Technologies, confidential
  • 4. TREND 2 Organizations Have Many Workload-specific Systems ENTERPRISE USERS • Mission-critical reliability • Transaction guarantees • Deep security • Real-time performance • Backup and recovery OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS • Interactive SQL • Rich analytics • Mixed workload management • Data governance • Security • Backup and recovery © 2014 MapR Technologies, confidential
  • 5. REALITY Hadoop Can Relieve the Pressure from Enterprise Systems ENTERPRISE USERS OPERATIONAL SYSTEMS Keys for Production Success • Data protection and recovery • Inter-operability • Read-write performance • Supports operations and analytics ANALYTICAL SYSTEMS • • • • • Data staging Archive Data transformation Data exploration Streaming, interactions © 2014 MapR Technologies, confidential
  • 6. Fortune 100 Financial Services Company 104M CARD MEMBERS © 2014 MapR Technologies, confidential 6
  • 7. REALITY 2 Most Hadoop Projects are Still Science Experiments Number of Companies Cluster Size Development/Testing Focus: Educ/Svc 1st Production Use Case 1 – 10 Nodes Wide-scale Production 10 – 2000 Nodes © 2014 MapR Technologies, confidential
  • 8. Largest Biometric Database in the World 1.2B PEOPLE PEOPLE 8 © 2014 MapR Technologies, confidential 8
  • 9. REALITY 3 Going Big Requires a Rock-Solid Architecture FOUNDATION © 2014 MapR Technologies, confidential
  • 10. REALITY 3 Going Big Requires a Rock-Solid Architecture Enterprise-grade Multi-tenancy High Performance Open Standards for Interoperability Data Protection Operational & Analytical FOUNDATION © 2014 MapR Technologies, confidential
  • 11. MapR Distribution for Hadoop APACHE HADOOP ECOSYSTEM Hive/ Stinger/ Tez Drill Impala Shark Hue ... Flume Mahout Cascading Solr Spark Storm Sentry Zookeeper Management Sqoop Whirr Pig YARN MapReduce Oozie HBase • High availability • Standard file access • Data protection • Standard database • Disaster recovery access Patent • Pluggable services MAPR-FS • Performance 2X-5X MAPR-FS Pending• Broad developer FILES support Enterprise-grade Performance • Ability to logically divide a cluster to support different use cases, job types, user groups, and administrators • Enterprise security authorization • Wire-level authentication • Data governance MapR Data Platform MapR Data Platform MapR Data Platform MapR Data Platform Multi-tenancy Data Protection • Ability to support predictive analytics, real-time database operations,MAPR-DB and MAPR-DB support high arrival TABLES rate data Inter-operability • Unit of work framework to provide transactional integrity Operational & Analytical © 2014 MapR Technologies, confidential
  • 12. Apache Hadoop NameNode High Availability (HA) NAS Appliance HDFS HA A B C D AA A E BB Primary NameNode NameNode NameNode B HDFS Federation D E F B E C F D DA D E F NameNode F C CC NameNode NameNode F Standby NameNode NameNode NameNode DataNode Single point NameNode Only one activeof failure Multiple single points of failure w/o HA Limited to 50-200 million files Needs 20 NameNodes Performance bottleneck for 1 Billion files E DataNode DataNode DataNode DataNode DataNode Performance bottleneck Commercial NASNAS needed Commercial possibly needed Metadata must fit in memory DataNode DataNode DataNode Double the block reports Performance bottleneck HDFS-based Distributions © 2014 MapR Technologies, confidential
  • 13. No NameNode Architecture A B C D E F NameNode No special config to enable HA Up to 1T files (> 5000x advantage) DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode DataNode Automatic failover & re-replication Metadata is persisted to disk Significantly less hardware & OpEx Higher performance © 2014 MapR Technologies, confidential
  • 14. Comparative Study of Hadoop Distributions: I/O Performance Read and Write Throughput Benchmarks IDH 2.4.1 262 276 212 465 MB per Second MB per Second 475 HDP 1.3 MapR M5 2.1.3 59 DFSIO Read Throughput CDH 4.3 69 64 DFSIO Write Throughput Source: Flux7 Labs Study, October 2013 © 2014 MapR Technologies, confidential
  • 15. World Record Performance NEW MINUTESORT WORLD RECORD With a Fraction of the Hardware 1.65 TB IN 1 MINUTE 298 NODES PREVIOUS RECORD: 1.6 TB with 2200 nodes © 2014 MapR Technologies, confidential
  • 16. Hbase Apps: High Performance with Consistent Low Latency --- M7 Read Latency --- Others Read Latency © 2014 MapR Technologies, confidential
  • 17. MapR M7: The Best In-Hadoop Database HBase JVM NoSQL Columnar Store  Apache HBase API  In-Hadoop database  HDFS JVM ext3/ext4 Tables/Files Disks Disks Other Distros MapR M7 The most scalable, enterprise-grade, NoSQL database that supports online applications and analytics © 2014 MapR Technologies, confidential
  • 18. MapR M7: The Best In-Hadoop Database Hbase Interface BigData Application JVM HDFS Interface NoSQL Columnar Store  Apache HBase API  In-Hadoop database  JVM ext3/ext4 Tables/Files Disks Disks Other Distros MapR M7 The most scalable, enterprise-grade, NoSQL database that supports online applications and analytics © 2014 MapR Technologies, confidential
  • 19. Opportunity to Revolutionize Enterprise Data Architecture From Redundant Processing Silos and Data Science Experiments… © 2014 MapR Technologies, confidential
  • 20. The Production Enterprise BigData Platform … to Consolidated Operational and Analytical Workloads © 2014 MapR Technologies, confidential
  • 21. Q&A Engage with us! @allenday, @mapr linkedin.com/in/allenday allenday@mapr.com tsheng@mapr.com mdarling@mapr.com © 2014 MapR Technologies, confidential