Big dataappliance hadoopworld_final

<Insert Picture Here>

Oracle Big Data Appliance and Solutions
Jean-Pierre Dijcks
Hadoop World – Nov 8th, 2012

The following is intended to outline our general product
direction. It is intended for information purposes only, and
may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality,
and should not be relied upon in making purchasing
decisions.
The development, release, and timing of any features or
functionality described for Oracle’s products remain at the
sole discretion of Oracle.

Case: On-line Ads and Content

Real-time: Determine
Low best ad to place
Latency Lookup user on page for this user
profile
Add user NoSQL Expert
if not present DB Input into System
Actual
HDFS Predictions
ads
on browsing
Web served
logs
High scale Batch
data reductions BI and
Billing
NoSQL DB Analytics

Profiles

Agenda

• Big Data Technology
• Oracle Big Data Appliance
• Big Data Applications
• Summary
• Q&A


Big Data Technology

Big Data: Infrastructure Requirements

Acquire Organize Analyze

• Low, predictable Latency
• High Transaction Volume • Deep Analytics
• Flexible Data Structures • Agile Development
• Massive Scalability
• High Throughput
• Real Time Results
• In-Place Preparation
• All Data Sources/Structures

Divided Solution Spectrum
Data
Variety

Distributed NoSQL
Dynamic File Systems Flexible
MapReduce
Schema Specialized
Solutions
Transaction Developer
(Key-Value) Centric
Stores

SQL
Schema DBMS DBMS Advanced Trusted
ETL Analytics
(OLTP) (DW) Secure
Administered


Oracle Integrated Software Solution Stack

Data
Variety

HDFS Hadoop In-DB
Analytics
Dynamic Oracle Loader
Schema “R”
Oracle NoSQL for Hadoop Mining
DB
Text
Oracle
Data Integrator Graph
Spatial

Oracle Oracle
Schema Database Database Oracle
(OLTP) (DW) BI EE


8 Copyright © 2011, Oracle and/or its affiliates. All rights Insert Information Protection Policy Classification from Slide 8
reserved.

Oracle Engineered Solutions
Data
Variety

Big Data Appliance
HDFS Hadoop In-DB
• Hadoop Analytics
Dynamic • NoSQL Database Loader
Oracle
Schema • Oracle Loader for hadoop
Oracle NoSQL “R”
for Hadoop
• Oracle Data Integrator
DB Mining Exalytics
Oracle Text • Speed of
Data Integrator Graph Thought
Spatial Analytics
Oracle Oracle Exadata Oracle
Schema Database • OLTP & DW Database Oracle
(OLTP) (DW)
• Data Mining & Oracle R BI EE
• Semantics
• Spatial


reserved.

Big Data Appliance
Batch Usage Model

Oracle Oracle Oracle
Big Data Appliance Exadata Exalytics

InfiniBand InfiniBand


Why build a Hadoop Appliance?

• Time to Build?
• Required Expertise?
• Cost and Difficulty Maintaining?

reserved.

Oracle Big Data Appliance Hardware

•18 Sun X4270 M2 Servers
– 48 GB memory per node = 864 GB memory
– 12 Intel cores per node = 216 cores
– 24 TB storage per node = 432 TB storage
•40 Gb p/sec InfiniBand
•10 Gb p/sec Ethernet

Big Data Appliance
Cluster of industry standard servers for Hadoop and NoSQL Database
• Focus on Scalability and Availability at low cost

InfiniBand Network
Compute and Storage
• Redundant 40Gb/s switches
• 18 High-performance low-cost
• IB connectivity to Exadata
servers acting as Hadoop
nodes

10GigE Network • 24 TB Capacity per node
• 8 10GigE ports • 2 6-core CPUs per node
• Datacenter connectivity • Hadoop triple replication
• NoSQL Database triple
replication

Scale Out to Infinity

Scale out by connecting racks
to each other using Infiniband
• Expand up to eight racks without
additional switches
• Scale beyond eight racks by adding
an additional switch

Oracle Big Data Appliance Software

•Oracle Linux 5.6
•Java Hotspot VM
•Apache Hadoop Distribution v0.20.x
•R Distribution
•Oracle NoSQL Database Enterprise
Edition
•Oracle Data Integrator Application
Adapter for Hadoop
•Oracle Loader for Hadoop

Why Open-Source Apache Hadoop?

• Fast evolution in critical features
• Built by the Hadoop experts in the community
• Practical instead of esoteric
• Focus on what is needed for large clusters
• Proven at very large scale
• In production at all the large consumers of Hadoop
• Extremely stable in those environments
• Well-understood by practitioners

Software Layout
• Node 1:
• M: Name Node, Balancer & HBase Master
• S: HDFS Data Node, NoSQL DB Storage Node
• Node 2:
• M: Secondary Name Node, Management,
Zookeeper, MySQL Slave
• Node 3:
• M: JobTracker, MySQL Master, ODI Agent,
Hive Server
• Node 4 – 18:
• S: HDFS Data Nodes, Task Tracker, HBase
Region Server, NoSQL DB Storage Nodes
• Your MapReduce runs here!

Big Data Appliance
Big Data for the Enterprise

• Optimized and Complete
• Everything you need to store and integrate
your lower information density data
• Integrated with Oracle Exadata
• Analyze all your data
• Easy to Deploy
• Risk Free, Quick Installation and Setup
• Single Vendor Support
• Full Oracle support for the entire system and
software set


Oracle NoSQL Database

Key-Value Store Workloads

• Large dynamic schema based data repositories

• Data capture
• Web applications
• Online retail
• Sensor/statistics/network capture/Mobile Devices
• Data services
• Scalable authentication
• Real-time communication (MMS, SMS, routing)
• Personalization / Localization
• Social Networks

Oracle NoSQL DB
A distributed, scalable key-value database

• Simple Data Model
• Key-value pair with major+sub-key paradigm
• Read/insert/update/delete operations Application Application

• Scalability NoSQLDB Driver NoSQLDB Driver

• Dynamic data partitioning and distribution
• Optimized data access via intelligent driver
• High availability
• One or more replicas
• Disaster recovery through location of replicas
• Resilient to partition master failures
• No single point of failure
Storage Nodes Storage Nodes
• Transparent load balancing Data Center B
Data Center A
• Reads from master or replicas
• Driver is network topology & latency aware

Resolving a Request
Operation + Key[M,m] + Value + Transaction Policy
Client

Hash Major Key to determine
Partition id

Use Partition Map to map Partition • Operation result
id to a Rep Group • New Partition Map
• RepNodeStorageTable
Use State Table to determine eligible information
Storage Node(s) within Rep Group

Use Load Balancer to select best
eligible Rep Node

Contact Rep Node directly

ACID Transactions
Transaction Policy Transaction Policy
Write Durability Read Consistency
• Configurable per-operation, • Configurable per-operation,
application can set defaults application can set defaults
• Write Transaction Durability consists • Read Consistency specified as
of both
Absolute, Time-based, Version or
a) Sync policy (on Master and None
Replica)
• Absolute  Read from the master
• Sync – force to disk
• Write No Sync – force to OS • Time-based  Read from any
buffer replica that is within <time-
• No Sync – write to local log buffer, interval> of master or better
flush when convenient • Version  Read from any replica
b) Replica Acknowledgement Policy that is current with <transaction-
• All token> or higher
• Simple Majority • None  Read from any replica
• None

Oracle NoSQL DB Differentiation

• Commercial Grade Software and Support
• General-purpose
• Reliable – Based on proven Berkeley DB JE HA
• Easy to install and configure
• Scalable throughput, bounded latency
• Simple Programming and Operational Model
• Simple Major + Sub key and Value data structure
• ACID transactions
• Configurable consistency & durability
• Easy Management
• Web-based console, API accessible
• Manages and Monitors: Topology; Load; Performance; Events; Alerts
• Completes Oracle large scale data storage offerings

Try NoSQL Database on OTN

Oracle NoSQL Database:
• Community Edition is available as a software
only distribution
• Enterprise Edition is available as a separately
licensable product or as part of Big Data Appliance


Oracle Loader for Hadoop

Oracle Loader for Hadoop Features

• Load data into a partitioned or non-partitioned table
– Single level, composite or interval partitioned table
– Support for scalar datatypes of Oracle Database
– Load into Oracle Database 11g Release 2

• Runs as a Hadoop job and supports standard options

• Pre-partitions and sorts data on Hadoop

• Online and offline load modes

27 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.


INPUT
1
MAP MAP
ORACLE LOADER FOR HADOOP

MAP REDUCE REDUCE

MAP

MAP REDUCE MAP REDUCE

MAP

MAP REDUCE REDUCE REDUCE
SHUFFLE
MAP /SORT
SHUFFLE
MAP /SORT MAP

MAP MAP REDUCE

MAP REDUCE MAP REDUCE

MAP REDUCE MAP MAP REDUCE

SHUFFLE SHUFFLE
MAP /SORT MAP /SORT REDUCE
SHUFFLE
/SORT
INPUT
2

reserved.

Oracle Loader for Hadoop: Online Option

Read target table metadata Perform partitioning,
ORACLE LOADER FOR HADOOP Connect to the database
from the database sorting, and data from reducer nodes, load
conversion into database partitions in
parallel
MAP

REDUCE

MAP

REDUCE
SHUFFLE
MAP
/SORT

MAP REDUCE

MAP REDUCE

SHUFFLE
MAP REDUCE
/SORT

reserved.

Oracle Loader for Hadoop: Offline Option

Read target table metadata Perform partitioning,
ORACLE LOADER FOR HADOOP
Write from reducer nodes to
from the database sorting, and data Oracle Data Pump files
conversion

MAP
Import into the database in
REDUCE parallel using external table
MAP mechanism

REDUCE
SHUFFLE
MAP
/SORT

MAP REDUCE

MAP REDUCE

SHUFFLE
MAP REDUCE
/SORT

reserved.

Oracle Loader for Hadoop Advantages

• Offload database server processing to Hadoop:
– Convert input data to final database format
– Compute table partition for row
– Sort rows by primary key within a table partition
• Generate binary datapump files
• Balance partition groups across reducers

reserved.

Input and Output Formats

Input Formats Output Formats
Online Mode
• Delimited text • Load directly from Hadoop nodes to
Oracle database
• Hive tables – JDBC
– Managed and external tables – Parallel direct path
– Native and non-native tables

Offline Mode
• Write your own input format • Datapump format
– Create binary files for external tables
– Import data into the database from the
external table with a SQL statement
• CSV, delimited text
– Load through SQL*Loader or external
table mechanism

reserved.

Selection Output Option for Use Case

Use Case Characteristics
Output Option
Online load with JDBC The simplest use case for non
partitioned tables
Online load with Direct Path Fast online load for partitioned
tables
Offline load with datapump files Fastest load method for external
tables
On Oracle Big Data Appliance Leave data on HDFS
Direct HDFS Parallel access from database
Import into database when
needed

reserved.

Invoking Oracle Loader for Hadoop

• Command line
$ hadoop jar oraloader.jar oracle.hadoop.loader.OraLoader
-libjars <library jar files>
-D <configuration properties>

$HADOOP_HOME/bin/hadoop jar oraloader.jar oracle.hadoop.loader.oraLoader
-libjars avro-1.4.1.jar, commons-math-2.2.jar
-conf connection.xml
-D mapreduce.inputformat.class=oracle.hadoop.loader.lib.input.DelimitedTextInputFormat
-D mapreduce.outputformat.class=oracle.hadoop.loader.lib.output.JDBCOutputFormat

reserved.

Automate Usage of Oracle Loader for Hadoop
Oracle Data Integrator (ODI)

• ODI has knowledge modules to
– Generate data transformation code to run on Hive/Hadoop
– Invoke Oracle Loader for Hadoop

• Use the drag-and-drop interface in ODI to
– Include invocation of Oracle Loader for Hadoop in any ODI
packaged flow

reserved.

reserved.


Summary

Big Data Appliance

• Optimized and Complete
• Everything you need to store and integrate your lower
information density data
• Integrated with Oracle Exadata
• Analyze all your data
• Easy to Deploy
• Risk Free, Quick Installation and Setup
• Single Vendor Support
• Full Oracle support for the entire system and software
set

Big Data Appliance and Exadata

NoSQL DB

HDFS

Hadoop

RDBMS 

Big dataappliance hadoopworld_final

More Related Content

What's hot

What's hot (20)

Similar to Big dataappliance hadoopworld_final

Similar to Big dataappliance hadoopworld_final (20)

More from jdijcks

More from jdijcks (6)

Recently uploaded

Recently uploaded (20)

Big dataappliance hadoopworld_final

Editor's Notes