2012 10 bigdata_overview

Agenda

• Big Data
• Strategy
• Technology
• Use Cases

Big Data

3 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Big Data
React to an Event Pro-Actively Change Outcomes

“Technology presents the opportunity
to transform business“*
Mark Hurd, President, Oracle

* Oracle Profit Magazine, Volume 17, Number 1

Big Data’s Key Ingredient

“ Improvement merely lets you Big Data transforms
hit the numbers. Creativity is our business 5%
what transforms.“*
Ron Johnson, CEO, JCPenney
Big Data improves
our business 20%

What is Big Data?
75%

* Fortune Magazine VOL. 165, NO. 4

Big Data Extends the Breadth and Speed of Data

Video and Images

Big Data:
Decisions based Documents
on all your data
Social Data
Machine-Generated Data

Information
Architectures
Today: Transactions
Decisions based
on database data

Big Data Extends the Depth of Analytics

Graph Analytics

Statistics

Query and Reporting Data Mining
2 miles

Spatial Analytics

Text Analytics

Big Data Defined

Big Data: Techniques and
Technologies that Enable Enterprises
to Effectively and Economically
Analyze All of their Data

Strategic Transformations

Reporting Analytics

Autonomous
Rear-view Mirror
Actions

Transactional
All Data
Data

Oracle’s Big Data solution
Endeca Information Discovery

Oracle
Big Data Oracle
Appliance Exadata Oracle
Oracle
Exalytics
Big Data
Connectors
InfiniBand InfiniBand

Oracle
CEP
Real-Time
Decisions

Acquire Organize & Discover Analyze Decide

11 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Oracle Big Data Strategy

BI Tools
Semantic Text

CEP Data
& Advanced
RTD
Management Analytics

Graph Spatial
Data Discovery Tools

Management Infrastructure

Build Acquire Adopt Engineer

Big Data Appliance
Hardware:
• 288 CPU cores with 1152 GB RAM
• 648 TB of raw disk storage
• 40 Gb/s InfiniBand
Integrated Software:
• Oracle Linux
• Oracle Java VM
• Cloudera Distribution of Apache Hadoop (CDH)
• Cloudera Manager
• Open-source distribution of R
• NoSQL Database Community Edition
All integrated software (except NoSQL DB CE) is supported as part of Premier Support for Systems and Premier Support for
Operating Systems

Oracle Big Data Appliance

File System Mount UI Framework SDK
FUSE-DFS HUE HUE SDK

Workflow Scheduling Metadata
APACHE OOZIE APACHE OOZIE APACHE HIVE

Languages / Compilers
APACHE PIG, APACHE HIVE, APACHE MAHOUT
Fast
Data
Read/Write
Integration
Access
APACHE
FLUME, APACHE APACHE HBASE
SQOOP
HDFS, MAPREDUCE

Coordination
APACHE ZOOKEEPER

Why Cloudera?

• Includes Open Source Apache Hadoop
– Fast evolution in critical features
– Proven at very large scale
• Managed Distribution
– Components certified to work together in regular updates
– Cloudera Manager provides Management GUI
• Most popular distribution in the market

Oracle and Cloudera

• All Cloudera software pre-installed and pre-configured
on BDA
– Engineered with Cloudera
• All Cloudera assets included
– Single Oracle Product SKU for HW & SW
– Single Oracle Support SKU for HW & SW (life of the machine)
• Oracle is the single point of contact for the solution

Price comparison

Oracle Big Data Appliance “Build-Your-Own” – HP hardware and Cloudera
Year 1 Year 2 Year 3 Total Year 1 Year 2 Year 3 Total

Servers and
BDA Cost $450,000 $428,220
switches

Support
$54,000 $54,000 $54,000 Support Cost $136,233 $72,000 $72,000
Cost

On-site Installation &
Installation $14,150 configuration
not included

Total $518,150 $54,000 $54,000 $626,150 Total $564,453 $72,000 $72,000 $708,453

Full details at https://blogs.oracle.com/datawarehousing/entry/price_comparison_for_big_data

Oracle NoSQL Database
A distributed, scalable key-value database
• Simple Data Model
• Key-value pair with major+sub-key paradigm Application Application
• Read/insert/update/delete operations NoSQLDB Driver NoSQLDB Driver
• Scalability
• Dynamic data partitioning and distribution
• Optimized data access via intelligent driver
• High availability
• One or more replicas
• Disaster recovery through location of replicas
• Resilient to partition master failures
• No single point of failure
• Transparent load balancing Storage Nodes Storage Nodes
• Reads from master or replicas Data Center A Data Center B
• Driver is network topology & latency aware

Big Data Connectors
Optimized integration of Hadoop with Oracle Database
and Oracle Exadata
• Oracle Loader for Hadoop
• Oracle Direct Connector for Hadoop Distributed File System
(HDFS)
• Oracle Data Integrator Application Adapter for Hadoop
• Oracle R Connector for Hadoop

• Does not require Big Data Appliance – can be licensed for
Hadoop running on non-Oracle hardware

Oracle Loader for Hadoop
Use The Cluster
ORACLE LOADER FOR HADOOP
MAP
REDUCE
MAP Last stage in MapReduce
MAP
SHUFFLE
/SORT
REDUCE workflow

Partitioned and non-
MAP REDUCE partitioned tables
MAP REDUCE
SHUFFLE
MAP /SORT REDUCE
Online and offline loads

Oracle Direct Connector for HDFS
Direct Access from Oracle Database

HDFS Oracle Database
SQL Query
SQL access to HDFS
External
Table External table view

Data query or import
DCH
DCH
HDFS
Infini
Band DCH
Client

Oracle Data Integrator
Simplifying MapReduce

Oracle
Data
Integrator Automatically generates
MapReduce code
Oracle
Loader for Manages the process
Hadoop
Loads into Data Warehouse

What is Data Discovery?
Simplified

Quickly explore all relevant data

 Relationships  Advanced search  Structured
undefined or unknown  Faceted navigation  Semi-structured
 No pre-defined model  Analytics  Unstructured
required  Messy data
 Rapid, iterative change  Beyond the data
warehouse

Business Intelligence and Data Discovery
Complementary Solutions, Integrated Business Processes
Known & Clearly Uncertain or
Defined Questions Open-Ended Questions
Who, What, When? Why, How, What Else?

Un-modeled Data Insights yield
Data Discovery
mature models
Diverse and Changing Models and KPIs
Fast Answers to New Questions

New questions
Modeled Data Business Intelligence
require new
Proven Answers to Known
Conforms to a Single Model Questions
data, explorati
on

Oracle Endeca Information Discovery
A platform for data discovery applications across the enterprise

Endeca Information Discovery
(EID) helps organizations
quickly explore all relevant data
• Combine structured & unstructured
data from disparate systems
• Rapidly assemble easy to use
analysis applications
• Automatically organize information
for search, discovery & analysis

Big Data: Why Deeper Analytics?
Communications
Enhanced churn prediction with social network analytics
Consider each customer’s value as part of their social network
Focus retention campaigns on high-value social networks
Identify new prospective high-value customers
Target promotions for upselling and cross-selling to key social network influencers
Identify rotational churners and exclude from retention offers

Insurance
Automated deep analytics for fraud and abuse in insurance claims processing
Enhance fraud analytics by considering text data (assessor’s reports, police reports, witness interviews) in
addition to transaction data
Investigate claims that have the highest expected risk (based on likelihood of fraud and claim size)
Focus scarce investigative resources and create feedback loop for automated analysis

Retail
Identify and respond to shifts in behavior
Combine past and most recent point-of-sale data with customer information
Track and monitor shifts in individual customer behaviors and household purchases
Anticipate new up-sell and cross-sell opportunities

27 | © 2012 Oracle Corporation

Deeper Analytics:
Oracle Advanced Analytics
• Oracle Advanced Analytics extends Oracle
Database into a comprehensive analytical platform
– Predictive analytics, data mining, text mining, statistical
analysis, advanced numerical computations
• Scalable and parallel: analyze huge volumes of
data
• Tightly integrated with SQL: share results of
analytics throughout enterprise
• Built for data analysts


Oracle Advanced Analytics: Data Mining
• 12 cutting-edge machine-learning algorithms
– Parallel model creation
– Data transformation and preparation for data mining
– Scalable mode creation
– Efficiently scoring for large volumes
– Data Miner GUI to build and evaluate data mining models
• Data Mining can provide valuable results:
– Predict customer behavior (Classification)
– Predict or estimate a value (Regression)
– Segment a population (Clustering)
– Identify factors more associated with a business
problem (Attribute Importance)
– Find profiles of targeted people or items (Decision Trees)
– Determine important relationships and “market baskets” within the population (Associations)
– Find fraudulent or “rare events” (Anomaly Detection)

Oracle Advanced Analytics: Oracle R Enterprise
• Oracle R Enterprise brings R’s statistical
functionality closer to the Oracle Database
1. Eliminate R’s memory constraint by enabling R
to work directly & transparently on database objects
– Allows R to run on very large data sets
2. Architected for Enterprise production infrastructure
– Automatically exploits database parallelism without require
parallel R programming
– Build and immediately deploy
3. Oracle R leverages the latest R algorithms and packages
– R is an embedded component of the DBMS server

Big Data Architecture Pattern
Analyze

2 miles

Capture Text Analytics Statistics Data Mining Graph Analytics Spatial Analytics Integrate into
Applications

Operational Systems

Real-time Event Detection
Front End Back End
Data Handlers

Acquire Low value density data Organize
Real-time &
Batch Feeds
Algorithms High value data

Filter
Index ETL
Classify
Correlate

Store
Low density High value Semantic
HDFS value data NoSQL Relational data /Spatial


Big Data Examples
Insurance
Individualize auto-insurance policies based on newly captured vehicle telemetry data
Insurer gains insight into customer’s driving habits delivering
More accurate assessments of risks
Individualized pricing based on actual individual customer driving habits
Guide and motivate individual customers to improve their driving habits

Travel
Optimize buying experience through web log and social media data analysis
Travel site gains insight into customer preferences and desires
Up-selling products by correlating current sales with (subsequent) browsing behavior
Increase browse-to-buy conversions via customized offers and packages
Deliver personalized travel recommendations based on social media data

Games
Collect gaming data to optimize spend within and across games
Games company gains insight into likes, dislikes and relationships of its users
Enhance games to drive customer spend within games
Recommend other content based on analysis of player connections and similar “likes”
Create special offers or packages based on browsing and (non-)buying behavior


Big Data Use Case: Smart Mall Point of Sale Capture:
Customer Profile: • Coupon used
Jane Send Coupon: • 3 items bought (up 1)
Customer enters Doe, 32, Married of item
20% • Increased spend (up $10)
mall area based 2 kids (2&4 yrs) used in the
when
on Cell Phone next 15 minutes
112 113 114 115 coupons
Uses our
116 117 118 119 120
location data 121

126 125 124
127 123 122


Big Data Technology Pattern
Identify User Collection &
Deliver Decision Point
Coupon
Filter Oracle
Decision RTD
CEP
Enrich
Big Data
Collection & Appliance
Decision Points Models
Scores
Analyze

Streaming
Map Big Data Analyze
Reduce Connectors

Social
Feeds


2012 10 bigdata_overview

More Related Content

2012 10 bigdata_overview

Editor's Notes