Oracle Data Integrator For Big Data: Alex Kotopoulis

Oracle Data Integrator for Big Data
Alex Kotopoulis
Senior Principal Product Manager
Oracle Data Integrator for Big Data: Hands-on Lab
Hands on Lab - Oracle Data Integrator for Big Data

Abstract: This lab will highlight to Developers, DBAs and Architects some of the best practices for implementing a
Big Data implementation of a Data Reservoir using E-LT techniques to improve performance and reduce data
integration costs using Oracle Data Integrator. In this lab, participants will walk through the steps that are needed
to load and transform sources into a Hadoop cluster, transform it, and load it into a relational target.
The following lessons will walk us through various steps that are needed to create the Oracle Data Integrator
mappings, packages and Oracle GoldenGate processes required to load and transform the data.
HANDS ON LAB - ORACLE DATA INTEGRATOR FOR BIG DATA ............................................................................................... 2

ARCHITECTURE OVERVIEW ............................................................................................................................................ 3
OVERVIEW ................................................................................................................................................................. 3
Time to Complete ............................................................................................................................................... 3
Prerequisites ...................................................................................................................................................... 3
TASK 0: PREPARATION STEPS ......................................................................................................................................... 4
TASK 1: REVIEW TOPOLOGY AND MODEL SETUP ............................................................................................................... 5
TASK 2: LOAD HIVE TABLES USING SQOOP ....................................................................................................................... 8
TASK 3: TRANSFORMING DATA WITHIN HIVE .................................................................................................................. 18
TASK 4: LOAD ORACLE FROM HIVE TABLES USING ORACLE LOADER FOR HADOOP .................................................................. 29
TASK 5: CREATING A NEW ODI PACKAGE TO EXECUTE END-TO-END LOAD. ........................................................................... 36
TASK 6: REPLICATING NEW RECORDS TO HIVE USING ORACLE GOLDENGATE ......................................................................... 39
SUMMARY ............................................................................................................................................................... 42
Last Updated: 29-Aug-14 Page 2 of 42

Architecture Overview
This Hands-on lab is based on a fictional movie streaming company that provides online access
to movie media. The goal of this lab is to load customer activity data that includes movie rating
actions as well as a movie database sourced from a MySQL DB into Hadoop Hive, aggregate
and join average ratings per movie, and load this data into an Oracle DB target.
Flume
Log-stream
Task 3:
Logs HDFS file Hive ext. table Hive Map Task 4:
Activity Activity Avg. Movie OLH Map
Ratings
Hive
Task 2: movie_rating Oracle
Sqoop Map MOVIE_RATING
Task 1:
Hive Topology Task 5:
Task 6:
MySQL movie and Models ODI Package
OGG Load
Movie
We are distributing the work into 6 tasks:
1. Review the prepared ODI topology and models connecting to MySQL, Hadoop, and
Oracle DB.
2. Create a mapping that uses Apache Sqoop to load movie data from MySQL to Hive
tables
3. Create a mapping that joins data from customer activity with movie data and
aggregates average movie ratings into a target Hive table.
4. Load the movie rating information from Hive to Oracle DB using Oracle Loader for
Hadoop.
5. Create a package workflow that orchestrates the mappings of tasks 2, 3,and 4 in one
end-to-end load.
6. Create Oracle GoldenGate processes that will detect inserts in the MySQL movie
database and add them to the Hive movie table in realtime.
Overview
Time to Complete
Perform all 6 tasks – 60 Minutes
Prerequisites
Before you begin this tutorial, you should
 Have a general understanding of RDBMS and Hadoop concepts.

 Have a general understanding of ETL concepts.

Task 0: Preparation Steps

In these steps you will clean and setup the environment for this exercise
1. Double-click Start/Stop Services on the desktop
2. In the Start/Stop Services window, scroll down with arrow keys to ORCL Oracle
Database 12c and select it. Press OK.
Note: The ORCL option is initially not visible, you need to scroll down.

Task 1: Review Topology and Model Setup

The connectivity information has already been setup for this hand on lab. This information is
setup within the Topology Manager of ODI. The next steps will walk you through how to
review this information.
1. Start ODI Studio: On the toolbar single-click (No double-click!) the ODI Studio icon
2. Go to the Topology Navigator and press Connect to Repository…
3. In the ODI Login dialog press OK.
4. Within the Physical Architecture accordion on the left, expand the Technologies folder
Note: For this HOL the setting “Hide Unused Technologies” has been set to hide all
technologies without a configured dataserver.

5. For this HOL the connectivity information has already been setup. Connectivity
information is setup for Hive, MySQL and Oracle sources and targets. Please expand
these technologies to see the configured dataservers.
Info: A technology is a type of datasource that can be used by ODI as source, target, or other
connection. A data server is an individual server of a given technology, for example a database
server. A data server can have multiple schemas. ODI uses a concept of logical and physical
schemas to allow execution of the same mapping on different environments, for example on
development, QA, and production environments.
6. Double-click on the Hive data server ( ) to review settings

7. Click on the JDBC tab on the left to view Hive connection information.
8. Switch to the Designer navigator and open the Models accordion. Expand all models.
Info: A model is a set of metadata definitions regarding a source such as a database schema or
a set of files. A model can contain multiple data stores, which follow the relational concept of
columns and rows and can be database tables, structured files, or XML elements within an
XML document .

Task 2: Load Hive Tables using Sqoop

In this task we use Apache Sqoop to load data from an external DB into Hive tables. Sqoop
starts parallel Map-Reduce processes in Hadoop to load chunks of the DB data with high
performance. ODI can generate Sqoop code transparently from a Mapping by selecting the
correct Knowledge module.
Flume
Log-stream
Task 3:
Ratings
Hive
Task 1:
Task 6:
OGG Load
Movie
1. The first mapping to be created will load the MySQL Movie table into the Hive movie
table
To create a new mapping, open the Project accordion within the Designer navigator:

2. Expand the Big Data HOL > First Folder folder
3. Right click on Mappings and click New Mapping
Info: A mapping is a data flow to move and transform data from sources into targets. It
contains declarative and graphical rules about data joining and transformation.
9. In the New Mapping dialog change the name to A - Sqoop Movie Load and press OK.

4. For this mapping we will load the table MOVIE from model MySQL to the table movie
within the model HiveMovie.
To view the models open the Models accordion
5. Drag the datastore MOVIE from model MySQL as a source and the datastore movie
from Model HiveMovie as a target onto the mapping diagram panel.

6. Drag from the output port of the source MOVIE to the input port of the target movie.
7. Click OK on the Attribute Matching dialog. ODI will map all same-name fields from
source to target.
8. The logical flow has now been setup. To set physical implementation click on the
Physical tab of the editor.

9. The physical tab shows the actual systems involved in the transformation, in this case
the MySQL source and the Hive target.
In the physical tab users can choose the Load Knowledge Module (LKM) that controls
data movement between systems as well as the Integration Knowledge Module (IKM)
that controls transformation of data.
Select the access point MOVIE_AP to select an LKM.
Note: The KMs that will be used have already been imported into the project.
Info: A knowledge module (KM) is a template that represents best practices to perform an
action in an interface, such as loading from/to a certain technology (Load knowledge module
or LKM), integrating data into the target (Integration Knowledge Module or IKM), checking
data constraints (Check Knowledge Module or CKM), and others. Knowledge modules can be
customized by the user.

10. Go to the Properties Editor underneath the Mapping editor. There is a section Loading
Knowledge Module; you might have to scroll down to see it. Open this section and pick
the LKM SQL Multi-Connect.GLOBAL. This LKM allows the IKM to perform loading
activities.
Note: If the Property Editor is not visible in the UI, go to the menu Window > Properties
to open it.
Depending on the available size of the Property Editor, the sections within the editor
(such as “General”) might be shown as titles or tabs on the left.
11. Select the target datastore MOVIE.

12. In the Property Editor open section Integration Knowledge Module and pick IKM SQL
to Hive-HBase-File (SQOOP).GLOBAL.
Note: If this IKM is not visible in the list, make sure that you performed the previous
tutorial step and chose the LKM SQL Multi-Connect.
13. Review the list of IKM Options for this KM. These options are used to configure and
tune the Sqoop process to load data. Change the option TRUNCATE to true.

14. The mapping is now complete. Press the Run button on the taskbar above the mapping
editor. When asked to save your changes, press Yes.
15. Click OK for the run dialog. We will use all defaults and run this mapping on the local
agent that is embedded in the ODI Studio UI. After a moment a Session started dialog
will appear, press OK there as well.
16. To review execution go to the Operator navigator and expand the All Executions node
to see the current execution. The execution might not have finished, then it will show
the icon for an ongoing task. You can refresh the view by pressing to refresh
once or to refresh automatically every 5 seconds.

17. Once the load is complete, the warning icon will be displayed. A warning icon is ok for
this run and still means the load was successful. You can expand the Execution tree to
see the individual tasks of the execution.
18. Go to Designer navigator and Models and right-click HiveMovie.movie. Select View
Data from the menu to see the loaded rows.

19. A Data editor appears with all rows of the movie table in Hive.

Task 3: Transforming Data within Hive

In this task we will design a transformation in an ODI mapping that will be executed in Hive.
Please note that with ODI you can create logical mappings declaratively without considering
any implementation details; those can be added later in the physical design.
Flume
Log-stream
Task 3:
Ratings
Hive
Task 1:
Task 6:
OGG Load
Movie
For this mapping we will use two Hive source tables movie and movieapp_log_avro as sources
and the Hive table movie_rating as target.
1. To create a new mapping, open the Project accordion within the Designer navigator,
expand the Big Data HOL > First Folder folder, and right-click on Mappings and click
New Mapping

2. In the New Mapping dialog change the name to B - Hive Calc Ratings and press OK.
3. Open the Models accordion and expand the model HiveMovie. Drag the datastores
movie and movieapp_log_avro as sources and movie_rating as target into the new
mapping.
4. First we would like to filter the movie activities to only include rating activities (ID 1).
For this drag a Filter from the Component Palette behind the movieapp_log_avro
source.

5. Drag the attribute activity from movieapp_log_avro onto the FILTER component. This
will connect the components and use the attribute activity in the filter condition.
6. Select the FILTER component and go to the Property Editor. Expand the section
Condition and complete the condition to movieapp_log_avro.activity = 1

7. We now want to aggregate all activities based on the movie watched and calculate an
average rating. Drag an Aggregate component from the palette onto the mapping.
8. Drag and drop the attributes movieid and rating from movieapp_log_avro directly
onto AGGREGATE in order to map them. They are automatically routed through the
filter.

9. Select the attribute AGGREGATE.rating and go to the Property Editor. Expand the
section Target and complete the expression to AVG (movieapp_log_avro.rating).
Note: The Expression Editor ( icon right of Expression field) can be used to edit
expressions and provides lists of available functions.
10. Now we would like to join the aggregated ratings with the movie table to obtain
enriched movie information. Drag a Join component from the Component Palette to
the mapping.

11. Drop the attributes movie.movie_id and AGGREGATE.movieid onto the JOIN
component. These two attributes will be used to create an equijoin condition.
Note: The join condition can also be changed in the Property Editor
12. Highlight the JOIN component and go to the property editor. Expand the Condition
section and check the property “Generate ANSI Syntax”

13. Drag from the output port of JOIN to the input port of the target movie_rating.
source to target.
15. Drag the remaining unmapped attribute AGGREGATE.rating over to

movie_rating.avg_rating.

16. The logical flow has now been setup. Compare the diagram below with your actual
mapping to spot any differences. To set physical implementation click on the Physical
tab of the editor.
17. The physical tab shows that in this mapping everything is performed in the same
system, the Hive server. Because of this no LKM is necessary.
Select the target MOVIE_RATING to select an IKM.

18. Go to the Property Editor and expand the section Integration Knowledge Module. The
correct IKM Hive Control Append.GLOBAL has already been selected by default, no
change is necessary. In the IKM options change TRUNCATE to True, leave all other
options to default.
20. Click OK for the run dialog. After a moment a Session started dialog will appear, press
OK there as well.
to see the current execution.

22. Once the load is complete, expand the Execution tree to see the individual tasks of the
execution. Double-click on Task 50 – Insert (new) rows to see details of the execution
23. In the Session Task Editor that opens click on the Code tab on the left. The generated
SQL code will be shown. The code is generated from the mapping logic and contains a
WHERE condition, JOIN and GROUP BY statement that is directly related to the
mapping components.

24. Go to Designer navigator and Models accordion and right-click

HiveMovie.movie_rating. Select View Data from the menu to see the loaded rows.
25. A data view editor appears with all rows of the movie_rating table in Hive.

Task 4: Load Oracle from Hive Tables using Oracle Loader

for Hadoop
In this task we load the results of the prior Hive transformation from the resulting Hive table
into the Oracle DB data warehouse. We are using the Oracle Loader for Hadoop (OLH) build
data loader which uses mechanisms specifically optimized for Oracle DB.
Flume
Log-stream
Task 3:
Ratings
Hive
Task 1:
Task 6:
OGG Load
Movie
1. To create a new mapping, open the Project accordion within the Designer navigator,
expand the Big Data HOL > First Folder folder, and right-click on Mappings and click
New Mapping

2. In the New Mapping dialog change the name to C - OLH Load Oracle and press OK.
3. Open the Models accordion and expand the model HiveMovie. Drag the datastore
movie_rating as source into the new mapping. Then open model OracleMovie and
drag in the datastore MOVIE_RATING_ODI as a target.
4. Drag from the output port of the source movie_rating to the input port of the target
MOVIE_RATING_ODI.

source to target.
6. The logical flow has now been setup. To set physical implementation click on the
Physical tab of the editor.

7. Select the access point MOVIE_RATING_AP (only MOVIE_RA is visible) to select an

LKM. Go to the Property Editor and choose LKM SQL Multi-Connect.GLOBAL because
the IKM will perform the load.
8. Select the target datastore MOVIE_RATING_ODI, then go to the Property Editor to

select an IKM. Choose IKM File-Hive to Oracle (OLH-OSCH).GLOBAL.
Note: If this IKM is not visible in the list, make sure that you performed the previous
tutorial step and chose the LKM SQL Multi-Connect.

9. Review the list of IKM options for this KM. These options are used to configure and
tune the OLH or OSCH process to load data. We will use the default setting of OLH
through JDBC. Change the option TRUNCATE to true.
11. Click OK for the run dialog. We will use all defaults and run this mapping on the local
agent that is embedded in the ODI Studio UI. After a moment a Session started dialog
will appear, press OK there as well.

to see the current execution. Wait until the execution is finished, check by refreshing
the view.
13. Go to Designer navigator and Models and right-click

OracleMovie.MOVIE_RATING_ODI. Select View Data from the menu to see the loaded
rows.

14. A data view editor appears with all rows of the table MOVIE_RATING_ODI in Oracle.

Task 5: Creating a new ODI Package to execute end-to-end

load.
Now that the mappings have been created we can create a package within ODI that will
execute all of the interfaces in order.
Flume
Log-stream
Task 3:
Ratings
Hive
Task 1:
Task 6:
OGG Load
Movie
1. To create a new package, open the Designer navigator and Project accordion on the
Big Data HOL / First Folder, then right-click on Packages and select New Package.
Info: A package is a task flow to orchestrate execution of multiple mappings and define
additional logic, such as conditional execution and actions such as sending emails, calling web
services, uploads/downloads, file manipulation, event handling, and others.

2. Name the package Big Data Load and press OK.
3. Click the Diagram tab

Drag and Drop the interfaces from the left onto the diagram panel, starting with the
mapping A - Sqoop Movie Load.
Notice the green arrow on this mapping which means it is the first step.
4. Drag the mappings B – Hive Calc Ratings and C – OLH Load Oracle onto the panel
5. Click the OK arrow toolbar button to select the order of precedence.

6. Drag and drop from the A - Sqoop Movie Load to the B – Hive Calc Ratings to set the
link. Then drag and drop from B – Hive Calc Ratings to C – OLH Load Oracle.
Note: If you need to rearrange steps, switch back to the select mode ( )
7. The package is now setup and can be executed. To execute the interface click the
Execute ( ) button in the toolbar. When prompted to save click Yes.
8. Click OK in the Run dialog. After a moment a Session started dialog will appear, press
OK there as well.
9. To review execution, go to the Operator navigator and open the latest session
execution. The 3 steps are separately shown and contain the same tasks as the
mapping executions in the prior tutorials.

Task 6: Replicating new records to Hive using Oracle

GoldenGate
Oracle GoldenGate allows the capture of completed transactions from a source database, and
the replication of these changes to a target system. In this tutorial we will replicate inserts into
the MOVIE table in MySQL to the respective movie table in Hive. Oracle GoldenGate provides
this capability through the GoldenGate Adapters and implemented examples for Hive, HDFS,
and HBase.
Flume
Log-stream
Task 3:
Ratings
Hive
Task 1:
Task 6: movie ODI Package
MySQL OGG Load and Models
Movie
The GoldenGate processes in detail are as following:
Capture Trail Pump Java

Extract File Extract Adapter
EMOV TM PMOV
HDFS file Hive table

MySQL table ogg_movie movie
MOVIE
EMOV.prm PMOV.prm PMOV.properties myhivehandler.jar

1. Start a terminal window from the menu bar by single-clicking on the Terminal icon
2. In the terminal window, execute the commands:

cd /u01/ogg
ggsci
3. Start the GoldenGate manager process by executing

start mgr
4. Add and start the GoldenGate extract processes by executing

obey dirprm/bigdata.oby
Note: Ignore any errors shown from the stop and delete commands at the beginning.
5. See the status of the newly added processes by executing

info all

6. Start a second terminal window from the menu bar and enter the command:
mysql --user=root --password=welcome1 odidemo
7. Insert a new row into the MySQL table movie by executing the following command:
insert into MOVIE (MOVIE_ID,TITLE,YEAR,BUDGET,GROSS,PLOT_SUMMARY) values
(1, 'Sharknado 2', 2014, 500000, 20000000, 'Flying sharks attack city');
Note: Alternatively you can execute the following command:

source ~/movie/moviework/ogg/mysql_insert_movie.sql;
8. Go to the ODI Studio and open the Designer navigator and Models accordion. Right-
click on datastore HiveMovie.movie and select View Data.

9. In the View Data window choose the Move to last row toolbar button ( ). The
inserted row with movie_id 1 should be in the last row. You might have to scroll all the
way down to see it. Refresh the screen if you don’t see the entry.
Summary
You have now successfully completed the Hands on Lab, and have successfully performed an
end-to-end load through a Hadoop Data Reservoir using Oracle Data Integrator and Oracle
GoldenGate. The strength of this products is to provide an easy-to-use approach to developing
performant data integration flows that utilize the strength of the underlying environments
without adding proprietary transformation engines. This is especially relevant in the age of Big
Data.

Oracle Data Integrator For Big Data: Alex Kotopoulis

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Oracle Data Integrator For Big Data: Alex Kotopoulis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Oracle Data Integrator For Big Data: Alex Kotopoulis

Uploaded by

Copyright:

Available Formats

Oracle Data Integrator for Big Data

Hands on Lab - Oracle Data Integrator for Big Data

HANDS ON LAB - ORACLE DATA INTEGRATOR FOR BIG DATA ............................................................................................... 2

Last Updated: 29-Aug-14 Page 2 of 42

We are distributing the work into 6 tasks:

 Have a general understanding of RDBMS and Hadoop concepts.

Last Updated: 29-Aug-14 Page 3 of 42

Task 0: Preparation Steps

1. Double-click Start/Stop Services on the desktop

Last Updated: 29-Aug-14 Page 4 of 42

Task 1: Review Topology and Model Setup

2. Go to the Topology Navigator and press Connect to Repository…

3. In the ODI Login dialog press OK.

Last Updated: 29-Aug-14 Page 5 of 42

6. Double-click on the Hive data server ( ) to review settings

Last Updated: 29-Aug-14 Page 6 of 42

Last Updated: 29-Aug-14 Page 7 of 42

Task 2: Load Hive Tables using Sqoop

Last Updated: 29-Aug-14 Page 8 of 42

2. Expand the Big Data HOL > First Folder folder

3. Right click on Mappings and click New Mapping

Last Updated: 29-Aug-14 Page 9 of 42

To view the models open the Models accordion

Last Updated: 29-Aug-14 Page 10 of 42

Last Updated: 29-Aug-14 Page 11 of 42

Last Updated: 29-Aug-14 Page 12 of 42

11. Select the target datastore MOVIE.

Last Updated: 29-Aug-14 Page 13 of 42

Last Updated: 29-Aug-14 Page 14 of 42

Last Updated: 29-Aug-14 Page 15 of 42

Last Updated: 29-Aug-14 Page 16 of 42

Last Updated: 29-Aug-14 Page 17 of 42

Task 3: Transforming Data within Hive

Last Updated: 29-Aug-14 Page 18 of 42

Last Updated: 29-Aug-14 Page 19 of 42

Last Updated: 29-Aug-14 Page 20 of 42

Last Updated: 29-Aug-14 Page 21 of 42

Last Updated: 29-Aug-14 Page 22 of 42

Last Updated: 29-Aug-14 Page 23 of 42

15. Drag the remaining unmapped attribute AGGREGATE.rating over to

Last Updated: 29-Aug-14 Page 24 of 42

Last Updated: 29-Aug-14 Page 25 of 42

Last Updated: 29-Aug-14 Page 26 of 42

Last Updated: 29-Aug-14 Page 27 of 42

24. Go to Designer navigator and Models accordion and right-click

Last Updated: 29-Aug-14 Page 28 of 42

Task 4: Load Oracle from Hive Tables using Oracle Loader

Last Updated: 29-Aug-14 Page 29 of 42

Last Updated: 29-Aug-14 Page 30 of 42

Last Updated: 29-Aug-14 Page 31 of 42

7. Select the access point MOVIE_RATING_AP (only MOVIE_RA is visible) to select an

8. Select the target datastore MOVIE_RATING_ODI, then go to the Property Editor to

Last Updated: 29-Aug-14 Page 32 of 42

Last Updated: 29-Aug-14 Page 33 of 42

13. Go to Designer navigator and Models and right-click

Last Updated: 29-Aug-14 Page 34 of 42

Last Updated: 29-Aug-14 Page 35 of 42

Task 5: Creating a new ODI Package to execute end-to-end