Oracle Data Integrator For Big Data: Alex Kotopoulis
Oracle Data Integrator For Big Data: Alex Kotopoulis
Oracle Data Integrator For Big Data: Alex Kotopoulis
Alex Kotopoulis
Senior Principal Product Manager
Oracle Data Integrator for Big Data: Hands-on Lab
The following lessons will walk us through various steps that are needed to create the Oracle Data Integrator
mappings, packages and Oracle GoldenGate processes required to load and transform the data.
Architecture Overview
This Hands-on lab is based on a fictional movie streaming company that provides online access
to movie media. The goal of this lab is to load customer activity data that includes movie rating
actions as well as a movie database sourced from a MySQL DB into Hadoop Hive, aggregate
and join average ratings per movie, and load this data into an Oracle DB target.
Flume
Log-stream
Task 3:
Logs HDFS file Hive ext. table Hive Map Task 4:
Activity Activity Avg. Movie OLH Map
Ratings
Hive
Task 2: movie_rating Oracle
Sqoop Map MOVIE_RATING
Task 1:
Hive Topology Task 5:
Task 6:
MySQL movie and Models ODI Package
OGG Load
Movie
1. Review the prepared ODI topology and models connecting to MySQL, Hadoop, and
Oracle DB.
2. Create a mapping that uses Apache Sqoop to load movie data from MySQL to Hive
tables
3. Create a mapping that joins data from customer activity with movie data and
aggregates average movie ratings into a target Hive table.
4. Load the movie rating information from Hive to Oracle DB using Oracle Loader for
Hadoop.
5. Create a package workflow that orchestrates the mappings of tasks 2, 3,and 4 in one
end-to-end load.
6. Create Oracle GoldenGate processes that will detect inserts in the MySQL movie
database and add them to the Hive movie table in realtime.
Overview
Time to Complete
Perform all 6 tasks – 60 Minutes
Prerequisites
Before you begin this tutorial, you should
2. In the Start/Stop Services window, scroll down with arrow keys to ORCL Oracle
Database 12c and select it. Press OK.
Note: The ORCL option is initially not visible, you need to scroll down.
1. Start ODI Studio: On the toolbar single-click (No double-click!) the ODI Studio icon
4. Within the Physical Architecture accordion on the left, expand the Technologies folder
Note: For this HOL the setting “Hide Unused Technologies” has been set to hide all
technologies without a configured dataserver.
5. For this HOL the connectivity information has already been setup. Connectivity
information is setup for Hive, MySQL and Oracle sources and targets. Please expand
these technologies to see the configured dataservers.
Info: A technology is a type of datasource that can be used by ODI as source, target, or other
connection. A data server is an individual server of a given technology, for example a database
server. A data server can have multiple schemas. ODI uses a concept of logical and physical
schemas to allow execution of the same mapping on different environments, for example on
development, QA, and production environments.
7. Click on the JDBC tab on the left to view Hive connection information.
8. Switch to the Designer navigator and open the Models accordion. Expand all models.
Info: A model is a set of metadata definitions regarding a source such as a database schema or
a set of files. A model can contain multiple data stores, which follow the relational concept of
columns and rows and can be database tables, structured files, or XML elements within an
XML document .
Flume
Log-stream
Task 3:
Logs HDFS file Hive ext. table Hive Map Task 4:
Activity Activity Avg. Movie OLH Map
Ratings
Hive
Task 2: movie_rating Oracle
Sqoop Map MOVIE_RATING
Task 1:
Hive Topology Task 5:
Task 6:
MySQL movie and Models ODI Package
OGG Load
Movie
1. The first mapping to be created will load the MySQL Movie table into the Hive movie
table
To create a new mapping, open the Project accordion within the Designer navigator:
Info: A mapping is a data flow to move and transform data from sources into targets. It
contains declarative and graphical rules about data joining and transformation.
9. In the New Mapping dialog change the name to A - Sqoop Movie Load and press OK.
4. For this mapping we will load the table MOVIE from model MySQL to the table movie
within the model HiveMovie.
5. Drag the datastore MOVIE from model MySQL as a source and the datastore movie
from Model HiveMovie as a target onto the mapping diagram panel.
6. Drag from the output port of the source MOVIE to the input port of the target movie.
7. Click OK on the Attribute Matching dialog. ODI will map all same-name fields from
source to target.
8. The logical flow has now been setup. To set physical implementation click on the
Physical tab of the editor.
9. The physical tab shows the actual systems involved in the transformation, in this case
the MySQL source and the Hive target.
In the physical tab users can choose the Load Knowledge Module (LKM) that controls
data movement between systems as well as the Integration Knowledge Module (IKM)
that controls transformation of data.
Select the access point MOVIE_AP to select an LKM.
Note: The KMs that will be used have already been imported into the project.
Info: A knowledge module (KM) is a template that represents best practices to perform an
action in an interface, such as loading from/to a certain technology (Load knowledge module
or LKM), integrating data into the target (Integration Knowledge Module or IKM), checking
data constraints (Check Knowledge Module or CKM), and others. Knowledge modules can be
customized by the user.
10. Go to the Properties Editor underneath the Mapping editor. There is a section Loading
Knowledge Module; you might have to scroll down to see it. Open this section and pick
the LKM SQL Multi-Connect.GLOBAL. This LKM allows the IKM to perform loading
activities.
Note: If the Property Editor is not visible in the UI, go to the menu Window > Properties
to open it.
Depending on the available size of the Property Editor, the sections within the editor
(such as “General”) might be shown as titles or tabs on the left.
12. In the Property Editor open section Integration Knowledge Module and pick IKM SQL
to Hive-HBase-File (SQOOP).GLOBAL.
Note: If this IKM is not visible in the list, make sure that you performed the previous
tutorial step and chose the LKM SQL Multi-Connect.
13. Review the list of IKM Options for this KM. These options are used to configure and
tune the Sqoop process to load data. Change the option TRUNCATE to true.
14. The mapping is now complete. Press the Run button on the taskbar above the mapping
editor. When asked to save your changes, press Yes.
15. Click OK for the run dialog. We will use all defaults and run this mapping on the local
agent that is embedded in the ODI Studio UI. After a moment a Session started dialog
will appear, press OK there as well.
16. To review execution go to the Operator navigator and expand the All Executions node
to see the current execution. The execution might not have finished, then it will show
the icon for an ongoing task. You can refresh the view by pressing to refresh
once or to refresh automatically every 5 seconds.
17. Once the load is complete, the warning icon will be displayed. A warning icon is ok for
this run and still means the load was successful. You can expand the Execution tree to
see the individual tasks of the execution.
18. Go to Designer navigator and Models and right-click HiveMovie.movie. Select View
Data from the menu to see the loaded rows.
19. A Data editor appears with all rows of the movie table in Hive.
Flume
Log-stream
Task 3:
Logs HDFS file Hive ext. table Hive Map Task 4:
Activity Activity Avg. Movie OLH Map
Ratings
Hive
Task 2: movie_rating Oracle
Sqoop Map MOVIE_RATING
Task 1:
Hive Topology Task 5:
Task 6:
MySQL movie and Models ODI Package
OGG Load
Movie
For this mapping we will use two Hive source tables movie and movieapp_log_avro as sources
and the Hive table movie_rating as target.
1. To create a new mapping, open the Project accordion within the Designer navigator,
expand the Big Data HOL > First Folder folder, and right-click on Mappings and click
New Mapping
2. In the New Mapping dialog change the name to B - Hive Calc Ratings and press OK.
3. Open the Models accordion and expand the model HiveMovie. Drag the datastores
movie and movieapp_log_avro as sources and movie_rating as target into the new
mapping.
4. First we would like to filter the movie activities to only include rating activities (ID 1).
For this drag a Filter from the Component Palette behind the movieapp_log_avro
source.
5. Drag the attribute activity from movieapp_log_avro onto the FILTER component. This
will connect the components and use the attribute activity in the filter condition.
6. Select the FILTER component and go to the Property Editor. Expand the section
Condition and complete the condition to movieapp_log_avro.activity = 1
7. We now want to aggregate all activities based on the movie watched and calculate an
average rating. Drag an Aggregate component from the palette onto the mapping.
8. Drag and drop the attributes movieid and rating from movieapp_log_avro directly
onto AGGREGATE in order to map them. They are automatically routed through the
filter.
9. Select the attribute AGGREGATE.rating and go to the Property Editor. Expand the
section Target and complete the expression to AVG (movieapp_log_avro.rating).
Note: The Expression Editor ( icon right of Expression field) can be used to edit
expressions and provides lists of available functions.
10. Now we would like to join the aggregated ratings with the movie table to obtain
enriched movie information. Drag a Join component from the Component Palette to
the mapping.
11. Drop the attributes movie.movie_id and AGGREGATE.movieid onto the JOIN
component. These two attributes will be used to create an equijoin condition.
Note: The join condition can also be changed in the Property Editor
12. Highlight the JOIN component and go to the property editor. Expand the Condition
section and check the property “Generate ANSI Syntax”
13. Drag from the output port of JOIN to the input port of the target movie_rating.
14. Click OK on the Attribute Matching dialog. ODI will map all same-name fields from
source to target.
16. The logical flow has now been setup. Compare the diagram below with your actual
mapping to spot any differences. To set physical implementation click on the Physical
tab of the editor.
17. The physical tab shows that in this mapping everything is performed in the same
system, the Hive server. Because of this no LKM is necessary.
Select the target MOVIE_RATING to select an IKM.
18. Go to the Property Editor and expand the section Integration Knowledge Module. The
correct IKM Hive Control Append.GLOBAL has already been selected by default, no
change is necessary. In the IKM options change TRUNCATE to True, leave all other
options to default.
19. The mapping is now complete. Press the Run button on the taskbar above the mapping
editor. When asked to save your changes, press Yes.
20. Click OK for the run dialog. After a moment a Session started dialog will appear, press
OK there as well.
21. To review execution go to the Operator navigator and expand the All Executions node
to see the current execution.
22. Once the load is complete, expand the Execution tree to see the individual tasks of the
execution. Double-click on Task 50 – Insert (new) rows to see details of the execution
23. In the Session Task Editor that opens click on the Code tab on the left. The generated
SQL code will be shown. The code is generated from the mapping logic and contains a
WHERE condition, JOIN and GROUP BY statement that is directly related to the
mapping components.
25. A data view editor appears with all rows of the movie_rating table in Hive.
Flume
Log-stream
Task 3:
Logs HDFS file Hive ext. table Hive Map Task 4:
Activity Activity Avg. Movie OLH Map
Ratings
Hive
Task 2: movie_rating Oracle
Sqoop Map MOVIE_RATING
Task 1:
Hive Topology Task 5:
Task 6:
MySQL movie and Models ODI Package
OGG Load
Movie
1. To create a new mapping, open the Project accordion within the Designer navigator,
expand the Big Data HOL > First Folder folder, and right-click on Mappings and click
New Mapping
2. In the New Mapping dialog change the name to C - OLH Load Oracle and press OK.
3. Open the Models accordion and expand the model HiveMovie. Drag the datastore
movie_rating as source into the new mapping. Then open model OracleMovie and
drag in the datastore MOVIE_RATING_ODI as a target.
4. Drag from the output port of the source movie_rating to the input port of the target
MOVIE_RATING_ODI.
5. Click OK on the Attribute Matching dialog. ODI will map all same-name fields from
source to target.
6. The logical flow has now been setup. To set physical implementation click on the
Physical tab of the editor.
9. Review the list of IKM options for this KM. These options are used to configure and
tune the OLH or OSCH process to load data. We will use the default setting of OLH
through JDBC. Change the option TRUNCATE to true.
10. The mapping is now complete. Press the Run button on the taskbar above the mapping
editor. When asked to save your changes, press Yes.
11. Click OK for the run dialog. We will use all defaults and run this mapping on the local
agent that is embedded in the ODI Studio UI. After a moment a Session started dialog
will appear, press OK there as well.
12. To review execution go to the Operator navigator and expand the All Executions node
to see the current execution. Wait until the execution is finished, check by refreshing
the view.
14. A data view editor appears with all rows of the table MOVIE_RATING_ODI in Oracle.
Flume
Log-stream
Task 3:
Logs HDFS file Hive ext. table Hive Map Task 4:
Activity Activity Avg. Movie OLH Map
Ratings
Hive
Task 2: movie_rating Oracle
Sqoop Map MOVIE_RATING
Task 1:
Hive Topology Task 5:
Task 6:
MySQL movie and Models ODI Package
OGG Load
Movie
1. To create a new package, open the Designer navigator and Project accordion on the
Big Data HOL / First Folder, then right-click on Packages and select New Package.
Info: A package is a task flow to orchestrate execution of multiple mappings and define
additional logic, such as conditional execution and actions such as sending emails, calling web
services, uploads/downloads, file manipulation, event handling, and others.
Notice the green arrow on this mapping which means it is the first step.
4. Drag the mappings B – Hive Calc Ratings and C – OLH Load Oracle onto the panel
6. Drag and drop from the A - Sqoop Movie Load to the B – Hive Calc Ratings to set the
link. Then drag and drop from B – Hive Calc Ratings to C – OLH Load Oracle.
Note: If you need to rearrange steps, switch back to the select mode ( )
7. The package is now setup and can be executed. To execute the interface click the
Execute ( ) button in the toolbar. When prompted to save click Yes.
8. Click OK in the Run dialog. After a moment a Session started dialog will appear, press
OK there as well.
9. To review execution, go to the Operator navigator and open the latest session
execution. The 3 steps are separately shown and contain the same tasks as the
mapping executions in the prior tutorials.
Flume
Log-stream
Task 3:
Logs HDFS file Hive ext. table Hive Map Task 4:
Activity Activity Avg. Movie OLH Map
Ratings
Hive
Task 2: movie_rating Oracle
Sqoop Map MOVIE_RATING
Task 1:
Hive Topology Task 5:
Task 6: movie ODI Package
MySQL OGG Load and Models
Movie
1. Start a terminal window from the menu bar by single-clicking on the Terminal icon
Note: Ignore any errors shown from the stop and delete commands at the beginning.
6. Start a second terminal window from the menu bar and enter the command:
mysql --user=root --password=welcome1 odidemo
7. Insert a new row into the MySQL table movie by executing the following command:
insert into MOVIE (MOVIE_ID,TITLE,YEAR,BUDGET,GROSS,PLOT_SUMMARY) values
(1, 'Sharknado 2', 2014, 500000, 20000000, 'Flying sharks attack city');
8. Go to the ODI Studio and open the Designer navigator and Models accordion. Right-
click on datastore HiveMovie.movie and select View Data.
9. In the View Data window choose the Move to last row toolbar button ( ). The
inserted row with movie_id 1 should be in the last row. You might have to scroll all the
way down to see it. Refresh the screen if you don’t see the entry.
Summary
You have now successfully completed the Hands on Lab, and have successfully performed an
end-to-end load through a Hadoop Data Reservoir using Oracle Data Integrator and Oracle
GoldenGate. The strength of this products is to provide an easy-to-use approach to developing
performant data integration flows that utilize the strength of the underlying environments
without adding proprietary transformation engines. This is especially relevant in the age of Big
Data.