Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo

1

Data Mesh Part 4:
Data Monolith to Data Mesh
Future of Data Integration Tools and a
focus on Oracle GoldenGate and Stream Processing
Oracle Development
October 2020
Copyright © 2020, Oracle and/or its affiliates1
Channel: https://www.youtube.com/user/oraclegoldengate
Data Mesh Playlist:
https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe

2

for more than 30yrs, a “Hub
and Spoke” Architecture:
Copyright © 2020, Oracle and/or its affiliates2
Is “Hub and Spoke” our destiny
forever…or could we be on a
journey to somewhere else?
ETL Tools…
Kimball EDWs…
Data Lakes…
Data
Hub Vendor DI Tools…
ETL
Hub
ODS
Hub
Big Data
Hub

3

The world around us will keep moving faster and faster…
…IT systems and the data that fuel them are going
to need to become faster and more agile…the
people processes that we follow for DevOps and
DataOps must also be faster and more agile
Our old ways of Data Integration are no longer
sufficient to meet the future.
Copyright © 2020, Oracle and/or its affiliates3

4

Data Fabric | Stream Processing | Data Mesh
Copyright © 2020, Oracle and/or its affiliates4

5

The future of Data Integration…is Mesh
…a new generation of Data Mesh capabilities will
leave behind the Monolithic Tools of the past to
interconnect modern, multi-cloud, data-driven
applications and create innovative, high-value
data products of all types
Copyright © 2020, Oracle and/or its affiliates5

6

Data Mesh Series
6
https://www.youtube.com/playlist?list=PL
bqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Part 1:
CDC and Distributed
Commit Logs
Best Practices for
Maintaining Transaction
Consistency with
Replication and Kafka
Managing Table to Topic
Mappings, Accounting for
Schema Evolution etc.
How to Handle the
Change Stream: Partial
Supplemental CDC,
Caching, Lookups etc.
Deployment Topologies:
Mid-tier, End-Point, Topic
Partitions etc.
Part 2:
Microservices Data
Architecture w/CDC
Microservices Design
Patterns for the Data Tier
Understanding the
GoldenGate Microservices
Architecture
Event Patterns for CDC:
• Transaction Outbox
• CQRS with CDC
• Event Sourcing with CDC
• Saga with CDC
Event Driven Processing,
with CEP, ESP Time Series,
& GoldenGate Stream
Analytics
Part 3:
Demo of Application &
Data Integration Mesh
What’s in a Name? Data
Fabric, Data Hub, Data
Mesh, Service Mesh
Purpose of a Data
Architecture: Operational
vs. Analytic Use Cases
Demonstration Video: Retail / Inventory Analysis
• Sources: Weather.com, Oracle DB, Retail Cloud, AWS S3, Salesforce
• Targets: Data Lake (Object Storage and Autonomous Data
Warehouse), Data Services (Mobile APIs), Stream Analytics
Part 4:
Monolith to Mesh, the
Future of DI Tools
Brief History of
(Monolithic) Integration
Tools
Future of Data Integration
is Mesh
DevOps and Deployment
of the Data Mesh
Business Value of a Data
Mesh (vs. the Monoliths)
Copyright © 2020, Oracle and/or its affiliates

7

Agenda
Copyright © 2020, Oracle and/or its affiliates7
1
2
3
4
Brief History of Integration Tools
Data Mesh as a Next Step
GoldenGate Strategy for Data Mesh
Call to Action

8

Messaging &
Event Systems
Brief History of Enterprise-class Integration Tools
Copyright © 2020, Oracle and/or its affiliates
Biz Process
APIs
Data
Consistency
App
Integration
Data
Integration
Transaction
Processing Systems (TPS)
Enterprise Application
Integration (EAI)
Service-Oriented
Architecture (SOA)
Integration Platform
as a Service (iPaaS)
B2B
Business Process Management (BPM)
Enterprise Service Bus (ESB)
Robotic Process
Automation (RPA)
Message Queue (MQ)
Messaging: Kafka, Pulsar etc.
Extract Transform Load (ETL) Data Integration (DI)
Change Data Capture (CDC)
& Data Replication
Data Federation/Virtualization
Complex Event Processing (CEP)
Big Data Event Stream Processing (ESP)
Stream Integration
1980 1990 2000 2010 2020Historically, integration
tools have focused on
specific tiers of the
software architecture:
Focus on committed, reliable and
ACID-grade data, typically for both
Operational (OLTP) and Analytic
(OLAP) workloads…
Data
Governance
Catalog etc.
Data Quality (DQ)
Master Data (MDM)
Data Hubs etc.
8

9

Meanwhile… Apps shift from Monoliths → Microservices → Mesh
Copyright © 2020, Oracle and/or its affiliates
shared frameworks
shared frameworks
host
App /
Component
App /
Component
App /
Component
host
shared frameworksframeworks
frameworks
App /
Component
App /
Component
App /
Component
App /
Component
App /
Component
App /
Component
host eg; serverless eg; container
Mesh
Controller
Distributed commit log
Classic Monolith “Minilith” / Client Server Microservice & Mesh
Very coarse grained, many components within
single App boundary, layers of shared
frameworks, dependencies on one/more
schema, single host is often mandatory
Coarse grained, some components may be
independently upgraded, but cross-component
dependencies still generally tightly-coupled.
Single host is preferred. Dependencies between
Apps and shared schema still exist.
Component isolation. Encapsulated App
schema, or Event Sourcing instead. All comms
via public APIs. Components may often be
stateless, to run in IT-managed service mesh
(eg; K8S) or in public cloud based serverless
runtimes (fully managed)
shared frameworks
sidecar
Note: a monolith with REST APIs is still a monolith, and putting a monolith in a container/K8S doesn’t make it a microservice!
Monoliths Mesh
9

10

Inflexion Point for DI Tools to Modernize…
Copyright © 2020, Oracle and/or its affiliates
Monolithic Data Hub
Classic Monolith
Data Hub
Client-server / Minilith
Data Mesh
Serverless / Event-driven Microservices
Batch ETL:
• Ab Initio, DataStage Grid
• PowerCenter (pre-10.x)
• Hadoop / Hive (CDH, HDP etc)
Streaming Data & Realtime Events:
• IBM Streams, Software AG Streams
• Lambda Big Data Architecture
(eg: Apache Hadoop + Apache Storm)
Batch ETL/ ELT and Cloud Native:
• PowerCenter (10.x and higher)
• Talend/Stitch, ODI, SAP, SAS etc
• Databricks “Data Lakehouse”
Streaming Data & Realtime Events:
• Qlik/Attunity, IBM IIDR, etc.
• Kappa Big Data Architecture
(including: Confluent KSQL and Flink)
Streaming Data & Realtime Events:
• GoldenGate, GoldenGate Stream Analytics
• AWS Kinesis, Lambda, Glue Streaming
• Event Sourcing Pattern with Domain
Aggregates (bespoke microservices)
Batch ET/ ELT and Cloud Native:
• OCI Data Integration, OCI Data Flow
• AWS Glue, Azure Data Factory
Compute + Storage
Compute
Storage
data data
data
Physical Site / Network Physical Site / Network
Monoliths Mesh
Hub Hub
10

11

Graph of Hubs != Data Mesh
Copyright © 2020, Oracle and/or its affiliates
Monoliths Distributed
Note: a monolith with REST APIs is still a monolith, and putting a monolith in a container/K8S doesn’t make it a microservice!
Hub
Hub
Hub
Compute
Storage
Compute
Shared Storage
Event Logs
Event Logs
VCN 1 VCN 2 VCN 3
data
Service Mesh Serverless
Runtime
Managed
Containers
ingest
prepare
pipe A
pipe B analyze
cleanse
sink
11

12

Agenda
Copyright © 2020, Oracle and/or its affiliates12
1
2
3
4
Brief History of Integration Tools
Data Mesh as a Next Step
GoldenGate Strategy for Data Mesh
Call to Action

13

What is a Data Mesh?
13
Microservice
Patterns
Log-based
Integrations
Polyglot
Replication
Data Mesh is a data-tier architecture to integrate and
govern enterprise data assets across distributed multi-cloud
environments – three defining characteristics are:
Data Product Oriented:
• Low code management of high-value data services that support
operational data stores, analytics, data lakes and data science
De-Centralized Processing:
• De-centralized data processing; no ETL/Hubs/Lake monoliths
• Microservices / Service Mesh and Serverless deployments,
utilization of “sidecar proxy” patterns, encapsulation, etc.
• Simplified continuous integration continuous delivery (CICD) and
lifecycle management (LCM) across public/private clouds
Event-Driven, Stream Centric:
• Real-time by default, batch patterns only when necessary
• Immutable event logs for messaging and data store events
• Trusted data semantics for consistent (ACID) and polyglot data
https://en.wikipedia.org/wiki/ACID
Data
Mesh
Event
Streaming
Immutable
Logs
Data
Replication
Polyglot
Persistence
Edge / 5G
Frameworks
Domain
Driven
Design
Service Mesh
“Sidecars”
Data
Mesh
Data Product Oriented
Eg: distributed
commit log in
Kafka
Eg: Kubernetes
controller +
kubelet
Eg: data
consistency
guarantees
Eg: low code,
data domain
centric
Copyright © 2020, Oracle and/or its affiliates

14

Copyright © 2020, Oracle and/or its affiliates14
Data Monolith
Data as an IT artifact Data as a Product
Monolithic & Centralized Distributed & Decentralized
Waterfall Data/DevOps (dominant) Agile, CICD Data/DevOps
Batch Processing Centric Event-Driven Streaming by Default
OLTP vs. OLAP OLTP ∩ OLAP
Data Meshto…

15

Copyright © 2020, Oracle and/or its affiliates15
Enterprise Data
Producers:
ERP Apps, DBs,
Middleware etc.
IoT Data
Producers:
Devices &
Things
Raw Data
Prepared Data
Canonical Data
Data
Consumers
Data
Domain A
Data
Domain C
Raw Event Consumers
Automated Devices,
Edge Nodes (5G), Machine-to-
Machine (M2M)
Data
Domain B
Business
Data Product
Owners
APIs
M2M
Marts
Models
Analytics
Data Producers Data Mesh
Data Mesh Purpose is for Data Products

16

Raw data, Time Series & Alerting events are pushed
Direct to Database (high fidelity transaction semantics fully preserved)
Optimized for Data Product Owners
Copyright © 2020, Oracle and/or its affiliates16
Enterprise Data
Producers
Detect
Event
Logical
Change
Records
(LCRs)
App
DB
committed!
CDC Replication
Data
Objects
Table
Data
Raw Data
/ Alerts
SQL
Consumers
Raw
Data
Prepared
Data
Canonical
Data
Raw Data (LCR)
Schema Events
(DDL)
Prepared
Data Topics
“Master”
Data Topics
JSON, XML,
Avro, Parquet,
CSV
Prepared data events are pushed
Canonical data events
Speed &
Fidelity
Trusted
Views
Ease of
Consumption
LCR/TFs
Applications,
Data Services
Biz Consumers
Analytics &
Data Marts
Data Science
& Streaming
Applications
DBAs for HA,
DR and OLTP
Data Product Owners /
Managers are responsible
to translate IT deliverables
into trusted data that
delivers business value
Data Model
Object Model
System
Of Record
(SoR)
User
Action
App APIs and
system log events
Data Product
Owners

17

Direct to Database (high fidelity transaction semantics fully preserved)
Decentralized by Design
Copyright © 2020, Oracle and/or its affiliates17
Data Domain
Producers
Detect
Event
Logical
Change
Records
(LCRs)
App
DB
committed!
Data
Objects
Table
Data
Raw Data
/ Alerts
SQL
Consumers
Data Model
Object Model
System
Of Record
(SoR)
User
Action
CDC Replication
Microservices
Edge Compute
or Cloud for
Raw Data
Events
Prepare
Technical Data
Views
LCRs
Business
Data Views
Raw data, Time Series & Alerting events are pushed
Prepared data events are pushed
Canonical data
Events
(ephemeral or persisted)
Stream
Process
Events
(persisted)
Stream
Process
Events
(persisted)
Applications,
Data Services
Biz Consumers
Analytics &
Data Marts
Data Science
& Streaming
Applications
OLTP and
OLAP DBs
Data Products
Payloads

18

DevOps Attributes for a Data Integration Mesh
18
Physical
Deployments
• Runtime should be deployable in most infrastructure
• Also, mesh nodes may also be tightly-coupled to a single infrastructure
(eg; a serverless environment in a proprietary public cloud)
Mesh Controls • “Data mesh controller” should have Observability, Security and Routing controls
on node deployments in various infrastructures (of customer choice)
Data Latency • Event-driven, streaming pipelines by default (single digit seconds)
• Pipelines may execute as micro-batch or batches (eg; fixed windows)
• Large batch processing “by reference” for files or direct-path DB utilities
(not all data transfers are suitable for event protocols)
Enterprise Data
Semantics
• Polyglot data handling means both highly structured and semi-structured
• Must preserve ACID / full relational semantics into Targets
• Must handle non-relational document payloads
Data Governance • Must have comparable data governance features with mainstream data
integration tools (eg; Catalog, Lineage, Data Validation, etc)
Simple DevOps
/ CICD Lifecycle
Customer Managed:
• Runtime is microservices-based
(note: just a REST API is not sufficient)
• Runs in containers (eg; Docker), optional
service mesh: Kubernetes, OpenShift etc.
Cloud Vendor Managed:
• serverless execution
Copyright © 2020, Oracle and/or its affiliates

19

Example 1: Mesh of Data Integration Microservices
Copyright © 2020, Oracle and/or its affiliates
Edge Gateways
Edge
Multi-Cloud
Enterprise Applications
Analytics
single pane
of glass…
filter
capture
λ
dist.
ingest
xform
load
dist.
ingest
ingest
capture
dist.
capture
dist.
capture
capture
replicat
join
load
capture
dist.
capture
Exadata
Cloud@Customer
19

20

Data Mesh Workload Coexistence
Copyright © 2020, Oracle and/or its affiliates
Compute
Storage
Compute
Shared Storage
Event Logs
Event Logs
VCN 1 VCN 2 VCN 3
data
Service Mesh Serverless
Runtime
Managed
Containers
ingest
prepare
pipe A
pipe B analyze
cleanse
sink
Mixed Workloads: that can run in
different infrastructure & “engines”
Customer Managed:
runs “as a Service”
using containers and
service mesh
Vendor Managed:
runs as “Serverless” customer pays only for
the minutes that workloads are running
Data
Products
Hybrid Infrastructure:
runs within managed
cloud containers
20

21

Example 2: Maintain Data Consistency in Pipelines
Copyright © 2020, Oracle and/or its affiliates
SCN – System Change Number, is the Oracle DB clock – every time a transaction commits, the clock
increments. The SCN marks a consistent point in time in the database.
CSN – Commit Sequence Number, is the GoldenGate clock – GG uses CSN during apply to identify
the point in time at which the transaction is committed for maintaining transaction consistency and
data integrity. A CSN is available for all Source DB transactions captured via GoldenGate:
https://docs.oracle.com/en/middleware/goldengate/core/19.1/admin/commit-sequence-number.html
Kafka
Single Partition
A
A { “customer_id": “1" ,
“first_name": “Debra" ,
“last_name": “Burks" ,
“phone": “" , “email":
“debra.burks@yahoo.com" ,
“SCN”: “130” , “CSN” : “130”
}
B
B
{ “customer_id": “1" , “9273
Thome Ave." , “city":
“Orchard Park" , “state":
“NY" , “zip_code": “14127“ ,
“SCN”: “130” , “CSN” : “130”
}
Data
Consumer is
responsible to
maintain
transaction
boundaries
OLTP
Updates and
Deletes both show
up in Kafka as new
messages,
Consumers must
interpret the flags
correctly
21

22

Copyright © 2020, Oracle and/or its affiliates22
Fast data architecture for customer
satisfaction (user activity stream,
trusted transactions)
Solution had to be fast, but most importantly the data had to be correct!
https://www.slideshare.net/r39132/big-data-fast-data-paypal-yow-2018

23

Data
Objects
Table
Data
Raw Data
/ Alerts
SQL
Consumers
Applications,
Data Services
Biz Consumers
Analytics &
Data Marts
Data Science
& Streaming
Applications
DBAs for HA,
DR and OLTP
Data Owners &
Data Products
Example 3: Continuous Transformation and Loading (CTL)
Copyright © 2020, Oracle and/or its affiliates
Enterprise Data
Producers:
ERP Apps, DBs,
Middleware etc.
IoT Data
Producers:
Devices &
Things
Queries
Data Patterns
Windowing
Data Policies
Business Rules
Filter
Aggregate
Correlate/Enrich
Thresholds
Joins
Time Series
Spatial Analytics
Anomalies
Classification
Scoring Models
23

24

Top Opportunities for a Data Mesh Solution
Copyright © 2020, Oracle and/or its affiliates24
IT / tactical solutions:
1. 100% correct, Cloud-native Apps
• Sync of backend OLTP stores
• Multi-active, cross-region DBs
2. Distributed Data Lake
• Fast data from anywhere
3. Shift from Batch to Streaming DI
• ETL to Continuous-TL
4. More Agile, DataOps Lifecycle
• Microservices DevOps benefits
Business Transformation / strategic:
1. Next Best Action
• Realtime customer engagement
2. Smart Inventory Management
• Eliminate supply chain lag
3. Location Intelligence
• Correlate app + device events
4. Predictive Analytics
• Data monetization,
new services for sale

25

Agenda
Copyright © 2020, Oracle and/or its affiliates25
1
2
3
4
Brief History of Integration Tools
Data Mesh as a Next Step
GoldenGate Strategy for Data Mesh
Call to Action

26

Why GoldenGate for a Data
Mesh, what’s so special?
Copyright © 2020, Oracle and/or its affiliates
• Event detection for all popular
data stores, relational and NoSQL
• High speed data replication
• Trusted to never lose data –
availability is a core use case
• Transaction-safe (for dependable
analytics and applications)
• Event stream processing at Web-
scale, on open-source or Cloud
Data Products
26

27

GoldenGate Microservices
27
data replications
bi-directional
ms/sec updates
consistency guarantees
Cloud, Containers or Edge Devices
Extracts Replicats Client Libraries
Native GUIs and Full REST APIs
API Gateway or Proxy Service
GG GUI
GoldenGate is itself a set of microservices
that human users or other services may
interact with
Embedded User Interface:
• C-based microservices with embedded HTTP client for
native JavaScript based GUI
• Oracle Jet frameworks for intuitive interaction model
REST native APIs:
• Fully REST native
• Also available, a Command Line Interface (CLI) produces
REST calls to the native services
Full GoldenGate Replication Capabilities:
• 100% coverage for all traditional GG replication
patterns; fully capable of HA/DR use cases
GG
Admin
Service
GG
Distro
Service
GG
Metrics
Service
GG
Receiver
Service
GG
Service
Manager
Your
Services
Copyright © 2020, Oracle and/or its affiliates

28

GoldenGate Stream Analytics
Trusted, Transaction Outbox for the Whole Enterprise
Copyright © 2020, Oracle and/or its affiliates
DB2/z
Trusted
Replication of
Real-time Data
Transactions & Events
ETL
&ML
Object
Storage
Relational
Non-
Relational
Apps
https://www.oracle.com/middleware/technologies/goldengate.html
DBMS
Cloud
Big Data
NoSQL
Streams
28

29

Single Pane of Glass
Copyright © 2020, Oracle and/or its affiliates29
connect
DB2/z
Data
Objects
Table
Data
Raw Data
/ Alerts
SQL
Consumers
Applications,
Data Services
Biz Consumers
Analytics &
Data Marts
Data Science
& Streaming
Applications
DBAs for HA,
DR and OLTP
Real-Time Stream
Data Processing
Raw
Data
Data Owners &
Data Products
Deploys in a Mesh
Across Containers, Public Clouds and 5G Edge Devices

30

Microservices, Distributed Deployments
GoldenGate: Mesh Platform for Fast Data
Copyright © 2020, Oracle and/or its affiliates30
Single Pane of Glass for Key Personas
OperationsContinuous
Transformation &
Loading (CTL)
Change Data
Capture (CDC)
Governance
Stream Analytics
(ML, Time-Series etc)
Data Replication
(source/target)
Oracle Cloud | 3rd Party Cloud | On-Prem Data Centers | Embedded Edge
Data Engineer Data AnalystData OpsCloud Admin
Sample Apps, Pre-Built Templates, Accelerators
Optional Containers, Service Mesh (K8S Pods) or Serverless
• Workspaces
• Catalog
• Data Verification
• Security
• Management
• Monitoring
• Metering (cloud)
• Administration
DataProducers
(Apps,DBs,IoT,etc)
Data Owners &
Data Products
Data
Objects
Table
Data
Raw Data
/ Alerts
SQL
Consumers
Applications,
Data Services
Biz Consumers
Analytics &
Data Marts
Data Science
& Streaming
Applications
DBAs for HA,
DR and OLTP

31

Data
Consumers
Data Product
Owners
Empower the Data Product Owners
Copyright © 2020, Oracle and/or its affiliates31
The Age of Data Product Managers
https://medium.com/swlh/the-age-of-data-product-
managers-how-to-prepare-24c0fedc163f
More:
• https://airfocus.com/glossary/what-is-a-
data-product-manager/
• https://hbr.org/2018/10/how-to-build-
great-data-products
Data products include:
1. Analytics
• Reports and dashboards
• Historic and real-time
2. Models
• Data domain objects
• Data models / ML features
3. Algorithms (for Business Rules)
• ML models, AI/data science
• Pipeline policies and mappings
4. Data Services and APIs
• Data payloads
• REST APIs, Pub/Sub Topics etc.
Business

32

Compelling Data Products
Copyright © 2020, Oracle and/or its affiliates
Low Code Event-based Data Services Time Series Analytics Event Driven Dashboards
Streaming Data Patterns Geo-Spatial Analysis & Geo-Fencing Predictive Analytics / ML
32

33

Strong Governance
Copyright © 2020, Oracle and/or its affiliates33
Workspace Management
• Low-code User Experience
• Role-based Access Controls
Data Catalog
• Asset Tagging, REST APIs
• Lineage Viewer for Pipelines
Security
• Certificates, Encryption
• Key Stores, Single Sign-On
• LDAP Integrations
• SSL, TLS 1.2/1.3
Conflict Detection
• Automatic CDR or User Defined
Data Verification
• Hash-based Digital Compare Tool
• Fast, 100% Certainty

34

Event-driven Data Mesh
Use Cases for Event Driven, Stream Data Processing
Copyright © 2020, Oracle and/or its affiliates
driven by business
demands for more flexible,
easier-to-change, loosely-
coupled applications […]
there has been a
widespread awakening to
the benefits of Event
Driven Architecture (EDA)
for increasing the
scalability and agility of
business systems.
W. Roy Schulte (of Gartner), March 2020:
EDA is Suddenly Popular Will Stream Analytics be Next?
Data & Microservice Events
Event/Data
Pipelines
Geospatial
Actions
Time-Series
Analysis
Real-time
AI/ML
Continuous
ETL
34

35

Agenda
Copyright © 2020, Oracle and/or its affiliates35
1
2
3
4
Brief History of Integration Tools
Data Mesh as a Next Step
GoldenGate Strategy for Data Mesh
Call to Action

36

More “Hub and Spoke”?
Copyright © 2020, Oracle and/or its affiliates36
Which journey are you on?
Data
Hub
…or, are you ready to
move to the Mesh?

37

What Next?
Copyright © 2020, Oracle and/or its affiliates
Ask Oracle for a demo!
Oracle #1 in Data Fabric Strategy
GoldenGate YouTube | Data Mesh:
Free Trial of GoldenGate Streaming:
https://www.youtube.com/playlist?list=PL
bqmhpwYrlZJ-583p3KQGDAd6038i1ywe
https://cloudmarketplace.oracle.com/marke
tplace/en_US/listing/70961838
https://blogs.oracle.com/dataintegration/oracl
e_forresterwave_datafabric_2020?xd_co_f=66
bcf41f-e285-4ccc-a5b5-1c790cab0db0
Customer Success
37

38

Data Mesh Part 4 Monolith to Mesh

39

Our mission is to help people see
data in new ways, discover insights,
unlock endless possibilities.

More Related Content

Data Mesh Part 4 Monolith to Mesh

  • 1. Data Mesh Part 4: Data Monolith to Data Mesh Future of Data Integration Tools and a focus on Oracle GoldenGate and Stream Processing Oracle Development October 2020 Copyright © 2020, Oracle and/or its affiliates1 Channel: https://www.youtube.com/user/oraclegoldengate Data Mesh Playlist: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
  • 2. for more than 30yrs, a “Hub and Spoke” Architecture: Copyright © 2020, Oracle and/or its affiliates2 Is “Hub and Spoke” our destiny forever…or could we be on a journey to somewhere else? ETL Tools… Kimball EDWs… Data Lakes… Data Hub Vendor DI Tools… ETL Hub ODS Hub Big Data Hub
  • 3. The world around us will keep moving faster and faster… …IT systems and the data that fuel them are going to need to become faster and more agile…the people processes that we follow for DevOps and DataOps must also be faster and more agile Our old ways of Data Integration are no longer sufficient to meet the future. Copyright © 2020, Oracle and/or its affiliates3
  • 4. Data Fabric | Stream Processing | Data Mesh Copyright © 2020, Oracle and/or its affiliates4
  • 5. The future of Data Integration…is Mesh …a new generation of Data Mesh capabilities will leave behind the Monolithic Tools of the past to interconnect modern, multi-cloud, data-driven applications and create innovative, high-value data products of all types Copyright © 2020, Oracle and/or its affiliates5
  • 6. Data Mesh Series 6 https://www.youtube.com/playlist?list=PL bqmhpwYrlZJ-583p3KQGDAd6038i1ywe Part 1: CDC and Distributed Commit Logs Best Practices for Maintaining Transaction Consistency with Replication and Kafka Managing Table to Topic Mappings, Accounting for Schema Evolution etc. How to Handle the Change Stream: Partial Supplemental CDC, Caching, Lookups etc. Deployment Topologies: Mid-tier, End-Point, Topic Partitions etc. Part 2: Microservices Data Architecture w/CDC Microservices Design Patterns for the Data Tier Understanding the GoldenGate Microservices Architecture Event Patterns for CDC: • Transaction Outbox • CQRS with CDC • Event Sourcing with CDC • Saga with CDC Event Driven Processing, with CEP, ESP Time Series, & GoldenGate Stream Analytics Part 3: Demo of Application & Data Integration Mesh What’s in a Name? Data Fabric, Data Hub, Data Mesh, Service Mesh Purpose of a Data Architecture: Operational vs. Analytic Use Cases Demonstration Video: Retail / Inventory Analysis • Sources: Weather.com, Oracle DB, Retail Cloud, AWS S3, Salesforce • Targets: Data Lake (Object Storage and Autonomous Data Warehouse), Data Services (Mobile APIs), Stream Analytics Part 4: Monolith to Mesh, the Future of DI Tools Brief History of (Monolithic) Integration Tools Future of Data Integration is Mesh DevOps and Deployment of the Data Mesh Business Value of a Data Mesh (vs. the Monoliths) Copyright © 2020, Oracle and/or its affiliates
  • 7. Agenda Copyright © 2020, Oracle and/or its affiliates7 1 2 3 4 Brief History of Integration Tools Data Mesh as a Next Step GoldenGate Strategy for Data Mesh Call to Action
  • 8. Messaging & Event Systems Brief History of Enterprise-class Integration Tools Copyright © 2020, Oracle and/or its affiliates Biz Process APIs Data Consistency App Integration Data Integration Transaction Processing Systems (TPS) Enterprise Application Integration (EAI) Service-Oriented Architecture (SOA) Integration Platform as a Service (iPaaS) B2B Business Process Management (BPM) Enterprise Service Bus (ESB) Robotic Process Automation (RPA) Message Queue (MQ) Messaging: Kafka, Pulsar etc. Extract Transform Load (ETL) Data Integration (DI) Change Data Capture (CDC) & Data Replication Data Federation/Virtualization Complex Event Processing (CEP) Big Data Event Stream Processing (ESP) Stream Integration 1980 1990 2000 2010 2020Historically, integration tools have focused on specific tiers of the software architecture: Focus on committed, reliable and ACID-grade data, typically for both Operational (OLTP) and Analytic (OLAP) workloads… Data Governance Catalog etc. Data Quality (DQ) Master Data (MDM) Data Hubs etc. 8
  • 9. Meanwhile… Apps shift from Monoliths → Microservices → Mesh Copyright © 2020, Oracle and/or its affiliates shared frameworks shared frameworks host App / Component App / Component App / Component host shared frameworksframeworks frameworks App / Component App / Component App / Component App / Component App / Component App / Component host eg; serverless eg; container Mesh Controller Distributed commit log Classic Monolith “Minilith” / Client Server Microservice & Mesh Very coarse grained, many components within single App boundary, layers of shared frameworks, dependencies on one/more schema, single host is often mandatory Coarse grained, some components may be independently upgraded, but cross-component dependencies still generally tightly-coupled. Single host is preferred. Dependencies between Apps and shared schema still exist. Component isolation. Encapsulated App schema, or Event Sourcing instead. All comms via public APIs. Components may often be stateless, to run in IT-managed service mesh (eg; K8S) or in public cloud based serverless runtimes (fully managed) shared frameworks sidecar Note: a monolith with REST APIs is still a monolith, and putting a monolith in a container/K8S doesn’t make it a microservice! Monoliths Mesh 9
  • 10. Inflexion Point for DI Tools to Modernize… Copyright © 2020, Oracle and/or its affiliates Monolithic Data Hub Classic Monolith Data Hub Client-server / Minilith Data Mesh Serverless / Event-driven Microservices Batch ETL: • Ab Initio, DataStage Grid • PowerCenter (pre-10.x) • Hadoop / Hive (CDH, HDP etc) Streaming Data & Realtime Events: • IBM Streams, Software AG Streams • Lambda Big Data Architecture (eg: Apache Hadoop + Apache Storm) Batch ETL/ ELT and Cloud Native: • PowerCenter (10.x and higher) • Talend/Stitch, ODI, SAP, SAS etc • Databricks “Data Lakehouse” Streaming Data & Realtime Events: • Qlik/Attunity, IBM IIDR, etc. • Kappa Big Data Architecture (including: Confluent KSQL and Flink) Streaming Data & Realtime Events: • GoldenGate, GoldenGate Stream Analytics • AWS Kinesis, Lambda, Glue Streaming • Event Sourcing Pattern with Domain Aggregates (bespoke microservices) Batch ET/ ELT and Cloud Native: • OCI Data Integration, OCI Data Flow • AWS Glue, Azure Data Factory Compute + Storage Compute Storage data data data Physical Site / Network Physical Site / Network Monoliths Mesh Hub Hub 10
  • 11. Graph of Hubs != Data Mesh Copyright © 2020, Oracle and/or its affiliates Monoliths Distributed Note: a monolith with REST APIs is still a monolith, and putting a monolith in a container/K8S doesn’t make it a microservice! Hub Hub Hub Compute Storage Compute Shared Storage Event Logs Event Logs VCN 1 VCN 2 VCN 3 data Service Mesh Serverless Runtime Managed Containers ingest prepare pipe A pipe B analyze cleanse sink 11
  • 12. Agenda Copyright © 2020, Oracle and/or its affiliates12 1 2 3 4 Brief History of Integration Tools Data Mesh as a Next Step GoldenGate Strategy for Data Mesh Call to Action
  • 13. What is a Data Mesh? 13 Microservice Patterns Log-based Integrations Polyglot Replication Data Mesh is a data-tier architecture to integrate and govern enterprise data assets across distributed multi-cloud environments – three defining characteristics are: Data Product Oriented: • Low code management of high-value data services that support operational data stores, analytics, data lakes and data science De-Centralized Processing: • De-centralized data processing; no ETL/Hubs/Lake monoliths • Microservices / Service Mesh and Serverless deployments, utilization of “sidecar proxy” patterns, encapsulation, etc. • Simplified continuous integration continuous delivery (CICD) and lifecycle management (LCM) across public/private clouds Event-Driven, Stream Centric: • Real-time by default, batch patterns only when necessary • Immutable event logs for messaging and data store events • Trusted data semantics for consistent (ACID) and polyglot data https://en.wikipedia.org/wiki/ACID Data Mesh Event Streaming Immutable Logs Data Replication Polyglot Persistence Edge / 5G Frameworks Domain Driven Design Service Mesh “Sidecars” Data Mesh Data Product Oriented Eg: distributed commit log in Kafka Eg: Kubernetes controller + kubelet Eg: data consistency guarantees Eg: low code, data domain centric Copyright © 2020, Oracle and/or its affiliates
  • 14. Copyright © 2020, Oracle and/or its affiliates14 Data Monolith Data as an IT artifact Data as a Product Monolithic & Centralized Distributed & Decentralized Waterfall Data/DevOps (dominant) Agile, CICD Data/DevOps Batch Processing Centric Event-Driven Streaming by Default OLTP vs. OLAP OLTP ∩ OLAP Data Meshto…
  • 15. Copyright © 2020, Oracle and/or its affiliates15 Enterprise Data Producers: ERP Apps, DBs, Middleware etc. IoT Data Producers: Devices & Things Raw Data Prepared Data Canonical Data Data Consumers Data Domain A Data Domain C Raw Event Consumers Automated Devices, Edge Nodes (5G), Machine-to- Machine (M2M) Data Domain B Business Data Product Owners APIs M2M Marts Models Analytics Data Producers Data Mesh Data Mesh Purpose is for Data Products
  • 16. Raw data, Time Series & Alerting events are pushed Direct to Database (high fidelity transaction semantics fully preserved) Optimized for Data Product Owners Copyright © 2020, Oracle and/or its affiliates16 Enterprise Data Producers Detect Event Logical Change Records (LCRs) App DB committed! CDC Replication Data Objects Table Data Raw Data / Alerts SQL Consumers Raw Data Prepared Data Canonical Data Raw Data (LCR) Schema Events (DDL) Prepared Data Topics “Master” Data Topics JSON, XML, Avro, Parquet, CSV Prepared data events are pushed Canonical data events Speed & Fidelity Trusted Views Ease of Consumption LCR/TFs Applications, Data Services Biz Consumers Analytics & Data Marts Data Science & Streaming Applications DBAs for HA, DR and OLTP Data Product Owners / Managers are responsible to translate IT deliverables into trusted data that delivers business value Data Model Object Model System Of Record (SoR) User Action App APIs and system log events Data Product Owners
  • 17. Direct to Database (high fidelity transaction semantics fully preserved) Decentralized by Design Copyright © 2020, Oracle and/or its affiliates17 Data Domain Producers Detect Event Logical Change Records (LCRs) App DB committed! Data Objects Table Data Raw Data / Alerts SQL Consumers Data Model Object Model System Of Record (SoR) User Action CDC Replication Microservices Edge Compute or Cloud for Raw Data Events Prepare Technical Data Views LCRs Business Data Views Raw data, Time Series & Alerting events are pushed Prepared data events are pushed Canonical data Events (ephemeral or persisted) Stream Process Events (persisted) Stream Process Events (persisted) Applications, Data Services Biz Consumers Analytics & Data Marts Data Science & Streaming Applications OLTP and OLAP DBs Data Products Payloads
  • 18. DevOps Attributes for a Data Integration Mesh 18 Physical Deployments • Runtime should be deployable in most infrastructure • Also, mesh nodes may also be tightly-coupled to a single infrastructure (eg; a serverless environment in a proprietary public cloud) Mesh Controls • “Data mesh controller” should have Observability, Security and Routing controls on node deployments in various infrastructures (of customer choice) Data Latency • Event-driven, streaming pipelines by default (single digit seconds) • Pipelines may execute as micro-batch or batches (eg; fixed windows) • Large batch processing “by reference” for files or direct-path DB utilities (not all data transfers are suitable for event protocols) Enterprise Data Semantics • Polyglot data handling means both highly structured and semi-structured • Must preserve ACID / full relational semantics into Targets • Must handle non-relational document payloads Data Governance • Must have comparable data governance features with mainstream data integration tools (eg; Catalog, Lineage, Data Validation, etc) Simple DevOps / CICD Lifecycle Customer Managed: • Runtime is microservices-based (note: just a REST API is not sufficient) • Runs in containers (eg; Docker), optional service mesh: Kubernetes, OpenShift etc. Cloud Vendor Managed: • serverless execution Copyright © 2020, Oracle and/or its affiliates
  • 19. Example 1: Mesh of Data Integration Microservices Copyright © 2020, Oracle and/or its affiliates Edge Gateways Edge Multi-Cloud Enterprise Applications Analytics single pane of glass… filter capture λ dist. ingest xform load dist. ingest ingest capture dist. capture dist. capture capture replicat join load capture dist. capture Exadata Cloud@Customer 19
  • 20. Data Mesh Workload Coexistence Copyright © 2020, Oracle and/or its affiliates Compute Storage Compute Shared Storage Event Logs Event Logs VCN 1 VCN 2 VCN 3 data Service Mesh Serverless Runtime Managed Containers ingest prepare pipe A pipe B analyze cleanse sink Mixed Workloads: that can run in different infrastructure & “engines” Customer Managed: runs “as a Service” using containers and service mesh Vendor Managed: runs as “Serverless” customer pays only for the minutes that workloads are running Data Products Hybrid Infrastructure: runs within managed cloud containers 20
  • 21. Example 2: Maintain Data Consistency in Pipelines Copyright © 2020, Oracle and/or its affiliates SCN – System Change Number, is the Oracle DB clock – every time a transaction commits, the clock increments. The SCN marks a consistent point in time in the database. CSN – Commit Sequence Number, is the GoldenGate clock – GG uses CSN during apply to identify the point in time at which the transaction is committed for maintaining transaction consistency and data integrity. A CSN is available for all Source DB transactions captured via GoldenGate: https://docs.oracle.com/en/middleware/goldengate/core/19.1/admin/commit-sequence-number.html Kafka Single Partition A A { “customer_id": “1" , “first_name": “Debra" , “last_name": “Burks" , “phone": “" , “email": “debra.burks@yahoo.com" , “SCN”: “130” , “CSN” : “130” } B B { “customer_id": “1" , “9273 Thome Ave." , “city": “Orchard Park" , “state": “NY" , “zip_code": “14127“ , “SCN”: “130” , “CSN” : “130” } Data Consumer is responsible to maintain transaction boundaries OLTP Updates and Deletes both show up in Kafka as new messages, Consumers must interpret the flags correctly 21
  • 22. Copyright © 2020, Oracle and/or its affiliates22 Fast data architecture for customer satisfaction (user activity stream, trusted transactions) Solution had to be fast, but most importantly the data had to be correct! https://www.slideshare.net/r39132/big-data-fast-data-paypal-yow-2018
  • 23. Data Objects Table Data Raw Data / Alerts SQL Consumers Applications, Data Services Biz Consumers Analytics & Data Marts Data Science & Streaming Applications DBAs for HA, DR and OLTP Data Owners & Data Products Example 3: Continuous Transformation and Loading (CTL) Copyright © 2020, Oracle and/or its affiliates Enterprise Data Producers: ERP Apps, DBs, Middleware etc. IoT Data Producers: Devices & Things Queries Data Patterns Windowing Data Policies Business Rules Filter Aggregate Correlate/Enrich Thresholds Joins Time Series Spatial Analytics Anomalies Classification Scoring Models 23
  • 24. Top Opportunities for a Data Mesh Solution Copyright © 2020, Oracle and/or its affiliates24 IT / tactical solutions: 1. 100% correct, Cloud-native Apps • Sync of backend OLTP stores • Multi-active, cross-region DBs 2. Distributed Data Lake • Fast data from anywhere 3. Shift from Batch to Streaming DI • ETL to Continuous-TL 4. More Agile, DataOps Lifecycle • Microservices DevOps benefits Business Transformation / strategic: 1. Next Best Action • Realtime customer engagement 2. Smart Inventory Management • Eliminate supply chain lag 3. Location Intelligence • Correlate app + device events 4. Predictive Analytics • Data monetization, new services for sale
  • 25. Agenda Copyright © 2020, Oracle and/or its affiliates25 1 2 3 4 Brief History of Integration Tools Data Mesh as a Next Step GoldenGate Strategy for Data Mesh Call to Action
  • 26. Why GoldenGate for a Data Mesh, what’s so special? Copyright © 2020, Oracle and/or its affiliates • Event detection for all popular data stores, relational and NoSQL • High speed data replication • Trusted to never lose data – availability is a core use case • Transaction-safe (for dependable analytics and applications) • Event stream processing at Web- scale, on open-source or Cloud Data Products 26
  • 27. GoldenGate Microservices 27 data replications bi-directional ms/sec updates consistency guarantees Cloud, Containers or Edge Devices Extracts Replicats Client Libraries Native GUIs and Full REST APIs API Gateway or Proxy Service GG GUI GoldenGate is itself a set of microservices that human users or other services may interact with Embedded User Interface: • C-based microservices with embedded HTTP client for native JavaScript based GUI • Oracle Jet frameworks for intuitive interaction model REST native APIs: • Fully REST native • Also available, a Command Line Interface (CLI) produces REST calls to the native services Full GoldenGate Replication Capabilities: • 100% coverage for all traditional GG replication patterns; fully capable of HA/DR use cases GG Admin Service GG Distro Service GG Metrics Service GG Receiver Service GG Service Manager Your Services Copyright © 2020, Oracle and/or its affiliates
  • 28. GoldenGate Stream Analytics Trusted, Transaction Outbox for the Whole Enterprise Copyright © 2020, Oracle and/or its affiliates DB2/z Trusted Replication of Real-time Data Transactions & Events ETL &ML Object Storage Relational Non- Relational Apps https://www.oracle.com/middleware/technologies/goldengate.html DBMS Cloud Big Data NoSQL Streams 28
  • 29. Single Pane of Glass Copyright © 2020, Oracle and/or its affiliates29 connect DB2/z Data Objects Table Data Raw Data / Alerts SQL Consumers Applications, Data Services Biz Consumers Analytics & Data Marts Data Science & Streaming Applications DBAs for HA, DR and OLTP Real-Time Stream Data Processing Raw Data Data Owners & Data Products Deploys in a Mesh Across Containers, Public Clouds and 5G Edge Devices
  • 30. Microservices, Distributed Deployments GoldenGate: Mesh Platform for Fast Data Copyright © 2020, Oracle and/or its affiliates30 Single Pane of Glass for Key Personas OperationsContinuous Transformation & Loading (CTL) Change Data Capture (CDC) Governance Stream Analytics (ML, Time-Series etc) Data Replication (source/target) Oracle Cloud | 3rd Party Cloud | On-Prem Data Centers | Embedded Edge Data Engineer Data AnalystData OpsCloud Admin Sample Apps, Pre-Built Templates, Accelerators Optional Containers, Service Mesh (K8S Pods) or Serverless • Workspaces • Catalog • Data Verification • Security • Management • Monitoring • Metering (cloud) • Administration DataProducers (Apps,DBs,IoT,etc) Data Owners & Data Products Data Objects Table Data Raw Data / Alerts SQL Consumers Applications, Data Services Biz Consumers Analytics & Data Marts Data Science & Streaming Applications DBAs for HA, DR and OLTP
  • 31. Data Consumers Data Product Owners Empower the Data Product Owners Copyright © 2020, Oracle and/or its affiliates31 The Age of Data Product Managers https://medium.com/swlh/the-age-of-data-product- managers-how-to-prepare-24c0fedc163f More: • https://airfocus.com/glossary/what-is-a- data-product-manager/ • https://hbr.org/2018/10/how-to-build- great-data-products Data products include: 1. Analytics • Reports and dashboards • Historic and real-time 2. Models • Data domain objects • Data models / ML features 3. Algorithms (for Business Rules) • ML models, AI/data science • Pipeline policies and mappings 4. Data Services and APIs • Data payloads • REST APIs, Pub/Sub Topics etc. Business
  • 32. Compelling Data Products Copyright © 2020, Oracle and/or its affiliates Low Code Event-based Data Services Time Series Analytics Event Driven Dashboards Streaming Data Patterns Geo-Spatial Analysis & Geo-Fencing Predictive Analytics / ML 32
  • 33. Strong Governance Copyright © 2020, Oracle and/or its affiliates33 Workspace Management • Low-code User Experience • Role-based Access Controls Data Catalog • Asset Tagging, REST APIs • Lineage Viewer for Pipelines Security • Certificates, Encryption • Key Stores, Single Sign-On • LDAP Integrations • SSL, TLS 1.2/1.3 Conflict Detection • Automatic CDR or User Defined Data Verification • Hash-based Digital Compare Tool • Fast, 100% Certainty
  • 34. Event-driven Data Mesh Use Cases for Event Driven, Stream Data Processing Copyright © 2020, Oracle and/or its affiliates driven by business demands for more flexible, easier-to-change, loosely- coupled applications […] there has been a widespread awakening to the benefits of Event Driven Architecture (EDA) for increasing the scalability and agility of business systems. W. Roy Schulte (of Gartner), March 2020: EDA is Suddenly Popular Will Stream Analytics be Next? Data & Microservice Events Event/Data Pipelines Geospatial Actions Time-Series Analysis Real-time AI/ML Continuous ETL 34
  • 35. Agenda Copyright © 2020, Oracle and/or its affiliates35 1 2 3 4 Brief History of Integration Tools Data Mesh as a Next Step GoldenGate Strategy for Data Mesh Call to Action
  • 36. More “Hub and Spoke”? Copyright © 2020, Oracle and/or its affiliates36 Which journey are you on? Data Hub …or, are you ready to move to the Mesh?
  • 37. What Next? Copyright © 2020, Oracle and/or its affiliates Ask Oracle for a demo! Oracle #1 in Data Fabric Strategy GoldenGate YouTube | Data Mesh: Free Trial of GoldenGate Streaming: https://www.youtube.com/playlist?list=PL bqmhpwYrlZJ-583p3KQGDAd6038i1ywe https://cloudmarketplace.oracle.com/marke tplace/en_US/listing/70961838 https://blogs.oracle.com/dataintegration/oracl e_forresterwave_datafabric_2020?xd_co_f=66 bcf41f-e285-4ccc-a5b5-1c790cab0db0 Customer Success 37
  • 39. Our mission is to help people see data in new ways, discover insights, unlock endless possibilities.