Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo

1

Julien Testut
Senior Principal Product Manager, Oracle Development
with thanks to Jagdev Dhillon & Tianshu Li
Productizing AsyncAPI for
Data Replication / CDC
Copyright © 2023, Oracle and/or its affiliates

2

The following is intended to outline our general product
direction. It is intended for information purposes only, and may
not be incorporated into any contract. It is not a commitment to
deliver any material, code, or functionality, and should not be
relied upon in making purchasing decisions. The development,
release, timing, and pricing of any features or functionality
described for Oracle’s products may change and remains at the
sole discretion of Oracle Corporation.
The materials in this presentation pertain to Oracle Health, Oracle, Oracle Cerner, and Cerner Enviza which are all wholly owned
subsidiaries of Oracle Corporation. Nothing in this presentation should be taken as indicating that any decisions regarding the
integration of any EMEA Cerner and/or Enviza entities have been made where an integration has not already occurred.
Safe harbor statement
Copyright © 2023, Oracle and/or its affiliates

3

Agenda
1. Brief background on CDC & GoldenGate
2. Why AsyncAPI?
3. AsyncAPI with GoldenGate
4. Roadmap
Copyright © 2023, Oracle and/or its affiliates

4

Databases, Change Data Capture (CDC) & Data Replication
Copyright © 2023, Oracle and/or its affiliates
• In databases, the most important system events are Transactions (Tx’s)
• DML (data manipulation language) – inserts, updates, deletes
• DDL (data definition language) – schema changes, alter table, etc.
• All OLTP databases, and most databases overall, have centralized logging
• Users/applications can open short/long-running Tx’s, affecting single rows or billions
• Committed transactions are when these database events achieve “durability”
• Change Data Capture (CDC) is fundamentally about “capturing” the Tx’s from the source
• Typically, at the moment of “commit” in the logs – when the Tx’s become durable
• Replication is about transmitting the “captured” Tx’s to other places (e.g.; Targets)
• Some databases have their own CDC & Replication layers (e.g.; proprietary to only that DB)
• CDC/Replication tools are built to work with many different databases

5

CDC Tools and Oracle GoldenGate (GG)
Copyright © 2023, Oracle and/or its affiliates
• CDC tools are a long-time part of the enterprise software domain
• GoldenGate was one of the first tools in this area, dating back from the mid-1990’s
• Open-source tools like Debezium have gained popularity since ~2020
• GoldenGate CDC/Replication technology was acquired by Oracle in 2009
• To provide a solution for “logical replication” supporting High Availability in distributed DBs
• To replace older technologies: Oracle Streams, Oracle CDC
• To become the foundation of Oracle’s data integration portfolio
• Today, GoldenGate is ~$1B global ecosystem for mission-critical systems
• ~10,000 customers, 180+ countries, historically mostly for large multi-nationals and Gov’s
• GG runs in most of your banks, payments systems, ecommerce retailers, telcos, airlines, etc.
• GG supports 100’s of different databases, clouds, warehouses, lakehouses, messaging, etc.
• GG is more than CDC/Replication, it also includes Data Integration, Streaming Data, Cloud
Pipelines, Data Governance, and Real-time Observability

6

History of GoldenGate and “Why AsyncAPI Now?”
Copyright © 2023, Oracle and/or its affiliates
• 1995 – 2005 – Decade of database replication
• Use cases focus on DML/DDL replication between databases
• 2005 – 2015 – Emergence of MPP and Big Data
• Massive expansions into Data Warehousing and eventually Hadoop-based Big Data tech
• 2015 – today – Shift to Microservices, Cloud and distributed data architecture
• GG is refactored to a microservices architecture and massive growth in cloud delivery
• Adoption of AsyncAPI is a natural part of the evolution for GoldenGate
• CDC/Replication is inherently an asynchronous activity
• More and more use cases featuring “Event Sourcing” designs (Tx Outbox, Saga Patterns)
• Event streams are becoming a valued part of a “Data Product” architecture
• Kafka is non-transactional (eg; DMLs) and difficult to maintain “C” (consistency) in ACID
• Kafka is sometimes “overkill” / too much overhead for many use cases

7

AsyncAPI with GoldenGate
Copyright © 2023, Oracle and/or its affiliates
Automated, machine-generated client applications to a stream of exactly-once
transaction events – JSON formatted, via REST pub/sub AsyncAPI
Standardized
Pub/Sub APIs
CDC/Replication
of Transactions
Real-time data events
Message
Data Tx’s
• Inserts
• Updates
• Deletes
• GET/PUT
• Schema
Changes
YAML
descriptor
Real-time data events
to any data consumers
Automated code generators:
Bypass need for Kafka
for data consumers
Benefits
Simple event sourcing &
transaction outbox patterns
AsyncAPI is the future of
event-driven architecture

8

Two important “big picture” use cases for CDC/Replication + AsyncAPI
Copyright © 2023, Oracle and/or its affiliates
Data Product
Consumers
Data Product
Producers
Tx’s
Apps
JSON
etc.
Application
Microservice
Consumers are App Microservices, for
CQRS/Outbox type design patterns
Analytic or data science consumers,
or for bespoke clients
Why bind from data tier? (a) commit point for durable data, (b) lowest latency
transmission, and (c) very high levels of automation
streaming data
products
Parquet
etc.

9

DB Event Streams with CDC and AsyncAPI
Copyright © 2023, Oracle and/or its affiliates
Data Product
Producers
Apps
DB Transactions/Commits
Base Tables
(Application Schema)
GoldenGate
Microservices
Data Product
Consumers
Apps
Transform
data, react to
changes
Tx’s
JSON as Tx’s
Data and Schema
changes are in stream
Pros:
• Easy for Producer
(highly automated, very
low effort to publish)
• No application changes
are required
• Schema metadata in
payloads (ie: the
Consumer can decide
how to handle schema
changes)
Cons:
• Consumer binding is to
Base Tables, exposing
some implementation
details such as Structure
DML
events
JSON

10

Transaction Outbox (without CDC or AsyncAPI tooling)
Copyright © 2023, Oracle and/or its affiliates
Data Product
Producers
Apps (code)
Base Tables
Data Product
Consumers
Apps
Outbox
Table
JSON
*JSON may be
in consumer
or producer
formats
JSON as
Biz Objects
Pros:
• Outbox pattern ensures
data consistency at
commit-point
• JSON schema may be
defined by either
Producer or Consumer
Cons:
• Latency & load – when
using a polling-based
relay service
• Burden of change lifecycle
is on the Producer
JSON
DB Transactions/Commits
JSON
commit
dev
code
Broker
Message
Relay
Read: polling
for changes
Publish:
distribute

11

Transaction Outbox with CDC & AsyncAPI
Copyright © 2023, Oracle and/or its affiliates
Data Product
Producers
Apps (code)
Base Tables
GoldenGate
Microservices
Data Product
Consumers
Apps
Outbox
Table
JSON
*JSON may be
in consumer
or producer
formats
JSON as
Biz Objects
in CDC payload
Pros:
• CDC + AsyncAPI provides
very low latency and less
impact on source DB
• Easy for Consumer
(eg; in many cases,
Consumer may define
format of the JSON)
• Outbox pattern may be
favored by Producer
application developers
(for Tx consistency)
Cons:
• Burden of change lifecycle
is on the Producer
JSON
DB Transactions/Commits
JSON
commit
dev
code
DML
events

12

Using AsyncAPI with GoldenGate
Copyright © 2023, Oracle and/or its affiliates
Data Product
Consumers
Data Product
Producers
1. Decide which Databases,
Tables & Columns to publish
2. Use GoldenGate Admin
Microservice to setup the
“capture” trail (GG’s ledger)
3. Use GoldenGate Distribution
Microservice to define the
AsyncAPI Channel (and
associate it to GG Trail)
4. Use REST/GUI to browse AsyncAPI
Channels
5. Authenticate using GG user/role and
download YAML document
6. Build or generate client to receive and
parse the Tx payload from GG Data Streams
7. Consume transactions
Tx’s
Apps
JSON

13

“Data Producer” using GoldenGate to create AsyncAPI Channels
Copyright © 2023, Oracle and/or its affiliates
Data Product
Producers
Part of GG microservice
called “Distribution Service”
Create “Data Streams”
Associated to a “GG Trail”

14

“Data Producer” can filter payloads, per Channel
Copyright © 2023, Oracle and/or its affiliates
Data Product
Producers
Filtering can happen at
Object level… Tables,
Columns, Data Values (eg;
sensitive data, or JSON
payloads etc.)

15

GoldenGate (typically) publishes via WebSocket Secure (WSS)
Copyright © 2023, Oracle and/or its affiliates
Oracle objective is to have
WSS client template to be
contributed back to
AsyncAPI for all to use

16

“Data Producer” using GoldenGate to create AsyncAPI Channels
Copyright © 2023, Oracle and/or its affiliates
Individual Data Streams
consist of a GoldenGate
payload, any of 6 possible
schema types
Data Product
Producers

17

GoldenGate generated client code
Copyright © 2023, Oracle and/or its affiliates
Data Product
Consumers
Example JavaScript to
define and initialize
the WebSocket for
streaming

18

GoldenGate Data Streams Payload
Copyright © 2023, Oracle and/or its affiliates
Record consist of
before/after images
and op_type
information (type of
transaction)
Data Product
Consumers

19

Example JSON records
Copyright © 2023, Oracle and/or its affiliates
Data Product
Consumers

20

Options for the service
Copyright © 2023, Oracle and/or its affiliates
Data Product
Consumers
• Connection protocol (set by the Producer)
• ws – WebSocket or wss – WebSocket Secure
• Payload service levels (set by Producer)
• Exact-once – GoldenGate will handle all tasks for deduplication
of records to guarantee that DML/DDL events are only sent
exactly one time
• At-most-once – Will tolerate gaps in streaming data records,
e.g.; gaps in data that may have been purged by the Producer
• At-least-once – Service Producer may from time-to-time re-
process source DML/DDL and this SLA may send duplicates
• Start position (set by Consumer)
• Current – will begin streaming Tx’s from current position
• Earliest – will fetch Tx’s starting from earliest available in the
GoldenGate Trail (retention is defined by Data Producers)
Data Product
Producers

21

Roadmap – what’s on the horizon
Copyright © 2023, Oracle and/or its affiliates
• Formatters
• For App Consumers: JSON (default), Avro, XML, Protobuf, etc.
• For Analytic Consumers: Parquet, Iceberg, Delta, etc.
• CloudEvents payload format option
• Adds more overhead, latency, etc.
• May help simplify how some clients can parse the transactions
• Business object semantics
• When integrated with Oracle JSON-Relational duality
(producers may choose to share Business Object structure,
rather than the physical tables)
• Stream processing sink
• AsyncAPI channels as output of streaming data pipelines
(pipelines enable data integration/prep or analytic actions)

22

Apidays Paris 2023 - Productizing AsyncAPI for Data Replication and Changed Data Capture, Julien Testut, Oracle

23

Our mission is to help people see
data in new ways, discover insights,
unlock endless possibilities.
Copyright © 2023, Oracle and/or its affiliates

More Related Content

Apidays Paris 2023 - Productizing AsyncAPI for Data Replication and Changed Data Capture, Julien Testut, Oracle

  • 1. Julien Testut Senior Principal Product Manager, Oracle Development with thanks to Jagdev Dhillon & Tianshu Li Productizing AsyncAPI for Data Replication / CDC Copyright © 2023, Oracle and/or its affiliates
  • 2. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. The materials in this presentation pertain to Oracle Health, Oracle, Oracle Cerner, and Cerner Enviza which are all wholly owned subsidiaries of Oracle Corporation. Nothing in this presentation should be taken as indicating that any decisions regarding the integration of any EMEA Cerner and/or Enviza entities have been made where an integration has not already occurred. Safe harbor statement Copyright © 2023, Oracle and/or its affiliates
  • 3. Agenda 1. Brief background on CDC & GoldenGate 2. Why AsyncAPI? 3. AsyncAPI with GoldenGate 4. Roadmap Copyright © 2023, Oracle and/or its affiliates
  • 4. Databases, Change Data Capture (CDC) & Data Replication Copyright © 2023, Oracle and/or its affiliates • In databases, the most important system events are Transactions (Tx’s) • DML (data manipulation language) – inserts, updates, deletes • DDL (data definition language) – schema changes, alter table, etc. • All OLTP databases, and most databases overall, have centralized logging • Users/applications can open short/long-running Tx’s, affecting single rows or billions • Committed transactions are when these database events achieve “durability” • Change Data Capture (CDC) is fundamentally about “capturing” the Tx’s from the source • Typically, at the moment of “commit” in the logs – when the Tx’s become durable • Replication is about transmitting the “captured” Tx’s to other places (e.g.; Targets) • Some databases have their own CDC & Replication layers (e.g.; proprietary to only that DB) • CDC/Replication tools are built to work with many different databases
  • 5. CDC Tools and Oracle GoldenGate (GG) Copyright © 2023, Oracle and/or its affiliates • CDC tools are a long-time part of the enterprise software domain • GoldenGate was one of the first tools in this area, dating back from the mid-1990’s • Open-source tools like Debezium have gained popularity since ~2020 • GoldenGate CDC/Replication technology was acquired by Oracle in 2009 • To provide a solution for “logical replication” supporting High Availability in distributed DBs • To replace older technologies: Oracle Streams, Oracle CDC • To become the foundation of Oracle’s data integration portfolio • Today, GoldenGate is ~$1B global ecosystem for mission-critical systems • ~10,000 customers, 180+ countries, historically mostly for large multi-nationals and Gov’s • GG runs in most of your banks, payments systems, ecommerce retailers, telcos, airlines, etc. • GG supports 100’s of different databases, clouds, warehouses, lakehouses, messaging, etc. • GG is more than CDC/Replication, it also includes Data Integration, Streaming Data, Cloud Pipelines, Data Governance, and Real-time Observability
  • 6. History of GoldenGate and “Why AsyncAPI Now?” Copyright © 2023, Oracle and/or its affiliates • 1995 – 2005 – Decade of database replication • Use cases focus on DML/DDL replication between databases • 2005 – 2015 – Emergence of MPP and Big Data • Massive expansions into Data Warehousing and eventually Hadoop-based Big Data tech • 2015 – today – Shift to Microservices, Cloud and distributed data architecture • GG is refactored to a microservices architecture and massive growth in cloud delivery • Adoption of AsyncAPI is a natural part of the evolution for GoldenGate • CDC/Replication is inherently an asynchronous activity • More and more use cases featuring “Event Sourcing” designs (Tx Outbox, Saga Patterns) • Event streams are becoming a valued part of a “Data Product” architecture • Kafka is non-transactional (eg; DMLs) and difficult to maintain “C” (consistency) in ACID • Kafka is sometimes “overkill” / too much overhead for many use cases
  • 7. AsyncAPI with GoldenGate Copyright © 2023, Oracle and/or its affiliates Automated, machine-generated client applications to a stream of exactly-once transaction events – JSON formatted, via REST pub/sub AsyncAPI Standardized Pub/Sub APIs CDC/Replication of Transactions Real-time data events Message Data Tx’s • Inserts • Updates • Deletes • GET/PUT • Schema Changes YAML descriptor Real-time data events to any data consumers Automated code generators: Bypass need for Kafka for data consumers Benefits Simple event sourcing & transaction outbox patterns AsyncAPI is the future of event-driven architecture
  • 8. Two important “big picture” use cases for CDC/Replication + AsyncAPI Copyright © 2023, Oracle and/or its affiliates Data Product Consumers Data Product Producers Tx’s Apps JSON etc. Application Microservice Consumers are App Microservices, for CQRS/Outbox type design patterns Analytic or data science consumers, or for bespoke clients Why bind from data tier? (a) commit point for durable data, (b) lowest latency transmission, and (c) very high levels of automation streaming data products Parquet etc.
  • 9. DB Event Streams with CDC and AsyncAPI Copyright © 2023, Oracle and/or its affiliates Data Product Producers Apps DB Transactions/Commits Base Tables (Application Schema) GoldenGate Microservices Data Product Consumers Apps Transform data, react to changes Tx’s JSON as Tx’s Data and Schema changes are in stream Pros: • Easy for Producer (highly automated, very low effort to publish) • No application changes are required • Schema metadata in payloads (ie: the Consumer can decide how to handle schema changes) Cons: • Consumer binding is to Base Tables, exposing some implementation details such as Structure DML events JSON
  • 10. Transaction Outbox (without CDC or AsyncAPI tooling) Copyright © 2023, Oracle and/or its affiliates Data Product Producers Apps (code) Base Tables Data Product Consumers Apps Outbox Table JSON *JSON may be in consumer or producer formats JSON as Biz Objects Pros: • Outbox pattern ensures data consistency at commit-point • JSON schema may be defined by either Producer or Consumer Cons: • Latency & load – when using a polling-based relay service • Burden of change lifecycle is on the Producer JSON DB Transactions/Commits JSON commit dev code Broker Message Relay Read: polling for changes Publish: distribute
  • 11. Transaction Outbox with CDC & AsyncAPI Copyright © 2023, Oracle and/or its affiliates Data Product Producers Apps (code) Base Tables GoldenGate Microservices Data Product Consumers Apps Outbox Table JSON *JSON may be in consumer or producer formats JSON as Biz Objects in CDC payload Pros: • CDC + AsyncAPI provides very low latency and less impact on source DB • Easy for Consumer (eg; in many cases, Consumer may define format of the JSON) • Outbox pattern may be favored by Producer application developers (for Tx consistency) Cons: • Burden of change lifecycle is on the Producer JSON DB Transactions/Commits JSON commit dev code DML events
  • 12. Using AsyncAPI with GoldenGate Copyright © 2023, Oracle and/or its affiliates Data Product Consumers Data Product Producers 1. Decide which Databases, Tables & Columns to publish 2. Use GoldenGate Admin Microservice to setup the “capture” trail (GG’s ledger) 3. Use GoldenGate Distribution Microservice to define the AsyncAPI Channel (and associate it to GG Trail) 4. Use REST/GUI to browse AsyncAPI Channels 5. Authenticate using GG user/role and download YAML document 6. Build or generate client to receive and parse the Tx payload from GG Data Streams 7. Consume transactions Tx’s Apps JSON
  • 13. “Data Producer” using GoldenGate to create AsyncAPI Channels Copyright © 2023, Oracle and/or its affiliates Data Product Producers Part of GG microservice called “Distribution Service” Create “Data Streams” Associated to a “GG Trail”
  • 14. “Data Producer” can filter payloads, per Channel Copyright © 2023, Oracle and/or its affiliates Data Product Producers Filtering can happen at Object level… Tables, Columns, Data Values (eg; sensitive data, or JSON payloads etc.)
  • 15. GoldenGate (typically) publishes via WebSocket Secure (WSS) Copyright © 2023, Oracle and/or its affiliates Oracle objective is to have WSS client template to be contributed back to AsyncAPI for all to use
  • 16. “Data Producer” using GoldenGate to create AsyncAPI Channels Copyright © 2023, Oracle and/or its affiliates Individual Data Streams consist of a GoldenGate payload, any of 6 possible schema types Data Product Producers
  • 17. GoldenGate generated client code Copyright © 2023, Oracle and/or its affiliates Data Product Consumers Example JavaScript to define and initialize the WebSocket for streaming
  • 18. GoldenGate Data Streams Payload Copyright © 2023, Oracle and/or its affiliates Record consist of before/after images and op_type information (type of transaction) Data Product Consumers
  • 19. Example JSON records Copyright © 2023, Oracle and/or its affiliates Data Product Consumers
  • 20. Options for the service Copyright © 2023, Oracle and/or its affiliates Data Product Consumers • Connection protocol (set by the Producer) • ws – WebSocket or wss – WebSocket Secure • Payload service levels (set by Producer) • Exact-once – GoldenGate will handle all tasks for deduplication of records to guarantee that DML/DDL events are only sent exactly one time • At-most-once – Will tolerate gaps in streaming data records, e.g.; gaps in data that may have been purged by the Producer • At-least-once – Service Producer may from time-to-time re- process source DML/DDL and this SLA may send duplicates • Start position (set by Consumer) • Current – will begin streaming Tx’s from current position • Earliest – will fetch Tx’s starting from earliest available in the GoldenGate Trail (retention is defined by Data Producers) Data Product Producers
  • 21. Roadmap – what’s on the horizon Copyright © 2023, Oracle and/or its affiliates • Formatters • For App Consumers: JSON (default), Avro, XML, Protobuf, etc. • For Analytic Consumers: Parquet, Iceberg, Delta, etc. • CloudEvents payload format option • Adds more overhead, latency, etc. • May help simplify how some clients can parse the transactions • Business object semantics • When integrated with Oracle JSON-Relational duality (producers may choose to share Business Object structure, rather than the physical tables) • Stream processing sink • AsyncAPI channels as output of streaming data pipelines (pipelines enable data integration/prep or analytic actions)
  • 23. Our mission is to help people see data in new ways, discover insights, unlock endless possibilities. Copyright © 2023, Oracle and/or its affiliates