Introduction To Confluent Components
Introduction To Confluent Components
Introduction To Confluent Components
Components
Customer Success Engineering
June 2022
Agenda
1. Confluent Platform
What components make up the Confluent Platform?
2. Kafka Concepts
Events, Distributed Commit Log, Event Streaming/Processing
4. Additional Features
Multi-Region Clusters, Tiered Storage, Cluster Linking and Self Balancing clusters
5. Deployment
How can I deploy the Confluent platform?
Copyright 2022, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 2
1. Confluent Platform
Motivation
4
Destination
What is the Confluent Platform?
6
Confluent Platform Components
Application Microservice Schema Registry Kafka Connect
Leader Follower
Worker + Worker +
Sticky Load Balancer Connectors Connectors
ksqlDB
REST Proxy ksqlDB ksqlDB
Server Server
Proxy Proxy Application
Clients
Kafka Brokers
Broker + Broker + Broker + Broker + KStreams
Rebalancer Rebalancer Rebalancer Rebalancer pp
Streams
ZooKeeper Nodes
Confluent
ZK ZK ZK ZK ZK Control Center
https://www.confluent.io/whitepaper/confluent-enterprise-reference-architecture/
7
Confluent completes Apache Kafka
DEVELOPER OPERATOR ARCHITECT EXECUTIVE
Apache Kafka
Cloud service
Software
Enterprise Professional
Support Services Committer-driven Expertise Training Partners
8
Confluent Platform: Features and Licensing
Features Licensing
Open Source features
Apache Kafka® (with Connect & Streams) Apache 2.0 License
Apache ZooKeeper™
Non-Java Clients Free. Unlimited Kafka brokers
Ansible Playbooks Community support
Community features
Pre-built Connectors Confluent Community License
REST Proxy
ksqlDB Free. Unlimited Kafka brokers
Schema Registry Community support
Commercial features Enterprise License (paid)
Pre-built Connectors
Control Center ● Annual subscription
Health+
● 24x7 Confluent support
Confluent for Kubernetes
Replicator
Secret Protection
Developer License Evaluation License
Auto Data Balancer
MQTT Proxy ● Free ● Free 30-day trial
Role-Based Access Control ● Limited to 1 Kafka broker ● Unlimited Kafka brokers
Structured Audit Logs
● Community support ● Community support
Schema Validation
Confluent Server Self-Balancing Clusters
Tiered Storage
Multi-Region Clusters
Cluster Linking (preview) Best-effort Confluent support
2. Kafka Concepts
Apache Kafka is a Distributed Commit Log
110101
Store streams of events 010111
001101
In a fault tolerant way
100010
110101
010111
Process streams of events 001101 In real time, as they occur
and produce new ones 100010
11
Anatomy of a Kafka Topic
Partition 0 1 2 3 4 5 6 7 8 9 10 11 12
Partition 1 1 2 3 4 5 6 7 8 9 Writes
Partition 2 1 2 3 4 5 6 7 8 9 10 11 12
Old New
Consumer A Consumer B
Producers
(offset=4) (offset=7)
Reads Writes
1 2 3 4 5 6 7 8 9 10 11 12
03. Confluent Platform
Components
Brokers & Zookeeper
Apache Kafka: Scale Out Vs. Failover
15
Apache Zookeeper - Cluster coordination
partition 2
P
partition 3
partition 4
Data is sent in batch per partition and bundled into a request for the broker.
Can configure compression.type, batch.size, linger.ms and acks.
18
Producer
Broker 1
Replica 1
Broker 2
P Replica 2
Broker 3
Replica 3
acks=all
min.insync.replica=2
replication.factor=3
Consumer
Partition 1
commit
offset
heartbeat
Partition 2
C
Partition 3
poll records
Partition 4
20
Consumers - Consumer group members
C C
C C
Within the same application (consumer
group), different partitions can be
assigned to different consumers to
increase parallel consumption as well as 21
support failover
Consumers - Consumer Groups
CC
C1
CC
C2
Different applications can
independently read from same
topic partitions at their own pace 22
Make Kafka Confluent Clients
Widely Accessible Battle-tested and high performing
producer and consumer APIs (plus
to Developers admin client)
23
REST Proxy
Connect Any REST Proxy
Application to Kafka
Non-Java
Applications
Provides a RESTful
REST / HTTP
interface to a Kafka cluster
REST Proxy
Communicate via
HTTP-connected devices
Schema Registry
• Works transparently
When used with Confluent Kafka clients, Kafka REST Proxy, and Kafka Streams
28
Schema Registry: Key
Features Schema Validation
2. Error
message
1. Invalid
Scale schemas reliably schema confluent.value.schema.validation=true
Schema
Registry
Kafka Connect
No Code connectivity to many systems
Kafka Connect
No-Code way of connecting known systems (databases, object storage, queues, etc)
to Apache Kafka
Some code can be written to do custom transforms and data conversions though
maybe out of the box Single Message Transforms and Converters exist
Data Data
sources sinks
Kafka Connect Kafka Connect
31
Instantly Connect Popular Data Sources & Sinks
Data Diode
190+
pre-built
connectors
Pipelines
Confluent HUB
Easily browse connectors by:
• Source vs Sinks
• Confluent vs Partner supported
• Commercial vs Free
• Available in Confluent Cloud
confluent.io/hub
Kafka Streams
Build apps which with stream processing inside
Stream Processing by Analogy
Kafka Cluster
36
36
Stream Processing in Kafka
Flexibility
Simplicity
37
Where does the processing code run?
38
ksqlDB
Stream processing using SQL and much more
Stream Processing in Kafka
Flexibility Simplicity
40
Can I do stream processing with SQL?
41
Can I do stream processing with SQL?
What if I could describe an anomaly detector in SQL and have it write the
results to a topic? You can do that too!
For Operators
Centrally manage and monitor multi-cluster
environments and security.
For Developers
View messages, topics and schemas,
manage connectors and build ksqlDB
queries
45
Adhere to Established
Event Streaming SLAs Broker overview
Cluster Linking
Enables the direct connection of clusters to mirror
topics between them
49
Self Balancing Clusters
Rebalances are required regularly to
optimize cluster performance:
Self Balancing Clusters
Uneven
load
Self-Balancing Clusters automate
partition rebalances to improve Kafka’s
performance, elasticity, and ease of
operations
Expansion
Shrinkage
51
Manual Rebalance Process:
$ kafka-reassign-partitions ...
Confluent Platform:
Self-Balancing
Micro-
Splunk SFDC ... App
service ...
Platform elasticity
Scale compute and storage independently
Multi Region Clusters
automated
Change the game for disaster recovery for Kafka Client A Client B Client A Client B
Minimal downtime:
• Automated client failover Single Kafka Cluster
Streamlined DR operations
• Leverages Kafka’s internal replication Broker
w-1
Broker
w-2
Broker
w-3
Broker
e-4
Broker
e-5
Broker
e-6
• No separate Connect clusters
Single multi-region cluster with high write Broker
w-4
Broker
w-5
Broker
w-6
Broker
e-1
Broker
e-2
Broker
e-3
throughput
• Asynchronous replication using “Observer” ZK1 ZK2
Observer
replicas replicas
Low bandwidth costs and high read throughput us-west-1 us-east-1
• Remote consumers read data locally, directly
from Observers
ZK3
“tie-breaker”
us-central-1
datacenter
Cluster Linking
Sharing data between independent
clusters or migrating clusters presents
Cluster Linking two challenges:
2.
2 Offsets are not preserved, so messages
are at risk of being skipped or reread
Topic 1, DC 1: 0 1 2 3 4 ...
Topic 1, DC 2: 4 5 6 7 8 ... 60
Cluster Linking requires no additional
infrastructure and preserves offsets:
Cluster Linking
Migrate
clusters to
Confluent
Cloud
61
5. Deployment
Confluent Platform Deployment Options
Confluent Platform
63
Deploy to Confluent
Cloud Terraform
Infrastructure as code
done right
Benefit from:
• Industry Standard
• Human Readable Configuration
• Manage Critical Confluent Cloud
Resources
• Consistent Deployability
• Multi-Cloud With Ease
• Scale Quickly
Accelerate Deployment
to Production Production-ready
Non-containerized Ansible Playbooks
environments
Deploy a complete event streaming
platform at scale:
• Kafka Brokers & Zookeeper
• Kafka Connect
• REST Proxy
• ksqlDB
• Control Center
• Schema Registry
• Schema Validation
• RBAC
Simplify Day-to-Day
Operations
Confluent for Kubernetes: Automated
rolling upgrades
Deploy a production-ready
Perform automated rolling upgrades after a Confluent
event streaming platform in Platform version, configuration, or resource update
minutes without impacting Kafka availability
Questions?
67