Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
1K views

Kafka Using Spring Boot

Uploaded by

Sajid Raza Rizvi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views

Kafka Using Spring Boot

Uploaded by

Sajid Raza Rizvi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 136

Kafka For Developers

Using
Spring Boot

Dilip Sundarraj
About Me
• Dilip

• Building Software’s since 2008

• Teaching in UDEMY Since 2016


Whats Covered?
• Introduction to Kafka and internals of Kafka

• Building Enterprise standard Kafka Clients using Spring-Kafka/


SpringBoot

• Resilient Kafka Client applications using Error-Handling/Retry/Recovery

• Writing Unit/Integration tests using JUnit


Targeted Audience
• Focused for developers

• Interested in learning the internals of Kafka

• Interested in building Kafka Clients using Spring Boot

• Interested in building Enterprise standard Kafka client applications using


Spring boot
Source Code
Thank You !
Introduction to Apache Kafka
Prerequisites
Course Prerequisites
• Prior Knowledge or Working Experience with Spring Boot/Framework

• Knowledge about building Kafka Clients using Producer and Consumer API

• Knowledge about building RESTFUL APIs using Spring Boot

• Experience working with Spring Data JPA

• Automated tests using JUnit

• Experience Working with Mockito

• Java 11 or Higher is needed

• Intellij , Eclipse or any other IDE is needed


Software Development
Past Current

Order Payment
Service Service
Order Payment
Service Service

Retail App
Inventory Notification
Service Service Inventory Delivery
Service Service

Notification
Service
MicroServices Architecture
API API API
Inventory Cart
Order Service
Service Service

Events Events Events


Interface Interface Interface

Event Streaming Platform

Events Events Events


Interface Interface Interface

Payment Notification Delivery


Service Service Service
API API API
What is an Event Streaming Platform?
• Producers and Consumers subscribe to a stream of records

Streaming
Producer Consumer
platform

• Store stream of Events

1 2 3 4 N
………

• Analyze and Process Events as they occur


Apache Kafka (Event Streaming Platform)
API API API
Inventory Cart
Order Service
Service Service

Events Events Events


Interface Interface Interface

Apache Kafka (Event Streaming Platform)

Events
Interface Events
Interface
Payment
Service Notification Delivery
API Service Service
API
Traditional Kafka
Messaging System Streaming Platform
• Transient Message Persistance • Stores events based on a retention
ddd
time. Events are Immutable

• Brokers responsibility to keep • Consumers Responsibility to keep


track of consumed messages
track of consumed messages

• Target a specific Consumer • Any Consumer can access a


fff
message from the broker

• Not a distributed system • It’s a distributed streaming


system

Kafka Use Cases


Transportation Retail Banking

Sale Notifications
Driver-Rider Notifications Fraud Transactions
RealTime Purchase New Feature/Product
Food Delivery Notifications recommendations
notifications
Tracking Online Order
Deliveries
Kafka Terminology & Client APIs
Kafka
File ProducerAPI
DB System Producers

Kafka Cluster

Source Kafka Kafka


Connector Broker1 Broker2
Kafka Cluster Kafka Streams
Kafka Kafka
Broker3 Broker4
Sink StreamsAPI
Connector

File
DB System
Kafka
Consumers ConsumerAPI
ConnectAPI
Download Kafka
Kafka Topics
&
Partitions
Kafka Topics
• Topic is an Entity in Kafka with a name
Kafka Topics
• Topic is an Entity in Kafka with a name

Kafka Broker

Send to Poll
Kafka Producer TopicA TopicA Kafka Consumer
TopicA

ABC ABC
Topic and Partitions
• Partition is where the message lives inside the topic

• Each Topic will be create with one or more partitions


Topic and Partitions
TopicA - Each Partition is an ordered , immutable
sequence of records
ABC DEF GHI JKL - Each record is assigned a sequential number
Partition 0 0 1 2 3 called offset
………

- Each partition is independent of each other


ABC DEF GHI JKL BOB DAD KIM
Partition 1 0 1 2 3 4 5 6
……… - Ordering is guaranteed only at the partition
level
- Partition continuously grows as new records
are produced
- All the records are persisted in a commit log
in the file system where Kafka is installed
Topics and Partitions
TopicA

0 1 2 3 4 5
Send to
Partition 0
Kafka Producer TopicA

Partition 1 0 1 2 3 4 5 6 7
Setting up
Zookeeper
&
Kafka Broker
Setting up Kafka in Local

Broker registered
with zookeeper
Sending
Kafka Messages
With
Key and Value
Kafka Message
• Kafka Message these sent from producer has two properties

• Key (optional)

• Value
Sending Message Without Key
test-topic

Partition 0 Apple
Send to
Kafka Producer TopicA Partitioner

Partition 1 Adam

Apple Adam
Partition 2 Alpha

Alpha Angel
Partition 3 Angel
Sending Message With Key
test-topic

K:A K: A K: A K: A
Send to Partition 0 V:Apple V: Adam V:Alpha V: Angel
Kafka Producer test-topic Partitioner

K:B K: B K: B K: B
Partition 1 V:Boy V: Ben V:Beta V: Becky
Key: A Key: A
Apple Adam
K:C K: C K: C K: C
Partition 2 V:Cat V: Carl V:Cam V: Cathy
Key: A Key: A
Alpha Angel

Partition 3
Key: B Boy Key: B Ben

Key: B Beta Key: B


Brcky
Consumer Offsets
Consumer Offsets
• Consumer have three options to read
test-topic
• from-beginning

ABC DEF GHI JKL


• latest
Partition 0 0 1 2 3
………
• specific offset ABC DEF GHI JKL BOB DAD KIM
Partition 1 0 1 2 3 4 5 6

ABC DEF GHI JKL BOB DAD KIM


Partition 2 0 1 2 3 4 5 6

ABC DEF GHI JKL BOB DAD KIM


Partition 3 0 1 2 3 4 5 6
Consumer Offsets
test-topic

ABC DEF GHI JKL DAD DFF JKL OPP


Partition 0 0 1 2 3 4 5 6 7

From beginning

group.id = group1 Consumer 1 __consumer_offsets


Consumer Offsets
test-topic

ABC DEF GHI JKL DAD DFF JKL OPP


Partition 0 0 1 2 3 4 5 6 7

From beginning

group.id = group1 Consumer 1 __consumer_offsets


Consumer Offset

• Consumer offsets behaves like a bookmark for the consumer to start


reading the messages from the point it left off.
Consumer Groups
Consumer Groups
• group.id is mandatory

test-topic
• group.id plays a major role when it comes
to scalable message consumption. ABC DEF GHI JKL
Partition 0 0 1 2 3
………
ABC DEF GHI JKL BOB DAD KIM
Partition 1 0 1 2 3 4 5 6
ABC DEF GHI JKL BOB DAD KIM
Partition 2 0 1 2 3 4 5 6
ABC DEF GHI JKL BOB DAD KIM
Partition 3 0 1 2 3 4 5 6

group.id = group1 Consumer 1


Consumer Groups
test-topic

Producer
Partition 0 Partition 1 Partition 2 Partition 3

group.id = group1 Consumer A

P0 P1

P2 P3
Consumer Groups
test-topic

Partition 0 Partition 1 Partition 2 Partition 3

Consumer A Consumer A
group.id = group1 group.id = group1

P0 P1 P2 P3
Consumer Groups
test-topic

Partition 0 Partition 1 Partition 2 Partition 3

Consumer A Consumer A Consumer A Consumer A

group.id = group1 group.id = group1 group.id = group1 group.id = group1

P0 P1 P2 P3
Consumer Groups
test-topic

Partition 0 Partition 1 Partition 2 Partition 3

Consumer A Consumer A Consumer A Consumer A Consumer A

group.id = group1 group.id = group1 group.id = group1 group.id = group1 group.id = group1

P0 P1 P2 P3 Idle
Consumer Groups
test-topic

Partition 0 Partition 1 Partition 2 Partition 3

Consumer A Consumer A Consumer B Consumer B


P0 P1 P0,P1 P1,P2

Consumer A Consumer A
P2 P3
Group2

Group1
Consumer Groups : Summary
• Consumer Groups are used for scalable message consumption

• Each different application will have a unique consumer group

• Who manages the consumer group?

• Kafka Broker manages the consumer-groups

• Kafka Broker acts as a Group Co-ordinator


Commit Log
&
Retention Policy
Commit Log

Kafka Broker

Send to Poll
Kafka Producer TopicA TopicA Kafka Consumer
TopicA

ABC

Bytes

log.dirs=/tmp/kafka-logs File System

00000000000000000000.log
Retention Policy
• Determines how long the message is retained ?

• Configured using the property log.retention.hours in server.properties


file

• Default retention period is 168 hours (7 days)


Kafka
as a
Distributed Streaming
System
Apache Kafka® is a distributed streaming platform
What is a Distributed System?
• Distributed systems are a collection of systems working together to
deliver a value

System 1

Client Client

System 2 System 3
Characteristics of Distributed System
Client
• Availability and Fault Tolerance

• Reliable Work Distribution


System 1

• Easily Scalable

System 2 System 3

• Handling Concurrency is fairly easy

Client System 4
Kafka as a Distributed System

Producer Consumer

Producer Producer Consumer Consumer

Kafka Broker
Producer Producer Consumer Consumer

Producer Producer Consumer Consumer


Kafka as a Distributed System

Producer Consumer Consumer


Producer Kafka Cluster

Kafka Kafka Kafka


Broker Broker Broker Consumer
Producer
1 2 3

Consumer Consumer
Producer Producer
File File File
System System System

- Client requests are distributed between brokers


- Easy to scale by adding more brokers based on the need

- Handles data loss using Replication


SetUp
Kafka Cluster
Using
Three Brokers
Start Kafka Broker

./kafka-server-start.sh ../config/server.properties
Setting up Kafka Cluster
- New server.properties files with the new broker details.

broker.id=<unique-broker-d>
listeners=PLAINTEXT://localhost:<unique-port>
log.dirs=/tmp/<unique-kafka-folder>
auto.create.topics.enable=false(optional)
Example: server-1.properties
broker.id=1
listeners=PLAINTEXT://localhost:9093
log.dirs=/tmp/kafka-logs-1
auto.create.topics.enable=false(optional)
How Kafka Distributes the
Client Requests?
How Topics are distributed?
./kafka-topics.sh -
-create --topic test-topic-replicated
-zookeeper localhost:2181
--replication-factor 3
--partitions 3

Leader of Leader of Leader of


Partition0 Partition1 Partition2

Controller

Partition-0 Partition-1 Partition-2

test-topic-replicated

Broker 1 Broker 2 Broker 3

Kafka Cluster
How Kafka Distributes Client Requests?
Kafka Producer
Partition-0
Kafka Send to
test-topic- Partitioner Partition-1
Producer replicated Leader of Leader of Leader of
Partition-2 Partition0 Partition1 Partition2

ABC GHI

DEF

Partition-0 Partition-1 Partition-2

test-topic-replicated

Broker 1 Broker 2 Broker 3


File System File System File System

Kafka Cluster
How Kafka Distributes Client Requests?
Kafka Consumer
Kafka Cluster
Leader of Leader of Leader of
Partition0 Partition1 Partition2

Poll
test-topic- Kafka Consumer
replicated

Records processed

Partition-0 Partition-1 Partition-2 Successfully

test-topic-replicated

Broker 1 Broker 2 Broker 3


File System File System File System
How Kafka Distributes Client Requests?
Kafka Consumer Groups
Kafka Cluster
Poll Kafka Consumer
test-topic
(partition-0)
group.id=group1

Records processed

Successfully

Poll
test-topic Kafka Consumer
Partition-0 Partition-1 Partition-2 (partition-1) group.id=group1

test-topic Records processed

Successfully

Poll
Broker 1 Broker 2 Broker 3 test-topic Kafka Consumer
(partition-2) group.id=group1
File System File System File System
Records processed

Successfully
Summary : How Kafka Distributes the Client
Requests?
• Partition leaders are assigned during topic Creation

• Clients will only invoke leader of the partition to produce and consume
data

• Load is evenly distributed between the brokers


How Kafka handles Data
Loss ?
How Kafka handles Data loss?
Kafka Cluster
Leader of Leader of Leader of
Partition0 Partition1 Partition2

Kafka Partition-0 Partition-1 Partition-2 Kafka


Producer Consumer

test-topic-replicated

Broker 1 Broker 2 Broker 3

File System File System File System


Replication
./kafka-topics.sh -
-create --topic test-topic-replicated
-zookeeper localhost:2181
--replication-factor 3
--partitions 3
Replication
Kafka Cluster
Leader of Leader of Leader of
Partition0 Partition1 Partition2

Replication factor = 3
Partition-0 Partition-1 Partition-2

Leader Replica
Kafka Partition-1 Partition-0 Partition-0
Producer (Follower) (Follower) (Follower)

Follower Replica Follower Replica


Partition-2 Partition-2 Partition-1
(Follower) (Follower) (Follower)

Broker 1 Broker 2 Broker 3

File System File System File System


Replication
Kafka Cluster
Leader of Leader of
Leader of Leader of
Partition0 Partition1
Partition
Partition 0&1 Partition2

Partition-0 Partition-1 Partition-2

Kafka Partition-1 Partition-0 Partition-0 Kafka


Producer (Follower) (Follower) (Follower) Consumer

Partition-2 Partition-2 Partition-1


(Follower) (Follower) (Follower)

Broker 1 Broker 2 Broker 3

File System File System File System


In-Sync Replica(ISR)
• Represents the number of replica in sync with each other in the cluster

• Includes both leader and follower replica

• Recommended value is always greater than 1

• Ideal value is ISR == Replication Factor

• This can be controlled by min.insync.replicas property

• It can be set at the broker or topic level


Fault Tolerance
&
Robustness
Application Overview
Library Inventory
Library Inventory Flow
Library Inventory

Librarian
Library Inventory Architecture
MicroService 1
Librarian

Kafka
API Producer

Library Event Producer

Library-events

MicroService 2

Kafka H2
Consumer (In-memory)

Library Event Consumer


Library Inventory Architecture
Librarian MicroService 1

Kafka
API Producer

Library Events Producer

Library-events

MicroService 2

Kafka H2
Consumer (In-memory)

Library Events Consumer


Library Event Domain
Librarian MicroService 1

Library Event Kafka


API Producer

Library Events Producer

Library-events

MicroService 2

Kafka H2
Consumer (In-memory)

Library Events Consumer


Library Event Domain
Librarian MicroService 1

Library Event Kafka


API Producer

Library Events Producer

Library-events

MicroService 2

Kafka H2
Consumer (In-memory)

Library Events Consumer


Library Events Producer API

Librarian MicroService 1

POST Kafka
Producer
New Book API

PUT
Update Book
Library Events Producer

Library-events
KafkaTemplate
Kafka Producer in Spring
KafkaTemplate

• Produce records in to Kafka Topic

• Similar to JDBCTemplate for DB


How KafkaTemplate Works ?

KafkaTemplate.send() Library-events
KafkaTemplate.send()

Behind the Scenes


RecordAccumulator

KafkaTemplate RecordBatch batch.size

Send() Serializer Partitioner


RecordBatch batch.size buffer.memory

key.serializer DefaultPartitioner linger.ms


RecordBatch batch.size
value.serializer Library-events
Configuring KafkaTemplate

Mandatory Values:
bootstrap-servers: localhost:9092,localhost:9093,localhost:9094
key-serializer: org.apache.kafka.common.serialization.IntegerSerializer
value-serializer: org.apache.kafka.common.serialization.StringSerializer
KafkaTemplate AutoConfiguration

application.yml

spring:
profiles: local
kafka:
producer:
bootstrap-servers: localhost:9092,localhost:9093,localhost:9094
key-serializer: org.apache.kafka.common.serialization.IntegerSerializer
value-serializer: org.apache.kafka.common.serialization.StringSerializer
Library Inventory Architecture
Librarian MicroService 1

Library Event Kafka


API Producer

Library Events Producer

Library-events

MicroService 2

Kafka H2
Consumer (In-memory)

Library Events Consumer


KafkaAdmin
• Create topics Programmatically

• Part of the SpringKafka

• How to Create a topic from Code?

• Create a Bean of type KafkaAdmin in SpringConfiguration

• Create a Bean of type NewTopic in SpringConfiguration


Introduction
To
Automated Tests
Why Automated Tests ?
• Manual testing is time consuming

• Manual testing slows down the development

• Adding new changes are error prone


What are Automated Tests?
• Automated Tests run against your code base

• Automated Tests run as part of the build

• This is a requirement for todays software development

• Easy to capture bugs

• Types of Automated Tests:

• UnitTest

• Integration Tests

• End to End Tests


Tools for Automated
• JUnit

• Spock
Integration Tests
Using
JUnit5
What is Integration Test?
• Test combines the different layers of the code and verify the behavior is
working as expected.

MicroService 1

Controller Kafka Producer


Library-events

1 2

Library Events Producer 3


Integration Test

MicroService 1

Controller Kafka Producer


Client Library-events

Library Events Producer


Embedded Kafka
What is EmbeddedKafka?
• In-Memory Kafka

• Integration Tests can interact with EmbeddedKafka


Library Events Producer

Embedded
LibraryControllerIntegrationTest
Kafka
Why Embedded Kafka ?
Library Events Producer

• Easy to write Integration Tests

Embedded
• Test all the code as like you interact with LibraryControllerIntegrationTest
Kafka
Kafka
Unit Tests
Using
JUnit5
What is Unit Test?
• Test the just focuses on a single unit (method)

• Mocks the external dependecies

MicroService 1

Controller Kafka Producer


Library-events

Library Events Producer


What is Unit Test?
• Test the just focuses on a single unit (method)

• Mocks the external dependencies

MicroService 1

Controller Mock
Kafka Producer

Mockito

Library Events Producer


Why Unit Test?
• Unit Tests are handy to mock external dependencies

• Unit Tests are faster compared to Integration tests

• Unit Tests cover scenarios that’s not possible with Integration tests
Library Events Producer API

Librarian MicroService 1

POST Kafka
Producer
New Book API

PUT
Update Book
Library Events Producer

Library-events
PUT - “/v1/libraryevent”
• libraryEventId is a mandatory field

{
"libraryEventId": 123,
"eventStatus": null,
"book": {
"bookId": 456,
"bookName": "Kafka Using Spring Boot",
"bookAuthor": "Dilip"
}
}
Kafka Producer Configurations
Kafka Producer Configurations
• acks

• acks = 0, 1 and all

• acks = 1 -> guarantees message is written to a leader ( Default)

• acks = all -> guarantees message is written to a leader and to all the
replicas

• acks=0 -> no guarantee (Not Recommended)


Kafka Producer Configurations
• retries

• Integer value = [0 - 2147483647]

• In Spring Kafka, the default value is -> 2147483647

• retry.backoff.ms

• Integer value represented in milliseconds

• Default value is 100ms


Library Events Consumer
Librarian MicroService 1

Library Event Kafka


API Producer

Library-events
Library Events Producer

MicroService 2

Kafka H2
Consumer (In-memory)

Library Events Consumer


Spring Kafka Consumer
Kafka Consumer

Poll
library-events Kafka Consumer

Records processed

Successfully

Library-events
Spring Kafka Consumer
• MessageListenerContainer

• KafkaMessageListenerContainer

• ConcurrentMessageListenerContainer

• @KafkaLisener Annotation

• Uses ConcurrentMessageListenerContainer behind the scenes


KafkaMessageListenerContainer
• Implementation of MessageListenerContainer

• Polls the records

• Commits the Offsets

• Single Threaded
ConcurrentMessageListenerContainer

Represents multiple KafkaMessageListenerContainer


@KafkaListener
• This is the easiest way to build Kafka Consumer

• KafkaListener Sample Code

@KafkaListener(topics = {"${spring.kafka.topic}"})
public void onMessage(ConsumerRecord<Integer, String> consumerRecord) {
log.info("OnMessage Record : {} “, consumerRecord);
}

• Configuration Sample Code


@Configuration
@EnableKafka
@Slf4j
public class LibraryEventsConsumerConfig {
KafkaConsumer Config

key-deserializer: org.apache.kafka.common.serialization.IntegerDeserializer

value-deserializer: org.apache.kafka.common.serialization.StringDeserializer

group-id: library-events-listener-group
Consumer Groups
&
Rebalance
Consumer Groups

Multiple instances of the same application with the same group id.
Rebalance
• Changing the partition ownership from one consumer to another

Library-events -consumer

group.id : Library-events-listener-group
Library-events
P0 P1 P2
Rebalance
• Changing the partition ownership from one consumer to another

Library-events -consumer

group.id : Library-events-listener-group
Library-events
P0 P2
Group-Co-ordinator

Triggers Library-events -consumer


Rebalance
group.id : Library-events-listener-group

P1
Committing Offsets

Poll
library-events Kafka Consumer

Records processed

Successfully

Library-events

__consumer_offsets Offsets Committed


Library Events Consumer
Library-events

MicroService 2

Kafka Consumer H2
(In-memory)

Library Events Consumer


Integration Testing
For
Real DataBases
Integration Testing using Real Databases
• Different aspects of writing unit
and integration testing

• Integration testing using


TestContainers
TestContainers
• What are TestContainers?

• Testcontainers is a Java library that supports JUnit tests, providing


lightweight, throwaway instances of common databases, Selenium
web browsers, or anything else that can run in a Docker container.

• More Info about TestContainers - https://www.testcontainers.org/


Retry in Kafka Consumer
Error in Kafka Consumer
Retry in Kafka Consumer
Recovery in Kafka Consumer
Recovery in Kafka Consumer

Exhausted

RECOVERY
Retry and Recovery
Recovery - Type 1

Exhausted
Recovery - Type 2

Exhausted
Issues with Recovery ?

• Recovery can alter the order of events


Recovery - Type 1

Producer

Exhausted
Error Handling
in
Kafka Producer
Library Events Producer API

Librarian MicroService 1

POST Kafka
Producer
New Book API Library-events

PUT
Update Book
Library Events Producer
Kafka Producer Errors
• Kafka Cluster is not available

• If acks= all , some brokers are not available

MicroService 1
• min.insync.replicas config
POST Kafka
• Example : min.insync.replicas = 2, But Producer
only one broker is available API

PUT Library-events

Library Events Producer


min.insync.replicas

min.insync.replicas = 2

Kafka Cluster

Broker 1 Broker 2 Broker 2


Retain/Recover
Failed
Records
Retain/Recover Failed Records
Retain/Recover Failed Records

You might also like