Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
160 views

Chapter 1 - Introduction To KAFKA: Objectives

This chapter introduces Kafka and provides an overview of key concepts. It discusses microservices architecture and how messaging fits into it. Kafka is an open-source messaging system that allows publishing and subscribing to streams of records. It is highly scalable, fault-tolerant, and very fast. The chapter covers Kafka's architecture, including topics that messages are published to, producers that publish messages, consumers that subscribe to topics, and brokers that manage the data.

Uploaded by

Suchismita Sahu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
160 views

Chapter 1 - Introduction To KAFKA: Objectives

This chapter introduces Kafka and provides an overview of key concepts. It discusses microservices architecture and how messaging fits into it. Kafka is an open-source messaging system that allows publishing and subscribing to streams of records. It is highly scalable, fault-tolerant, and very fast. The chapter covers Kafka's architecture, including topics that messages are published to, producers that publish messages, consumers that subscribe to topics, and brokers that manage the data.

Uploaded by

Suchismita Sahu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Chapter 1 - Introduction to KAFKA

Objectives
Key objectives of this chapter
 What is Microservices?
 Messaging Architectures
 What is Kafka?
 Need for Kafka
 Where is Kafka useful?
 Architecture
 Core concepts in Kafka
 Overview of ZooKeeper
 Cluster, Kafka Brokers, Producer, Consumer, Topic

1.1 Microservices
 Small, autonomous services which work well together.
 Being able to change individual components independently.
 Independent processes
 Communicate over APIs, rather than using databases directly
 High degree of autonomy
 Small, focused on doing one thing well
 A form of SOA. Typical SOA-based applications used to be monolithic.
 Microservices concept facilitates in adopting Agile Software Development.

1.2 Microservices vs Classic SOA


SOA Microservices
XML JSON
Complex to integrate Easy to integrate
Chapter 1 - Introduction to KAFKA

SOA Microservices
Heavy Lightweight
HTTP/SOAP HTTP/REST

1.3 Traditional Enterprise Application Architecture


 Classical architecture
 Typical 3 layers:
◊ client-side UI (Browser, HTML + JS)
◊ a database (RDBMS, NoSQL …)
◊ server-side application (Java, .NET, PHP, …)
 Any changes to the system involve building and deploying a new version
of the application. Changes are expensive.
 Scaling requires scaling of the entire application, rather than parts of it that
require greater resource.
 Long release cycles.

2
Chapter 1 - Introduction to KAFKA

1.4 Sample Microservices Architecture


 Applications naturally start as Monoliths, they scale and evolve to
Microservice architecture
 Applications are decomposed to components – smaller independent
service applications.
 Components are loosely coupled.

1.5 Microservices Architecture – Pros


 Multiple developers and teams can deliver relatively independently of each
other
 Can be written in different programming languages
 Can be managed by different teams
 Can use different data storage technologies
 Centralized management is minimal
 Independently deployable by fully automated deployment machinery
 Works well with Continuous Delivery
 Allows frequent releases while keeping the rest of the system available
and stable

3
Chapter 1 - Introduction to KAFKA

1.6 Messaging Architectures – What is Messaging?


 Application-to-application communication
 Supports asynchronous operations.
 Message:
 A message is a self-contained package of business data and network
routing headers.

1.7 Messaging Architectures – Steps to Messaging


 Messaging connects multiple applications in an exchange of data.
 Messaging uses an encapsulated asynchronous approach to exchange
data through a network.
 A traditional messaging system has two models of abstraction:
◊ Queue – a message channel where a single message is received
exactly by one consumer in a point-to-point message-queue pattern. If
there are no consumers available, the message is retained until a
consumer processes the message.
◊ Topic - a message feed that implements the publish-subscribe pattern
and broadcasts messages to consumers that subscribe to that topic.
 A single message is transmitted in five steps:
◊ Create
◊ Send

4
Chapter 1 - Introduction to KAFKA

◊ Deliver
◊ Receive
◊ Process

1.8 Messaging Architectures – Messaging Models


 1. Point to Point
 2. Publish and Subscribe

1.9 What is Kafka?


 In modern applications, real-time information is continuously generated by
applications (publishers/producers) and routed to other applications
(subscribers/consumers)
 Apache Kafka is an open source, distributed publish-subscribe messaging
system.
 Kafka allows integration of information of producers and consumers to
avoid any kind of rewriting of an application at either end.
 Kafka provides overcomes the challenges of real-time data usage for
consumption of data volumes that may grow in order of magnitude, larger
than the real data.

5
Chapter 1 - Introduction to KAFKA

 Kafka also supports parallel data loading in the Hadoop systems.

1.10 What is Kafka? (Contd.)


 Kafka is a unique distributed publish-subscribe messaging system written
in the Scala language with multi-language support and runs on the Java
Virtual Machine (JVM).
 Kafka relies on another service named Zookeeper – a distributed
coordination system – to function.
 Kafka has high-throughput and is built to scale-out in a distributed model
on multiple servers.
 Kafka persists messages on disk and can be used for batched
consumption as well as real-time applications.

1.11 Kafka Overview


 When used in the right way and for the right use case, Kafka has unique
attributes that make it a highly attractive option for data integration.

 Data Integration is the combination of technical and business processes


used to combine data from disparate sources into meaningful and valuable
information.
 A complete data integration solution encompasses discovery, cleansing,
monitoring, transforming and delivery of data from a variety of sources
 Messaging is a key data integration strategy employed in many distributed
environments such as the cloud.

6
Chapter 1 - Introduction to KAFKA

 Messaging supports asynchronous operations, enabling you to decouple a


process that consumes a service from the process that implements the
service.

1.12 Kafka Overview (Contd.)

1.13 Need for Kafka


 High Throughput
◊ Provides support for hundreds of thousands of messages with modest
hardware
 Scalability
◊ Highly scalable distributed systems with no downtime
 Replication

7
Chapter 1 - Introduction to KAFKA

◊ Messages can be replicated across a cluster, which provides support


for multiple subscribers and also in case of failure balances the
consumers
 Durability
◊ Provides support for persistence of messages to disk which can be
further used for batch consumption
 Stream Processing
◊ Kafka can be used along with real-time streaming applications like
spark, flink, and storm
 Data Loss
◊ Kafka with proper configurations can ensure zero data loss

1.14 Kafka Architecture

1.15 Core concepts in Kafka


 Topic

8
Chapter 1 - Introduction to KAFKA

◊ A category or feed to which messages are published


 Producer
◊ Publishes messages to Kafka Topic
 Consumer
◊ Subscribes and consumes messages from Kafka Topic
 Broker
◊ Handles hundreds of megabytes of reads and writes

1.16 Kafka Topic


 User defined category where the messages are published
 For each topic, a partition log is maintained
 Each partition basically contains an ordered, immutable sequences of
messages where each message assigned a sequential ID number called
offset
 Writes to a partition are generally sequential thereby reducing the number
of hard disk seeks
 Reading messages from partition can either be from the beginning and
also can rewind or skip to any point in a partition by supplying an offset
value

9
Chapter 1 - Introduction to KAFKA

1.17 Kafka Producer


 Application publishes messages to the topic in Kafka Cluster
 Can be of any kind like Front End, Streaming etc.
 While writing messages, it is also possible to attach a key to the message
 By attaching key the producers basically provide a guarantee that all
messages with the same key will arrive in the same partition
 Supports both async and sync modes
 Publishes as many messages as fast as the broker in a cluster can handle

10
Chapter 1 - Introduction to KAFKA

1.18 Kafka Consumer


 Application subscribes and consumes messages from brokers in Kafka
Cluster
 Can be of any kind like real-time consumers, NoSQL consumers etc.
 During consumption of messages from a topic a consumer group can be
configured with multiple consumers.
 Each consumer of consumer group reads messages from a unique subset
of partitions in each topic they subscribe to
 Messages with the same key arrive at the same consumer
 Supports both Queuing and Publish-Subscribe
 Consumers have to maintain the number of messages consumed

11
Chapter 1 - Introduction to KAFKA

1.19 Kafka Broker


 Kafka cluster basically comprised of one or more servers
 Each of the servers in the cluster is called a broker
 Handles hundreds of megabytes of writes from producers and reads from
consumers
 Retains all the published messages irrespective of whether it is consumed
or not
 If retention is configured for n days, then messages once published, it is
available for consumption for configured n days and thereafter it is
discarded

12
Chapter 1 - Introduction to KAFKA

1.20 Kafka Cluster


 A Kafka Cluster is generally fast, highly scalable messaging system
 A publish-subscribe messaging system
 Can be used effectively in place of ActiveMQ, RabbitMQ, Java Messaging
System (JMS), and Advanced Messaging Queuing Protocol (AMQP)
 Can be integrated with Hadoop Ecosystem
 Expanding of the cluster can be done with ease
 Effective for applications which involve large-scale message processing

13
Chapter 1 - Introduction to KAFKA

1.21 Why Kafka Cluster?


 Kafka is preferred in place of more traditional brokers like JMS and
AMQP?
◊ With Kafka, we can easily handle hundreds of thousands of messages
in a second, which makes Kafka a high throughput messaging system
◊ The cluster can be expanded with no downtime, making Kafka highly
scalable
◊ Messages are replicated, which provides reliability and durability
◊ Fault-tolerant

1.22 Sample Multi-Broker Cluster

14
Chapter 1 - Introduction to KAFKA

1.23 Overview of ZooKeeper


 An open source Apache project
 Provides a centralized infrastructure and services that enable
synchronization across a cluster
 Common objects used across the large cluster environments are
maintained in Zookeeper
 Objects such as configuration, hierarchical naming space etc. are
maintained in Zookeeper
 Zookeeper services are used by large scale applications to coordinate
distributed processing across large clusters

1.24 Kafka Cluster & ZooKeeper

1.25 Kafka Integration


 Databases: MongoDB/CosmosDB/CouchDB/Oracle
 Big Data: Hadoop, Spark

15
Chapter 1 - Introduction to KAFKA

 Logging: Logstash (ELK stack)


 IoT

1.26 Who Uses Kafka?

16
Chapter 1 - Introduction to KAFKA

1.27 Courses
 WA2708 – Kafka for Application Modernization
 WA2684 – Developing Microservices

1.28 Summary
 Kafka is a unique distributed publish-subscribe messaging system written
in the Scala language with multi-language support and runs on the Java
Virtual Machine (JVM).
 Kafka relies on another service named Zookeeper – a distributed
coordination system – to function.
 Kafka has high-throughput and is built to scale-out in a distributed model
on multiple servers.
 Kafka persists messages on disk and can be used for batched
consumption as well as real-time applications.

17

You might also like