Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka

Getting to Know Kafka
Welcome to Kafka, a platform to build real-time streaming applications. Kafka - Premiera

Ola is the first course in the series of courses covering all the aspects of Kafka.
This course will walk you through the following concepts:
 Overview of Kafka & its fundamental components.

 Publish-Subscribe messaging workflow
 Kafka as a messaging system
 Kafka as a storage system
 Stream Processing using Kafka
 Final Assessment
How Kafka Came to be ..

During the initial days of big data, the focus was mainly on batch processing. In batch
processing, applications would be run daily or weekly to load and analyze data from big data
stores.
More recently, businesses have an increased need for handling real-time data feeds, i.e.
analyzing and processing data and events as they happen.
To meet these demands, engineers at LinkedIn developed and open sourced Kafka, a

stream processing platform which scales on commodity hardware.
 Kafka is a publish-subscribe messaging system, written in Scala and Java, that is fast,

distributed and durable.
 Kafka is fault-tolerant and enables you to build distributed applications that scale on

commodity hardware.
What is a Messaging System?

Before we explore Kafka, let's understand the types of messaging systems.
A messaging system is a medium that allows data transfer from one application to another
so that the applications can focus on data without worrying about how to share it.
The two types of messaging patterns are:
 Point to Point Messaging System

 Publish-Subscribe Messaging System.
Point to Point Messaging System

In Point to Point messaging system, senders send messages to a queue and receivers
consume messages from the queue.
But there is a restriction that a particular message can be consumed by a maximum of only
one receiver.
The message disappears from the queue, once the message is consumed by the receiver.
Publish-Subscribe Messaging System

In Publish-Subscribe messaging system, senders, also known as publishers, classifies
messages and publish them to a topic. Receivers, or subscribers, can receive messages
only on subscribing to that topic.
Unlike point to point, in Publish-Subscribe messaging system,
 A message on a topic is broadcast to all subscribed consumers.

 Consumers can subscribe to multiple topics to receive messages.
What is Kafka?
Apache Kafka is a distributed publish-subscribe messaging system used for collecting and
delivering high volumes of data with low latency, similar to a traditional message broker.
Apache Kafka was originated at LinkedIn and became an open source project in 2011. Scala
and Java are used to develop Kafka.
How Kafka Works
 Kafka stores messages coming from multiple applications called producers.
 The messages get partitioned into partitions and based on some classification written

to different topics.
 The messages in a partition are indexed and stored with a timestamp.
 Multiple applications called consumers polls messages from partitions.

 Kafka runs as a cluster of servers.
 Each topic partition is replicated over the nodes of the Kafka Cluster.
 TCP protocol is used for communication between clients and Kafka nodes.
We can see this in detail on the upcoming topics.
What is Kafka Used For?

Kafka is used for :
 Building real-time streaming pipelines that move data between different applications.

 Building real-time streaming applications that are capable of processing streams of
data.
 Building a fault tolerant storage system that stores streams of records.
We will be discussing the above points in detail in the upcoming topics.
Benefits of Kafka
 Reliability: Kafka's distributed design, topic partitioning, and data replication over
servers make it reliable.
 Scalability: Kafka system exists as a cluster of brokers. The number of brokers can
grow over time when more data comes. Any failure of an individual broker in a cluster
is handled by the system providing uninterrupted service.
 Durability: Disk-based data retention makes Kafka durable. Messages remain on the
disk based on the retention rule configured on a per-topic basis. Even if a consumer
falls backs due to any reason, the data continue to reside in the Broker till the retention
period and is not lost.
 High-Performance: All the above features make Kafka a High-Performance

messaging system.
Kafka for Messaging
1
 Kafka plays well as a traditional message broker.
 When compared to other messaging systems, Kafka
has better performance, inbuilt partitioning, fault-
tolerance, and replication.
Website Activity Tracking
 Kafka is used for tracking user activities on

websitesthat include login activities, page views,
clicks, likes, sharing, comments, and searches.
 Site activities are published to topics with one activity
per topic.
 These topics are available for subscription for real-
time processing and monitoring applications.
Operational Metrics
 Kafka is used for operational monitoring of data.

 These include aggregation of operational data such
as service call stacks, errors, call latency, disk
utilization, CPU utilization, memory and network
utilization.
Log Aggregation
4
 This includes collecting log files from the server
and saving them in a central file system.
 Kafka reads those data and creates more abstract
data as a stream of messages and make them
available in a standard format for consumers.
Stream Processing
5
Kafka subscribes data from a topic, process, and writes

the processed data to another new topic which is then
made available to users.
Some of the use cases of stream processing are as
follows :
1. Travel companies can build applications using Kafka
Streams API to make real-time decisions like determining
best packages and pricing for individual customers and
make reservations.
2. Banking industry can leverage Kafka Streams for
detecting fraudulent transactions.
3. Automotive and manufacturing companies can
gain real-time insights of their supply chains and monitor
telematics data from connected cars.
Kafka Installation on Ubuntu

Following are the steps to install Kafka on an Ubuntu.
Step 1 — Create a User for Kafka

 To reduce any chances of the Kafka server being compromised, we need to create a
dedicated user.
 Login as root user, and create a user called kafka using the useradd command :
 useradd kafka -m
 Set its password using passwd command:

 passwd kafka
 Add the newly created user kafka to the sudo group to provide all the privileges
required to install Kafka's dependencies. You can add kafka to the sudo group using
the adduser command:
 adduser kafka sudo
 Your Kafka user is now ready. Log into it using su:
su - kafka
Installing Prerequisites for Kafka ...

Step 2 — Install Java
 Update the list of available packages, before installing additional packages so that you are
installing the latest version available in the repository. Type the following command:
sudo apt-get update
 Apache Kafka needs a Java runtime environment. Install the default-jre package using apt-get
command. Type the following command:
sudo apt-get install default-jre
Installing Prerequisites...
Step 3 — Install ZooKeeper
 Install Zookeeper package which is available in Ubuntu's default repositories. Type the
following command:
sudo apt-get install zookeeperd
 Once the installation is completed, ZooKeeper will be started automatically as a daemon.

Zookeeper will listen on port 2181 by default. To ensure it works, you can connect to it via
Telnet:
telnet localhost 2181
 At the Telnet prompt, type in ruok and press ENTER. If everything's fine, ZooKeeper will
say imok and end the Telnet session.
Downloading and Installing Kafka

Step 4 — Downloading and Extracting Kafka Binaries
Now that we have installed all prerequisites, it is time to download and extract Kafka. We can
start by creating a directory called Downloads to store all downloads.
mkdir -p ~/Downloads
Download the Kafka binaries using wget.
wget "http://www-eu.apache.org/dist/kafka/0.11.0.1/kafka_2.11-0.11.0.1.tgz" -O
~/Downloads/kafka.tgz
Create the base directory kafka for Kafka installation and change to this directory.
mkdir -p ~/kafka && cd ~/kafka
Extract the archive you have downloaded using the tar command.
tar -xvzf ~/Downloads/kafka.tgz --strip 1
Configuring Kafka Server

Step 5 — Configuring Kafka Server :
Open the file server.properties using vi editor. Type the following command to open the vi
editor.
vi ~/kafka/config/server.properties
The default configuration of Kafka doesn't allow topic deletion. We need to configure Kafka for
that. You can add the following line of code at the end of the
file ~/kafka/config/server.properties
delete.topic.enable = true
Save the file, and exit vi.
Starting Kafka Server

Step 6 — Starting the Kafka Server
You can start the Kafka server (also known as kafka broker) as a background process
independent of your shell session using nohup using the following command:
nohup ~/kafka/bin/kafka-server-start.sh ~/kafka/config/server.properties >

~/kafka/kafka.log 2>&1 &
Wait for a few seconds. Once the server starts, you will see the following messages
in ~/kafka/kafka.log:
[2015-07-29 06:02:41,736] INFO New leader is 0
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2015-07-29 06:02:41,776] INFO [Kafka Server 0], started (kafka.server.KafkaServer)
Now you have a Kafka server which is listening on port 9092. You may use
cat ~/kafka/kafka.log | grep port
to check the port.
Fundamental Components
Before getting deep into Kafka, we must have an understanding on some of the frequently
used terms in Kafka, which are as follows :
 Topics and Partitions

 Producer and Consumer
 Broker
 Zookeeper
Topic
 A Kafka topic is a category or feed name under which messages are stored.
 A Kafka producer publishes messages to a topic, which may be subscribed by zero or
more consumers.
 As shown in the figure, the Kafka cluster maintains a partitioned log for each topic.
Each of the partitions contains messages or records in an immutable ordered
sequence.
Partitions
A topic partition is a structured commit log to which the records are continually appended.
For each topic, Kafka keeps a minimum of one partition.
 Each record in the partition is assigned a sequential id called as the offset, which

uniquely identifies each of them within the partition.
 The partitions enable the topic to scale beyond a single server and act as the unit of
parallelism.
What is a Producer?
 Kafka producers publish messages to one or more Kafka topics.
 Every time a producer sends a message to a broker, the broker appends them to the
corresponding topic’s partition. Producers can also send messages to a partition of
their choice.
 Producers write to a single leader so that each write is served by a separate broker
which helps in load balancing.
How does a Producer Work ?

The image shows a producer writing to partition 0 of a topic, present in broker 1.
Being a leader partition 0 replicates that write to the available followers - broker 2 & broker 3.
When each replica acknowledges that it has received the message, the system is in sync.
Explaining Kafka Producers

This video describes
 Kafka Producer Concept

 How Producer writes data to Broker
 Different acknowledgment modes of Producer
What is a Consumer ?
 A consumer subscribes to a topic and consumes published messages by pulling data
from the brokers.
 Consumers read from a single partition so that you can scale the throughput of
message consumption similar to message production.
 If the number of consumers is more than the number of partitions then some
consumers will remain idle as they have no partitions to read from.
 If the number of partitions is greater than the number of consumers, then each
consumer will receive messages from multiple partitions.
 If the number of consumers is equal to the number of partitions, then each consumer
reads messages in order from exactly one partition.
Consumers & Consumer Groups

Consumers can be organized into consumer groups for a given topic.
Each message published on a topic will be delivered to one consumer instance within each
subscribed consumer group. These consumer instances may either be in separate processes
or on separate machines.
 If all the consumer instances are within the same consumer group, then the records
will be load balanced over the instances.
 If all the consumer instances are within different consumer groups, then each record
will be broadcast to all the consumer processes.
Consumers & Consumer groups

The image from the Kafka documentation depicts a scenario with multiple partitions of a
single topic.
Partitions 0 and 3 are kept in server 1 and partitions 1 and 2 are kept in server 2.
There are two consumer groups - A and B. A is composed of two consumers, and B of four
consumers.
 Consumer Group A consists of two consumers each reading two partitions and
together reading all the four partitions of the topic.
 On the other hand, Consumer Group B has the same number of consumers and
partitions, each reading exactly one partition from the topic.
Consumer Offset
 The Offset or position of a consumer in the partition log is the only metadata
retained for that consumer.
 The consumer controls the offset.
 When a consumer reads records, the offset advances linearly along the partition log.
 The consumer can read data from any position in the partition log - it can move back to
an older offset to re-read older data or jump ahead to the latest record and start
consuming from there.
Describing Consumer and Consumer Groups

This video explains the following:
 Concept Of Consumer
 How Consumer reads data from Topic Partitions
 Consumer Groups
 Consumer Offset
Kafka Broker
 Being a distributed system, Kafka runs in a cluster of machines, where each node in
the cluster is called a Kafka broker.
 A Kafka cluster is a Kafka distribution with more than one broker.
 A Kafka cluster will expand without downtime.
 Each broker may hold zero or more partitions of a topic. For example, if you have a
topic with 24 partitions and a cluster with 3 Kafka brokers, each one will hold 8
partitions of the topic.
 Kafka and Zookeeper will handle the load distributions among these partitions and
redistribute them correctly when any broker goes down.
Leader and Followers

Each of the partitions held by a broker is replicated in the Kafka cluster for fault recovery.
Each topic partition has one broker acting as a leader for that partition.
Nodes following leader instructions are called as followers.
 A leader is the node that handles all read and write requests for a given partition.
 It updates the followers or replicas with new data.
 If a leader fails, a follower takes over as the new leader.
Assume we have a Kafka cluster with three brokers, and topic partitions replicated over them.
As shown in the figure, for partition 0, broker 1 acts as the leader and brokers 2 & 3 are
followers (replicas).
The read and write requests for a partition are handled by the leader and the followers
replicate the leader across the nodes of the cluster.
Each broker in the cluster will be a leader for some of its partitions and a follower for
others, to maintain proper load balancing.
What is Zookeeper ?
 Zookeeper is a distributed centralized service that coordinates/manages large sets of
hosts.
 Zookeeper is used to provide a configuration service, naming registry, synchronization,

and group services in distributed applications.
Role of Zookeeper in Kafka

Kafka uses Zookeeper for the following:
1. Electing a controller :
 The controller is one among the many brokers responsible for maintaining the
leader/follower relationship for all the partitions.
 When a node crashes or shuts down, the controller tells other replicas to become
partition leader replacing the one on the node, that is going away.
 Zookeeper elects a controller, makes sure there is only one, and elects a new one it if
it crashes.
2. Cluster membership:
 Zookeeper monitors which brokers are alive and part of the cluster.
3. Topic configuration:
 Zookeeper keeps track of topics, its partitions and replicas, who is the preferred leader
and what configuration overrides are set for each topic.
4. Quotas:
 Zookeeper tracks how much data each client is allowed to read and write.
5. ACLs:
 Zookeeper tracks the following: Who is allowed to read and write to which topic, What
are the consumer groups which exist, Who are their respective members and What is
the latest offset each group received from each partition.
Queuing and Publish-Subscribe in Kafka

Kafka acts as both Publish-Subscribe and Queue-based messaging system.
In both cases, Producers send messages to a topic and Consumers read them from the topic.
Consumers are associated with ConsumerGroup using GroupId.
Consumers having same GroupId belong to a ConsumerGroup.
 If all consumers belong to different consumer groups, then all the Consumer Groups will
consume messages (This is a Publish-Subscribe model ).
 If all consumers belong to the same consumer group, then the partitions will be evenly
distributed among consumers in the consumer group. (This is a Queuing Model ).
Workflow as Publish-Subscribe Messaging

System
1. Producers send messages to a topic at regular intervals.
2. Kafka broker stores all messages to the partitions configured for that topic, such that the
messages are equally divided among the partitions of the topic.
3. Consumer subscribes to the topic.
4. On subscription, Kafka will send the current offset of the topic to the consumer. It then
saves a copy of the current offset in Zookeeper ensemble.
5. The consumer then requests Kafka for new messages at regular intervals.
6. Once received from the producer, Kafka forwards the message to the consumer.
7. The consumer receives the message and processes it.
8. Once processed, the consumer sends an acknowledgment to Kafka broker.
9. On receiving the acknowledgment, Kafka broker changes offset to the new value and
updates it in Zookeeper.
10. The above flow goes on repeating until the consumer stops the request.
11. At any time, the consumer can rewind/skip to the desired offset and get subsequent
messages.
Workflow as Queue Messaging System

1. Producer sends messages to the topic "Topic-01" at regular intervals.
2. Broker stores those messages to the partitions configured for the topic " Topic-01", such
that the messages are equally divided among the partitions of the topic.
3. A consumer with GroupId "Group-01" subscribes to the topic "Topic-01".
4. Kafka communicates with the consumer using the same steps as in Publish-
Subscribemessaging system.
5. Another consumer with the same GroupId Group-01 also subscribes to the same

topic Topic-01.
6. Now, the data is shared between the two consumers in the consumer group, such that
each topic partition is read by a consumer in the consumer group.
7. This sharing of partitions go on until the *number of consumers in the consumer

groupgrows to the total number of the topic partitions.
8. Now, if again another consumer with the same GroupId Group-01 subscribes to the same
topic Topic-01, it has to wait till any other consumers within the consumer
groupunsubscribe.
Kafka as a Messaging system

As discussed in the beginning of the course, traditional messaging has two models
- Queuingand Publish-Subscribe.
In Kafka, the consumer group divides processing of messages among its consumer
instances, similar to a queue. Again, Kafka broadcasts messages to all subscribing consumer
groups, as with Publish-Subscribe.
Thus, Kafka combines the strength of both these message models, enabling it to
easily scale.
 Kafka also provides better ordering guarantees than a traditional messaging system

using topic partitions.
 Kafka assigns topic partitions to each consumer within the consumer group in such a
way that each partition is consumed by only one consumer in the group.
 This guarantees that the consumer is the sole reader of that partition, consuming the
data in order.
Kafka as a Storage system

Kafka acts as a efficient storage system due to the following:
 It stores data in a distributed fashion with high performance.

 It writes the data to disk and replicates the data for fault-tolerance.
 Producers can wait for acknowledgement until the data write operation is complete, i.e.
replicated and persisted.
A configurable retention period can be set to retain all published records in Kafka

irrespective of whether they were consumed or not.
For example, if retention period is set as three days, a record will be available for
consumption for three days after it is published. It will be discarded after the retention period.
Kafka in a nutshell
Kafka for Stream Processing

So far, we discussed how to read and write data to Kafka topics using producers and
consumers. Next, we will learn to perform real-time processing of data streams using Kafka
Streams.
APIS
Kafka includes five core apis:
1. The Producer API allows applications to send streams of data to topics in the Kafka

cluster.
2. The Consumer API allows applications to read streams of data from topics in the Kafka
cluster.
3. The Streams API allows transforming streams of data from input topics to output topics.
4. The Connect API allows implementing connectors that continually pull from some source
system or application into Kafka or push from Kafka into some sink system or application.
5. The AdminClient API allows managing and inspecting topics, brokers, and other Kafka
objects.
Kafka exposes all its functionality over a language independent protocol which has clients
available in many programming languages. However only the Java clients are maintained as part
of the main Kafka project, the others are available as independent open source projects. A list of
non-Java clients is available here.
Kafka Streams API

A stream processor in Kafka reads streams of data from input topics, processes this data and
produces continuous streams of data to output topics. For this purpose, Kafka provides a fully
integrated Streams API.
For example, an application might take in input streams of data and perform computations for
handling out-of-order data, reprocess input as code changes etc. and then output a stream of
transformed data.
 The input for the Streams API are the producer and consumer APIs.
 It uses Kafka for stateful storage.
 For fault tolerance among stream processor instances, it uses the same group mechanism.
Key Concepts
1. Stream - Primary abstraction in Kafka Streams, it represents an unbounded and
continuously updating data set. A stream contains a sequence of immutable data records
which are ordered and fault tolerant.
2. Stream processing application - Program using the Kafka Streams Library to implement
its computational logic through one or more processor topologies.
3. Processor topology - Graph of stream processors (nodes) that are connected by streams
(edges).
4.Stream processor - Node in the processor topology. It denotes a processing step to operate
on input stream data by receiving an input record at a time from its upstream processors in
the topology, applying transformations, and consequently producing output records to its
downstream processors.
Two special stream processors:
1. Source Processor - Does not have any upstream processors. It produces an input
stream to its topology by consuming records from one or more Kafka topics and
forwards it to downstream processors.
2. Sink Processor - Does not have downstream processors. It sends any received
records from its upstream processors to a specified Kafka topic.
Quick Fact
 Simple, lightweight client library.
 No external dependencies on systems other than
Kafka.
 Supports exactly-once processing semantics.
 Supports event-time based windowingoperations.
Course Summary
In this course, you have learned the fundamental concepts of Kafka like Topics, Partitions,
Producers & Consumers and how they work in sync.
The course also explained on how Kafka could be used as a messaging, storage and stream
processing tool.
Quiz :
1. In Kafka, the c+A1:B26ommunication between the clients and servers is done with ----- Protocol. TCP
2. Which one functions as a messaging system? KAFKA
3. Based on the classification of messages Kafka categorizes messages into TOPIC
4. __________ is the subset of the replicas list that is currently alive and caught-up to the leader. IN-
SYNC-REPLICA
5. Which of the following is incorrect ? SINGLE LINE
6. Each record in the partition is assigned a sequential id called as the _______. offset
7. Which service monitors Cluster membership ? ZOOKEEPER
8. Which is the node responsible for all reads and writes for the given partition ? leader
9. A _______ is a structured commit log to which records are appended continually. TOPIC
PARTITION
10. Which concept of Kafka helps scale processing and multi-subscription. CONSUMER GRUOUP
11. How are messages stored in topic partitions ? IMMUTABLE
12. Kafka is run as a cluster comprised of one or more servers each of which is called : BROKER
13. The _________ is one of the brokers and is responsible for maintaining the leader/follower relationship
for all the partitions. CONTROLLER
14. Kafka supports both 'queuing model' and 'publish-subscribe' model? T
15. If the retention policy is set to two days, then for the two days after a record is published, it is available
for consumption, after which it will be discarded to free up space. T
16. If multiple consumers subscribed for a topic and each belong to different Consumer Group, the
messages of the topic will be consumed by all consumers. This is: PUBLISH SUBSCRIBE
17. Kafka combines the strength of both queuing and publish-subscribe models, enabling it to scale easily.
T
18. When a consumer subscribe to a topic, kafka provides the current offset of the topic to : BOTH
19. A configurable ________ can be set to retain all published records in Kafka irrespective of whether they
have been consumed or not. RETENTION PERIOD
20. If only one 'Consumer Group' subscribed for a topic and there are lots of consumers in this Consumer
Group, messages of the topic will be evenly load balanced between consumers of the consumer group.
This is : QUEINING MODEL
21. Kafka also provides better ordering guarantees than a traditional messaging system using ________.
TOPIC PARTITION
22. Source Processor does not have any downstream processors. F
23. Which processor sends any received records from its upstream processors to a specified Kafka topic ?
SINK
24. A ________ is a logical abstraction for stream processing code. PROCESSOR TROPOLOGY
25. Kafka Streams supports exactly-once processing semantics. T
26. A _________ in Kafka reads streams of data from input topics, processes this data and produces
continual streams of data to output topics. STREAM PROCESSOR
27. Graph of stream processors (nodes) that are connected by streams (edges) is called ? PROCESSOR
TROPOLOGY
28. A ________ is a logical abstraction for stream processing code Processor Topology
29. A Kafka distribution with more than one broker is called as a Kafka cluster T
30. are servers that replicate a partition log regardless of their role as leader or follower Followers
31. In a _______, a pool of consumers may read from a server and each record goes to one of them; in
_________ the record is broadcast to all consumers. queue, publish-subscribe
32. Kafka Streams employs one-record-at-a-time processing to achieve millisecond processing latency
T
33. A _______ subscribes to a topic and consumes published messages by pulling data from the brokers.
consumer
34. The _______ allows an application to act as a stream processor, consuming an input stream from one
or more topics and producing an output stream to one or more output topics. Streams API
35. Point out the wrong statement The Kafka cluster does not retain all published messages.
36. Kafka can be used for which of the following All the options
37. The __________ property is the unique and permanent name of each node in the cluster
broker.id
38. Kafka cluster can enforce quota on requests to control the broker resources used by clients T
39. A consumer cannot reset to an older offset to reprocess data from the past or skip ahead to the most
recent record and start consuming from now F
40. Each record published to a topic is delivered to _______ consumer instance within each subscribing
consumer group. multiple
41. Sink processor does not have any upstream processors. FALSE
42. Each message published to a topic is delivered to _______ within each subscribing consumer group.
one consumer instance
43. The ________ allows building and running reusable producers or consumers that connect Kafka topics
to existing applications or data systems. Connector API
44. Kafka provides better ordering guarantees than a traditional messaging system using topic partitions
TRUE
45. Which processor consumes records from one or more Kafka topics and forwards it to downstream
processors ? Source
46. The ________ allows an application to publish a stream of records to one or more Kafka topics.
Streams API
47. If all the consumer instances have different consumer groups, then each record will be broadcast to all
the consumer processes. T
48. Producers can also send messages to a partition of their choice. T
49. Engineers at _________ developed and open sourced Kafka, LinkedIn
50. It is possible to delete a Kafka topic. TRUE
51. Which acknowledgement number shows that the leader should wait for the full set of in-sync replicas
to acknowledge the record. 0
52. In a cluster Kafka can work without Zookeeper F
53. "isr" is the set of "in-sync" replicas. T
54. Producers write to a single leader, so that each write is serviced by a separate broker. T
55. Kafka only provides a total order over records within a partition, not between different partitions in a
topic. F
56. _________ data retention makes Kafka a durable system Disk-based
57. Which method of Kafka Consumer class is used to manually assign a list of partitions to a consumer
subscribe()
58. Which is the correct order of steps to create a simple messaging system in Kafka i.Step1. Start the
ZooKeeper server. ii.Step 2: Start the kafka server. iii.Step 3:Run the producer and send some
messages. iv. Step 4: Create a topic. v. Step 5: Start the consumer and see the messages send from
producer.
59. __________keeps track of topics, its partitions and replicas, who is the preferred leader and what
configuration overrides are set for each topic Zookeeper
60. Which API supports managing and inspecting topics, brokers and other kafka objects AdminClient
API
61. A hashing-based Partitioner takes ___ and generates a hash to locate which partition the message
should go Topic
62. Which messaging semantics do Kafka use to handle failure to any broker in cluster? retries
63. The __________ allows an application to subscribe to one or more topics and process the stream of
records produced to them Consumer API
64. What is the default retention period for a Kafka topic 7 Days
65. Which configuration in Producer API controls the criteria under which requests are considered
complete? Acks
66. Which of the following statement is incorrect ? In Queue based messaging system message ordering is
lost during parallel processing.
67. Which one below is not a parameter to the Kafka cluster.ProducerRecord class constructor ? Offset
68. For stream processing, Kafka provides which of the following Streams API
69. A _____ is the primary abstraction in Kafka Streams, and represents an unbounded and continuously
updating data set. Stream
70. Kafka Streams has no external dependencies on systems other than Apache Kafka itself T
71. Kafka has push based consumer where data is pushed from broker to consumer F
72. Each record consists of a key, a value, and a ________ data
73. Kafka Streams supports both stateful and stateless operations T
74. The only metadata retained on a per-consumer basis is the position of the consumer in the log, called :
offset
75. Kafka stores metadata of basic information about Topics, Brokers and consumer offsets in :
Zookeeper ensemble
76. Banking industry can leverage Kafka Streams for detecting fraudulent transactions. T

Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka

Uploaded by

Copyright:

Available Formats

Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka

Uploaded by

Copyright:

Available Formats

Getting to Know Kafka

Welcome to Kafka, a platform to build real-time streaming applications. Kafka - Premiera

This course will walk you through the following concepts:

 Overview of Kafka & its fundamental components.

How Kafka Came to be ..

To meet these demands, engineers at LinkedIn developed and open sourced Kafka, a

 Kafka is a publish-subscribe messaging system, written in Scala and Java, that is fast,

 Kafka is fault-tolerant and enables you to build distributed applications that scale on

What is a Messaging System?

The two types of messaging patterns are:

 Point to Point Messaging System

Point to Point Messaging System

Publish-Subscribe Messaging System

Unlike point to point, in Publish-Subscribe messaging system,

 A message on a topic is broadcast to all subscribed consumers.

How Kafka Works

 Kafka stores messages coming from multiple applications called producers.

 The messages get partitioned into partitions and based on some classification written

 The messages in a partition are indexed and stored with a timestamp.

 Multiple applications called consumers polls messages from partitions.

We can see this in detail on the upcoming topics.

What is Kafka Used For?

 Building real-time streaming pipelines that move data between different applications.

We will be discussing the above points in detail in the upcoming topics.

 High-Performance: All the above features make Kafka a High-Performance

Kafka for Messaging

 Kafka is used for tracking user activities on

 Kafka is used for operational monitoring of data.

Kafka subscribes data from a topic, process, and writes

Kafka Installation on Ubuntu

Step 1 — Create a User for Kafka

 Set its password using passwd command:

 Your Kafka user is now ready. Log into it using su:

Installing Prerequisites for Kafka ...

sudo apt-get update

sudo apt-get install default-jre

sudo apt-get install zookeeperd

 Once the installation is completed, ZooKeeper will be started automatically as a daemon.

telnet localhost 2181

Downloading and Installing Kafka

mkdir -p ~/kafka && cd ~/kafka

tar -xvzf ~/Downloads/kafka.tgz --strip 1

Configuring Kafka Server

Save the file, and exit vi.

Starting Kafka Server

nohup ~/kafka/bin/kafka-server-start.sh ~/kafka/config/server.properties >

[2015-07-29 06:02:41,776] INFO [Kafka Server 0], started (kafka.server.KafkaServer)

cat ~/kafka/kafka.log | grep port

to check the port.

 Topics and Partitions

 Each record in the partition is assigned a sequential id called as the offset, which

How does a Producer Work ?

Explaining Kafka Producers

 Kafka Producer Concept

Consumers & Consumer Groups

Consumers & Consumer groups

 The consumer controls the offset.

Describing Consumer and Consumer Groups

 A Kafka cluster is a Kafka distribution with more than one broker.