Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
2. Agenda
● What is Apache Kafka?
● Why Apache Kafka?
● Fundamentals
○ Terminologies
○ Architecture
○ Protocol
● What does kafka offer?
● Where Kafka is used?
11. Birth of Kafka
Kafka decouples Data Pipelines
Source System Source System Source System Source System
Hadoop Security Systems
Real-time
monitoring
Data Warehouse
Kafka
Producers
Brokers
Consumers
14. Terminologies
• Topic is a message stream (queue)
• Partitions - Topic is divided into partitions
• Ordered, immutable, a structured commit log
• Segments - Partition has log segments on disk
• Partitions broken down into physical files.
• Offset - Segment has messages
• Each message is assigned a sequential id
24. What does kafka offer?
• Fast
• Scalable
• Durable
• Distributed by Design
25. What does kafka offer?
• Fast
• A single Kafka broker can handle 100Mbps of reads & writes from 1000 of clients.
• Scalable
• Durable
• Distributed by Design
26. What does kafka offer?
• Fast
• Scalable
• Data streams are partitioned and spread over a cluster of machines to allow data
streams larger than the capability of any single machine
• Durable
• Distributed by Design
27. What does kafka offer?
• Fast
• Scalable
• Durable
• Messages are persisted on disk and replicated within the cluster to prevent data
loss. Each broker can handle terabytes of messages without performance impact.
• Distributed by Design
28. What does kafka offer
• Fast
• Scalable
• Durable
• Distributed by Design
• Kafka has a modern cluster-centric design that offers strong durability and
fault-tolerance guarantees.
29. Efficiency - high throughput & low latency
• Disks are fast when used sequentially
• Append to end of log.
• Fetch messages from a partition beginning from a particular message id.
• Batching makes best use of network/IO
• Batched send and receive
• Batched compression (GZIP, Snappy and LZ4)
30. Efficiency - high throughput & low latency
• No message caching in JVM
• Zero-copy from file to socket (Java NIO)
34. What do people use it for?
• Activity tracking
• page views or click tracking, profile updates
• Real-time event processing
• Provides low latency
• Good integration with spark, samza. storm etc.
• Collecting Operational Metrics
• Monitoring & alerting
• Commit log
• database changes can be published to Kafka
35. Why Replication?
• Broker can go down
• controlled: rolling restart for code/config push
• uncontrolled: isolated broker failure
• If broker is down
• some partitions are unavailable
• could be permanent data loss
• Replication ensures higher availability & durability
36. Caveats
• Not designed for large payloads.
• Decoding is done for a whole message (no streaming decoding).
• Rebalancing can screw things up.
• if you're doing any aggregation in the consumer by the partition key
• Number of partitions cannot be easily changed (chose wisely).
• Lots of topics can hurt I/O performance.
37. Guarantees
• At least once delivery
• In order delivery, per partition
• For a topic with replication factor N, Kafka can tolerate up to N-1 server
failures without “losing” any messages committed to the log
38. Summary
• A high throughput distributed messaging system rethought as commit log
• Originally developed by LinkedIn, Open Sourced in 2011
• Written in Scala, Clients for every popular language
• Used by LinkedIn, Twitter, Netflix and many more companies.
• When should we use Kafka ?
• When should we not use kafka?
40. Agenda
• Kafka Clusters at @Zapr
• Kafka in AWS
• Challenges Managing Kafka Brokers.
• Monitoring of Kafka Clusters.
41. Kafka Version: 0.8.2
Total Kafka Clusters: 5
Number of Topics: 185
Brokers: 17
Partitions: 7827
Regions: 2 (us-east-1,ap-southeast-1)
Kafka at Zapr
42. Kafka Clusters on Amazon Web Services.
Quorum of 3 Zookeeper Nodes for High Availability
Kafka Brokers
43. Resource Utilization
Family Type vCPU Memory CPU Pricing
Compute
Optimized
C4.2xlarge 8 16 ~75% 338$
General
Purpose
M4.2xlarge 8 32 ~25% 366$
After moving from C4 Class to M4 Class Machines 40% Improvement in CPU Usage
45. #Default Partitions and Replication Factor for Kafka Topics
num.partitions=10
default.replication.factor=2
#Controlled shutdown for the proper shutdown of Kafka Broker for maintenance.
num.recovery.threads.per.data.dir=4
controlled.shutdown.enable=true
controlled.shutdown.max.retries=5
controlled.shutdown.retry.backoff.ms=60000
#Creation and Deletion of Kafka Topics
delete.topic.enable=true
auto.create.topics.enable=true
Production Kafka Broker
Configurations
50. Introducing Kafka Manager
https://github.com/yahoo/kafka-manager
It supports the following :
● Manage multiple clusters.
● Easy inspection of cluster state (topics, consumers, offsets, brokers)
● Generate partition assignments with option to select brokers to use
● Run reassignment of partition (based on generated assignments)
● Create a topic with optional topic configs (0.8.1.1 has different configs than 0.8.2+)
● Delete topic (only supported on 0.8.2+ and remember set delete.topic.enable=true in
broker config)
● Add partitions to existing topic
● Update config for existing topic
● Optionally enable JMX polling for broker level and topic level metrics.
61. Kafka Offset Monitoring
https://github.com/quantifind/KafkaOffsetMonitor
● This is an app to monitor your kafka consumers and their position (offset) in
the queue.
● You can see the current consumer groups, for each group the topics that they
are consuming
● Offset are useful to understand how quick you are consuming from a queue
and how fast the queue is growing
64. Future plans
• Upgrading EC2 instances to next generations from M4 to M5
• Upgrading Kafka Version from 0.8.2 to latest version (1.2)
• Using ST1 (throughput optimized HDD) EBS