Kafka101training Public v2 140818033637 Phpapp01

Apache Kafka 0.
8 basic training
Michael G. Noll, Verisign
mnoll@verisign.com / @miguno
July 2014
Update 2015-08-01:
Shameless plug! Since publishing this Kafka training deck about a year ago
I joined Confluent Inc. as their Developer Evangelist.
Confluent is the US startup founded in 2014 by the creators of Apache Kafka

who developed Kafka while at LinkedIn (see Forbes about Confluent). Next to
building the worlds best stream data platform we are also providing
professional Kafka trainings, which go even deeper as well as beyond my
extensive training deck below.
http://www.confluent.io/training
I can say with confidence that these are the best and most effective Apache
Kafka trainings available on the market. But you dont have to take my word
for it feel free to take a look yourself and reach out to us if youre interested.
Verisign Public
Michael 2
Kafka?
Part 1: Introducing Kafka
Why should I stay awake for the full duration of this workshop?
Part 2: Kafka core concepts
Topics, partitions, replicas, producers, consumers, brokers
Part 3: Operating Kafka
Architecture, hardware specs, deploying, monitoring, P&S tuning
Part 4: Developing Kafka apps
Writing to Kafka, reading from Kafka, testing, serialization, compression, example apps
Part 5: Playing with Kafka using Wirbelsturm
Wrapping up
Verisign Public 3
Part 1: Introducing Kafka
Verisign Public 4
Overview of Part 1: Introducing Kafka
Kafka?
Kafka adoption and use cases in the wild
At LinkedIn
At other companies
How fast is Kafka, and why?
Kafka + X for processing
Storm, Samza, Spark Streaming, custom apps
Verisign Public 5
Kafka?
http://kafka.apache.org/
Originated at LinkedIn, open sourced in early 2011
Implemented in Scala, some Java
9 core committers, plus ~ 20 contributors
https://kafka.apache.org/committers.html
https://github.com/apache/kafka/graphs/contributors
Verisign Public 6
Kafka?
LinkedIns motivation for Kafka was:
A unified platform for handling all the real-time data feeds a large company might have.
Must haves
High throughput to support high volume event feeds.
Support real-time processing of these feeds to create new, derived feeds.
Support large data backlogs to handle periodic ingestion from offline systems.
Support low-latency delivery to handle more traditional messaging use cases.
Guarantee fault-tolerance in the presence of machine failures.
http://kafka.apache.org/documentation.html#majordesignelements
Verisign Public 7
Kafka @ LinkedIn, 2014
(Numbers have increased since.)
https://twitter.com/SalesforceEng/status/466033231800713216/photo/1
http://www.hakkalabs.co/articles/site-reliability-engineering-linkedin-kafka-service
Verisign Public 8
Data architecture @ LinkedIn, Feb 2013
(Numbers are aggregated

across all their clusters.)
http://gigaom.com/2013/12/09/netflix-open-sources-its-data-traffic-cop-suro/
Verisign Public 9
Multiple data centers, multiple clusters
Mirroring between clusters / data centers
What type of data is being transported through Kafka?

Metrics: operational telemetry data
Tracking: everything a LinkedIn.com user does
Queuing: between LinkedIn apps, e.g. for sending emails
To transport data from LinkedIns apps to Hadoop, and back

In total ~ 200 billion events/day via Kafka
Tens of thousands of data producers, thousands of consumers
7 million events/sec (write), 35 million events/sec (read) <<< may include replicated events
But: LinkedIn is not even the largest Kafka user anymore as of 2014
http://www.slideshare.net/JayKreps1/i-32858698
http://search-hadoop.com/m/4TaT4qAFQW1
Verisign Public 10
For reference, here are the stats on one of

LinkedIn's busiest clusters (at peak):
15 brokers
15,500 partitions (replication factor 2)
400,000 msg/s inbound
70 MB/s inbound
400 MB/s outbound
https://kafka.apache.org/documentation.html#java
Verisign Public 11
Staffing: Kafka team @ LinkedIn
Team of 8+ engineers
Site reliability engineers (Ops): at least 3
Developers: at least 5
SREs as well as DEVs are on call 24x7
https://kafka.apache.org/committers.html
Verisign Public 12
Kafka adoption and use cases
LinkedIn: activity streams, operational metrics, data bus
400 nodes, 18k topics, 220B msg/day (peak 3.2M msg/s), May 2014
Netflix: real-time monitoring and event processing
Twitter: as part of their Storm real-time data pipelines
Spotify: log delivery (from 4h down to 10s), Hadoop
Loggly: log collection and processing
Mozilla: telemetry data
Airbnb, Cisco, Gnip, InfoChimps, Ooyala, Square, Uber,
https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
Verisign Public 13
Kafka @ Spotify
https://www.jfokus.se/jfokus14/preso/Reliable-real-time-processing-with-Kafka-and-Storm.pdf (Feb 2014)
Verisign Public 14
How fast is Kafka?
Up to 2 million writes/sec on 3 cheap machines
Using 3 producers on 3 different machines, 3x async replication
Only 1 producer/machine because NIC already saturated
Sustained throughput as stored data grows
Slightly different test config than 2M writes/sec above.
Test setup
Kafka trunk as of April 2013, but 0.8.1+ should be similar.
3 machines: 6-core Intel Xeon 2.5 GHz, 32GB RAM, 6x 7200rpm SATA, 1GigE
http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
Verisign Public 15
Why is Kafka so fast?
Fast writes:
While Kafka persists all data to disk, essentially all writes go to the
page cache of OS, i.e. RAM.
Cf. hardware specs and OS tuning (we cover this later)
Fast reads:
Very efficient to transfer data from page cache to a network socket
Linux: sen d f i
le() system call
Combination of the two = fast Kafka!

Example (Operations): On a Kafka cluster where the consumers are
mostly caught up you will see no read activity on the disks as they will be
serving data entirely from cache.
http://kafka.apache.org/documentation.html#persistence
Verisign Public 16
Why is Kafka so fast?
Example: Loggly.com, who run Kafka & Co. on Amazon AWS
99.99999% of the time our data is coming from disk cache and RAM; only
very rarely do we hit the disk.
One of our consumer groups (8 threads) which maps a log to a customer
can process about 200,000 events per second draining from 192 partitions
spread across 3 brokers.
Brokers run on m2.xlarge Amazon EC2 instances backed by provisioned IOPS
http://www.developer-tech.com/news/2014/jun/10/why-loggly-loves-apache-kafka-how-unbreakable-infinitely-scalable-messaging-makes-log-management-better/
Verisign Public 17
Kafka + X for processing the data?
Kafka + Storm often used in combination, e.g. Twitter
Kafka + custom
Normal Java multi-threaded setups
Akka actors with Scala or Java, e.g. Ooyala
Recent additions:
Samza (since Aug 13) also by LinkedIn
Spark Streaming, part of Spark (since Feb 13)
Kafka + Camus for Kafka->Hadoop ingestion
https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
Verisign Public 18
Part 2: Kafka core concepts
Verisign Public 19
Overview of Part 2: Kafka core concepts
A first look
Topics, partitions, replicas, offsets
Producers, brokers, consumers
Putting it all together
Verisign Public 20
A first look
The who is who
Producers write data to brokers.
Consumers read data from brokers.
All this is distributed.
The data
Data is stored in topics.
Topics are split into partitions, which are replicated.
Verisign Public 21
A first look
http://www.michael-noll.com/blog/2013/03/13/running-a-multi-broker-apache-kafka-cluster-on-a-single-node/
Verisign Public 22
Topics
Topic: feed name to which messages are published
Example: zerg.hydra
Kafka prunes head based on age or max size or key
Producer A1
Kafka topic
Producer A2

new
Producer An
Older msgs Newer msgs
Producers always append to tail

(think: append to a file)
Broker(s)
Verisign Public 23
Topics
Consumer group C1 Consumers use an offset pointer to

track/control their read progress
(and decide the pace of consumption)
Consumer group C2
Producer A1
Producer A2

new
Producer An
Producers always append to tail

(think: append to a file)
Broker(s)
Verisign Public 24
Topics
Creating a topic
CLI
$ kafka-topics.sh --zookeeper zookeeper1:2181 --create --topic zerg.hydra \
--partitions 3 --replication-factor 2 \
--config x= y
API
https://github.com/miguno/kafka-storm-starter/blob/develop/src/main/scal
a/com/miguno/kafkastorm/storm/
KafkaStormDemo.scala
Auto-create via auto.create.topics.enable = true
Modifying a topic
https://kafka.apache.org/documentation.html#basic_ops_modify_topic
Deleting a topic: DONT in 0.8.1.x!

Verisign Public 25
Partitions
A topic consists of partitions.
Partition: ordered + immutable sequence of messages
that is continually appended to
Verisign Public 26
Partitions
#partitions of a topic is configurable
#partitions determines max consumer (group) parallelism
Cf. parallelism of Storms KafkaSpout via builder.setSpout(,,N )
Consumer group A, with 2 consumers, reads from a 4-partition topic

Verisign Public
Consumer group B, with 4 consumers, reads from the same topic 27
Partition offsets
Offset: messages in the partitions are each assigned a
unique (per partition) and sequential id called the offset
Consumers track their pointers via (offset, partition, topic) tuples
Consumer group C1
Verisign Public 28
Replicas of a partition
Replicas: backups of a partition
They exist solely to prevent data loss.
Replicas are never read from, never written to.
They do NOT help to increase producer or consumer parallelism!
Kafka tolerates (numReplicas - 1) dead brokers before losing data
LinkedIn: numReplicas == 2 1 broker can die
Verisign Public 29
Topics vs. Partitions vs. Replicas
Verisign Public 30
Inspecting the current state of a topic
--describe the topic
$ kafka-topics.sh --zookeeper zookeeper1:2181 --describe --topic zerg.hydra
Topic:zerg2.hydra PartitionC ount:3 ReplicationFactor:2 Confi g s:
Topic: zerg2.hydra Partition: 0 Leader: 1 Replicas: 1,0 Isr: 1,0
Leader: brokerID of the currently elected leader broker

Replica IDs = broker IDs
ISR = in-sync replica, replicas that are in sync with the leader
In this example:
Broker 0 is leader for partition 1.
Broker 1 is leader for partitions 0 and 2.
All replicas are in-sync with their respective leader partitions.
Verisign Public 31
Lets recap
The who is who
Producers write data to brokers.
Consumers read data from brokers.
All this is distributed.
The data
Data is stored in topics.
Topics are split into partitions which are replicated.
Verisign Public 32
Putting it all together
Verisign Public 33
Side note (opinion)
Drawing a conceptual line from Kafka to Clojure's core.async
Cf. talk "Clojure core.async Channels", by Rich Hickey, at ~ 31m54

http://www.infoq.com/presentations/clojure-core-async
Verisign Public 34
Part 3: Operating Kafka
Verisign Public 35
Overview of Part 3: Operating Kafka
Kafka architecture
Kafka hardware specs
Deploying Kafka
Monitoring Kafka
Kafka apps
Kafka itself
ZooKeeper
"Auditing" Kafka (not: security audit)
P&S tuning
Ops-related Kafka references
Verisign Public 36
Kafka architecture
Kafka brokers
You can run clusters with 1+ brokers.
Each broker in a cluster must have
a unique broker.id.
Verisign Public 37
Kafka architecture
Kafka requires ZooKeeper
LinkedIn runs (old) ZK 3.3.4,
but latest 3.4.5 works, too.
ZooKeeper
v0.8: used by brokers and consumers, but not by producers.
Brokers: general state information, leader election, etc.
Consumers: primarily for tracking message offsets (cf. later)
v0.9: used by brokers only
Consumers will use special Kafka topics instead of ZooKeeper
Will substantially reduce the load on ZooKeeper for large deployments
Verisign Public 38
Kafka broker hardware specs @ LinkedIn
Solely dedicated to running Kafka, run nothing else.
1 Kafka broker instance per machine
2x 4-core Intel Xeon (info outdated?)
64 GB RAM (up from 24 GB)
Only 4 GB used for Kafka broker, remaining 60 GB for page cache
Page cache is what makes Kafka fast
RAID10 with 14 spindles
More spindles = higher disk throughput
Cache on RAID, with battery backup
Before H/W upgrade: 8x SATA drives (7200rpm), not sure about RAID
1 GigE (?) NICs
EC2 example: m2.2xlarge @ $0.34/hour, with provisioned IOPS

Verisign Public 39
ZooKeeper hardware specs @ LinkedIn
ZooKeeper servers
Solely dedicated to running ZooKeeper, run nothing else.
1 ZooKeeper instance per machine
SSDs dramatically improve performance
In v0.8.x, brokers and consumers must talk to ZK. In large-scale
environments (many consumers, many topics and partitions) this
means ZK can become a bottleneck because it processes requests
serially. And this processing depends primarily on I/O performance.
1 GigE (?) NICs
ZooKeeper in LinkedIns architecture

5-node ZK ensembles = tolerates 2 dead nodes
1 ZK ensemble for all Kafka clusters within a data center
LinkedIn runs multiple data centers, with multiple Kafka clusters
Verisign Public 40
Deploying Kafka
Puppet module
https://github.com/miguno/puppet-kafka
Hiera-compatible, rspec tests, Travis CI setup (e.g. to test against multiple
versions of Puppet and Ruby, Puppet style checker/lint, etc.)
RPM packaging script for RHEL 6

https://github.com/miguno/wirbelsturm-rpm-kafka
Digitally signed by yum@michael-noll.com
RPM is built on a Wirbelsturm-managed build server
Public (Wirbelsturm) S3-backed yum repo

https://s3.amazonaws.com/yum.miguno.com/bigdata/
Verisign Public 41
Deploying Kafka
Hiera example
Verisign Public 42
Operating Kafka
Typical operations tasks include:
Adding or removing brokers
Example: ensure a newly added broker actually receives data, which
requires moving partitions from existing brokers to the new broker
Kafka provides helper scripts (cf. below) but still manual work involved
Balancing data/partitions to ensure best performance
Add new topics, re-configure topics
Example: Increasing #partitions of a topic to increase max parallelism
Apps management: new producers, new consumers
See Ops-related references at the end of this part
Verisign Public 43
Lessons learned from operating Kafka at LinkedIn
Biggest challenge has been to manage hyper growth
Growth of Kafka adoption: more producers, more consumers,
Growth of data: more LinkedIn.com users, more user activity,
Typical tasks at LinkedIn

Educating and coaching Kafka users.
Expanding Kafka clusters, shrinking clusters.
Monitoring consumer apps Hey, my stuff stopped. Kafkas fault!
Verisign Public 44
Kafka security
Original design was not created with security in mind.
Discussion started in June 2014 to add security features.
Covers transport layer security, data encryption at rest, non-repudiation, A&A,
See [DISCUSS] Kafka Security Specific Features
At the moment there's basically no security built-in.
Verisign Public 45
Monitoring Kafka
Verisign Public 46
Monitoring Kafka
Nothing fancy built into Kafka (e.g. no UI) but see:
https://cwiki.apache.org/confluence/display/KAFKA/System+Tools
https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
Kafka Web Console Kafka Offset Monitor
Verisign Public 47
Monitoring Kafka
Use of standard monitoring tools recommended
Graphite
Puppet module: https://github.com/miguno/puppet-graphite
Java API, also used by Kafka: http://metrics.codahale.com/
JMX
https://kafka.apache.org/documentation.html#monitoring
Collect logging files into a central place

Logstash/Kibana and friends
Helps with troubleshooting, debugging, etc. notably if you can correlate
logging data with numeric metrics
Verisign Public 48
Monitoring Kafka apps
Almost all problems are due to:
1. Consumer lag
2. Rebalancing <<< we cover this later in part 4
Verisign Public 49
Monitoring Kafka apps: consumer lag
Lag = how far your consumer is behind the producers
Consumer group C1
Producer A1
Producer A2

new
Producer An
Broker(s)
Lag is a consumer problem

Too slow, too much GC, losing connection to ZK or Kafka,
Bug or design flaw in consumer
Operational mistakes: e.g. you brought up 6 servers in parallel, each one
in turn triggering rebalancing, then hit Kafka's rebalance limit;
cf. rebalance.m ax.retries (default: 4) & friends
Verisign Public 50
Monitoring Kafka itself (1 of 3)
Under-replicated partitions
For example, because a broker is down.
Means cluster runs in degraded state.
FYI: LinkedIn runs with replication factor of 2 => 1 broker can die.
Offline partitions
Even worse than under-replicated partitions!
Serious problem (data loss) if anything but 0 offline partitions.
Verisign Public 51
Data size on disk
Should be balanced across disks/brokers
Data balance even more important than partition balance
FYI: New script in v0.8.1 to balance data/partitions across brokers
Broker partition balance

Count of partitions should be balanced evenly across brokers
See new script above.
Verisign Public 52
Leader partition count
Should be balanced across brokers so that each broker gets the same
amount of load
Only 1 broker is ever the leader of a given partition, and only this broker is
going to talk to producers + consumers for that partition
Non-leader replicas are used solely as safeguards against data loss
Feature in v0.8.1 to auto-rebalance the leaders and partitions in case a
broker dies, but it does not work that well yet (SRE's still have to do this
manually at this point).
Network utilization
Maxed network one reason for under-replicated partitions
LinkedIn don't run anything but Kafka on the brokers, so network max is
due to Kafka. Hence, when they max the network, they need to add more
capacity across the board.
Verisign Public 53
Monitoring ZooKeeper
Ensemble (= cluster) availability
LinkedIn run 5-node ensembles = tolerates 2 dead
Twitter run 13-node ensembles = tolerates 6 dead
Latency of requests
Metric target is 0 ms when using SSDs in ZooKeeper machines.
Why? Because SSDs are so fast they typically bring down latency below ZKs
metric granularity (which is per-ms).
Outstanding requests
Metric target is 0.
Why? Because ZK processes all incoming requests serially. Non-zero
values mean that requests are backing up.
Verisign Public 54
"Auditing" Kafka
LinkedIn's way to detect data loss etc.
Verisign Public 55
Auditing Kafka
LinkedIn's way to detect data loss etc. in Kafka
Not part of open source stack yet. May come in the future.
In short: custom producer+consumer app that is hooked into monitoring.
Value proposition
Monitor whether you're losing messages/data.
Monitor whether your pipelines can handle the incoming data load.
Verisign Public 56
LinkedIn's Audit UI: a first look
Example 1: Count discrepancy
Caused by messages failing to
reach a downstream Kafka
cluster
Example 2: Load lag
Verisign Public 57
Auditing Kafka
Every producer is also writing messages into a special topic about
how many messages it produced, every 10mins.
Example: "Over the last 10mins, I sent N messages to topic X.
This metadata gets mirrored like any other Kafka data.
Audit consumer
1 audit consumer per Kafka cluster
Reads every single message out of its Kafka cluster. It then calculates
counts for each topic, and writes those counts back into the same special
topic, every 10mins.
Example: "I saw M messages in the last 10mins for topic X in THIS cluster
And the next audit consumer in the next, downstream cluster does the
same thing.
Verisign Public 58
Auditing Kafka
Monitoring audit consumers
Completeness check
"#msgs according to producer == #msgs seen by audit consumer?"
Lag
"Can the audit consumers keep up with the incoming data rate?"
If audit consumers fall behind, then all your tracking data falls behind
as well, and you don't know how many messages got produced.
Verisign Public 59
Auditing Kafka
Audit UI
Only reads data from that special "metrics/monitoring" topic, but
this data is reads from every Kafka cluster at LinkedIn.
What they producers said they wrote in.
What the audit consumers said they saw.
Shows correlation graphs (producers vs. audit consumers)
For each tier, it shows how many messages there were in each topic
over any given period of time.
Percentage of how much data got through (from cluster to cluster).
If the percentage drops below 100%, then emails are sent to Kafka
SRE+DEV as well as their Hadoop ETL team because that stops the
Hadoop pipelines from functioning properly.
Verisign Public 60
LinkedIn's Audit UI: a closing look
Example 1: Count discrepancy
Caused by messages failing to
reach a downstream Kafka
cluster
Example 2: Load lag
Verisign Public 61
Kafka performance tuning
Verisign Public 62
OS tuning
Kernel tuning
Dont swap! vm .sw appiness = 0 (RHEL 6.5 onwards: 1)
Allow more dirty pages but less dirty cache.
LinkedIn have lots of RAM in servers, most of it is for page cache (60
of 64 GB). They let dirty pages built up, but cache should be available
as Kafka does lots of disk and network I/O.
See vm .dirty_*_ratio & friends
Disk throughput
Longer commit interval on mount points. (ext3 or ext4?)
Normal interval for ext3 mount point is 30s (?) between flushes;
LinkedIn: 120s. They can tolerate losing 2mins worth of data (because
of partition replicas) so they rather prefer higher throughput here.
More spindles (RAID10 w/ 14 disks)
Verisign Public 63
Java/JVM tuning
Biggest issue: garbage collection
And, most of the time, the only issue
Goal is to minimize GC pause times

Aka stop-the-world events apps are halted until GC finishes
Verisign Public 64
Java garbage collection in Kafka @ Spotify
Before tuning After tuning
https://www.jfokus.se/jfokus14/preso/Reliable-real-time-processing-with-Kafka-and-Storm.pdf
Verisign Public 65
Java/JVM tuning
Good news: use JDK7u51 or later and have a quiet life!
LinkedIn: Oracle JDK, not OpenJDK
Silver bullet is new G1 garbage-first garbage collector

Available since JDK7u4.
Substantial improvement over all previous GCs, at least for Kafka.
$ java -Xm s4g -Xm x4g -XX:Perm Size= 48m -XX:M axPerm Size= 48m
-XX:+ U seG 1G C
-XX:M axG CPauseM illis= 20
-XX:InitiatingH eapO ccupancyPercent= 35
Verisign Public 66
Kafka configuration tuning
Often not much to do beyond using the defaults, yay.
Key candidates for tuning:

num .io.threads should be >= #disks (start testing with == #disks)
num .netw ork.threads adjust it based on (concurrent) #producers, #consumers,
and replication factor
Verisign Public 67
Kafka usage tuning lessons learned from others
Don't break things up into separate topics unless the data in them is
truly independent.
Consumer behavior can (and will) be extremely variable, dont assume you
will always be consuming as fast as you areproducing.
Keep time related messages in the same partition.

Consumer behavior can extremely variable, don't assume the lag on all
your partitions will be similar.
Design a partitioning scheme, so that the owner of one partition can stop
consuming for a long period of time and your application will be minimally
impacted (for example, partition by transaction id)
http://grokbase.com/t/kafka/users/145qtx4z1c/topic-partitioning-strategy-for-large-data
Verisign Public 68
Ops-related references
Kafka FAQ
https://cwiki.apache.org/confluence/display/KAFKA/FAQ
Kafka operations
https://kafka.apache.org/documentation.html#operations
Kafka system tools

https://cwiki.apache.org/confluence/display/KAFKA/System+Tools
Consumer offset checker, get offsets for a topic, print metrics via JMX to console, read from topic
A and write to topic B, verify consumer rebalance
Kafka replication tools
https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools
Caveat: Some sections of this document are slightly outdated.
Controlled shutdown, preferred leader election tool, reassign partitions tool
Kafka tutorial
http://www.michael-noll.com/blog/2013/03/13/running-a-multi-broker-apache-kafka-cluster-on-a-s
ingle-node
/
Verisign Public 69
Part 4: Developing Kafka apps
Verisign Public 70
Overview of Part 4: Developing Kafka apps
Writing data to Kafka with producers
Example producer
Producer types (async, sync)
Message acking and batching of messages
Write operations behind the scenes caveats ahead!
Reading data from Kafka with consumers
High-level consumer API and simple consumer API
Consumer groups
Rebalancing
Testing Kafka
Serialization in Kafka
Data compression in Kafka
Example Kafka applications
Dev-related Kafka references
Verisign Public 71
Writing data to Kafka
Verisign Public 72
Writing data to Kafka
You use Kafka producers to write data to Kafka brokers.
Available for JVM (Java, Scala), C/C++, Python, Ruby, etc.
The Kafka project only provides the JVM implementation.
Has risk that a new Kafka release will break non-JVM clients.
A simple example producer:
Full details at:

https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example
Verisign Public 73
Producers
The Java producer API is very simple.
Well talk about the slightly confusing details next.
Verisign Public 74
Producers
Two types of producers: async and sync
Same API and configuration, but slightly different semantics.

What applies to a sync producer almost always applies to async, too.
Async producer is preferred when you want higher throughput.
Important configuration settings for either producer type:

client.id identifies producer app, e.g. in system logs
producer.type async or sync
request.required.acks acking semantics, cf. next slides
serializer.class configure encoder, cf. slides on Avro usage
m etadata.broker.list cf. slides on bootstrapping list of brokers
Verisign Public 75
Sync producers
Straight-forward so I wont cover sync producers here
Please go to https://kafka.apache.org/documentation.html
Most important thing to remember: producer.send() will block!
Verisign Public 76
Async producer
Sends messages in background = no blocking in client.
Provides more powerful batching of messages (see later).
Wraps a sync producer, or rather a pool of them.
Communication from async->sync producer happens via a queue.
Which explains why you may see kafka.producer.async.Q ueueFullException
Each sync producer gets a copy of the original async producer config,
including the request.required.acks setting (see later).
Implementation details: Producer, async.AsyncProducer,
async.ProducerSendThread, ProducerPool, async.DefaultEventHandler#send()
Verisign Public 77
Async producer
Caveats
Async producer may drop messages if its queue is full.
Solution 1: Dont push data to producer faster than it is able to send to brokers.
Solution 2: Queue full == need more brokers, add them now! Use this solution in
favor of solution 3 particularly if your producer cannot block (async producers).
Solution 3: Set queue.enqueue.tim eout.m s to -1 (default). Now the producer
will block indefinitely and will never willingly drop a message.
Solution 4: Increase queue.buff
ering.m ax.m essages (default: 10,000).
In 0.8 an async producer does not have a callback for send() to register
error handlers. Callbacks will be available in 0.9.
Verisign Public 78
Producers
Two aspects worth mentioning because they significantly influence
Kafka performance:
1. Message acking
2. Batching of messages
Verisign Public 79
1) Message acking
Background:
In Kafka, a message is considered committed when any required ISR (in-
sync replicas) for that partition have applied it to their data log.
Message acking is about conveying this Yes, committed! information back
from the brokers to the producer client.
Exact meaning of any required is defined by request.required.acks.
Only producers must configure acking

Exact behavior is configured via req u est.req u ired .acks, which
determines when a produce request is considered completed.
Allows you to trade latency (speed) <-> durability (data safety).
Consumers: Acking and how you configured it on the side of producers do
not matter to consumers because only committed messages are ever given
out to consumers. They dont need to worry about potentially seeing a
message that could be lost if the leader fails.
Verisign Public 80
1) Message acking
Typical values of request.required.acks
0: producer never waits for an ack from the broker.
latency
better
Gives the lowest latency but the weakest durability guarantees.

1: producer gets an ack after the leader replica has received the data.
Gives better durability as the we wait until the lead broker acks the request. Only msgs that
were written to the now-dead leader but not yet replicated will be lost.
durability
-1: producer gets an ack after all ISR have received the data.
better
Gives the best durability as Kafka guarantees that no data will be lost as long as at least
one ISR remains.
Beware of interplay with request.tim eout.m s!

"The amount of time the broker will wait trying to meet the `request.required.acks`
requirement before sending back an error to the client.
Caveat: Message may be committed even when broker sends timeout error to client
(e.g. because not all ISR acked in time). One reason for this is that the producer
acknowledgement is independent of the leader-follower replication, and ISRs send
their acks to the leader, the latter of which will reply to the client.
Verisign Public 81
2) Batching of messages
Batching improves throughput
Tradeoff is data loss if client dies before pending messages have been sent.
You have two options to batch messages in 0.8:

1. Use send(listO fM essag es).
Sync producer: will send this list (batch) of messages right now. Blocks!
Async producer: will send this list of messages in background as usual, i.e.
according to batch-related configuration settings. Does not block!
2. Use send(sin g leM essag e) with async producer.
For async the behavior is the same as send(listO fM essages).
Verisign Public 82
Option 1: How send(listO fM essages) works behind the scenes
The original list of messages is partitioned (randomly if the default

partitioner is used) based on their destination partitions/topics, i.e. split into
smaller batches.
p4 p4
partitioner.class p6 p1 p4 p4 p6 p6 p6
p1
Each post-split batch is sent to the respective leader broker/ISR (the

individual send()s happen sequentially), and each is acked by its
respective leader broker according to request.required.acks.
p4 p4 send() Current leader ISR (broker) for partition 4
p6 p6 send() Current leader ISR (broker) for partition 6
p1 and so on
Verisign Public 83
Option 2: Async producer
Standard behavior is to batch messages
Semantics are controlled via producer configuration settings
batch.num .m essages
ering.m ax.m s + queue.buff
queue.buff ering.m ax.m essages
queue.enqueue.tim eout.m s
And more, see producer configuration docs.
Remember: Async producer simply wraps sync producer!

But the batch-related config settings above have no effect on true
sync producers, i.e. when used without a wrapping async producer.
Verisign Public 84
FYI: upcoming producer configuration changes
Kafka 0.8 Kafka 0.9 (unreleased)

m etadata.broker.list bootstrap.servers
request.required.acks acks
batch.n u m .m essag es batch.size
m essage.send.m ax.retries retries
(This list is not complete, see Kafka docs for details.)
Verisign Public 85
Write operations behind the scenes
When writing to a topic in Kafka, producers write directly to the
partition leaders (brokers) of that topic
Remember: Writes always go to the leader ISR of a partition!
This raises two questions:

How to know the right partition for a given topic?
How to know the current leader broker/replica of a partition?
Verisign Public 86
1) How to know the right partition when sending?
In Kafka, a producer i.e. the client decides to which target
partition a message will be sent.
Can be random ~ load balancing across receiving brokers.
Can be semantic based on message key, e.g. by user ID or domain
name.
Here, Kafka guarantees that all data for the same key will go to the same
partition, so consumers can make locality assumptions.
But theres one catch with line 2 (i.e. no key) in Kafka 0.8.
Verisign Public 87
Keyed vs. non-keyed messages in Kafka 0.8
If a key is not specified:
Producer will ignore any configured partitioner.

It will pick a random partition from the list of available partitions and stick to it for
some time before switching to another one = NOT round robin or similar!
Why? To reduce number of open sockets in large Kafka deployments (KAFKA-1017).
Default: 10mins, cf. topic.m etadata.refresh.interval.m s
See implementation in D efaultEventH andler# getPartition()
If there are fewer producers than partitions at a given point of time, some partitions
may not receive any data. How to fix if needed?
Try to reduce the metadata refresh interval topic.m etadata.refresh.interval.m s
Specify a message key and a customized random partitioner.
In practice it is not trivial to implement a correct random partitioner in Kafka 0.8.
Partitioner interface in Kafka 0.8 lacks sufficient information to let a partitioner select a
random and available partition. Same issue with DefaultPartitioner.
Verisign Public 88
Keyed vs. non-keyed messages in Kafka 0.8
If a key is specified:
Key is retained as part of the msg, will be stored in the broker.

One can design a partition function to route the msg based on key.
The default partitioner assigns messages to a partition based on
their key hashes, via key.hashC ode % num Partitions.
Caveat:
If you specify a key for a message but do not explicitly wire in a custom
partitioner via partitioner.class, your producer will use the default partitioner.
So without a custom partitioner, messages with the same key will still end up in
the same partition! (cf. default partitioners behavior above)
Verisign Public 89
2) How to know the current leader of a partition?
Producers: broker discovery aka bootstrapping
Producers dont talk to ZooKeeper, so its not through ZK.
Broker discovery is achieved by providing producers with a bootstrapping
broker list, cf. m etadata.broker.list
These brokers inform the producer about all alive brokers and where to find
current partition leaders. The bootstrap brokers do use ZK for that.
Impacts on failure handling

In Kafka 0.8 the bootstrap list is static/immutable during producer run-time.
This has limitations and problems as shown in next slide.
The current bootstrap approach will improve in Kafka 0.9. This change will
make the life of Ops easier.
Verisign Public 90
Bootstrapping in Kafka 0.8
Scenario: N=5 brokers total, 2 of which are for bootstrap
broker1 broker2 broker3 broker4 broker5

Dos:
Take down one bootstrap broker (e.g. broker2), repair it, and bring it back.
In terms of impacts on broker discovery, you can do whatever you want to
brokers 3-5.
Donts:
Stop all bootstrap brokers 1+2. If you do, the producer stops working!
To improve operational flexibility, use VIPs or similar for values in
m etadata.broker.list.
Verisign Public 91
Reading data from Kafka
Verisign Public 92
You use Kafka consumers to write data to Kafka brokers.
Available for JVM (Java, Scala), C/C++, Python, Ruby, etc.
The Kafka project only provides the JVM implementation.
Has risk that a new Kafka release will break non-JVM clients.
Examples will be shown later in the Example Kafka apps section.

Three API options for JVM users:
1. High-level consumer API <<< in most cases you want to use this one!
2. Simple consumer API

3. Hadoop consumer API
Most noteworthy: The simple API is anything but simple.

. Prefer to use the high-level consumer API if it meets your needs (it should).
. Counter-example: Kafka spout in Storm 0.9.2 uses simple consumer API to
integrate well with Storms model of guaranteed message processing.
Verisign Public 93
Consumers pull from Kafka (theres no push)
Allows consumers to control their pace of consumption.
Allows to design downstream apps for average load, not peak load (cf. Loggly talk)
Consumers are responsible to track their read positions aka offsets

High-level consumer API: takes care of this for you, stores offsets in ZooKeeper
Simple consumer API: nothing provided, its totally up to you
What does this offset management allow you to do?
Consumers can deliberately rewind in time (up to the point where Kafka prunes), e.g. to
replay older messages.
Cf. Kafka spout in Storm 0.9.2.
Consumers can decide to only read a specific subset of partitions for a given topic.
Cf. Logglys setup of (down)sampling a production Kafka topic to a manageable volume for testing
Run offline, batch ingestion tools that write (say) from Kafka to Hadoop HDFS every hour.
Cf. LinkedIn Camus, Pinterest Secor
Verisign Public 94
Important consumer configuration settings
group.id assigns an individual consumer to a group
zookeeper.connect to discover brokers/topics/etc., and to store consumer
state (e.g. when using the high-level consumer API)
fetch.m essage.m ax.bytes number of message bytes to (attempt to) fetch for each
partition; must be >= brokers m essage.m ax.bytes
Verisign Public 95
Consumer groups
Allows multi-threaded and/or multi-machine consumption from Kafka topics.
Consumers join a group by using the same group.id
Kafka guarantees a message is only ever read by a single consumer in a group.
Kafka assigns the partitions of a topic to the consumers in a group so that each partition is
consumed by exactly one consumer in the group.
Maximum parallelism of a consumer group: #consumers (in the group) <= #partitions
Verisign Public 96
Guarantees when reading data from Kafka
A message is only ever read by a single consumer in a group.
A consumer sees messages in the order they were stored in the log.
The order of messages is only guaranteed within a partition.
No order guarantee across partitions, which includes no order guarantee per-topic.
If total order (per topic) is required you can consider, for instance:
Use #partition = 1. Good: total order. Bad: Only 1 consumer process at a time.
Add total ordering in your consumer application, e.g. a Storm topology.
Some gotchas:
If you have multiple partitions per thread there is NO guarantee about the order you
receive messages, other than that within the partition the offsets will be sequential.
Example: You may receive 5 messages from partition 10 and 6 from partition 11, then 5
more from partition 10 followed by 5 more from partition 10, even if partition 11 has data
available.
Adding more processes/threads will cause Kafka to rebalance, possibly changing
the assignment of a partition to a thread (whoops).
Verisign Public 97
Rebalancing: how consumers meet brokers
Remember?
The assignment of brokers via the partitions of a topic to

consumers is quite important, and it is dynamic at run-time.
Verisign Public 98
Why dynamic at run-time?
Machines can die, be added,
Consumer apps may die, be re-configured, added,
Whenever this happens a rebalancing occurs.

Rebalancing is a normal and expected lifecycle event in Kafka.
But its also a nice way to shoot yourself or Ops in the foot.
Why is this important?

Most Ops issues are due to 1) rebalancing and 2) consumer lag.
So Dev + Ops must understand what goes on.
Verisign Public 99
Rebalancing?
Consumers in a group come into consensus on which consumer is
consuming which partitions required for distributed consumption
Divides broker partitions evenly across consumers, tries to reduce the
number of broker nodes each consumer has to connect to
When does it happen? Each time:
a consumer joins or leaves a consumer group, OR
a broker joins or leaves, OR
a topic joins/leaves via a filter, cf. createM essageStream sByFilter()
Examples:
If a consumer or broker fails to heartbeat to ZK rebalance!
createM essageStream s() registers consumers for a topic, which results
in a rebalance of the consumer-broker assignment.
Verisign Public 100

Testing Kafka apps
Verisign Public 101

Testing Kafka apps
Wont have the time to cover testing in this workshop.
Some hints:
Unit-test your individual classes like usual
When integration testing, use in-memory instances of Kafka and ZK
Test-drive your producers/consumers in virtual Kafka clusters via
Wirbelsturm
Starting points:
Kafkas own test suite
0.8.1: https://github.com/apache/kafka/tree/0.8.1/core/src/test
trunk: https://github.com/apache/kafka/tree/trunk/core/src/test/
Kafka tests in kafka-storm-starter
https://github.com/miguno/kafka-storm-starter/
Verisign Public 102

Verisign Public 103

Kafka does not care about data format of msg payload
Up to developer (= you) to handle serialization/deserialization
Common choices in practice: Avro, JSON
(Code above is from the High Level Consumer API)
Verisign Public 104

Serialization in Kafka: using Avro
One way to use Avro in Kafka is via Twitter Bijection.
https://github.com/twitter/bijection
Approach: Convert pojo to byte[], then send byte[] to Kafka.

Bijection in Scala:
Bijection in Java: https://github.com/twitter/bijection/wiki/Using-bijection-from-java
Full Kafka/Bijection example:

KafkaSpec in kafka-storm-starter
Alternatives to Bijection:
e.g. https://github.com/miguno/kafka-avro-codec
Verisign Public 105
Verisign Public 106

Again, no time to cover compression in this training.
But worth looking into!
Interplay with batching of messages, e.g. larger batches typically achieve
better compression ratios.
Details about compression in Kafka:

https://cwiki.apache.org/confluence/display/KAFKA/Compression
Blog post by Neha Narkhede, Kafka committer @ LinkedIn: http
://geekmantra.wordpress.com/2013/03/28/compression-in-kafka-gzip-or-s
nappy
/
Verisign Public 107

Example Kafka applications
Verisign Public 108

kafka-storm-starter
Written by yours truly

https://github.com/miguno/kafka-storm-starter
$ git clone https://github.com /m iguno/kafka-storm -starter

$ cd kafka-storm -starter
# N ow ready for m ayhem !
(Must have JDK 6 installed.)
Verisign Public 109

kafka-storm-starter: run the test suite
$ ./sbt test
Will run unit tests plus end-to-end tests of Kafka, Storm, and Kafka-
Storm integration.
Verisign Public 110

kafka-storm-starter: run the KafkaStormDemo app
$ ./sbt run
Starts in-memory instances of ZooKeeper, Kafka, and Storm. Then

runs a Storm topology that reads from Kafka.
Verisign Public 111

Kafka related code in kafka-storm-starter
KafkaProducerApp
a/com/miguno/kafkastorm/kafka/
KafkaProducerApp.scala
KafkaConsumerApp
a/com/miguno/kafkastorm/kafka/
KafkaConsumerApp.scala
KafkaSpec: test-drives producer and consumer above

https://github.com/miguno/kafka-storm-starter/blob/develop/src/test/scala/
com/miguno/kafkastorm/integration/
KafkaSpec.scala
Verisign Public 112

Dev-related references
Kafka documentation
Kafka FAQ
Kafka tutorials
http://www.michael-noll.com/blog/2013/03/13/running-a-multi-broker-apach
e-kafka-cluster-on-a-single-node
/
Code examples
Verisign Public 113

Part 5: Playing with Kafka using Wirbelsturm
1-click Kafka deployments
Verisign Public 114

Deploying Kafka via Wirbelsturm
Written by yours truly

https://github.com/miguno/wirbelsturm
$ git clone https://github.com /m iguno/w irbelsturm .git

$ cd w irbelsturm
$ ./bootstrap
$ viw irbelsturm .yam l # uncom m ent Kafka section
$ vagrant up zookeeper1 kafka1
(Must have Vagrant 1.6.1+ and VirtualBox 4.3+ installed.)
Verisign Public 115

What can I do with Wirbelsturm?
Get a first impression of Kafka
Test-drive your producer apps and consumer apps
Test failure handling
Stop/kill brokers, check what happens to producers or consumers.
Stop/kill ZooKeeper instances, check what happens to brokers.
Use as sandbox environment to test/validate deployments
What will actually happen when I run this reassign partition tool?
"What will actually happen when I delete a topic?"
Will my Hiera changes actually work?
Reproduce production issues, share results with Dev
Also helpful when reporting back to Kafka project and mailing lists.
Any further cool ideas?
Verisign Public 116

Wrapping up
Verisign Public 117

Where to find help
No (good) Kafka book available yet.
Kafka documentation
http://kafka.apache.org/documentation.html
https://cwiki.apache.org/confluence/display/KAFKA/Index
Kafka ecosystem, e.g. Storm integration, Puppet
https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
Mailing lists
http://kafka.apache.org/contact.html
Code examples
exam ples/directory in Kafka, https://github.com/apache/kafka/
Verisign Public 118

2014 VeriSign, Inc. All rights reserved.VERISIGN and other trademarks, service marks, and designs are registered or unregistered trademarks of
VeriSign, Inc. and its subsidiaries in the United States and in foreign countries. All other trademarks are property of their respective owners.

Kafka101training Public v2 140818033637 Phpapp01

Uploaded by

Copyright:

Available Formats

Kafka101training Public v2 140818033637 Phpapp01

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kafka101training Public v2 140818033637 Phpapp01

Uploaded by

Copyright:

Available Formats

What are some of the motivations for creating Kafka at LinkedIn?

What are some of the motivations for creating Kafka at LinkedIn?

What are some common use cases for Kafka?

What are some common use cases for Kafka?

Apache Kafka 0.

Confluent is the US startup founded in 2014 by the creators of Apache Kafka

(Numbers have increased since.)

(Numbers are aggregated

What type of data is being transported through Kafka?

To transport data from LinkedIns apps to Hadoop, and back

For reference, here are the stats on one of

https://www.jfokus.se/jfokus14/preso/Reliable-real-time-processing-with-Kafka-and-Storm.pdf (Feb 2014)

Combination of the two = fast Kafka!

Kafka + Camus for Kafka->Hadoop ingestion

Producers always append to tail

Consumer group C1 Consumers use an offset pointer to

Producers always append to tail

Deleting a topic: DONT in 0.8.1.x!

Consumer group A, with 2 consumers, reads from a 4-partition topic

Leader: brokerID of the currently elected leader broker

Cf. talk "Clojure core.async Channels", by Rich Hickey, at ~ 31m54

EC2 example: m2.2xlarge @ $0.34/hour, with provisioned IOPS

ZooKeeper in LinkedIns architecture

RPM packaging script for RHEL 6

Public (Wirbelsturm) S3-backed yum repo

See Ops-related references at the end of this part

Typical tasks at LinkedIn

At the moment there's basically no security built-in.

Kafka Web Console Kafka Offset Monitor

Collect logging files into a central place

Lag is a consumer problem

Broker partition balance

Example 2: Load lag

Example 2: Load lag

Goal is to minimize GC pause times

Before tuning After tuning

Silver bullet is new G1 garbage-first garbage collector

Key candidates for tuning:

Keep time related messages in the same partition.

Kafka system tools

A simple example producer:

Full details at:

Same API and configuration, but slightly different semantics.

Important configuration settings for either producer type:

Most important thing to remember: producer.send() will block!

Only producers must configure acking

Gives the lowest latency but the weakest durability guarantees.

Beware of interplay with request.tim eout.m s!

You have two options to batch messages in 0.8:

2. Use send(sin g leM essag e) with async producer.

For async the behavior is the same as send(listO fM essages).

The original list of messages is partitioned (randomly if the default

Each post-split batch is sent to the respective leader broker/ISR (the

p6 p6 send() Current leader ISR (broker) for partition 6

Remember: Async producer simply wraps sync producer!

Kafka 0.8 Kafka 0.9 (unreleased)

(This list is not complete, see Kafka docs for details.)

This raises two questions:

Producer will ignore any configured partitioner.

Key is retained as part of the msg, will be stored in the broker.