Kafka Python
Kafka Python
Kafka Python
Release 1.4.7.dev
Dana Powers
1 KafkaConsumer 3
2 KafkaProducer 5
3 Thread safety 7
4 Compression 9
5 Protocol 11
6 Low-level 13
6.1 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.2 kafka-python API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6.3 Simple APIs (DEPRECATED) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.4 Install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.5 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.6 Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.7 Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.8 License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.9 Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Index 79
i
ii
kafka-python Documentation, Release 1.4.7.dev
Python client for the Apache Kafka distributed stream processing system. kafka-python is designed to function much
like the official java client, with a sprinkling of pythonic interfaces (e.g., consumer iterators).
kafka-python is best used with newer brokers (0.9+), but is backwards-compatible with older versions (to 0.8.0).
Some features will only be enabled on newer brokers. For example, fully coordinated consumer groups – i.e., dynamic
partition assignment to multiple consumers in the same group – requires use of 0.9 kafka brokers. Supporting this
feature for earlier broker releases would require writing and maintaining custom leadership election and membership
/ health check code (perhaps using zookeeper or consul). For older brokers, you can achieve something similar by
manually assigning different partitions to each consumer instance with config management tools like chef, ansible,
etc. This approach will work fine, though it does not support rebalancing on failures. See Compatibility for more
details.
Please note that the master branch may contain unreleased features. For release documentation, please see readthedocs
and/or python’s inline help.
Contents 1
kafka-python Documentation, Release 1.4.7.dev
2 Contents
CHAPTER 1
KafkaConsumer
KafkaConsumer is a high-level message consumer, intended to operate as similarly as possible to the official java
client. Full support for coordinated consumer groups requires use of kafka brokers that support the Group APIs: kafka
v0.9+.
See KafkaConsumer for API and configuration details.
The consumer iterator returns ConsumerRecords, which are simple namedtuples that expose basic message attributes:
topic, partition, offset, key, and value:
>>> # join a consumer group for dynamic partition assignment and offset commits
>>> from kafka import KafkaConsumer
>>> consumer = KafkaConsumer('my_favorite_topic', group_id='my_favorite_group')
>>> for msg in consumer:
... print (msg)
3
kafka-python Documentation, Release 1.4.7.dev
4 Chapter 1. KafkaConsumer
CHAPTER 2
KafkaProducer
KafkaProducer is a high-level, asynchronous message producer. The class is intended to operate as similarly as
possible to the official java client. See KafkaProducer for more details.
>>> # Block until all pending messages are at least put on the network
>>> # NOTE: This does not guarantee delivery or success! It is really
>>> # only useful if you configure internal batching using linger_ms
>>> producer.flush()
5
kafka-python Documentation, Release 1.4.7.dev
6 Chapter 2. KafkaProducer
CHAPTER 3
Thread safety
The KafkaProducer can be used across threads without issue, unlike the KafkaConsumer which cannot.
While it is possible to use the KafkaConsumer in a thread-local manner, multiprocessing is recommended.
7
kafka-python Documentation, Release 1.4.7.dev
Compression
kafka-python supports gzip compression/decompression natively. To produce or consume lz4 compressed messages,
you should install python-lz4 (pip install lz4). To enable snappy, install python-snappy (also requires snappy library).
See Installation for more information.
9
kafka-python Documentation, Release 1.4.7.dev
10 Chapter 4. Compression
CHAPTER 5
Protocol
A secondary goal of kafka-python is to provide an easy-to-use protocol layer for interacting with kafka brokers via
the python repl. This is useful for testing, probing, and general experimentation. The protocol support is leveraged to
enable a check_version() method that probes a kafka broker and attempts to identify which version it is running
(0.8.0 to 1.1+).
11
kafka-python Documentation, Release 1.4.7.dev
12 Chapter 5. Protocol
CHAPTER 6
Low-level
Legacy support is maintained for low-level consumer and producer classes, SimpleConsumer and SimpleProducer.
6.1 Usage
6.1.1 KafkaConsumer
# consume msgpack
KafkaConsumer(value_deserializer=msgpack.unpackb)
13
kafka-python Documentation, Release 1.4.7.dev
There are many configuration options for the consumer class. See KafkaConsumer API documentation for more
details.
6.1.2 KafkaProducer
producer = KafkaProducer(bootstrap_servers=['broker1:1234'])
# Asynchronous by default
future = producer.send('my-topic', b'raw_bytes')
# produce asynchronously
for _ in range(100):
producer.send('my-topic', b'msg')
def on_send_success(record_metadata):
(continues on next page)
14 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
def on_send_error(excp):
log.error('I am an errback', exc_info=excp)
# handle exception
6.2.1 KafkaConsumer
• fetch_min_bytes (int) – Minimum amount of data the server should return for a fetch
request, otherwise wait up to fetch_max_wait_ms for more data to accumulate. Default: 1.
• fetch_max_wait_ms (int) – The maximum amount of time in milliseconds the server
will block before answering the fetch request if there isn’t sufficient data to immediately
satisfy the requirement given by fetch_min_bytes. Default: 500.
• fetch_max_bytes (int) – The maximum amount of data the server should return for a
fetch request. This is not an absolute maximum, if the first message in the first non-empty
partition of the fetch is larger than this value, the message will still be returned to ensure that
the consumer can make progress. NOTE: consumer performs fetches to multiple brokers in
parallel so memory usage will depend on the number of brokers containing partitions for the
topic. Supported Kafka version >= 0.10.1.0. Default: 52428800 (50 MB).
• max_partition_fetch_bytes (int) – The maximum amount of data per-partition
the server will return. The maximum total memory used for a request = #partitions *
max_partition_fetch_bytes. This size must be at least as large as the maximum message
size the server allows or else it is possible for the producer to send messages larger than
the consumer can fetch. If that happens, the consumer can get stuck trying to fetch a large
message on a certain partition. Default: 1048576.
• request_timeout_ms (int) – Client request timeout in milliseconds. Default:
305000.
• retry_backoff_ms (int) – Milliseconds to backoff when retrying on errors. Default:
100.
• reconnect_backoff_ms (int) – The amount of time in milliseconds to wait before
attempting to reconnect to a given host. Default: 50.
• reconnect_backoff_max_ms (int) – The maximum amount of time in milliseconds
to wait when reconnecting to a broker that has repeatedly failed to connect. If provided, the
backoff per host will increase exponentially for each consecutive connection failure, up to
this maximum. To avoid connection storms, a randomization factor of 0.2 will be applied to
the backoff resulting in a random range between 20% below and 20% above the computed
value. Default: 1000.
• max_in_flight_requests_per_connection (int) – Requests are pipelined to
kafka brokers up to this number of maximum requests per broker connection. Default: 5.
• auto_offset_reset (str) – A policy for resetting offsets on OffsetOutOfRange er-
rors: ‘earliest’ will move to the oldest available message, ‘latest’ will move to the most
recent. Any other value will raise the exception. Default: ‘latest’.
• enable_auto_commit (bool) – If True , the consumer’s offset will be periodically
committed in the background. Default: True.
• auto_commit_interval_ms (int) – Number of milliseconds between automatic off-
set commits, if enable_auto_commit is True. Default: 5000.
• default_offset_commit_callback (callable) – Called as callback(offsets, re-
sponse) response will be either an Exception or an OffsetCommitResponse struct. This
callback can be used to trigger custom actions when a commit request completes.
• check_crcs (bool) – Automatically check the CRC32 of the records consumed. This
ensures no on-the-wire or on-disk corruption to the messages occurred. This check adds
some overhead, so it may be disabled in cases seeking extreme performance. Default: True
• metadata_max_age_ms (int) – The period of time in milliseconds after which we
force a refresh of metadata, even if we haven’t seen any partition leadership changes to
proactively discover any new brokers or partitions. Default: 300000
16 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
• ssl_password (str) – Optional password to be used when loading the certificate chain.
Default: None.
• ssl_crlfile (str) – Optional filename containing the CRL to check for certificate
expiration. By default, no CRL check is done. When providing a file, only the leaf certificate
will be checked against this CRL. The CRL can only be checked with Python 3.4+ or 2.7.9+.
Default: None.
• ssl_ciphers (str) – optionally set the available ciphers for ssl connections. It should
be a string in the OpenSSL cipher list format. If no cipher can be selected (because compile-
time options or other configuration forbids use of all the specified ciphers), an ssl.SSLError
will be raised. See ssl.SSLContext.set_ciphers
• api_version (tuple) – Specify which Kafka API version to use. If set to None, the
client will attempt to infer the broker version by probing various APIs. Different versions
enable different functionality.
Examples
(0, 9) enables full group coordination features with automatic partition assignment and
rebalancing,
(0, 8, 2) enables kafka-storage offset commits with manual partition assignment only,
(0, 8, 1) enables zookeeper-storage offset commits with manual partition assignment
only,
(0, 8, 0) enables basic functionality but requires manual partition assignment and offset
management.
Default: None
• api_version_auto_timeout_ms (int) – number of milliseconds to throw a time-
out exception from the constructor when checking the broker api version. Only applies if
api_version set to None.
• connections_max_idle_ms – Close idle connections after the number of mil-
liseconds specified by this config. The broker closes idle connections after connec-
tions.max.idle.ms, so this avoids hitting unexpected socket disconnected errors on the client.
Default: 540000
• metric_reporters (list) – A list of classes to use as metrics reporters. Implementing
the AbstractMetricsReporter interface allows plugging in classes that will be notified of new
metric creation. Default: []
• metrics_num_samples (int) – The number of samples maintained to compute met-
rics. Default: 2
• metrics_sample_window_ms (int) – The maximum age in milliseconds of samples
used to compute metrics. Default: 30000
• selector (selectors.BaseSelector) – Provide a specific selector implementa-
tion to use for I/O multiplexing. Default: selectors.DefaultSelector
• exclude_internal_topics (bool) – Whether records from internal topics (such as
offsets) should be exposed to the consumer. If set to True the only way to receive records
from an internal topic is subscribing to it. Requires 0.10+ Default: True
18 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
assign(partitions)
Manually assign a list of TopicPartitions to this consumer.
Parameters partitions (list of TopicPartition) – Assignment for this instance.
Raises
• IllegalStateError – If consumer has already called
• subscribe().
Warning: It is not possible to use both manual partition assignment with assign() and group
assignment with subscribe().
Note: This interface does not support incremental assignment and will replace the previous assignment
(if there was one).
Note: Manual topic assignment through this method does not use the consumer’s group management
functionality. As such, there will be no rebalance operation triggered when group membership or cluster
and topic metadata change.
assignment()
Get the TopicPartitions currently assigned to this consumer.
If partitions were directly assigned using assign(), then this will simply return the same partitions that
were previously assigned. If topics were subscribed using subscribe(), then this will give the set of
topic partitions currently assigned to the consumer (which may be None if the assignment hasn’t happened
yet, or if the partitions are in the process of being reassigned).
Returns {TopicPartition, . . . }
Return type set
beginning_offsets(partitions)
Get the first offset for the given partitions.
This method does not change the current consumer position of the partitions.
Note: This method may block indefinitely if the partition does not exist.
close(autocommit=True)
Close the consumer, waiting indefinitely for any needed cleanup.
Keyword Arguments autocommit (bool) – If auto-commit is configured for this consumer,
this optional flag causes the consumer to attempt to commit any pending consumed offsets
prior to close. Default: True
commit(offsets=None)
Commit offsets to kafka, blocking until success or error.
This commits offsets only to Kafka. The offsets committed using this API will be used on the first fetch
after every rebalance and also on startup. As such, if you need to store offsets in anything other than Kafka,
this API should not be used. To avoid re-processing the last message read if a consumer is restarted, the
committed offset should be the next message your application should consume, i.e.: last_offset + 1.
Blocks until either the commit succeeds or an unrecoverable error is encountered (in which case it is thrown
to the caller).
Currently only supports kafka-topic offset storage (not zookeeper).
Parameters offsets (dict, optional) – {TopicPartition: OffsetAndMetadata} dict to
commit with the configured group_id. Defaults to currently consumed offsets for all sub-
scribed partitions.
commit_async(offsets=None, callback=None)
Commit offsets to kafka asynchronously, optionally firing callback.
This commits offsets only to Kafka. The offsets committed using this API will be used on the first fetch
after every rebalance and also on startup. As such, if you need to store offsets in anything other than Kafka,
this API should not be used. To avoid re-processing the last message read if a consumer is restarted, the
committed offset should be the next message your application should consume, i.e.: last_offset + 1.
This is an asynchronous call and will not block. Any errors encountered are either passed to the callback
(if provided) or discarded.
Parameters
• offsets (dict, optional) – {TopicPartition: OffsetAndMetadata} dict to commit
with the configured group_id. Defaults to currently consumed offsets for all subscribed
partitions.
20 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
Note: This method may block indefinitely if the partition does not exist.
highwater(partition)
Last known highwater offset for a partition.
A highwater offset is the offset that will be assigned to the next message that is produced. It may be useful
for calculating lag, by comparing with the reported position. Note that both position and highwater refer
to the next offset – i.e., highwater offset is one greater than the newest available message.
Highwater offsets are returned in FetchResponse messages, so will not be available if no FetchRequests
have been sent for this partition yet.
Parameters partition (TopicPartition) – Partition to check
Returns Offset if available
Return type int or None
metrics(raw=False)
Get metrics on consumer performance.
This is ported from the Java Consumer, for details see: https://kafka.apache.org/documentation/#new_
consumer_monitoring
Warning: This is an unstable interface. It may change in future releases without warning.
offsets_for_times(timestamps)
Look up the offsets for the given partitions by timestamp. The returned offset for each partition is the ear-
liest offset whose timestamp is greater than or equal to the given timestamp in the corresponding partition.
This is a blocking call. The consumer does not have to be assigned the partitions.
If the message format version in a partition is before 0.10.0, i.e. the messages do not have timestamps,
None will be returned for that partition. None will also be returned for the partition if there are no
messages in it.
Note: This method may block indefinitely if the partition does not exist.
partitions_for_topic(topic)
This method first checks the local metadata cache for information about the topic. If the topic is not found
(either because the topic does not exist, the user is not authorized to view the topic, or the metadata cache
is not populated), then it will issue a metadata update call to the cluster.
Parameters topic (str) – Topic to check.
Returns Partition ids
Return type set
pause(*partitions)
Suspend fetching from the requested partitions.
Future calls to poll() will not return any records from these partitions until they have been resumed
using resume().
Note: This method does not affect partition subscription. In particular, it does not cause a group rebalance
when automatic assignment is used.
Parameters *partitions (TopicPartition) – Partitions to pause.
paused()
Get the partitions that were previously paused using pause().
Returns {partition (TopicPartition), . . . }
Return type set
22 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
poll(timeout_ms=0, max_records=None)
Fetch data from assigned topics / partitions.
Records are fetched and returned in batches by topic-partition. On each poll, consumer will try to use the
last consumed offset as the starting offset and fetch sequentially. The last consumed offset can be manually
set through seek() or automatically set as the last committed offset for the subscribed list of partitions.
Incompatible with iterator interface – use one or the other, not both.
Parameters
• timeout_ms (int, optional) – Milliseconds spent waiting in poll if data is not
available in the buffer. If 0, returns immediately with any records that are available cur-
rently in the buffer, else returns empty. Must not be negative. Default: 0
• max_records (int, optional) – The maximum number of records returned in a
single call to poll(). Default: Inherit value from max_poll_records.
Returns
Topic to list of records since the last fetch for the subscribed list of topics and partitions.
Return type dict
position(partition)
Get the offset of the next record that will be fetched
Parameters partition (TopicPartition) – Partition to check
Returns Offset
Return type int
resume(*partitions)
Resume fetching from the specified (paused) partitions.
Parameters *partitions (TopicPartition) – Partitions to resume.
seek(partition, offset)
Manually specify the fetch offset for a TopicPartition.
Overrides the fetch offsets that the consumer will use on the next poll(). If this API is invoked for the
same partition more than once, the latest offset will be used on the next poll().
Note: You may lose data if this API is arbitrarily used in the middle of consumption to reset the fetch
offsets.
Parameters
• partition (TopicPartition) – Partition for seek operation
• offset (int) – Message offset in partition
Raises AssertionError – If offset is not an int >= 0; or if partition is not currently assigned.
seek_to_beginning(*partitions)
Seek to the oldest available offset for partitions.
Parameters *partitions – Optionally provide specific TopicPartitions, otherwise default to
all assigned partitions.
Raises AssertionError – If any partition is not currently assigned, or if no partitions are
assigned.
seek_to_end(*partitions)
Seek to the most recent available offset for partitions.
24 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
unsubscribe()
Unsubscribe from all topics and clear all assigned partitions.
6.2.2 KafkaProducer
class kafka.KafkaProducer(**configs)
A Kafka client that publishes records to the Kafka cluster.
The producer is thread safe and sharing a single producer instance across threads will generally be faster than
having multiple instances.
The producer consists of a pool of buffer space that holds records that haven’t yet been transmitted to the server
as well as a background I/O thread that is responsible for turning these records into requests and transmitting
them to the cluster.
send() is asynchronous. When called it adds the record to a buffer of pending record sends and immediately
returns. This allows the producer to batch together individual records for efficiency.
The ‘acks’ config controls the criteria under which requests are considered complete. The “all” setting will
result in blocking on the full commit of the record, the slowest but most durable setting.
If the request fails, the producer can automatically retry, unless ‘retries’ is configured to 0. Enabling retries
also opens up the possibility of duplicates (see the documentation on message delivery semantics for details:
https://kafka.apache.org/documentation.html#semantics ).
The producer maintains buffers of unsent records for each partition. These buffers are of a size specified by the
‘batch_size’ config. Making this larger can result in more batching, but requires more memory (since we will
generally have one of these buffers for each active partition).
By default a buffer is available to send immediately even if there is additional unused space in the buffer.
However if you want to reduce the number of requests you can set ‘linger_ms’ to something greater than 0.
This will instruct the producer to wait up to that number of milliseconds before sending a request in hope that
more records will arrive to fill up the same batch. This is analogous to Nagle’s algorithm in TCP. Note that
records that arrive close together in time will generally batch together even with linger_ms=0 so under heavy
load batching will occur regardless of the linger configuration; however setting this to something larger than 0
can lead to fewer, more efficient requests when not under maximal load at the cost of a small amount of latency.
The buffer_memory controls the total amount of memory available to the producer for buffering. If records are
sent faster than they can be transmitted to the server then this buffer space will be exhausted. When the buffer
space is exhausted additional send calls will block.
The key_serializer and value_serializer instruct how to turn the key and value objects the user provides into
bytes.
Keyword Arguments
• bootstrap_servers – ‘host[:port]’ string (or list of ‘host[:port]’ strings) that the pro-
ducer should contact to bootstrap initial cluster metadata. This does not have to be the full
node list. It just needs to have at least one broker that will respond to a Metadata API
Request. Default port is 9092. If no servers are specified, will default to localhost:9092.
• client_id (str) – a name for this client. This string is passed in each request to servers
and can be used to identify specific server-side log entries that correspond to this client.
Default: ‘kafka-python-producer-#’ (appended with a unique number per instance)
• key_serializer (callable) – used to convert user-supplied keys to bytes If not
None, called as f(key), should return bytes. Default: None.
• value_serializer (callable) – used to convert user-supplied message values to
bytes. If not None, called as f(value), should return bytes. Default: None.
• acks (0, 1, 'all') – The number of acknowledgments the producer requires the
leader to have received before considering a request complete. This controls the durabil-
ity of records that are sent. The following settings are common:
0: Producer will not wait for any acknowledgment from the server. The message will
immediately be added to the socket buffer and considered sent. No guarantee can be
made that the server has received the record in this case, and the retries configuration will
not take effect (as the client won’t generally know of any failures). The offset given back
for each record will always be set to -1.
1: Wait for leader to write the record to its local log only. Broker will respond without
awaiting full acknowledgement from all followers. In this case should the leader fail
immediately after acknowledging the record but before the followers have replicated it
then the record will be lost.
all: Wait for the full set of in-sync replicas to write the record. This guarantees that the
record will not be lost as long as at least one in-sync replica remains alive. This is the
strongest available guarantee.
If unset, defaults to acks=1.
• compression_type (str) – The compression type for all data generated by the pro-
ducer. Valid values are ‘gzip’, ‘snappy’, ‘lz4’, or None. Compression is of full batches
of data, so the efficacy of batching will also impact the compression ratio (more batching
means better compression). Default: None.
• retries (int) – Setting a value greater than zero will cause the client to resend any
record whose send fails with a potentially transient error. Note that this retry is no different
than if the client resent the record upon receiving the error. Allowing retries without setting
max_in_flight_requests_per_connection to 1 will potentially change the ordering of records
because if two batches are sent to a single partition, and the first fails and is retried but the
second succeeds, then the records in the second batch may appear first. Default: 0.
• batch_size (int) – Requests sent to brokers will contain multiple batches, one for each
partition with data available to be sent. A small batch size will make batching less common
and may reduce throughput (a batch size of zero will disable batching entirely). Default:
16384
• linger_ms (int) – The producer groups together any records that arrive in between re-
quest transmissions into a single batched request. Normally this occurs only under load
when records arrive faster than they can be sent out. However in some circumstances the
client may want to reduce the number of requests even under moderate load. This setting ac-
complishes this by adding a small amount of artificial delay; that is, rather than immediately
sending out a record the producer will wait for up to the given delay to allow other records
to be sent so that the sends can be batched together. This can be thought of as analogous
to Nagle’s algorithm in TCP. This setting gives the upper bound on the delay for batching:
once we get batch_size worth of records for a partition it will be sent immediately regardless
of this setting, however if we have fewer than this many bytes accumulated for this partition
we will ‘linger’ for the specified time waiting for more records to show up. This setting
defaults to 0 (i.e. no delay). Setting linger_ms=5 would have the effect of reducing the
number of requests sent but would add up to 5ms of latency to records sent in the absense
of load. Default: 0.
• partitioner (callable) – Callable used to determine which partition each message
is assigned to. Called (after key serialization): partitioner(key_bytes, all_partitions, avail-
able_partitions). The default partitioner implementation hashes each non-None key using
the same murmur2 algorithm as the java client so that messages with the same key are as-
signed to the same partition. When a key is None, the message is delivered to a random
26 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
28 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
close(timeout=None)
Close this producer.
Parameters timeout (float, optional) – timeout in seconds to wait for completion.
flush(timeout=None)
Invoking this method makes all buffered records immediately available to send (even if linger_ms is greater
than 0) and blocks on the completion of the requests associated with these records. The post-condition
of flush() is that any previously sent record will have completed (e.g. Future.is_done() == True).
A request is considered completed when either it is successfully acknowledged according to the ‘acks’
configuration for the producer, or it results in an error.
Other threads can continue sending messages while one thread is blocked waiting for a flush call to com-
plete; however, no guarantee is made about the completion of messages sent after the flush call begins.
Parameters timeout (float, optional) – timeout in seconds to wait for completion.
Raises KafkaTimeoutError – failure to flush buffered records within the provided timeout
metrics(raw=False)
Get metrics on producer performance.
This is ported from the Java Producer, for details see: https://kafka.apache.org/documentation/#producer_
monitoring
Warning: This is an unstable interface. It may change in future releases without warning.
partitions_for(topic)
Returns set of all known partitions for the topic.
send(topic, value=None, key=None, headers=None, partition=None, timestamp_ms=None)
Publish a message to a topic.
Parameters
• topic (str) – topic where the message will be published
• value (optional) – message value. Must be type bytes, or be serializable to bytes
via configured value_serializer. If value is None, key is required and message acts as a
‘delete’. See kafka compaction documentation for more details: https://kafka.apache.org/
documentation.html#compaction (compaction requires kafka >= 0.8.1)
• partition (int, optional) – optionally specify a partition. If not set, the partition
will be selected using the configured ‘partitioner’.
• key (optional) – a key to associate with the message. Can be used to determine which
partition to send the message to. If partition is None (and producer’s partitioner config is
left as default), then messages with the same key will be delivered to the same partition
(but if key is None, partition is chosen randomly). Must be type bytes, or be serializable
to bytes via configured key_serializer.
• headers (optional) – a list of header key value pairs. List items are tuples of str key
and bytes value.
• timestamp_ms (int, optional) – epoch milliseconds (from Jan 1 1970 UTC) to
use as the message timestamp. Defaults to current time.
Returns resolves to RecordMetadata
Return type FutureRecordMetadata
Raises KafkaTimeoutError – if unable to fetch topic metadata, or unable to obtain memory
buffer prior to configured max_block_ms
6.2.3 KafkaAdminClient
class kafka.admin.KafkaAdminClient(**configs)
A class for administering the Kafka cluster.
Warning: This is an unstable interface that was recently added and is subject to change without warning.
In particular, many methods currently return raw protocol tuples. In future releases, we plan to make these
into nicer, more pythonic objects. Unfortunately, this will likely break those interfaces.
The KafkaAdminClient class will negotiate for the latest version of each message protocol format supported by
both the kafka-python client library and the Kafka broker. Usage of optional fields from protocol versions that
are not supported by the broker will result in IncompatibleBrokerVersion exceptions.
Use of this class requires a minimum broker version >= 0.10.0.0.
Keyword Arguments
• bootstrap_servers – ‘host[:port]’ string (or list of ‘host[:port]’ strings) that the con-
sumer should contact to bootstrap initial cluster metadata. This does not have to be the
full node list. It just needs to have at least one broker that will respond to a Metadata API
Request. Default port is 9092. If no servers are specified, will default to localhost:9092.
• client_id (str) – a name for this client. This string is passed in each request to servers
and can be used to identify specific server-side log entries that correspond to this client. Also
submitted to GroupCoordinator for logging with respect to consumer group administration.
Default: ‘kafka-python-{version}’
• reconnect_backoff_ms (int) – The amount of time in milliseconds to wait before
attempting to reconnect to a given host. Default: 50.
• reconnect_backoff_max_ms (int) – The maximum amount of time in milliseconds
to wait when reconnecting to a broker that has repeatedly failed to connect. If provided, the
backoff per host will increase exponentially for each consecutive connection failure, up to
this maximum. To avoid connection storms, a randomization factor of 0.2 will be applied to
the backoff resulting in a random range between 20% below and 20% above the computed
value. Default: 1000.
• request_timeout_ms (int) – Client request timeout in milliseconds. Default: 30000.
• connections_max_idle_ms – Close idle connections after the number of mil-
liseconds specified by this config. The broker closes idle connections after connec-
tions.max.idle.ms, so this avoids hitting unexpected socket disconnected errors on the client.
Default: 540000
30 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
Warning: This is currently broken for BROKER resources because those must be sent to that specific
broker, versus this always picks the least-loaded node. See the comment in the source code for details.
We would happily accept a PR fixing this.
close()
Close the KafkaAdminClient connection to the Kafka broker.
create_partitions(topic_partitions, timeout_ms=None, validate_only=False)
Create additional partitions for an existing topic.
Parameters
• topic_partitions – A map of topic name strings to NewPartition objects.
• timeout_ms – Milliseconds to wait for new partitions to be created before the broker
returns.
• validate_only – If True, don’t actually create new partitions. Default: False
Returns Appropriate version of CreatePartitionsResponse class.
create_topics(new_topics, timeout_ms=None, validate_only=False)
Create new topics in the cluster.
Parameters
• new_topics – A list of NewTopic objects.
• timeout_ms – Milliseconds to wait for new topics to be created before the broker re-
turns.
• validate_only – If True, don’t actually create new topics. Not supported by all ver-
sions. Default: False
Returns Appropriate version of CreateTopicResponse class.
32 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
delete_topics(topics, timeout_ms=None)
Delete topics from the cluster.
Parameters
• topics – A list of topic name strings.
• timeout_ms – Milliseconds to wait for topics to be deleted before the broker returns.
Returns Appropriate version of DeleteTopicsResponse class.
describe_configs(config_resources, include_synonyms=False)
Fetch configuration parameters for one or more Kafka resources.
Parameters
• config_resources – An list of ConfigResource objects. Any keys in ConfigRe-
source.configs dict will be used to filter the result. Setting the configs dict to None will get
all values. An empty dict will get zero values (as per Kafka protocol).
• include_synonyms – If True, return synonyms in response. Not supported by all
versions. Default: False.
Returns Appropriate version of DescribeConfigsResponse class.
describe_consumer_groups(group_ids, group_coordinator_id=None)
Describe a set of consumer groups.
Any errors are immediately raised.
Parameters
• group_ids – A list of consumer group IDs. These are typically the group names as
strings.
• group_coordinator_id – The node_id of the groups’ coordinator broker. If set to
None, it will query the cluster for each group to find that group’s coordinator. Explicitly
specifying this can be useful for avoiding extra network round trips if you already know the
group coordinator. This is only useful when all the group_ids have the same coordinator,
otherwise it will error. Default: None.
Returns A list of group descriptions. For now the group descriptions are the raw results from
the DescribeGroupsResponse. Long-term, we plan to change this to return namedtuples as
well as decoding the partition assignments.
list_consumer_group_offsets(group_id, group_coordinator_id=None, partitions=None)
Fetch Consumer Offsets for a single consumer group.
Note: This does not verify that the group_id or partitions actually exist in the cluster.
As soon as any error is encountered, it is immediately raised.
Parameters
• group_id – The consumer group id name for which to fetch offsets.
• group_coordinator_id – The node_id of the group’s coordinator broker. If set to
None, will query the cluster to find the group coordinator. Explicitly specifying this can be
useful to prevent that extra network round trip if you already know the group coordinator.
Default: None.
• partitions – A list of TopicPartitions for which to fetch offsets. On brokers >= 0.10.2,
this can be set to None to fetch all known offsets for the consumer group. Default: None.
Return dictionary A dictionary with TopicPartition keys and OffsetAndMetada values. Parti-
tions that are not specified and for which the group_id does not have a recorded offset are
omitted. An offset value of -1 indicates the group_id has no offset for that TopicPartition. A
-1 can only happen for partitions that are explicitly specified.
list_consumer_groups(broker_ids=None)
List all consumer groups known to the cluster.
This returns a list of Consumer Group tuples. The tuples are composed of the consumer group name and
the consumer group protocol type.
Only consumer groups that store their offsets in Kafka are returned. The protocol type will be an empty
string for groups created using Kafka < 0.9 APIs because, although they store their offsets in Kafka, they
don’t use Kafka for group coordination. For groups created using Kafka >= 0.9, the protocol type will
typically be “consumer”.
As soon as any error is encountered, it is immediately raised.
Parameters broker_ids – A list of broker node_ids to query for consumer groups. If set to
None, will query all brokers in the cluster. Explicitly specifying broker(s) can be useful for
determining which consumer groups are coordinated by those broker(s). Default: None
Return list List of tuples of Consumer Groups.
Raises
• GroupCoordinatorNotAvailableError – The coordinator is not available, so
cannot process requests.
• GroupLoadInProgressError – The coordinator is loading and hence can’t process
requests.
6.2.4 KafkaClient
class kafka.client.KafkaClient(**configs)
A network client for asynchronous request/response network I/O.
This is an internal class used to implement the user-facing producer and consumer clients.
This class is not thread-safe!
cluster
Local cache of cluster metadata, retrieved via MetadataRequests during poll().
Type ClusterMetadata
Keyword Arguments
• bootstrap_servers – ‘host[:port]’ string (or list of ‘host[:port]’ strings) that the client
should contact to bootstrap initial cluster metadata. This does not have to be the full node
list. It just needs to have at least one broker that will respond to a Metadata API Request.
Default port is 9092. If no servers are specified, will default to localhost:9092.
• client_id (str) – a name for this client. This string is passed in each request to servers
and can be used to identify specific server-side log entries that correspond to this client. Also
submitted to GroupCoordinator for logging with respect to consumer group administration.
Default: ‘kafka-python-{version}’
• reconnect_backoff_ms (int) – The amount of time in milliseconds to wait before
attempting to reconnect to a given host. Default: 50.
34 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
• ssl_ciphers (str) – optionally set the available ciphers for ssl connections. It should
be a string in the OpenSSL cipher list format. If no cipher can be selected (because compile-
time options or other configuration forbids use of all the specified ciphers), an ssl.SSLError
will be raised. See ssl.SSLContext.set_ciphers
• api_version (tuple) – Specify which Kafka API version to use. If set to None, Kafka-
Client will attempt to infer the broker version by probing various APIs. Example: (0, 10, 2).
Default: None
• api_version_auto_timeout_ms (int) – number of milliseconds to throw a time-
out exception from the constructor when checking the broker api version. Only applies if
api_version is None
• selector (selectors.BaseSelector) – Provide a specific selector implementa-
tion to use for I/O multiplexing. Default: selectors.DefaultSelector
• metrics (kafka.metrics.Metrics) – Optionally provide a metrics instance for
capturing network IO stats. Default: None.
• metric_group_prefix (str) – Prefix for metric names. Default: ‘’
• sasl_mechanism (str) – Authentication mechanism when security_protocol is config-
ured for SASL_PLAINTEXT or SASL_SSL. Valid values are: PLAIN, GSSAPI, OAUTH-
BEARER.
• sasl_plain_username (str) – username for sasl PLAIN authentication. Required if
sasl_mechanism is PLAIN.
• sasl_plain_password (str) – password for sasl PLAIN authentication. Required if
sasl_mechanism is PLAIN.
• sasl_kerberos_service_name (str) – Service name to include in GSSAPI sasl
mechanism handshake. Default: ‘kafka’
• sasl_kerberos_domain_name (str) – kerberos domain name to use in GSSAPI sasl
mechanism handshake. Default: one of bootstrap servers
• sasl_oauth_token_provider (AbstractTokenProvider) – OAuthBearer to-
ken provider instance. (See kafka.oauth.abstract). Default: None
add_topic(topic)
Add a topic to the list of topics tracked via metadata.
Parameters topic (str) – topic to track
Returns resolves after metadata request/response
Return type Future
check_version(node_id=None, timeout=2, strict=False)
Attempt to guess the version of a Kafka broker.
Note: It is possible that this method blocks longer than the specified timeout. This can happen if the
entire cluster is down and the client enters a bootstrap backoff sleep. This is only possible if node_id
is None.
Returns: version tuple, i.e. (0, 10), (0, 9), (0, 8, 2), . . .
Raises
• NodeNotReadyError (if node_id is provided)
• NoBrokersAvailable (if node_id is None)
36 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
38 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
6.2.5 BrokerConnection
• ssl_certfile (str) – optional filename of file in pem format containing the client
certificate, as well as any ca certificates needed to establish the certificate’s authenticity.
default: None.
• ssl_keyfile (str) – optional filename containing the client private key. default: None.
• ssl_password (callable, str, bytes, bytearray) – optional password or
callable function that returns a password, for decrypting the client private key. Default:
None.
• ssl_crlfile (str) – optional filename containing the CRL to check for certificate ex-
piration. By default, no CRL check is done. When providing a file, only the leaf certificate
will be checked against this CRL. The CRL can only be checked with Python 3.4+ or 2.7.9+.
default: None.
• ssl_ciphers (str) – optionally set the available ciphers for ssl connections. It should
be a string in the OpenSSL cipher list format. If no cipher can be selected (because compile-
time options or other configuration forbids use of all the specified ciphers), an ssl.SSLError
will be raised. See ssl.SSLContext.set_ciphers
• api_version (tuple) – Specify which Kafka API version to use. Accepted values are:
(0, 8, 0), (0, 8, 1), (0, 8, 2), (0, 9), (0, 10). Default: (0, 8, 2)
• api_version_auto_timeout_ms (int) – number of milliseconds to throw a time-
out exception from the constructor when checking the broker api version. Only applies if
api_version is None
• selector (selectors.BaseSelector) – Provide a specific selector implementa-
tion to use for I/O multiplexing. Default: selectors.DefaultSelector
• state_change_callback (callable) – function to be called when the connection
state changes from CONNECTING to CONNECTED etc.
• metrics (kafka.metrics.Metrics) – Optionally provide a metrics instance for
capturing network IO stats. Default: None.
• metric_group_prefix (str) – Prefix for metric names. Default: ‘’
• sasl_mechanism (str) – Authentication mechanism when security_protocol is config-
ured for SASL_PLAINTEXT or SASL_SSL. Valid values are: PLAIN, GSSAPI, OAUTH-
BEARER.
• sasl_plain_username (str) – username for sasl PLAIN authentication. Required if
sasl_mechanism is PLAIN.
• sasl_plain_password (str) – password for sasl PLAIN authentication. Required if
sasl_mechanism is PLAIN.
• sasl_kerberos_service_name (str) – Service name to include in GSSAPI sasl
mechanism handshake. Default: ‘kafka’
• sasl_kerberos_domain_name (str) – kerberos domain name to use in GSSAPI sasl
mechanism handshake. Default: one of bootstrap servers
• sasl_oauth_token_provider (AbstractTokenProvider) – OAuthBearer to-
ken provider instance. (See kafka.oauth.abstract). Default: None
blacked_out()
Return true if we are disconnected from the given node and can’t re-establish a connection yet
can_send_more()
Return True unless there are max_in_flight_requests_per_connection.
40 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
6.2.6 ClusterMetadata
class kafka.cluster.ClusterMetadata(**configs)
A class to manage kafka cluster metadata.
This class does not perform any IO. It simply updates internal state given API responses (MetadataResponse,
GroupCoordinatorResponse).
Keyword Arguments
• retry_backoff_ms (int) – Milliseconds to backoff when retrying on errors. Default:
100.
• metadata_max_age_ms (int) – The period of time in milliseconds after which we
force a refresh of metadata even if we haven’t seen any partition leadership changes to
proactively discover any new brokers or partitions. Default: 300000
• bootstrap_servers – ‘host[:port]’ string (or list of ‘host[:port]’ strings) that the client
should contact to bootstrap initial cluster metadata. This does not have to be the full node
list. It just needs to have at least one broker that will respond to a Metadata API Request.
Default port is 9092. If no servers are specified, will default to localhost:9092.
add_group_coordinator(group, response)
Update with metadata for a group coordinator
Parameters
• group (str) – name of group from GroupCoordinatorRequest
• response (GroupCoordinatorResponse) – broker response
Returns coordinator node_id if metadata is updated, None on error
Return type string
add_listener(listener)
Add a callback function to be called on each metadata update
available_partitions_for_topic(topic)
Return set of partitions with known leaders
Parameters topic (str) – topic to check for partitions
Returns {partition (int), . . . } None if topic not found.
Return type set
broker_metadata(broker_id)
Get BrokerMetadata
Parameters broker_id (int) – node_id for a broker to check
Returns BrokerMetadata or None if not found
brokers()
Get all BrokerMetadata
Returns {BrokerMetadata, . . . }
Return type set
coordinator_for_group(group)
Return node_id of group coordinator.
Parameters group (str) – name of consumer group
Returns node_id for group coordinator None if the group does not exist.
Return type int
failed_update(exception)
Update cluster state given a failed MetadataRequest.
leader_for_partition(partition)
Return node_id of leader, -1 unavailable, None if unknown.
partitions_for_broker(broker_id)
Return TopicPartitions for which the broker is a leader.
Parameters broker_id (int) – node id for a broker
Returns {TopicPartition, . . . } None if the broker either has no partitions or does not exist.
Return type set
partitions_for_topic(topic)
Return set of all partitions for topic (whether available or not)
42 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
# To consume messages
client = SimpleClient('localhost:9092')
consumer = SimpleConsumer(client, "my-group", "my-topic")
for message in consumer:
# message is raw byte string -- decode if necessary!
# e.g., for unicode: `message.decode('utf-8')`
print(message)
# This will spawn processes such that each handles 2 partitions max
consumer = MultiProcessConsumer(client, "my-group", "my-topic",
partitions_per_proc=2)
client.close()
Asynchronous Mode
Synchronous Mode
# Note that the application is responsible for encoding messages to type bytes
producer.send_messages('my-topic', b'some message')
producer.send_messages('my-topic', b'this method', b'is variadic')
44 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
kafka = SimpleClient('localhost:9092')
import time
from kafka import SimpleClient
from kafka.errors import LeaderNotAvailableError, NotLeaderForPartitionError
from kafka.protocol import create_message
from kafka.structs import ProduceRequestPayload
kafka = SimpleClient('localhost:9092')
payload = ProduceRequestPayload(topic='my-topic', partition=0,
messages=[create_message("some message")])
retries = 5
resps = []
while retries and not resps:
retries -= 1
(continues on next page)
kafka.close()
resps[0].topic # 'my-topic'
resps[0].partition # 0
resps[0].error # 0
resps[0].offset # offset of the first message sent in this request
6.4 Install
Pip:
6.4.2 Bleeding-Edge
46 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
Ubuntu:
OSX:
From Source:
wget https://github.com/google/snappy/releases/download/1.1.3/snappy-1.1.3.tar.gz
tar xzvf snappy-1.1.3.tar.gz
cd snappy-1.1.3
./configure
make
sudo make install
Highly recommended if you are using Kafka 11+ brokers. For those kafka-python uses a new message protocol
version, that requires calculation of crc32c, which differs from zlib.crc32 hash implementation. By default kafka-
python calculates it in pure python, which is quite slow. To speed it up we optionally support https://pypi.python.org/
pypi/crc32c package if it’s installed.
6.5 Tests
Test environments are managed via tox. The test suite is run via pytest. Individual tests are written using unittest,
pytest, and in some cases, doctest.
Linting is run via pylint, but is generally skipped on pypy due to pylint compatibility / performance issues.
For test coverage details, see https://coveralls.io/github/dpkp/kafka-python
The test suite includes unit tests that mock network interfaces, as well as integration tests that setup and teardown
kafka broker (and zookeeper) fixtures for client / consumer / producer testing.
6.5. Tests 47
kafka-python Documentation, Release 1.4.7.dev
tox -e py27
tox -e py35
Integration tests start Kafka and Zookeeper fixtures. This requires downloading kafka server binaries:
./build_integration.sh
By default, this will install the broker versions listed in build_integration.sh’s ALL_RELEASES into the servers/ direc-
tory. To install a specific version, set the KAFKA_VERSION variable:
KAFKA_VERSION=1.0.1 ./build_integration.sh
Then to run the tests against a specific Kafka version, simply set the KAFKA_VERSION env variable to the server
build you want to use for testing:
To test against the kafka source tree, set KAFKA_VERSION=trunk [optionally set SCALA_VERSION (defaults to
the value set in build_integration.sh)]
6.6 Compatibility
kafka-python is compatible with (and tested against) broker versions 1.1 through 0.8.0 . kafka-python is not compatible
with the 0.8.2-beta release.
Because the kafka server protocol is backwards compatible, kafka-python is expected to work with newer broker
releases as well (2.0+).
48 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
6.7 Support
6.8 License
6.9 Changelog
This is a patch release primarily focused on bugs related to concurrency, SSL connections and testing, and SASL
authentication:
6.7. Support 49
kafka-python Documentation, Release 1.4.7.dev
• Wrap SSL sockets after connecting for python3.7 compatibility (dpkp / PR #1754)
• Allow configuration of SSL Ciphers (dpkp / PR #1755)
• Maintain shadow cluster metadata for bootstrapping (dpkp / PR #1753)
• Generate SSL certificates for local testing (dpkp / PR #1756)
• Rename ssl.keystore.location and ssl.truststore.location config files (dpkp)
• Reset reconnect backoff on SSL connection (dpkp / PR #1777)
• Fix 0.8.2 protocol quick detection / fix SASL version check (dpkp / PR #1763)
• Update sasl configuration docstrings to include supported mechanisms (dpkp)
• Support SASL OAuthBearer Authentication (pt2pham / PR #1750)
Miscellaneous Bugfixes
• Dont force metadata refresh when closing unneeded bootstrap connections (dpkp / PR #1773)
• Fix possible AttributeError during conn._close_socket (dpkp / PR #1776)
• Return connection state explicitly after close in connect() (dpkp / PR #1778)
• Fix flaky conn tests that use time.time (dpkp / PR #1758)
• Add py to requirements-dev (dpkp)
• Fixups to benchmark scripts for py3 / new KafkaFixture interface (dpkp)
This release is primarily focused on addressing lock contention and other coordination issues between the KafkaCon-
sumer and the background heartbeat thread that was introduced in the 1.4 release.
Consumer
50 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
Client
Admin Client
Core/Protocol
• Fix default protocol parser version / 0.8.2 version probe (dpkp / PR #1740)
• Make NotEnoughReplicasError/NotEnoughReplicasAfterAppendError retriable (le-linh / PR #1722)
Bugfixes
Test Infrastructure
Compatibility
• Catch thrown OSError by python 3.7 when creating a connection (danjo133 / PR #1694)
• Update travis test coverage: 2.7, 3.4, 3.7, pypy2.7 (jeffwidman, dpkp / PR #1614)
• Drop dependency on sphinxcontrib-napoleon (stanislavlevin / PR #1715)
• Remove unused import from kafka/producer/record_accumulator.py (jeffwidman / PR #1705)
• Fix SSL connection testing in Python 3.7 (seanthegeek, silentben / PR #1669)
6.9. Changelog 51
kafka-python Documentation, Release 1.4.7.dev
Bugfixes
• (Attempt to) Fix deadlock between consumer and heartbeat (zhgjun / dpkp #1628)
• Fix Metrics dict memory leak (kishorenc #1569)
Client
Admin Client
Consumer
Core / Protocol
52 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
Documentation
Test Infrastructure
Compatibility
Compatibility
• Fix for python 3.7 support: remove ‘async’ keyword from SimpleProducer (dpkp #1454)
6.9. Changelog 53
kafka-python Documentation, Release 1.4.7.dev
Client
Consumer
• Check for immediate failure when looking up coordinator in heartbeat thread (dpkp #1457)
Core / Protocol
• Always acquire client lock before coordinator lock to avoid deadlocks (dpkp #1464)
• Added AlterConfigs and DescribeConfigs apis (StephenSorriaux #1472)
• Fix CreatePartitionsRequest_v0 (StephenSorriaux #1469)
• Add codec validators to record parser and builder for all formats (tvoinarovskyi #1447)
• Fix MemoryRecord bugs re error handling and add test coverage (tvoinarovskyi #1448)
• Force lz4 to disable Kafka-unsupported block linking when encoding (mnito #1476)
• Stop shadowing ConnectionError (jeffwidman #1492)
Documentation
Test Infrastructure
54 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
Bugfixes
Client
Consumer
• Avoid tight poll loop in consumer when brokers are down (dpkp #1415)
• Validate max_records in KafkaConsumer.poll (dpkp #1398)
• KAFKA-5512: Awake heartbeat thread when it is time to poll (dpkp #1439)
Producer
• Validate that serializers generate bytes-like (or None) data (dpkp #1420)
Core / Protocol
6.9. Changelog 55
kafka-python Documentation, Release 1.4.7.dev
Test Infrastructure
Bugfixes
• Fix consumer poll stuck error when no available partition (ckyoog #1375)
• Increase some integration test timeouts (dpkp #1374)
• Use raw in case string overriden (jeffwidman #1373)
• Fix pending completion IndexError bug caused by multiple threads (dpkp #1372)
This is a substantial release. Although there are no known ‘showstopper’ bugs as of release, we do recommend you
test any planned upgrade to your application prior to running in production.
Some of the major changes include:
• We have officially dropped python 2.6 support
• The KafkaConsumer now includes a background thread to handle coordinator heartbeats
• API protocol handling has been separated from networking code into a new class, KafkaProtocol
• Added support for kafka message format v2
• Refactored DNS lookups during kafka broker connections
• SASL authentication is working (we think)
• Removed several circular references to improve gc on close()
Thanks to all contributors – the state of the kafka-python community is strong!
Detailed changelog are listed below:
56 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
Client
Consumer
• KAFKA-3977: Defer fetch parsing for space efficiency, and to raise exceptions to user (dpkp #1245)
• KAFKA-4034: Avoid unnecessary consumer coordinator lookup (dpkp #1254)
• Handle lookup_coordinator send failures (dpkp #1279)
• KAFKA-3888 Use background thread to process consumer heartbeats (dpkp #1266)
• Improve KafkaConsumer cleanup (dpkp #1339)
• Fix coordinator join_future race condition (dpkp #1338)
• Avoid KeyError when filtering fetchable partitions (dpkp #1344)
• Name heartbeat thread with group_id; use backoff when polling (dpkp #1345)
• KAFKA-3949: Avoid race condition when subscription changes during rebalance (dpkp #1364)
• Fix #1239 regression to avoid consuming duplicate compressed messages from mid-batch (dpkp #1367)
Producer
Core / Protocol
6.9. Changelog 57
kafka-python Documentation, Release 1.4.7.dev
Bugfixes
Test Infrastructure
• Use 0.11.0.2 kafka broker for integration testing (dpkp #1357 #1244)
• Add a Makefile to help build the project, generate docs, and run tests (tvoinarovskyi #1247)
• Add fixture support for 1.0.0 broker (dpkp #1275)
• Add kafka 1.0.0 to travis integration tests (dpkp #1365)
• Change fixture default host to localhost (asdaraujo #1305)
• Minor test cleanups (dpkp #1343)
• Use latest pytest 3.4.0, but drop pytest-sugar due to incompatibility (dpkp #1361)
Documentation
Bugfixes
58 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
Client
Consumer
Documentation
• Small fixes to SASL documentation and logging; validate security_protocol (dpkp #1231)
• Various typo and grammar fixes (jeffwidman)
Bugfixes
Client
6.9. Changelog 59
kafka-python Documentation, Release 1.4.7.dev
Consumer
Producer
Core / Protocol
Test Infrastructure
• pylint 1.7.0+ supports python 3.6 and merge py36 into common testenv (jianbin-wei #1095)
• Add kafka 0.10.2.1 into integration testing version (jianbin-wei #1096)
• Disable automated tests for python 2.6 and kafka 0.8.0 and 0.8.1.1 (jianbin-wei #1096)
• Support manual py26 testing; dont advertise 3.3 support (dpkp)
• Add 0.11.0.0 server resources, fix tests for 0.11 brokers (dpkp)
• Use fixture hostname, dont assume localhost (dpkp)
• Add 0.11.0.0 to travis test matrix, remove 0.10.1.1; use scala 2.11 artifacts (dpkp #1176)
Documentation
60 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
Core / Protocol
• Derive all api classes from Request / Response base classes (dpkp 1030)
• Prefer python-lz4 if available (dpkp 1024)
• Fix kwarg handing in kafka.protocol.struct.Struct (dpkp 1025)
• Fixed couple of “leaks” when gc is disabled (Mephius 979)
• Added max_bytes option and FetchRequest_v3 usage. (Drizzt1991 962)
• CreateTopicsRequest / Response v1 (dpkp 1012)
• Add MetadataRequest_v2 and MetadataResponse_v2 structures for KIP-78 (Drizzt1991 974)
• KIP-88 / KAFKA-3853: OffsetFetch v2 structs (jeffwidman 971)
• DRY-up the MetadataRequest_v1 struct (jeffwidman 966)
• Add JoinGroup v1 structs (jeffwidman 965)
• DRY-up the OffsetCommitResponse Structs (jeffwidman 970)
• DRY-up the OffsetFetch structs (jeffwidman 964)
• time –> timestamp to match Java API (jeffwidman 969)
• Add support for offsetRequestV1 messages (jlafaye 951)
• Add FetchRequest/Response_v3 structs (jeffwidman 943)
• Add CreateTopics / DeleteTopics Structs (jeffwidman 944)
Test Infrastructure
Consumer
6.9. Changelog 61
kafka-python Documentation, Release 1.4.7.dev
Producer
Client
Bugfixes
• Add client info logging re bootstrap; log connection attempts to balance with close (dpkp)
• Minor additional logging for consumer coordinator (dpkp)
• Add more debug-level connection logging (dpkp)
• Do not need str(self) when formatting to %s (dpkp)
• Add new broker response errors (dpkp)
• Small style fixes in kafka.errors (dpkp)
• Include the node id in BrokerConnection logging (dpkp 1009)
62 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
Documentation
Legacy Client
Core
Consumer
6.9. Changelog 63
kafka-python Documentation, Release 1.4.7.dev
Producer
Client
Bugfixes
• Always include an error for logging when the coordinator is marked dead (dpkp 890)
• Only string-ify BrokerResponseError args if provided (dpkp 889)
• Update warning re advertised.listeners / advertised.host.name (jeffwidman 878)
• Fix unrecognized sasl_mechanism error message (sharego 883)
Documentation
64 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
Bugfixes
Incompatible Changes
Improvements
6.9. Changelog 65
kafka-python Documentation, Release 1.4.7.dev
Bugfixes
• Ignore socket.error when checking for protocol out of sync prior to socket close (dpkp 792)
• Fix offset fetch when partitions are manually assigned (KAFKA-3960 / 786)
• Change pickle_method to use python3 special attributes (jpaulodit 777)
• Fix ProduceResponse v2 throttle_time_ms
• Always encode size with MessageSet (#771)
• Avoid buffer overread when compressing messageset in KafkaProducer
• Explicit format string argument indices for python 2.6 compatibility
• Simplify RecordMetadata; short circuit callbacks (#768)
• Fix autocommit when partitions assigned manually (KAFKA-3486 / #767 / #626)
• Handle metadata updates during consumer rebalance (KAFKA-3117 / #766 / #701)
• Add a consumer config option to exclude internal topics (KAFKA-2832 / #765)
• Protect writes to wakeup socket with threading lock (#763 / #709)
• Fetcher spending unnecessary time during metrics recording (KAFKA-3785)
• Always use absolute_import (dpkp)
Test / Fixtures
Documentation
Bugfixes
66 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
Bugfixes
Patch Improvements
Bugfixes
Bugfixes
6.9. Changelog 67
kafka-python Documentation, Release 1.4.7.dev
Consumers
Producers
Clients
Documentation
<none>
Internals
Bugfixes
68 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
Consumers
Producers
<none>
Clients
Documentation
6.9. Changelog 69
kafka-python Documentation, Release 1.4.7.dev
Internals
Consumers
Producers
• Fix producer threading bug that can crash sender (dpkp PR 590)
• Fix bug in producer buffer pool reallocation (dpkp PR 585)
• Remove spurious warnings when closing sync SimpleProducer (twm PR 567)
• Fix FutureProduceResult.await() on python2.6 (dpkp)
• Add optional timeout parameter to KafkaProducer.flush() (dpkp)
• KafkaProducer optimizations (zackdever PR 598)
Clients
Documentation
70 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
Internals
Consumers
• Add RangePartitionAssignor (and use as default); add assignor tests (dpkp PR 550)
• Make sure all consumers are in same generation before stopping group test
• Verify node ready before sending offset fetch request from coordinator
• Improve warning when offset fetch request returns unknown topic / partition
Producers
Clients
Documentation
Internals
• Don’t override system rcvbuf or sndbuf unless configured explicitly (dpkp PR 557)
• Some attributes may not exist in __del__ if we failed assertions
• Break up some circular references and close client wake pipes on __del__ (aisch PR 554)
This release includes significant code changes. Users of older kafka-python versions are encouraged to test upgrades
before deploying to production as some interfaces and configuration options have changed.
Users of SimpleConsumer / SimpleProducer / SimpleClient (formerly KafkaClient) from prior releases should migrate
to KafkaConsumer / KafkaProducer. Low-level APIs (Simple*) are no longer being actively maintained and will be
removed in a future release.
6.9. Changelog 71
kafka-python Documentation, Release 1.4.7.dev
For comprehensive API documentation, please see python help() / docstrings, kafka-python.readthedocs.org, or run
‘tox -e docs’ from source to build documentation locally.
Consumers
• KafkaConsumer re-written to emulate the new 0.9 kafka consumer (java client) and support coordinated con-
sumer groups (feature requires >= 0.9.0.0 brokers)
– Methods no longer available:
Producers
• new producer class: KafkaProducer. Exposes the same interface as official java client. Async by default;
returned future.get() can be called for synchronous blocking
• SimpleProducer is now deprecated and will be removed in a future release. Users are encouraged to migrate to
KafkaProducer.
Clients
• synchronous KafkaClient renamed to SimpleClient. For backwards compatibility, you will get a SimpleClient
via ‘from kafka import KafkaClient’. This will change in a future release.
• All client calls use non-blocking IO under the hood.
• Add probe method check_version() to infer broker versions.
72 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
Documentation
Internals
• Old protocol stack is deprecated. It has been moved to kafka.protocol.legacy and may be removed in a future
release.
• Protocol layer re-written using Type classes, Schemas and Structs (modeled on the java client).
• Add support for LZ4 compression (including broken framing header checksum).
Consumers
Producers
Clients
6.9. Changelog 73
kafka-python Documentation, Release 1.4.7.dev
Documentation
• Update docs and links wrt maintainer change (mumrah -> dpkp)
Internals
Consumers
74 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
Producers
KafkaClient
Documentation
6.9. Changelog 75
kafka-python Documentation, Release 1.4.7.dev
Internals
76 Chapter 6. Low-level
kafka-python Documentation, Release 1.4.7.dev
• Warn users that async producer does not reliably handle failures (dpkp - PR 213)
• Fix spurious ConsumerFetchSizeTooSmall error in consumer (DataDog - PR 136)
• Use PyLint for static error checking (dpkp - PR 208)
• Strictly enforce str message type in producer.send_messages (dpkp - PR 211)
• Add test timers via nose-timer plugin; list 10 slowest timings by default (dpkp)
• Move fetching last known offset logic to a stand alone function (zever - PR 177)
• Improve KafkaConnection and add more tests (dpkp - PR 196)
• Raise TypeError if necessary when encoding strings (mdaniel - PR 204)
• Use Travis-CI to publish tagged releases to pypi (tkuhlman / mumrah)
• Use official binary tarballs for integration tests and parallelize travis tests (dpkp - PR 193)
• Improve new-topic creation handling (wizzat - PR 174)
6.9. Changelog 77
kafka-python Documentation, Release 1.4.7.dev
6.9.31 0.8.0
78 Chapter 6. Low-level
Index
79
kafka-python Documentation, Release 1.4.7.dev
G partitions_for_topic() (kafka.KafkaConsumer
get_api_versions() (kafka.client.KafkaClient method), 22
method), 37 pause() (kafka.KafkaConsumer method), 22
paused() (kafka.KafkaConsumer method), 22
H poll() (kafka.client.KafkaClient method), 38
poll() (kafka.KafkaConsumer method), 22
highwater() (kafka.KafkaConsumer method), 21
position() (kafka.KafkaConsumer method), 23
I R
in_flight_request_count()
ready() (kafka.client.KafkaClient method), 38
(kafka.client.KafkaClient method), 37
recv() (kafka.BrokerConnection method), 41
is_disconnected() (kafka.client.KafkaClient
refresh_backoff() (kafka.cluster.ClusterMetadata
method), 37
method), 43
is_ready() (kafka.client.KafkaClient method), 37
remove_listener() (kafka.cluster.ClusterMetadata
K method), 43
request_update() (kafka.cluster.ClusterMetadata
KafkaAdminClient (class in kafka.admin), 30 method), 43
KafkaClient (class in kafka.client), 34 resume() (kafka.KafkaConsumer method), 23
KafkaConsumer (class in kafka), 15
KafkaProducer (class in kafka), 25 S
L seek() (kafka.KafkaConsumer method), 23
seek_to_beginning() (kafka.KafkaConsumer
leader_for_partition() method), 23
(kafka.cluster.ClusterMetadata method), seek_to_end() (kafka.KafkaConsumer method), 23
42 send() (kafka.BrokerConnection method), 41
least_loaded_node() (kafka.client.KafkaClient send() (kafka.client.KafkaClient method), 38
method), 38 send() (kafka.KafkaProducer method), 29
list_consumer_group_offsets() send_pending_requests()
(kafka.admin.KafkaAdminClient method), (kafka.BrokerConnection method), 41
33 set_topics() (kafka.client.KafkaClient method), 39
list_consumer_groups() subscribe() (kafka.KafkaConsumer method), 24
(kafka.admin.KafkaAdminClient method), subscription() (kafka.KafkaConsumer method), 24
34
M T
topics() (kafka.cluster.ClusterMetadata method), 43
maybe_connect() (kafka.client.KafkaClient
topics() (kafka.KafkaConsumer method), 24
method), 38
ttl() (kafka.cluster.ClusterMetadata method), 43
metrics() (kafka.KafkaConsumer method), 21
metrics() (kafka.KafkaProducer method), 29
U
O unsubscribe() (kafka.KafkaConsumer method), 24
update_metadata() (kafka.cluster.ClusterMetadata
offsets_for_times() (kafka.KafkaConsumer
method), 43
method), 22
P W
with_partitions() (kafka.cluster.ClusterMetadata
partitions_for() (kafka.KafkaProducer method),
method), 43
29
partitions_for_broker()
(kafka.cluster.ClusterMetadata method),
42
partitions_for_topic()
(kafka.cluster.ClusterMetadata method),
42
80 Index