0% found this document useful (0 votes)

532 views

Getting Started With Apache Kafka in Python - Towards Data Science PDF

This document summarizes how to get started with Apache Kafka in Python. It describes Kafka as an open-source streaming platform that can be used for building distributed systems. It discusses key Kafka concepts like topics, producers, consumers and brokers. It then provides steps to set up a local Kafka cluster and run a simple Python producer and consumer to publish and consume messages.

Uploaded by

Deven Mali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

532 views

Getting Started With Apache Kafka in Python - Towards Data Science PDF

Uploaded by

Deven Mali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

3/15/2020 Getting started with Apache Kafka in Python - Towards Data Science

Getting started with Apache Ka a in Python

Adnan Siddiqi Follow
Jun 11, 2018 · 9 min read

Image Credit: linuxhint.com

In this post, I am going to discuss Apache Kafka and how Python programmers can use it
for building distributed systems.

What is Apache Ka ka?

Apache Kafka is an open-source streaming platform that was initially built by LinkedIn.
It was later handed over to Apache foundation and open sourced it in 2011.

https://towardsdatascience.com/getting-started-with-apache-kafka-in-python-604b3250aa05 1/17
3/15/2020 Getting started with Apache Kafka in Python - Towards Data Science

According to Wikipedia:

Apache Kafka is an open-source stream-processing software platform developed by the

Apache Software Foundation, written in Scala and Java. The project aims to provide a
uni ed, high-throughput, low-latency platform for handling real-time data feeds. Its
storage layer is essentially a “massively scalable pub/sub message queue architected as a
distributed transaction log,”[3] making it highly valuable for enterprise infrastructures to
process streaming data. Additionally, Kafka connects to external systems (for data
import/export) via Kafka Connect and provides Kafka Streams, a Java stream processing
library.

Credit: O cial Website

https://towardsdatascience.com/getting-started-with-apache-kafka-in-python-604b3250aa05 2/17
3/15/2020 Getting started with Apache Kafka in Python - Towards Data Science

Think of it is a big commit log where data is stored in sequence as it happens. The users
of this log can just access and use it as per their requirement.

Ka ka Use Cases
Uses of Kafka are multiple. Here are a few use-cases that could help you to gure out its
usage.

Activity Monitoring:- Kafka can be used for activity monitoring. The activity could
belong to a website or physical sensors and devices. Producers can publish raw data
from data sources that later can be used to nd trends and pattern.

Messaging:- Kafka can be used as a message broker among services. If you are
implementing a microservice architecture, you can have a microservice as a
producer and another as a consumer. For instance, you have a microservice that is
responsible to create new accounts and other for sending email to users about
account creation.

Log Aggregation:- You can use Kafka to collect logs from di erent systems and store
in a centralized system for further processing.

ETL:- Kafka has a feature of almost real-time streaming thus you can come up with
an ETL based on your need.

Database:- Based on things I mentioned above, you may say that Kafka also acts as a
database. Not a typical databases that have a feature of querying the data as per
need, what I meant that you can keep data in Kafka as long as you want without
consuming it.

Ka ka Concepts
Let’s discuss core Kafka concepts.

https://towardsdatascience.com/getting-started-with-apache-kafka-in-python-604b3250aa05 3/17
3/15/2020 Getting started with Apache Kafka in Python - Towards Data Science

Topics
Every message that is feed into the system must be part of some topic. The topic is
nothing but a stream of records. The messages are stored in key-value format. Each
message is assigned a sequence, called O set. The output of one message could be an
input of the other for further processing.

Producers
Producers are the apps responsible to publish data into Kafka system. They publish data
on the topic of their choice.

Consumers
The messages published into topics are then utilized by Consumers apps. A consumer
gets subscribed to the topic of its choice and consumes data.

Broker
Every instance of Kafka that is responsible for message exchange is called a Broker.
Kafka can be used as a stand-alone machine or a part of a cluster.

I try to explain the whole thing with a simple example, there is a warehouse or godown
of a restaurant where all the raw material is dumped like rice, vegetables etc. The
restaurant serves di erent kinds of dishes: Chinese, Desi, Italian etc. The chefs of each
cuisine can refer to the warehouse, pick the desire things and make things. There is a
possibility that the stu made by the raw material can later be used by all departments’
chefs, for instance, some secret sauce that is used in ALL kind of dishes. Here, the
warehouse is a broker, vendors of goods are the producers, the goods and the secret

https://towardsdatascience.com/getting-started-with-apache-kafka-in-python-604b3250aa05 4/17
3/15/2020 Getting started with Apache Kafka in Python - Towards Data Science

sauce made by chefs are topics while chefs are consumers. My analogy might sound funny
and inaccurate but at least it’d have helped you to understand the entire thing :-)

Setting up and Running

The easiest way to install Kafka is to download binaries and run it. Since it’s based on
JVM languages like Scala and Java, you must make sure that you are using Java 7 or
greater.

Kafka is available in two di erent avors: One by Apache foundation and other by
Con uent as a package. For this tutorial, I will go with the one provided by Apache
foundation. By the way, Con uent was founded by the original developers of Kafka.

Starting Zookeeper
Kafka relies on Zookeeper, in order to make it run we will have to run Zookeeper rst.

bin/zookeeper-server-start.sh config/zookeeper.properties

it will display lots of text on the screen, if see the following it means it’s up properly.

2018-06-10 06:36:15,023] INFO maxSessionTimeout set to -1

(org.apache.zookeeper.server.ZooKeeperServer)
[2018-06-10 06:36:15,044] INFO binding to port 0.0.0.0/0.0.0.0:2181
(org.apache.zookeeper.server.NIOServerCnxnFactory)

Starting Ka ka Server
Next, we have to start Kafka broker server:

bin/kafka-server-start.sh config/server.properties

And if you see the following text on the console it means it’s up.

2018-06-10 06:38:44,477] INFO Kafka commitId : fdcf75ea326b8e07

(org.apache.kafka.common.utils.AppInfoParser)
[2018-06-10 06:38:44,478] INFO [KafkaServer id=0] started
(kafka.server.KafkaServer)

https://towardsdatascience.com/getting-started-with-apache-kafka-in-python-604b3250aa05 5/17
3/15/2020 Getting started with Apache Kafka in Python - Towards Data Science

Create Topics
Messages are published in topics. Use this command to create a new topic.

➜ kafka_2.11-1.1.0 bin/kafka-topics.sh --create --zookeeper

localhost:2181 --replication-factor 1 --partitions 1 --topic test
Created topic "test".

You can also list all available topics by running the following command.

➜ kafka_2.11-1.1.0 bin/kafka-topics.sh --list --zookeeper

localhost:2181
test

As you see, it prints, test .

Sending Messages
Next, we have to send messages, producers are used for that purpose. Let’s initiate a
producer.

➜ kafka_2.11-1.1.0 bin/kafka-console-producer.sh --broker-list

localhost:9092 --topic test
>Hello
>World

You start the console based producer interface which runs on the port 9092 by default. -

-topic allows you to set the topic in which the messages will be published. In our case
the topic is test

It shows you a > prompt and you can input whatever you want.

Messages are stored locally on your disk. You can learn about the path of it by checking
the value of log.dirs in config/server.properties le. By default they are set to
/tmp/kafka-logs/

https://towardsdatascience.com/getting-started-with-apache-kafka-in-python-604b3250aa05 6/17
3/15/2020 Getting started with Apache Kafka in Python - Towards Data Science

If you list this folder you will nd a folder with name test-0 . Upon listing it you will nd
3 les: 00000000000000000000.index 00000000000000000000.log

00000000000000000000.timeindex

If you open 00000000000000000000.log in an editor then it shows something like:

^@^@^@^@^@^@^@^@^@^@^@=^@^@^@^@^BÐØR^V^@^@^@^@^@^@^@^@Âcça<9a>o^@^@^
Acça<9a>oÿÿÿÿÿÿÿÿÿÿÿÿÿÿ^@^@^@Â^V^@^@^@Â
Hello^@^@^@^@^@^@^@^@Â^@^@^@=^@^@^@^@^BÉJ^B-
^@^@^@^@^@^@^@^@Âcça<9f>^?^@^@Âcça<9f>^?
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ^@^@^@Â^V^@^@^@Â
World^@
~

Looks like the encoded data or delimiter separated, I am not sure. If someone knows this
format then do let me know.

Anyways, Kafka provides a utility that lets you examine each incoming message.

➜ kafka_2.11-1.1.0 bin/kafka-run-class.sh
kafka.tools.DumpLogSegments --deep-iteration --print-data-log --files
/tmp/kafka-logs/test-0/00000000000000000000.log
Dumping /tmp/kafka-logs/test-0/00000000000000000000.log
Starting offset: 0
offset: 0 position: 0 CreateTime: 1528595323503 isvalid: true
keysize: -1 valuesize: 5 magic: 2 compresscodec: NONE producerId: -1
producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: []
payload: Hello
offset: 1 position: 73 CreateTime: 1528595324799 isvalid: true
keysize: -1 valuesize: 5 magic: 2 compresscodec: NONE producerId: -1
producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: []
payload: World

You can see the message with other details like offset , position and CreateTime etc.

Consuming Messages
Messages that are stored should be consumed too. Let’s started a console based
consumer.

https://towardsdatascience.com/getting-started-with-apache-kafka-in-python-604b3250aa05 7/17
3/15/2020 Getting started with Apache Kafka in Python - Towards Data Science

➜ kafka_2.11-1.1.0 bin/kafka-console-consumer.sh --bootstrap-server

localhost:9092 --topic test --from-beginning

If you run, it will dump all the messages from the beginning till now. If you are just
interested to consume the messages after running the consumer then you can just omit -

-from-beginning switch it and run. The reason it does not show the old messages because
the o set is updated once the consumer sends an ACK to the Kafka broker about
processing messages. You can see the work ow below.

Accessing Ka ka in Python
There are multiple Python libraries available for usage:

Kafka-Python — An open-source community-based library.

PyKafka — This library is maintained by Parsly and it’s claimed to be a Pythonic API.
Unlike Kafka-Python you can’t create dynamic topics.

Con uent Python Kafka:- It is o ered by Con uent as a thin wrapper around
librdkafka, hence it’s performance is better than the two.

For this post, we will be using the open-source Kafka-Python.

Recipes Alert System in Ka ka

https://towardsdatascience.com/getting-started-with-apache-kafka-in-python-604b3250aa05 8/17
3/15/2020 Getting started with Apache Kafka in Python - Towards Data Science

In the last post about Elasticsearch, I scraped Allrecipes data. In this post, I am going to
use the same scraper as a data source. The system we are going to build is an alert
system which will send noti cation about the recipes if it meets the certain threshold of
the calories. There will be two topics:

raw_recipes:- It will be storing the raw HTML of each recipe. The idea is to use this
topic as the main source of our data that later can be processed and transformed as
per need.

parsed_recipes:- As the name suggests, this will be parsed data of each recipe in
JSON format.

The length of Kafka topic name should not exceed 249.

A typical work ow will look like below:

Install kafka-python via pip

pip install kafka-python

Raw recipe producer

The rst program we are going to write is the producer. It will access Allrecpies.com and
fetch the raw HTML and store in raw_recipes topic.

https://towardsdatascience.com/getting-started-with-apache-kafka-in-python-604b3250aa05 9/17
3/15/2020 Getting started with Apache Kafka in Python - Towards Data Science

1 def fetch_raw(recipe_url):
2 html = None
3 print('Processing..{}'.format(recipe_url))
4 try:
5 r = requests.get(recipe_url, headers=headers)
6 if r.status_code == 200:
7 html = r.text
8 except Exception as ex:
9 print('Exception while accessing raw html')
10 print(str(ex))
11 finally:
12 return html.strip()
13
14
15 def get_recipes():
16 recipies = []
17 salad_url = 'https://www.allrecipes.com/recipes/96/salad/'
18 url = 'https://www.allrecipes.com/recipes/96/salad/'
19 print('Accessing list')
20
21 try:
22 r = requests.get(url, headers=headers)
23 if r.status_code == 200:
24 html = r.text
25 soup = BeautifulSoup(html, 'lxml')
26 links = soup.select('.fixed-recipe-card__h3 a')
27 idx = 0
28 for link in links:
29
30 sleep(2)
31 recipe = fetch_raw(link['href'])
32 recipies.append(recipe)
33 idx += 1
34 if idx > 2:
35 break
36 except Exception as ex:
37 print('Exception in get_recipes')
38 print(str(ex))
39 finally:
40 return recipies

producer-raw-recipies.py hosted with ❤ by GitHub view raw

This code snippet will extract markup of each recipe and return in list format.
https://towardsdatascience.com/getting-started-with-apache-kafka-in-python-604b3250aa05 10/17
3/15/2020 Getting started with Apache Kafka in Python - Towards Data Science

Next, we to create a producer object. Before we proceed further, we will make changes in
config/server.properties le. We have to set advertised.listeners to
PLAINTEXT://localhost:9092 otherwise you could experience the following error:

Error encountered when producing to broker b'adnans-mbp':9092. Retrying.

We will now add two methods: connect_kafka_producer() that will give you an instance
of Kafka producer and publish_message() that will just dump the raw HTML of
individual recipes.

1 def publish_message(producer_instance, topic_name, key, value):

2 try:
3 key_bytes = bytes(key, encoding='utf-8')
4 value_bytes = bytes(value, encoding='utf-8')
5 producer_instance.send(topic_name, key=key_bytes, value=value_bytes)
6 producer_instance.flush()
7 print('Message published successfully.')
8 except Exception as ex:
9 print('Exception in publishing message')
10 print(str(ex))
11
12
13 def connect_kafka_producer():
14 _producer = None
15 try:
16 _producer = KafkaProducer(bootstrap_servers=['localhost:9092'], api_version=(0, 10))
17 except Exception as ex:
18 print('Exception while connecting Kafka')
19 print(str(ex))
20 finally:
21 return _producer

producer-raw-recipies.py hosted with ❤ by GitHub view raw

The main will look like below:

if __name__ == '__main__':
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X
10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181
Safari/537.36',

https://towardsdatascience.com/getting-started-with-apache-kafka-in-python-604b3250aa05 11/17
3/15/2020 Getting started with Apache Kafka in Python - Towards Data Science

'Pragma': 'no-cache'
}

all_recipes = get_recipes()
if len(all_recipes) > 0:
kafka_producer = connect_kafka_producer()
for recipe in all_recipes:
publish_message(kafka_producer, 'raw_recipes', 'raw',
recipe.strip())
if kafka_producer is not None:
kafka_producer.close()

If it runs well, it shows the following output:

/anaconda3/anaconda/bin/python /Development/DataScience/Kafka/kafka-
recipie-alert/producer-raw-recipies.py
Accessing list
Processing..https://www.allrecipes.com/recipe/20762/california-
coleslaw/
Processing..https://www.allrecipes.com/recipe/8584/holiday-chicken-
salad/
Processing..https://www.allrecipes.com/recipe/80867/cran-broccoli-
salad/
Message published successfully.
Message published successfully.
Message published successfully.

Process finished with exit code 0

I am using a GUI tool, named as Kafka Tool to browse recently published messages. It is
available for OSX, Windows and Linux.

https://towardsdatascience.com/getting-started-with-apache-kafka-in-python-604b3250aa05 12/17
3/15/2020 Getting started with Apache Kafka in Python - Towards Data Science

Ka kaToolKit in action

Recipe Parser
The next script we are going to write will serve as both consumer and producer. First it
will consume data from raw_recipes topic, parse and transform data into JSON and then
will publish it in parsed_recipes topic. Below is the code that will fetch HTML data from
raw_recipes topic, parse and then feed into parsed_recipes topic.

1 import json
2 from time import sleep
3
4 from bs4 import BeautifulSoup
5 from kafka import KafkaConsumer, KafkaProducer
6
7
8 def publish_message(producer_instance, topic_name, key, value):
9 try:
10 key_bytes = bytes(key, encoding='utf-8')
11 value_bytes = bytes(value, encoding='utf-8')
12 producer_instance.send(topic_name, key=key_bytes, value=value_bytes)
13 producer_instance.flush()
14 print('Message published successfully.')
15 except Exception as ex:
16 print('Exception in publishing message')
17 print(str(ex))
18
19
20 def connect_kafka_producer():
21 _producer = None
22 try:
23 _producer = KafkaProducer(bootstrap_servers=['localhost:9092'], api_version=(0, 10))
24 except Exception as ex:
25 print('Exception while connecting Kafka')
26 print(str(ex))
https://towardsdatascience.com/getting-started-with-apache-kafka-in-python-604b3250aa05 13/17
3/15/2020 Getting started with Apache Kafka in Python - Towards Data Science

27 finally:
28 return _producer
29
30
31 def parse(markup):
32 title = '-'
33 submit_by = '-'
34 description = '-'
35 calories = 0
36 ingredients = []
37 rec = {}
38
39 try:
40
41 soup = BeautifulSoup(markup, 'lxml')
42 # title
43 title_section = soup.select('.recipe-summary__h1')
44 # submitter
45 submitter_section = soup.select('.submitter__name')
46 # description
47 description_section = soup.select('.submitter__description')
48 # ingredients
49 ingredients_section = soup.select('.recipe-ingred_txt')
50
51 # calories
52 calories_section = soup.select('.calorie-count')
53 if calories_section:
54 calories = calories_section[0].text.replace('cals', '').strip()
55
56 if ingredients_section:
57 for ingredient in ingredients_section:
58 ingredient_text = ingredient.text.strip()
59 if 'Add all ingredients to list' not in ingredient_text and ingredient_text !=
60 ingredients.append({'step': ingredient.text.strip()})
61
62 if description_section:
63 description = description_section[0].text.strip().replace('"', '')
64
65 if submitter_section:
66 submit_by = submitter_section[0].text.strip()
67
68 if title_section:
69 title = title_section[0].text
70
71 rec {'title': title 'submitter': submit by 'description': description 'calories': c
https://towardsdatascience.com/getting-started-with-apache-kafka-in-python-604b3250aa05 14/17
3/15/2020 Getting started with Apache Kafka in Python - Towards Data Science
71 rec = { title : title, submitter : submit_by, description : description, calories : c
72 'ingredients': ingredients}
73
74 except Exception as ex:
75 print('Exception while parsing')
76 print(str(ex))
77 finally:
78 return json.dumps(rec)
79
80
81 if __name__ == '__main__':
82 print('Running Consumer..')
83 parsed_records = []
84 topic_name = 'raw_recipes'
85 parsed_topic_name = 'parsed_recipes'
86
87 consumer = KafkaConsumer(topic_name, auto_offset_reset='earliest',
88 bootstrap_servers=['localhost:9092'], api_version=(0, 10), consumer
89 for msg in consumer:
90 html = msg.value
91 result = parse(html)
92 parsed_records.append(result)
93 consumer.close()
94 sleep(5)
95
96 if len(parsed_records) > 0:
97 print('Publishing records..')
98 producer = connect_kafka_producer()
99 for rec in parsed_records:
100 publish_message(producer, parsed_topic_name, 'parsed', rec)

producer_consumer_parse_recipes.py hosted with ❤ by GitHub view raw

KafkaConsumer accepts a few parameters beside the topic name and host address. By
providing auto_offset_reset='earliest' you are telling Kafka to return messages from
the beginning. The parameter consumer_timeout_ms helps the consumer to disconnect
after the certain period of time. Once disconnected, you can close the consumer stream
by calling consumer.close()

After this, I am using same routines to connect producers and publish parsed data in the
new topic. KafaTool browser gives glad tidings about newly stored messages.

https://towardsdatascience.com/getting-started-with-apache-kafka-in-python-604b3250aa05 15/17
3/15/2020 Getting started with Apache Kafka in Python - Towards Data Science

So far so good. We stored recipes in both raw and JSON format for later use. Next, we
have to write a consumer that will connect with parsed_recipes topic and generate alert
if certain calories critera meets.

1 import json
2 from time import sleep
3
4 from kafka import KafkaConsumer
5
6 if __name__ == '__main__':
7 parsed_topic_name = 'parsed_recipes'
8 # Notify if a recipe has more than 200 calories
9 calories_threshold = 200
10
11 consumer = KafkaConsumer(parsed_topic_name, auto_offset_reset='earliest',
12 bootstrap_servers=['localhost:9092'], api_version=(0, 10), consumer_
13 for msg in consumer:
14 record = json.loads(msg.value)
15 calories = int(record['calories'])
16 title = record['title']
17
18 if calories > calories_threshold:

https://towardsdatascience.com/getting-started-with-apache-kafka-in-python-604b3250aa05 16/17
3/15/2020 Getting started with Apache Kafka in Python - Towards Data Science
19 print('Alert: {} calories count is {}'.format(title, calories))
20 sleep(3)
21
22 if consumer is not None:
23 consumer.close()

consumer-notification.py hosted with ❤ by GitHub view raw

The JSON is decoded and then check the calories count, a noti cation is issued once the
criteria meet.

Conclusion
Kafka is a scalable, fault-tolerant, publish-subscribe messaging system that enables you
to build distributed applications. Due to its high performance and e ciency, it’s
getting popular among companies that are producing loads of data from various external
sources and want to provide real-time ndings from it. I have just covered the gist of it.
Do explore the docs and existing implementation and it will help you to understand how
it could be the best t for your next system.

The code is available on Github.

This article originally published here.

Click here to subscribe my newsletter for future posts.

Big Data Ka ka Python Distributed Systems Data Science

About Help Legal

https://towardsdatascience.com/getting-started-with-apache-kafka-in-python-604b3250aa05 17/17

Solomon rprc11 PPT 03 Accessible
No ratings yet
Solomon rprc11 PPT 03 Accessible
48 pages
EPCC For New Boil Off Gas Compressor Project - ITT Document
No ratings yet
EPCC For New Boil Off Gas Compressor Project - ITT Document
253 pages
Building Data Pipelines in Python
No ratings yet
Building Data Pipelines in Python
49 pages
Apache Kafka
No ratings yet
Apache Kafka
9 pages
Fast Data Processing with Spark 2 - Third Edition
From Everand
Fast Data Processing with Spark 2 - Third Edition
Krishna Sankar
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Study of Led TV Manufacturing
No ratings yet
Study of Led TV Manufacturing
28 pages
Apache Kafka Cookbook - Sample Chapter
100% (1)
Apache Kafka Cookbook - Sample Chapter
14 pages
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
Apache Kafka - Basic Operations
No ratings yet
Apache Kafka - Basic Operations
6 pages
Apache Kafka 101
No ratings yet
Apache Kafka 101
25 pages
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Extending Docker
From Everand
Extending Docker
Russ McKendrick
5/5 (1)
Kafka Streams - Real-time Streams Processing
From Everand
Kafka Streams - Real-time Streams Processing
Prashant Kumar Pandey
5/5 (2)
Top 10 Kafka Problems
No ratings yet
Top 10 Kafka Problems
3 pages
Kafka Sparkstreaming
No ratings yet
Kafka Sparkstreaming
75 pages
Kafka and Mongodb
No ratings yet
Kafka and Mongodb
15 pages
Kafka Producer Internals: Find Answers On The Fly, or Master Something New. Subscribe Today
No ratings yet
Kafka Producer Internals: Find Answers On The Fly, or Master Something New. Subscribe Today
1 page
A Deep Dive Into Query Execution Engine of Spark SQL
100% (2)
A Deep Dive Into Query Execution Engine of Spark SQL
88 pages
Stream Processing Using Kafka
No ratings yet
Stream Processing Using Kafka
46 pages
Apache Spark 2.x Cookbook
From Everand
Apache Spark 2.x Cookbook
Rishi Yadav
No ratings yet
Building Data Pipelines - 1
No ratings yet
Building Data Pipelines - 1
25 pages
Mastering Apache Spark PDF
No ratings yet
Mastering Apache Spark PDF
663 pages
Kafka Secuirty
No ratings yet
Kafka Secuirty
4 pages
Apache Hue-Cloudera
No ratings yet
Apache Hue-Cloudera
63 pages
Running Airflow Reliably With Kubernetes
100% (1)
Running Airflow Reliably With Kubernetes
47 pages
Certification
No ratings yet
Certification
16 pages
Parallel Programming With Spark: Matei Zaharia
No ratings yet
Parallel Programming With Spark: Matei Zaharia
40 pages
Python Data Structures
No ratings yet
Python Data Structures
178 pages
AWS Boto - 1
No ratings yet
AWS Boto - 1
55 pages
Airflow Introduction
No ratings yet
Airflow Introduction
9 pages
Complex Event Processing With Apache Flink Presentation
No ratings yet
Complex Event Processing With Apache Flink Presentation
49 pages
System Design Cheatsheet 1651760511
No ratings yet
System Design Cheatsheet 1651760511
6 pages
Research On AWS Glue
No ratings yet
Research On AWS Glue
5 pages
5 Kafka Producer Advanced
No ratings yet
5 Kafka Producer Advanced
152 pages
Lab - Exploring DataLake With Athena and Quicksight PDF
No ratings yet
Lab - Exploring DataLake With Athena and Quicksight PDF
22 pages
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages
Aws Glu
No ratings yet
Aws Glu
17 pages
Cloudera Developer Training For Apache Spark: Hands-On Exercises
No ratings yet
Cloudera Developer Training For Apache Spark: Hands-On Exercises
61 pages
Redis Internals
No ratings yet
Redis Internals
20 pages
Getting Started With Apache Kafka
No ratings yet
Getting Started With Apache Kafka
21 pages
Kubernetes A Complete Guide
From Everand
Kubernetes A Complete Guide
Gerardus Blokdyk
No ratings yet
Cloudera Introduction PDF
No ratings yet
Cloudera Introduction PDF
97 pages
Windowing Functions
No ratings yet
Windowing Functions
54 pages
Orchestrate Redshift ETL Using AWS Glue and Step Functions Report
No ratings yet
Orchestrate Redshift ETL Using AWS Glue and Step Functions Report
31 pages
Machine Learning Spark ML
No ratings yet
Machine Learning Spark ML
11 pages
Spark Training in Bangalore
No ratings yet
Spark Training in Bangalore
36 pages
Kafka and Spark Streaming
No ratings yet
Kafka and Spark Streaming
45 pages
Cloudera Certification Dump - 410-Anil
100% (3)
Cloudera Certification Dump - 410-Anil
49 pages
Kafka Cheat Sheets
No ratings yet
Kafka Cheat Sheets
1 page
Apache Kafka Installation
No ratings yet
Apache Kafka Installation
3 pages
Machine Learning in Spark
No ratings yet
Machine Learning in Spark
26 pages
Chapter 4. Database System Architecture & Modeling
No ratings yet
Chapter 4. Database System Architecture & Modeling
57 pages
Airflow 101 Mobile
No ratings yet
Airflow 101 Mobile
48 pages
Resume 3
No ratings yet
Resume 3
4 pages
Data Vault and HQDM Principles PDF
No ratings yet
Data Vault and HQDM Principles PDF
8 pages
Apache Airflow On Docker For Complete Beginners - Justin Gage - Medium
No ratings yet
Apache Airflow On Docker For Complete Beginners - Justin Gage - Medium
12 pages
Spark Interview Questions: Click Here
No ratings yet
Spark Interview Questions: Click Here
35 pages
Cloudera Spark
No ratings yet
Cloudera Spark
70 pages
Hadoop Overview Training Material
No ratings yet
Hadoop Overview Training Material
44 pages
SparkInternals All
No ratings yet
SparkInternals All
90 pages
[FREE PDF sample] Python Unit Test Automation: Practical Techniques for Python Developers and Testers 1 / converted Edition Ashwin Pajankar ebooks
100% (2)
[FREE PDF sample] Python Unit Test Automation: Practical Techniques for Python Developers and Testers 1 / converted Edition Ashwin Pajankar ebooks
35 pages
Big Data Hadoop Architect - V4
No ratings yet
Big Data Hadoop Architect - V4
20 pages
Vmware Gss Job Description: Title: Associate Technical Support Engineer Location: Bangalore CTC: INR 987337
No ratings yet
Vmware Gss Job Description: Title: Associate Technical Support Engineer Location: Bangalore CTC: INR 987337
3 pages
Veritas Resiliency Platform 2.1 Getting Started Guide: © 2017 Veritas Technologies LLC. All Rights Reserved
No ratings yet
Veritas Resiliency Platform 2.1 Getting Started Guide: © 2017 Veritas Technologies LLC. All Rights Reserved
4 pages
6 Ways Pharmaceutical Companies Are Using Data Analytics To Drive Innovation & Value
No ratings yet
6 Ways Pharmaceutical Companies Are Using Data Analytics To Drive Innovation & Value
11 pages
Freelance SME - Writer Test
No ratings yet
Freelance SME - Writer Test
3 pages
Microwave Engineering Questions and - Answers - Antenna Basics
No ratings yet
Microwave Engineering Questions and - Answers - Antenna Basics
42 pages
BCS Practical Soft
No ratings yet
BCS Practical Soft
25 pages
Finite State Machine
No ratings yet
Finite State Machine
14 pages
BCS MCQs-123
No ratings yet
BCS MCQs-123
50 pages
CMOS
No ratings yet
CMOS
8 pages
128-1005-Dr. Gajanan Rangarao Patil
No ratings yet
128-1005-Dr. Gajanan Rangarao Patil
1 page
MCQ RMT Final Unit1,2,3
No ratings yet
MCQ RMT Final Unit1,2,3
14 pages
207-1192-Ms. Sushma Avinash Patil
No ratings yet
207-1192-Ms. Sushma Avinash Patil
1 page
Army Institute of Technology: Campus Placements 2020-21
No ratings yet
Army Institute of Technology: Campus Placements 2020-21
2 pages
EC8701 Antennas and Microwave Engineering
No ratings yet
EC8701 Antennas and Microwave Engineering
42 pages
Only 46.2% of Indian Students Are Employable According To The Indian Skills Report. 80% of Engineering Graduates Are Not Fit For Jobs..
No ratings yet
Only 46.2% of Indian Students Are Employable According To The Indian Skills Report. 80% of Engineering Graduates Are Not Fit For Jobs..
4 pages
Deven's Activity: All Activity Articles Documents
No ratings yet
Deven's Activity: All Activity Articles Documents
5 pages
WebiarProposal - AIT 20 April 20
No ratings yet
WebiarProposal - AIT 20 April 20
2 pages
Post1 - LinkedIn
No ratings yet
Post1 - LinkedIn
2 pages
Master List of Experiment: Maharashtra Institute of Technology, Aurangabad Laboratory Manual
No ratings yet
Master List of Experiment: Maharashtra Institute of Technology, Aurangabad Laboratory Manual
90 pages
App 900428
No ratings yet
App 900428
2 pages
377-1036-Mr. Jitendra Bhaskar Jawale
No ratings yet
377-1036-Mr. Jitendra Bhaskar Jawale
1 page
Lesson 2.1 - Unit Testing Basics
No ratings yet
Lesson 2.1 - Unit Testing Basics
28 pages
Customer Satisfaction on airtel customer service
No ratings yet
Customer Satisfaction on airtel customer service
43 pages
Shanti Business School: PGDM Trimester-Iii End Term Examination JULY - 2015
No ratings yet
Shanti Business School: PGDM Trimester-Iii End Term Examination JULY - 2015
8 pages
Georgia Warehouse Market Report
No ratings yet
Georgia Warehouse Market Report
29 pages
Nasdaq BLBD 2020
No ratings yet
Nasdaq BLBD 2020
93 pages
Artificial Neural Networks-Based Machine Learning For Wireless Networks: A Tutorial
No ratings yet
Artificial Neural Networks-Based Machine Learning For Wireless Networks: A Tutorial
33 pages
10.01 Ebook PCI Ebook
No ratings yet
10.01 Ebook PCI Ebook
24 pages
IBM 4weeks Data Analytics Programme Calendar Overview
No ratings yet
IBM 4weeks Data Analytics Programme Calendar Overview
1 page
Presentation PDF
No ratings yet
Presentation PDF
10 pages
Housekeeping, Management & Holding Construction Materials
No ratings yet
Housekeeping, Management & Holding Construction Materials
7 pages
Wash Water Reuse Article
No ratings yet
Wash Water Reuse Article
4 pages
Eacop Project Presentation Local Region Engagement
No ratings yet
Eacop Project Presentation Local Region Engagement
137 pages
Ebook The Complete Guide To Managing Fixed Assets in Ns
No ratings yet
Ebook The Complete Guide To Managing Fixed Assets in Ns
11 pages
BAUET Mechanical Innovation Club Boocklet
No ratings yet
BAUET Mechanical Innovation Club Boocklet
4 pages
Lamosiac India Limited - Prospectus
No ratings yet
Lamosiac India Limited - Prospectus
302 pages
Materials Selection: Ali Ourdjini, UTM - 2005 Faculty of Mechanical Engineering
No ratings yet
Materials Selection: Ali Ourdjini, UTM - 2005 Faculty of Mechanical Engineering
33 pages
Activity #4 Corporate Liquidation
No ratings yet
Activity #4 Corporate Liquidation
3 pages
Ddar FW Log
No ratings yet
Ddar FW Log
990 pages
Absorption Variable Seatwork
No ratings yet
Absorption Variable Seatwork
2 pages
Pune Confirmation
No ratings yet
Pune Confirmation
4 pages
Inter-Regional Place Branding: Sebastian Zenker Björn P. Jacobsen Editors
No ratings yet
Inter-Regional Place Branding: Sebastian Zenker Björn P. Jacobsen Editors
190 pages
ITB - Electrical Works For No. 2 API Separator Rev A
No ratings yet
ITB - Electrical Works For No. 2 API Separator Rev A
84 pages
ABM Organization and Management CG 4
No ratings yet
ABM Organization and Management CG 4
6 pages
This Is A Variable Consideration That Doesn't Depend On Index or Rate Recognized On The P/L When Incurring
No ratings yet
This Is A Variable Consideration That Doesn't Depend On Index or Rate Recognized On The P/L When Incurring
7 pages
Improvement Plan Paper in Operation Management
No ratings yet
Improvement Plan Paper in Operation Management
47 pages
The Power of Process
No ratings yet
The Power of Process
3 pages
Software Architecture and Design Documentation Template
No ratings yet
Software Architecture and Design Documentation Template
7 pages
MFRS 13 FValue
No ratings yet
MFRS 13 FValue
14 pages

Getting Started With Apache Kafka in Python - Towards Data Science PDF

Uploaded by

Getting Started With Apache Kafka in Python - Towards Data Science PDF

Uploaded by

3/15/2020 Getting started with Apache Kafka in Python - Towards Data Science

Getting started with Apache Ka a in Python

Image Credit: linuxhint.com

What is Apache Ka ka?

Apache Kafka is an open-source stream-processing software platform developed by the

Credit: O cial Website

Setting up and Running

2018-06-10 06:36:15,023] INFO maxSessionTimeout set to -1

2018-06-10 06:38:44,477] INFO Kafka commitId : fdcf75ea326b8e07

➜ kafka_2.11-1.1.0 bin/kafka-topics.sh --create --zookeeper

➜ kafka_2.11-1.1.0 bin/kafka-topics.sh --list --zookeeper

As you see, it prints, test .

➜ kafka_2.11-1.1.0 bin/kafka-console-producer.sh --broker-list

If you open 00000000000000000000.log in an editor then it shows something like:

➜ kafka_2.11-1.1.0 bin/kafka-console-consumer.sh --bootstrap-server

Kafka-Python — An open-source community-based library.

For this post, we will be using the open-source Kafka-Python.

Recipes Alert System in Ka ka

The length of Kafka topic name should not exceed 249.

A typical work ow will look like below:

Install kafka-python via pip

pip install kafka-python

Raw recipe producer

producer-raw-recipies.py hosted with ❤ by GitHub view raw

Error encountered when producing to broker b'adnans-mbp':9092. Retrying.

1 def publish_message(producer_instance, topic_name, key, value):

producer-raw-recipies.py hosted with ❤ by GitHub view raw

The __main__ will look like below:

If it runs well, it shows the following output:

Process finished with exit code 0

producer_consumer_parse_recipes.py hosted with ❤ by GitHub view raw

consumer-notification.py hosted with ❤ by GitHub view raw

The code is available on Github.

This article originally published here.

Click here to subscribe my newsletter for future posts.

Big Data Ka ka Python Distributed Systems Data Science

About Help Legal

You might also like

The main will look like below: