Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
97 views

Getting To Know Apache Kafka's Architecture: Ryan Plant

Apache Kafka is a distributed messaging system consisting of producers, consumers, and brokers within a cluster. It uses characteristics of distributed systems like worker node roles, reliability through replication, and consensus-based communication. The document discusses Kafka's architecture and how it incorporates aspects of distributed systems, with Apache Zookeeper managing metadata and coordination between nodes in the Kafka cluster.

Uploaded by

ancgate
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views

Getting To Know Apache Kafka's Architecture: Ryan Plant

Apache Kafka is a distributed messaging system consisting of producers, consumers, and brokers within a cluster. It uses characteristics of distributed systems like worker node roles, reliability through replication, and consensus-based communication. The document discusses Kafka's architecture and how it incorporates aspects of distributed systems, with Apache Zookeeper managing metadata and coordination between nodes in the Kafka cluster.

Uploaded by

ancgate
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Getting to Know Apache Kafka’s

Architecture

Ryan Plant
COURSE AUTHOR

@ryan_plant blog.ryanplant.com
Apache Kafka as a Messaging System
Producers Consumers
Apache Kafka as a Messaging System
Producers Topics Consumers

To: “X” Retrieve: “X”

“X”

“Y”

To: “Y” Retrieve: “Y”


Apache Kafka as a Messaging System
Producers Consumers

Broker
Apache Kafka as a Messaging System
Producers Broker Consumers

A
~/A/…

B
~/B/…

C
~/C/…
How Apache Kafka Starts to Differentiate
Producers Consumers

LinkedIn: 1,400 brokers => 2 petabytes per week

Broker Broker

Broker Broker

“A high-throughput distributed messaging system.”


The Apache Kafka Cluster
Producers Cluster Consumers

Broker Broker

Broker Broker
The Apache Kafka Cluster
Producers Cluster Consumers
Size: 1

Broker Broker

Broker Broker
The Apache Kafka Cluster
Producers Cluster Consumers
Size: 2

Broker Broker

Broker Broker
The Apache Kafka Cluster
Producers Cluster Consumers
Size: 2

Broker Broker

Broker Broker
The Apache Kafka Cluster
Producers Cluster Consumers
Size: 4

Broker Broker

Broker Broker

Later…
Distributed Systems

Collection of resources that are instructed


to achieve a specific goal or function
Consist of multiple workers or nodes
The system of nodes require coordination
to ensure consistency and progress
towards a common goal
KAFKA BROKERS
Each node communicates with each other
though messages
Distributed Systems: Controller Election

Work Items Attendance Status


Distributed Systems: The Cluster

KAFKA CLUSTER
Distributed Systems: Getting Work Done

PRODUCER
Worker availability and health
Task redundancy
Distributed Systems: Getting Work Done
(Reliably)

“we cannot afford loss…


three replicas, please”

PEER
LEADER PEER
LEADER

PEER PEER PEER


Distributed Systems: Getting Work Done
(Reliably)

”here you go…”

“we have a quorum”


LEADER LEADER

“not ready or able”


FOLLOWER
PEER FOLLOWER
PEER PEER
Sources of Work in Apache Kafka

PRODUCER CONSUMER

KAFKA CLUSTER
Distributed Systems: Communication and
Consensus

Worker node membership and naming


Configuration management
Leader election
Health status
Apache Zookeeper
Centralized service for maintaining
metadata about a cluster of distributed
nodes
- Configuration information
- Heath status
- Group membership

Hadoop, HBase, Mesos, Solr, Redis, and


Neo4j
Distributed system consisting of multiple
nodes in an “ensemble”
Apache Kafka’s Distributed Architecture

APACHE ZOOKEEPER

PRODUCER CONSUMER

KAFKA CLUSTER
Apache Kafka is a Pub-Sub messaging
system, consisting of:
- Producers and Consumers
Summary
- Brokers within a Cluster

Characteristics of distributed systems


- Worker node roles: Controllers,
Leaders, and Followers
- Reliability through replication
- Consensus-based communication

Role of Apache Zookeeper

You might also like