Distributed Coordination with Python

DISTRIBUTED COORDINATION
WITH PYTHON
BenBangert
mozilla

DISTRIBUTED COORDINATION IS NOT...
• Distributed Databases (Cassandra, Riak)
• Distributed Computing (Hadoop, etc.)
• Distributed Event Analysis (Storm)

ZooKeeperisacentralizedservicefor
maintainingconfigurationinformation,
naming,providingdistributedsynchronization,
andprovidinggroupservices.

WHY NOT USE...
• Memcached?
• MongoDB?
• Postgres/MySQL?

Hierarchical data structure in znodes

• Session Based
• Znode watches
• Ephemeral and Sequential Znodes

• Last for duration of client session
• Session dies when connection is closed or expires
• Can’t have children znodes
EPHEMERAL ZNODES

SEQUENTIAL ZNODES
• Supply a node name (or not), get node name back with a trailing sequence
number (0001, 0002, 0003, etc.)
• Can be combined with ephemeral ﬂag

BASIC COMMANDS
• create(PATH, DATA...)
• get(PATH...)
• get_children(PATH...)
• set(PATH, DATA...)
• delete(PATH...)

PYTHON CLIENTS
• txzookeeper
• kazoo
• uniﬁed client that works with gevent
• implements wire protocol in pure Python

EASY TO USE
from kazoo.client import KazooClient
client = KazooClient()
client.start()

CONFIGURATION
• Store settings in node data
• Organize node structure
• Set watches on nodes of interest

PARTY MEMBERSHIP
• Join a party, ﬁnd out who else is around
• Elect a leader if desired
• Recipe in Kazoo

LOCKS
• Lock a resource for a single client
• Lock a resource for multiple clients (Semaphore)
• Hard to write properly
• Recipe in Kazoo

BUILDING HIGHER LEVEL
ABSTRACTIONS
ON
ZOOKEEPER

DO NOT IMPLEMENT YOURSELF
USE THE RECIPE

BASIC STEPS
• Create lock parent node if needed
• Create ephemeral+sequence node under parent, store node name
returned
• Get children of lock node
• Sort children list by sequence number
• First child in the list has the lock!

THINGS TO WATCH OUT FOR
• Avoid the thundering herd, use watches only when needed
• When our node isn’t the lowest, watch the one in front of us
• Only one client wanting a lock is ‘woken’ when the lock is released by a
diﬀerent client

ROBUST CODE TAKES EFFORT
• What happens when a server fails?
• What happens when the client fails?
• What happens when we don’t know if the server has failed?

FAILURE WILL HAPPEN
• Fail fast, fail completely.
• Session expiration is a good time to sys.exit
• Always include jitter (kazoo includes jitter on its connection and command
retry operations)
• Consider what exceptions can occur in any code relying on a distributed
system

• Distributed systems are hard
• Use existing battle-proven tools (Zookeeper, Kazoo)
• Always consider everything that can fail, and how
• Be wary of tools that don’t tell you how they fail
• Read Kyle Kingsbury’s Jepsen posts to see examples of
systems failing: http://aphyr.com/tags/jepsen

Distributed Coordination with Python

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Similar to Distributed Coordination with Python

Similar to Distributed Coordination with Python (20)

More from OSCON Byrum

More from OSCON Byrum (20)

Recently uploaded

Recently uploaded (20)

Distributed Coordination with Python