Building stateful systems with akka cluster sharding

Building Stateful Systems with
Akka Cluster Sharding
Presented By:
Hugh Mckee
Himanshu Gupta
Anjali Sharma

Before we start…
1. Please use the Q&A section to post your questions and raise your hand after
the webinar to discuss your questions with experts
2. Session is recorded and we will be sharing with you after the session.
3. We will send a follow up mail with all links and downloads soon.

About the Speakers
Hugh Mckee
Developer Advocate at Lightbend
● Speaker and advocate for development of Reactive Cloud Native Systems
Himanshu Gupta
Akka Expert and Sr. Lead Consultant at Knoldus Inc.
● Speaker for Fast Data Systems and Reactive Application Engineer.
Anjali Sharma
Software Consultant at Knoldus Inc.
● Developer Engineer specialised in Scala, Akka and Spark.

Agenda
What is Cluster Sharding?
Understanding Entity/Shard Ids
Sharding Example
What are Stateful Systems?
How to use Stateful Actors?
No more Blocking
Passivation

About Knoldus
Product Engineering for Innovative Organizations
Keeping your business competitive & future-ready with extremely well-engineered systems
through the unwavering pursuit of emerging technology, high-quality engineers,
processes, and practices

REACTIVE
PRODUCTS
Microservices & API
●
●
●
●
●
ENTERPRISE
DATA PROGRAM
Data Lake
●
ARTIFICIAL
INTELLIGENCE
Machine Learning
Data Science
Deep Learning
●
●
●
●
BLOCKCHAIN
●
●
●
Knoldus Practice Areas
Fast Data
●
●
●
Agile
Transformation
Reactive UI/UX
Test Automation
Practice
Reactive DevOps
Product
Engineering

Knoldus Global Presence
10+ Years
Years of Proﬁtable Growth
175+ Engineers
Reactive products, Fast Data strategy, AI
04 Ofﬁces
Toronto, Chicago, Singapore, India
20+ Customers
Multi-year Global Customers

About Lightbend
Lightbend empowers organizations to quickly implement any digitally transformative business
strategy—no matter how ambitious, challenging or innovative.
We take care of the architectural hurdles and back-end complexity of building globally distributed,
cloud-native application environments. Lightbend enables development teams with the technology
and expertise required to build applications that support business critical decisions. That’s why Global
2000 enterprises turn to us.
Unleash the full power of the cloud with Lightbend.

What is Cluster Sharding ?
Sharding:
● The term Sharding means Partitioning.
● It's a technique that mostly databases use to improve
their elasticity and resiliency.

Database Sharding:
● Records are distributed across nodes, using a shard key
or a partition key.
● A router which directs requests to the appropriate
shard or partition.
● Even after sharding, it may lead to bottleneck.

Akka Cluster Sharding:
● The Akka toolkit provides cluster sharding as a way to
introduce sharding into your application.
● Instead of distributing database records across a
cluster, we are going to distribute actors across the
cluster.
● Each actor is then going to act as a consistency
boundary, for the data that it manages.

Components of Cluster Sharding ?
Entities:
● The basic unit in akka cluster sharding is an actor
called an entity.
● There is only one entity per entity ID in the cluster.
● Messages are addressed to the entity ID and processed
by the entity. This allows the entity to act as a single
source of truth, acting as a consistency boundary for
the data that it manages.

Shards:
● Entities are distributed in shards.
● Each shard manages a number of entities and creates
entity actors on demand.
● And each shard has a unique ID mapping entities to a
shard ID is how we control the distribution.

Shard Region:
● Shards gets distribute into different shard regions. Each
shard region contains a number of shards.
● For a type of entity, there is usually one shard region per
JVM.
● A shard region will look up the location of the shard for
the entity the first time when it doesn’t already know its
location, and then forwards the messages to the
appropriate node region, and the entity.

Shard Coordinator:
● The shard coordinator is responsible to manage shards,
it’s a cluster singleton.
● It’s responsible for ensuring that the system knows
where to send messages addressed to a specific entity.
● And it decides which shard gets to live in which region,
which is to stay on which node.

Understanding Entity ID
● To uniquely identify each entity, entityIDs are used.
● They are used to create name of the actor and hence must be unique across the entire
cluster.
● Entity Id Extractors are used to process each incoming message and separate it into an
entity id and a message to be passed to the entity actor.
case class MyMessage(entityID: String, message: String)
val idExtractor: ShardRegion.ExtractorEntityId = {
case MyMessage(id, message) => (id, message)
}

Understanding Shard ID
val shardIdExtractor: ShardRegion.ExtractShardId = {
case MyMessage(id, _) =>
(Math.abs(id.hashCode % totalShards)).toString
}
● To identify shards, Shard Ids are used.
● Entities are mapped to a Shard Id.
● An Extractor function is used to process each incoming message and produce Shard Id.
● Best practice is to aim for roughly 10 shards per node.
● When selecting a ShardId and producing an extractor it is important to consider how the
Shards will be balanced.
● Poor sharding strategy will produce hotspots which result in uneven workload.

Sharding Example
val shards = ClusterSharding(myActorSystem).start(
“shardedActors”,
MyShardedActor.props(),
ClusterShardingSettings(myActorSystem),
idExtractor,
shardIdExtractor
)
● ClusterSharding.start is called on each node that will be hosting shards.
● The role of the above block of code is to provide an actor ref which is the reference for
the local shard region.
● For sending messages we have to take the shard region actor ref and we send it the
message we’re expecting.
● Messages are first sent to the entities, through the local shard region.
shards ! MyMessage(entityId, someMessage)

First, stateless systems
CART-1234
CART-1234
Temp Hot State
Cold State
1. Retrieve state
2. Change state
3. Save state
4. Forget state
Retrieve

CART-1234
CART-1234
Temp Hot State
Cold State
1. Retrieve state
2. Change state
3. Save state
4. Forget state
Change

CART-1234
CART-1234
Temp Hot State
Cold State
1. Retrieve state
2. Change state
3. Save state
4. Forget state
Save

CART-1234Cold State
1. Retrieve state
2. Change state
3. Save state
4. Forget state

CART-1234
CART-1234Cold State
CART-1234
Contention handled by the database

CART-1234
CART-1234
Hot State
Cold State
Stateful systems
Retrieve state on 1st access

CART-1234
CART-1234
Hot State
Cold State
Stateful systems
Save incremental state changes

Stateful systems
CART-1234
CART-1234
Load Balancer

How to use Stateful Actors
Akka
Cluster
Sharding
https://github.com/mckeeh3/akka-typed-java-cluster-sharding.git

No More Blocking
Why blocking inside an Actor is bad?
Blocking inside an actor can tie-up a thread inside an
Actor which cannot be reused by other Actors when
required. Hence it creates resource contention.
Note: Generally DB operations are blocking in an application.

Non-Blocking Requires Extra Care
● Next message can’t be processed until the previous message is complete.
● What to do in case a non-blocking operation fails?

Handling Non-Blocking Failures
● We need to be careful when updating data asynchronously in DB.
● Because if the update fails, then state will become inconsistent.

Handling Non-Blocking Failures
● We should fail the actor so that it can restart itself.
● Because on restart the actor will reload the state from DB, and the state will be
consistent.

Passivation
● Keeping the state of all the actors in memory is a huge risk.
● As it can fill the memory fast and cause OOM (OutOfMemory exception).
● Hence Akka Cluster Sharding provides a way to remove idle actors from
memory known as Passivation.

How Passivation works?
● Passivation works on a configurable time span.
● For every actor the time of last processed is tracked.
● In case an actor has not processed a message for the configured time span,
then it is removed from the Actor System.

How Passivation works? (contd.)
● Now, as soon as the actor starts receiving messages, it’s state is loaded back
from the DB.
● Since the actor was removed from the memory, all it’s state was lost.
Note: While the actor was not present in the memory, it’s messages are stored in
buffer. Hence they remain safe.

Configuring Passivation
● Using passivate-idle-entity-after setting we can configure when entities will
passivate.
● By default it’s value is 120 seconds.

References
● Reactive Banking Sample Code
● Akka Cluster Sharding - Scala

www.knoldus.com
+(1) 647-467-4396
linkedin.com/company/knoldus
@knolspeak
Thank You!
Stay in Touch

Building stateful systems with akka cluster sharding

More Related Content

Building stateful systems with akka cluster sharding