Building on quicksand microservices indicthreads

About Me
• Work at ThoughtWorks Pune
• DDD and Distributed Computing enthusiast
• A fan of Pat Helland
Twitter handle: @shripadagashe
Blog: https://shripad-agashe.github.io

Evolution of systems
Image source: https://commons.wikimedia.org/wiki/File:Front_Z9_2094.jpg
Image Source: https://en.wikipedia.org/wiki/Solaris_Cluster#/media/
File:Sun_Microsystems_Solaris_computer_cluster.jpg
Vs

Reliability Is a steep curve
Downtime vs % availability
0
225
450
675
900
99 99.9 99.99 99.999
Downtime in Seconds
%Availability Per Year
Per
month
Per day
99 3.65 days 7.2 hour
14.4
minutes
99.9
8.76
hours
43.8
minutes
1.44
minutes
99.99
52
minutes
4.38
munutes
9 seconds
99.999
5.26
minutes
25.9
seconds
0.8
seoncds

And It has a price
0
250
500
750
1000
99 99.9 99.99 99.999

Probability theory to the
rescue
Union of Probability
P(A) Intersection P(B) = P(A) * P(B)
For 99 % Availability i.e. 1% unavailability Probability of unavailability for 2 servers
= (0.01) * (0.01) = 0.0001

More on Probability
• Systems in series
• Availability = P(A) * P(B)
• Systems in parallel
• Availability = 1 - P(1-A) * P(1-B)

So Architectural patterns
evolve around it
App App
DB
Box Cylinder Architecture Best Practices
•App layer should be
stateless
• Architecture should be
layered

But DB is still on a single
machine
Primary DB
Secondary
DB
Asynchronous
replication
DB high availability is achieved via replication
Active-Active Replication Active-Passive Replication
Primary DB
Secondary
DB
Synchronous
Replication

Move to transaction model
Client In Memory Disk
Write
Write
Commit
Write to Disk

Enterprise organization
Image Source: https://www.ﬂickr.com/photos/mwichary/2356663850

Conway’s Law
Inventory
Sales
Finance
Fulﬁlment Inventory System
Sales System
Finance
System
Fulﬁlment
System
Organization IT Systems

Integration via DB
App
App
DB
App
App
DB
What Enabled it:
• 2 Phase Commit
• XA transaction
App 1 App 2

Possible Alternative
App
App
DB
App
App
DB
What Enabled it:
• SOAP
• REST
App 1 App 2
Service

Bouquets and brickbats
+
• Integration is simple
• Familiar for most
developers
• Easier to reason
-
• Any sync call will add
to latency
• Sync calls will
expose system to
variations in behavior
of external systems

Possible alternative
App
App
DB
App
App
DB
App 1 App 2
Replication
Replication Patterns
• Via ﬁle
• Batch app for replication
• Event driven replication using message queues

Bouquets and brickbats
+
• As there is no sync call, it
does not add additional
latency to app
• As systems are isolated
chances of failure
propagation are minimal
• With Pub Sub changes can
be propagated to multiple
subscriber with minimal
additional work
-
• Integration may not be
trivial
• Async propagation of data
needs careful reasoning

Probabilistic Business Rules
• When we have asynchronous replication we have
windows of failure that mean work may be lost or
delayed.
• Distribution + AsynchronyàProbabilities of
Enforcement
Source:http://db.cs.berkeley.edu/cs286/papers/quicksand-cidr2009.pdf

Asynchrony and Truth
Image source: https://www.ﬂickr.com/photos/stevenpisano/16595925953

Here comes Eventual
Consistency
• Eventual consistency guarantees that subset of previous
writes will be returned; eventually it will return all
writes.
• There is no guarantee of order
• There is no time bound on eventual
• Loosely defined term which guarantees nothing. The
application should tolerate any subset of writes
without any time guarantee
• As opposed to EC being a single concept it is a spectrum
• On one end of spectrum is strong consistency
• On the other end eventual consistency

Eventual consistency thru simple
example
Ofﬁcial scorekeeper:
score = Read (“visitors”);
Write (“visitors”, score +1);
Umpire:
if middle of 9th
inning then
vScore = Read (“visitors”);
hScore = Read (“home”);
if vScore < hScore
end game;
Radio reporter:
do {
report vScore and hScore;
sleep (30 minutes);
}
Sportswriter:
While not end of game {
drink beer;
smoke cigar;
}
go out to dinner;
write article;
Statistician:
Wait for end of game;
score = Read (“home”);
stat = Read (“season-runs”);
Write (“season-runs”, stat +
score);
Stat watcher:
stat = Read (“season-
runs”);
discuss stats with friends;
Strong Consistency See all previous writes.
Eventual Consistency See subset of previous writes.
Consistent Prefix See initial sequence of writes.
Bounded Staleness See all “old” writes.
Monotonic Reads See increasing subset of writes.
Read My Writes See all writes performed by reader.
Source:http://cacm.acm.org/magazines/2013/12/169945-replicated-data-consistency-explained-through-baseball/fulltext#F8

Not everyone needs same
thing
• Often different roles have different tolerances for
stale information
• The trade off between correctness and availability
can bring in more revenue
• The trade off is often driven by business value

Whats the Risk Appetite
• Consistency is often cost of doing business
• The major point is that availability (and its cousins ofﬂine and
latency-reduction) may be traded off with classic notions of
consistency. This tradeoff may frequently be applied across many
different aspects at many levels of granularity within a single
application.
• Locally clear a check if the face value is less than $10,000. If it exceeds
$10,000, double check with all the replicas to make sure it clears
• Schedule the shipment of a “Harry Potter” book based on a local
opinion of the inventory. In contrast, the one and only one Gutenberg
bible requires strict coordination!

Memories, Guesses,
Apologies
• The idea is that everything is done locally with a
subset of the global knowledge.
• You know what you know when an action is
performed. Since you have only a subset of the
knowledge, your actions are really only guesses.
• When your knowledge as a replica increases, you
may have an “Oh, crap!” moment.

• Every business has to be ready for apologies.
• Consider a case where the only book in inventory is
scheduled for delivery.
• In preparing the book for shipment, it is run over by the
forklift in the warehouse.
• So correct software non withstanding you will need
to apologize
More on Apologies

How to apologize
• First of all recognize if you need to apologize
• Identify promises that could not be completed
• Unique identiﬁer across systems becomes critical to
identify failures
• Typically the mistakes are identiﬁed during
reconciliation
• Based on severity apology can be handled by system
based on rules or directed for human involvement

What inhibits business trade offs
• The layering of an arbitrary application atop a storage subsystem
inhibits reordering (and also apologies)
• Logical delete vs Actual delete of row
• Only when commutative operations are used can we achieve the
desired loose coupling.
• Application operations can be commutative

Commutative Business Transactions
• Order insensitive logic
• Valid Account creation - Create dummy account ﬁrst and then
attach customer to it later
• Credit to account
• Visibility of business operations
• Logical deletion of record

We need to partner with
people like him
Image source: https://commons.wikimedia.org/wiki/File:Jackie_Stewart_2011_British_Grand_Prix.jpg

So whats in it for You
• The technique explained requires business person
to be IT sympathetic
• Business has to align with IT and see IT as a
competitive advantage
• Developing with us rather than developing for us
mentality

Image source: https://commons.wikimedia.org/wiki/File:Jackie_Stewart_2011_British_Grand_Prix.jpg
• Move from DB Centric view of consistency to
application centric view of consistency
• Carefully make trade offs in IT systems to limit
losses and increase upside
• Look for inspiration in business practices
developed for world without instant information
Key takeaway

Building on quicksand microservices indicthreads

More Related Content

Building on quicksand microservices indicthreads