The Unicorn Project and The Five Ideals (older: see notes for newer version)

@RealGeneKim
Session ID:
Gene Kim
The Unicorn Project
And The Five Ideals

@RealGeneKim
 Nearly 3 years, 1600 hours of
work
 Publication date: November 26
 Wanted to capture the heroic
journeys of the DevOps
Enterprise community

@RealGeneKim
There’s Never Been A Better Time
for Infrastructure and Operations

@RealGeneKim
The Five Ideals
1. Locality and Simplicity
2. Focus, Flow, and Joy
3. Improvement of Daily Work
4. Psychological Safety
5. Customer Focus

@RealGeneKim
Session ID:
Ideal #1:
Locality and Simplicity

@RealGeneKim
The Birth And Death Of Etsy Sprouter
 A story about teams of engineers implementing
changes
 2008: Devs and DBAs
 2009: Devs and DBAs and Sprouter team
 2010: Devs

@RealGeneKim
Lesson:
The Organization and
The Architecture Of Our Software
Must Be Congruent

@RealGeneKim
Lead Time = 9 months
Source: Damon Edwards (@damonedwards)

@RealGeneKim
Architecture Enables Teams To…
 …make large scale changes to the design of its system without the
permission of someone outside the team, or depending on other
teams
 ...complete its work without fine-grained communication and
coordination with people outside the team
 ...deploy and release its product or service on demand, independently
of other services the product or service depends upon
 ...do most of its testing on demand, without requiring an integrated
test environment
 ...perform deployments during normal business hours with negligible
downtime
Source: Puppet/DORA: 2017 State Of DevOps Report: https://puppet.com/resources/whitepaper/state-of-devops-report

@RealGeneKim
The First Ideal: Organization
 Ideal: any team can independently develop, test,
and deploy value to the customer
 Not Ideal: to deploy value to the customer, every
team must coordinate with tens of other teams,
and any of them can prevent it

@RealGeneKim
The First Ideal: A Measure
 Bus Factor
 Lunch Factor

@RealGeneKim
How Many People Do You Need To Feed?
 Two pizza team
 Feeding everyone in the building
 Schedule lunch with 43 different people

@RealGeneKim
The First Ideal: Organization
 Ideal: every team has the expertise, capability
and authority to satisfy customer needs
 Not Ideal: in order to satisfy customer needs,
every team must escalate up two levels (and over
two, and down two)

@RealGeneKim
The First Ideal: Code
 Ideal: anyone can implement what they need by
looking at one file or module, and make the
needed change
- Kubernetes sidecars
 Not Ideal: to make your needed change, you
have to understand all the files and modules

@RealGeneKim
Hotel Wi-Fi Story

@RealGeneKim
Session ID:
Ideal #2:
Focus, Flow, and Joy

@RealGeneKim
As Your Ambassador From Dev
 For decades, I self-identified as an Ops person…
 2 years ago, I’ve started to self-identify as Dev
 Clojure / ClojureScript
 LISP, functional programming, immutability
 3000 lines of Objective C -> 1500 lines of TypeScript/React -
> 500 lines of ClojureScript
 Development is so fun, and these days, you can do
miraculous things with so little effort

@RealGeneKim
Why Functional Programming
 The famous French philosopher Claude Lévi-Strauss
would say of certain tools, ‘is it good to think with?’
 Core FP concepts
 Immutability
 Pure functions
 Composability
 Pioneered by Haskell and Ocaml. Popularized by
Clojure, Erlang, Elm, Elixir

@RealGeneKim
Never Have I Valued Infrastructure More
 Things I detest now
 Everything outside of my application
 Connecting to anything to anything
 Secrets management
 Bash
 YAML
 Patching
 Building kubernetes deployment files (mostly by
Googling)
 Why my cloud costs are so high

@RealGeneKim
Interestingly, It Portends Future Of Ops
 Core concepts
 Immutability
 Pure functions
 Composability
 Look at…
 Docker, Docker Compose
 Kubernetes
 Kubernetes sidecars
 Event streams: Apache Kafka

@RealGeneKim
The Second Ideal: Focus and Flow
 Ideal: your energy and time is focused on solving
the business problem, and you’re having fun
 Not Ideal: all your time is spent trying to solve
problems you don’t even want to solve (e.g.,
YAML files, Makefile and spaces in filenames,
bash)

@RealGeneKim
Two Types Of Learning
● Procedural Learning
● One-shot Learning

@RealGeneKim
 Ideally, I can implement my business feature in
one place, and I have all the expertise I need to
implement it

@RealGeneKim
The Value Of Platforms
 Enable developer productivity
 Self-service
 On-demand
 Immediacy and fast feedback
 Focus and flow
 Joy
 Monitoring, deployment, environment creation,
security scans, orchestration…

@RealGeneKim
 “bash: the disease you die with, but don’t die of…”
Jeffrey Snover
Technical Fellow, Microsoft
@jsnover

@RealGeneKim
The Second Ideal: Focus and Flow
 Ideal: trunk based development
 Not Ideal: 5 days merging, 50 people in
conference rooms

@RealGeneKim
Session ID:
Ideal #3:
Improvement of Daily Work

@RealGeneKim
Not Ideal
“In manufacturing, the absence of effective feedback often
contribute to major quality and safety problems. In one well-
documented case at the General Motors Fremont manufacturing
plant, there were no effective procedures in place to detect
problems during the assembly process, nor were there explicit
procedures on what to do when problems were found.
“As a result, there were instances of engines being put in
backward, cars missing steering wheels or tires, and cars even
having to be towed off the assembly line because they wouldn’t
start.”
Source: DevOps Handbook

@RealGeneKim
Create as much feedback in our system, from as
many areas in our system, sooner, faster, and
cheaper, with as much clarity between cause and
effect.
Why? Because the more assumptions we can
invalidate, the more we learn, improving our ability
to fix problems and innovate.
Source: DevOps Handbook
Ideal

@RealGeneKim
How many times per day is the andon cord
pulled in a typical day at a Toyota
manufacturing plant?
3,500 times per day
Source: http://www.gembapantarei.com/2008/04/how_many_times_do_you_pull_the_andon_cord_each_day.html

@RealGeneKim
Fast Push To Market
Debts & Risks
Features
Quality
Defects

@RealGeneKim
Fast Push To Market — Continued
Features
Defects
Defect fixing dominates work
Site reliability tanks
Slower and slower velocity
Customers leave
Morale plunges
Devs leave because everything is hard
Quality
Debts & Risks

@RealGeneKimSource: https://twitter.com/johncutlefish/status/1046169469268111361
Who hasn’t felt this?
You hire a bunch of developers, but you
still can’t ship the features you
promised…
…and maybe you even have the feeling
that things are slowing down…

@RealGeneKimSource: The Unicorn Project (2019)

@RealGeneKim
Near Death Experiences
● Ebay (1999)
● Microsoft (2002): Bill Gates memo
● Google (2005): Automated testing culture
● Amazon (2004): Jeff Bezos memo
● Twitter (2008)
● LinkedIn (2009)
● Etsy (2009)

@RealGeneKim
2002 Microsoft Security
Standdown
 Famously, Microsoft after
SQL Slammer required
every product group to
freeze feature
Source: https://www.wired.com/2002/01/bill-gates-trustworthy-computing/

@RealGeneKim
The Feature Freeze / Standdown
Debt
Features
Quality
Defects
Features

@RealGeneKim
Quote from Marty Cagan from his book
Inspired
The deal [between product owners and] engineering goes like this: Product
management takes 20% of the team’s capacity right off the top and gives this to
engineering to spend as they see fit. They might use it to rewrite, re-architect, or
re-factor problematic parts of the code base…whatever they believe is necessary
to avoid ever having to come to the team and say, ‘we need to stop and rewrite [all
our code].’ If you’re in really bad shape today, you might need to make this 30% or
even more of the resources. However, I get nervous when I find teams that think
they can get away with much less than 20%.
Cagan notes that when organizations do not pay their “20% tax,” technical debt
will increase to the point where an organization inevitably spends all of its cycles
paying down technical debt. At some point, the services become so fragile that
feature delivery grinds to a halt because all the engineers are working on reliability
issues or working around problems.

@RealGeneKimSource: Satya Nadella, CEO, Microsoft (@satyanadella)

@RealGeneKim
First Ideal
 Ideal: 3-5% of developers dedicated to improving
developer productivity
 Not ideal: assigned to summer interns and
“people not good enough to be developers”

@RealGeneKim
The Third Ideal: Improvement
 Not Ideal: No one cares if someone breaks the
build, or checks in code that breaks our tests
 Ideal: When someone breaks our build or our
tests, fixing it becomes the most important work
of the moment

@RealGeneKim
The Third Ideal: Improvement
 Not ideal: When someone needs a peer review,
that person has to wait until someone else frees
up
 Ideal: Whatever I’m working on, if someone
needs a peer review, I drop whatever I’m doing to
help

@RealGeneKim
"Automated tests transform fear into boredom."
-- Eran Messeri, Google
Google Dev And Ops (2013)
 15,000 engineers, working on 4,000+ projects
 All code is checked into one source tree
(billions of files!)
 5,500 code commits/day
 75 million test cases are run daily

@RealGeneKim
Session ID:
Ideal #4:
Psychological Safety

@RealGeneKim
One Of The Highest Predictors Of
Performance
Source: Typology Of Organizational Culture (Westrum, 2004)

@RealGeneKim
Google: Project Aristotle, Oxygen, re:Work
Source: https://rework.withgoogle.com/blog/five-keys-to-a-successful-google-team/

@RealGeneKim
Great Practices Enabled
 Blameless post-mortems
 Chaos Monkeys

@RealGeneKim
Inject Failures Often

@RealGeneKim
You Don’t Choose Chaos Monkey…
Chaos Monkey Chooses You

@RealGeneKim
“Then I remembered all the Chaos Monkey
exercises we’ve gone through. My reaction
was, ‘Bring it on!’”
The 2014 AWS Reboot
“When we got the news about the emergency EC2
reboots, our jaws dropped. When we got the list of
how many Cassandra nodes would be affected, I
felt ill.
– Christos Kalantzis
Netflix Cloud DB EngineeringSource: http://techblog.netflix.com/2014/10/a-state-of-xen-chaos-monkey-cassandra.html

@RealGeneKim
The 2014 AWS Reboot
“Out of our 2700+ production Cassandra nodes,
218 were rebooted. 22 Cassandra nodes did not
reboot successfully.
“Netflix customers experienced no downtime that
weekend.”
– Bruce Wong
Netflix Chaos Engineering

@RealGeneKim
Netflix and Service Catalog

@RealGeneKim
Tom Limoncelli Quote (@yesthattom)

@RealGeneKim
Session ID:
Ideal #5:
Customer Focus

@RealGeneKim
Session ID:
DevOps Is For The Unicorns…
...And The Horses, Too

@RealGeneKim
DevOps Enterprise: Lessons Learned
 In 2018, we’ll hold the fifth year of the DevOps Enterprise Summit, a
conference for horses, by horses
 Over the years, we’ve had over 200 leaders from:
 Capital One, KeyBank, Barclays, GE Capital, ING Bank, Fidelity, PNC, ADP, BofA,
Western Union, BBVA
 Nationwide Insurance, Zurich Insurance, Hiscox, Aviva, LV=
 Walmart, Nordstrom, Target, Macy’s, Marks and Spencer
 Nike, Adidas, Sherwin Williams
 Verizon, Telstra, T-Mobile, Orange, CSG
 Raytheon, Lockheed Martin, Northrop Grumman, CSRA, Jaguar Land Rover
 Disney, Ticketmaster, NBC/Universal
 Kaiser Permanente
 US Citizenship & Immigration Services, UK HM Revenue Collection, DISA Forge.mil, NZ
Ministry of Social Development, UK Welfare and Pensions, US Joint Warfare Analysis
Center
 Amazon PrimeNow, CA, Compuware, Google Search, IBM, MicroFocus, Microsoft, SAP

@RealGeneKimSource: Puppet/DORA: 2017 State Of DevOps Report: https://puppet.com/resources/whitepaper/state-of-devops-report

@RealGeneKim
Leadership Matters
 Teams with the least reported transformational
leadership behaviors (the bottom-third) were one-
half as likely to be high IT performers
 Leaders cannot do it alone! Teams with the top
10% of reported transformational leadership
behaviors performed no better than the median
Source: Puppet/DORA: 2017 State Of DevOps Report: https://puppet.com/resources/whitepaper/state-of-devops-report

@RealGeneKim
The Fifth Ideal: Focus On The Customer
 Not ideal: Functional silo managers behave like
union leaders, as opposed to business leaders
 Ideal: Functional silo managers make decisions
based on what the customer values, and helps
ensure their teams have the skills to thrive in the
long term

@RealGeneKim
The Fifth Ideal: Focus On The Customer
 Core vs. Context

@RealGeneKim
Why Do I Think This Is
Important?

@RealGeneKim
 Publication date: November 26
 Excerpts will be released in the
next three weeks

@RealGeneKim
Want More Learn More?
To receive this presentation and the following:
 Announcement and upcoming excerpts from
The Unicorn Project
 Eight excerpts from Beyond The Phoenix Project audio
series w/John Willis
 The 140 page excerpt of The DevOps Handbook
 The 140 page excerpt of The Phoenix Project
 Videos and slides from DevOps Enterprise 2014-2019
 One hour excerpt of The Phoenix Project audiobook
Just pick up your phone, and send an email:
To: realgenekim@SendYourSlides.com
Subject: devops
realgenekim@SendYourSlides.com
devops

The Unicorn Project and The Five Ideals (older: see notes for newer version)

More Related Content

The Unicorn Project and The Five Ideals (older: see notes for newer version)

Editor's Notes