Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Session ID:
Gene Kim
The Unicorn Project
And The Five Ideals
2013 2016 2017
 Nearly 3 years, 1600 hours of
 Publication date: November 26
 Wanted to capture the heroic
journeys of the DevOps
Enterprise community
There’s Never Been A Better Time
for Infrastructure and Operations
The Five Ideals
1. Locality and Simplicity
2. Focus, Flow, and Joy
3. Improvement of Daily Work
4. Psychological Safety
5. Customer Focus
Session ID:
Ideal #1:
Locality and Simplicity
The Birth And Death Of Etsy Sprouter
 A story about teams of engineers implementing
 2008: Devs and DBAs
 2009: Devs and DBAs and Sprouter team
 2010: Devs
The Organization and
The Architecture Of Our Software
Must Be Congruent
Lead Time = 9 months
Source: Damon Edwards (@damonedwards)
Architecture Enables Teams To…
 …make large scale changes to the design of its system without the
permission of someone outside the team, or depending on other
 ...complete its work without fine-grained communication and
coordination with people outside the team
 ...deploy and release its product or service on demand, independently
of other services the product or service depends upon
 ...do most of its testing on demand, without requiring an integrated
test environment
 ...perform deployments during normal business hours with negligible
Source: Puppet/DORA: 2017 State Of DevOps Report: https://puppet.com/resources/whitepaper/state-of-devops-report
The First Ideal: Organization
 Ideal: any team can independently develop, test,
and deploy value to the customer
 Not Ideal: to deploy value to the customer, every
team must coordinate with tens of other teams,
and any of them can prevent it
The First Ideal: A Measure
 Bus Factor
 Lunch Factor
How Many People Do You Need To Feed?
 Two pizza team
 Feeding everyone in the building
 Schedule lunch with 43 different people
The First Ideal: Organization
 Ideal: every team has the expertise, capability
and authority to satisfy customer needs
 Not Ideal: in order to satisfy customer needs,
every team must escalate up two levels (and over
two, and down two)
The First Ideal: Code
 Ideal: anyone can implement what they need by
looking at one file or module, and make the
needed change
- Kubernetes sidecars
 Not Ideal: to make your needed change, you
have to understand all the files and modules
Hotel Wi-Fi Story
Session ID:
Ideal #2:
Focus, Flow, and Joy
As Your Ambassador From Dev
 For decades, I self-identified as an Ops person…
 2 years ago, I’ve started to self-identify as Dev
 Clojure / ClojureScript
 LISP, functional programming, immutability
 3000 lines of Objective C -> 1500 lines of TypeScript/React -
> 500 lines of ClojureScript
 Development is so fun, and these days, you can do
miraculous things with so little effort
Why Functional Programming
 The famous French philosopher Claude Lévi-Strauss
would say of certain tools, ‘is it good to think with?’
 Core FP concepts
 Immutability
 Pure functions
 Composability
 Pioneered by Haskell and Ocaml. Popularized by
Clojure, Erlang, Elm, Elixir
Never Have I Valued Infrastructure More
 Things I detest now
 Everything outside of my application
 Connecting to anything to anything
 Secrets management
 Bash
 Patching
 Building kubernetes deployment files (mostly by
 Why my cloud costs are so high
Interestingly, It Portends Future Of Ops
 Core concepts
 Immutability
 Pure functions
 Composability
 Look at…
 Docker, Docker Compose
 Kubernetes
 Kubernetes sidecars
 Event streams: Apache Kafka
The Second Ideal: Focus and Flow
 Ideal: your energy and time is focused on solving
the business problem, and you’re having fun
 Not Ideal: all your time is spent trying to solve
problems you don’t even want to solve (e.g.,
YAML files, Makefile and spaces in filenames,
Two Types Of Learning
● Procedural Learning
● One-shot Learning
 Ideally, I can implement my business feature in
one place, and I have all the expertise I need to
implement it
The Value Of Platforms
 Enable developer productivity
 Self-service
 On-demand
 Immediacy and fast feedback
 Focus and flow
 Joy
 Monitoring, deployment, environment creation,
security scans, orchestration…
 “bash: the disease you die with, but don’t die of…”
Jeffrey Snover
Technical Fellow, Microsoft
The Second Ideal: Focus and Flow
 Ideal: trunk based development
 Not Ideal: 5 days merging, 50 people in
conference rooms
Session ID:
Ideal #3:
Improvement of Daily Work
Not Ideal
“In manufacturing, the absence of effective feedback often
contribute to major quality and safety problems. In one well-
documented case at the General Motors Fremont manufacturing
plant, there were no effective procedures in place to detect
problems during the assembly process, nor were there explicit
procedures on what to do when problems were found.
“As a result, there were instances of engines being put in
backward, cars missing steering wheels or tires, and cars even
having to be towed off the assembly line because they wouldn’t
Source: DevOps Handbook
Create as much feedback in our system, from as
many areas in our system, sooner, faster, and
cheaper, with as much clarity between cause and
Why? Because the more assumptions we can
invalidate, the more we learn, improving our ability
to fix problems and innovate.
Source: DevOps Handbook
How many times per day is the andon cord
pulled in a typical day at a Toyota
manufacturing plant?
3,500 times per day
Source: http://www.gembapantarei.com/2008/04/how_many_times_do_you_pull_the_andon_cord_each_day.html
Fast Push To Market
Debts & Risks
Fast Push To Market — Continued
Defect fixing dominates work
Site reliability tanks
Slower and slower velocity
Customers leave
Morale plunges
Devs leave because everything is hard
Debts & Risks
@RealGeneKimSource: https://twitter.com/johncutlefish/status/1046169469268111361
Who hasn’t felt this?
You hire a bunch of developers, but you
still can’t ship the features you
…and maybe you even have the feeling
that things are slowing down…
@RealGeneKimSource: The Unicorn Project (2019)
Near Death Experiences
● Ebay (1999)
● Microsoft (2002): Bill Gates memo
● Google (2005): Automated testing culture
● Amazon (2004): Jeff Bezos memo
● Twitter (2008)
● LinkedIn (2009)
● Etsy (2009)
2002 Microsoft Security
 Famously, Microsoft after
SQL Slammer required
every product group to
freeze feature
Source: https://www.wired.com/2002/01/bill-gates-trustworthy-computing/
The Feature Freeze / Standdown
Quote from Marty Cagan from his book
The deal [between product owners and] engineering goes like this: Product
management takes 20% of the team’s capacity right off the top and gives this to
engineering to spend as they see fit. They might use it to rewrite, re-architect, or
re-factor problematic parts of the code base…​whatever they believe is necessary
to avoid ever having to come to the team and say, ‘we need to stop and rewrite [all
our code].’ If you’re in really bad shape today, you might need to make this 30% or
even more of the resources. However, I get nervous when I find teams that think
they can get away with much less than 20%.
Cagan notes that when organizations do not pay their “20% tax,” technical debt
will increase to the point where an organization inevitably spends all of its cycles
paying down technical debt. At some point, the services become so fragile that
feature delivery grinds to a halt because all the engineers are working on reliability
issues or working around problems.
@RealGeneKimSource: Satya Nadella, CEO, Microsoft (@satyanadella)
First Ideal
 Ideal: 3-5% of developers dedicated to improving
developer productivity
 Not ideal: assigned to summer interns and
“people not good enough to be developers”
The Third Ideal: Improvement
 Not Ideal: No one cares if someone breaks the
build, or checks in code that breaks our tests
 Ideal: When someone breaks our build or our
tests, fixing it becomes the most important work
of the moment
The Third Ideal: Improvement
 Not ideal: When someone needs a peer review,
that person has to wait until someone else frees
 Ideal: Whatever I’m working on, if someone
needs a peer review, I drop whatever I’m doing to
"Automated tests transform fear into boredom."
-- Eran Messeri, Google
Google Dev And Ops (2013)
 15,000 engineers, working on 4,000+ projects
 All code is checked into one source tree
(billions of files!)
 5,500 code commits/day
 75 million test cases are run daily
Session ID:
Ideal #4:
Psychological Safety
One Of The Highest Predictors Of
Source: Typology Of Organizational Culture (Westrum, 2004)
One Of The Highest Predictors Of
Source: Typology Of Organizational Culture (Westrum, 2004)
One Of The Highest Predictors Of
Source: Typology Of Organizational Culture (Westrum, 2004)
Google: Project Aristotle, Oxygen, re:Work
Source: https://rework.withgoogle.com/blog/five-keys-to-a-successful-google-team/
Great Practices Enabled
 Blameless post-mortems
 Chaos Monkeys
Inject Failures Often
You Don’t Choose Chaos Monkey…
Chaos Monkey Chooses You
“Then I remembered all the Chaos Monkey
exercises we’ve gone through. My reaction
was, ‘Bring it on!’”
The 2014 AWS Reboot
“When we got the news about the emergency EC2
reboots, our jaws dropped. When we got the list of
how many Cassandra nodes would be affected, I
felt ill.
– Christos Kalantzis
Netflix Cloud DB EngineeringSource: http://techblog.netflix.com/2014/10/a-state-of-xen-chaos-monkey-cassandra.html
The 2014 AWS Reboot
“Out of our 2700+ production Cassandra nodes,
218 were rebooted. 22 Cassandra nodes did not
reboot successfully.
“Netflix customers experienced no downtime that
– Bruce Wong
Netflix Chaos Engineering
Netflix and Service Catalog
Tom Limoncelli Quote (@yesthattom)
Session ID:
Ideal #5:
Customer Focus
Session ID:
DevOps Is For The Unicorns…
...And The Horses, Too
DevOps Enterprise: Lessons Learned
 In 2018, we’ll hold the fifth year of the DevOps Enterprise Summit, a
conference for horses, by horses
 Over the years, we’ve had over 200 leaders from:
 Capital One, KeyBank, Barclays, GE Capital, ING Bank, Fidelity, PNC, ADP, BofA,
Western Union, BBVA
 Nationwide Insurance, Zurich Insurance, Hiscox, Aviva, LV=
 Walmart, Nordstrom, Target, Macy’s, Marks and Spencer
 Nike, Adidas, Sherwin Williams
 Verizon, Telstra, T-Mobile, Orange, CSG
 Raytheon, Lockheed Martin, Northrop Grumman, CSRA, Jaguar Land Rover
 Disney, Ticketmaster, NBC/Universal
 Kaiser Permanente
 US Citizenship & Immigration Services, UK HM Revenue Collection, DISA Forge.mil, NZ
Ministry of Social Development, UK Welfare and Pensions, US Joint Warfare Analysis
 Amazon PrimeNow, CA, Compuware, Google Search, IBM, MicroFocus, Microsoft, SAP
@RealGeneKimSource: Puppet/DORA: 2017 State Of DevOps Report: https://puppet.com/resources/whitepaper/state-of-devops-report
Leadership Matters
 Teams with the least reported transformational
leadership behaviors (the bottom-third) were one-
half as likely to be high IT performers
 Leaders cannot do it alone! Teams with the top
10% of reported transformational leadership
behaviors performed no better than the median
Source: Puppet/DORA: 2017 State Of DevOps Report: https://puppet.com/resources/whitepaper/state-of-devops-report
The Fifth Ideal: Focus On The Customer
 Not ideal: Functional silo managers behave like
union leaders, as opposed to business leaders
 Ideal: Functional silo managers make decisions
based on what the customer values, and helps
ensure their teams have the skills to thrive in the
long term
The Fifth Ideal: Focus On The Customer
 Core vs. Context
Why Do I Think This Is
 Publication date: November 26
 Excerpts will be released in the
next three weeks
Want More Learn More?
To receive this presentation and the following:
 Announcement and upcoming excerpts from
The Unicorn Project
 Eight excerpts from Beyond The Phoenix Project audio
series w/John Willis
 The 140 page excerpt of The DevOps Handbook
 The 140 page excerpt of The Phoenix Project
 Videos and slides from DevOps Enterprise 2014-2019
 One hour excerpt of The Phoenix Project audiobook
Just pick up your phone, and send an email:
To: realgenekim@SendYourSlides.com
Subject: devops

More Related Content

The Unicorn Project and The Five Ideals (older: see notes for newer version)

  • 1. @RealGeneKim Session ID: Gene Kim The Unicorn Project And The Five Ideals
  • 3. @RealGeneKim  Nearly 3 years, 1600 hours of work  Publication date: November 26  Wanted to capture the heroic journeys of the DevOps Enterprise community
  • 4. @RealGeneKim There’s Never Been A Better Time for Infrastructure and Operations
  • 5. @RealGeneKim The Five Ideals 1. Locality and Simplicity 2. Focus, Flow, and Joy 3. Improvement of Daily Work 4. Psychological Safety 5. Customer Focus
  • 7. @RealGeneKim The Birth And Death Of Etsy Sprouter  A story about teams of engineers implementing changes  2008: Devs and DBAs  2009: Devs and DBAs and Sprouter team  2010: Devs
  • 8. @RealGeneKim Lesson: The Organization and The Architecture Of Our Software Must Be Congruent
  • 9. @RealGeneKim Lead Time = 9 months Source: Damon Edwards (@damonedwards)
  • 10. @RealGeneKim Architecture Enables Teams To…  …make large scale changes to the design of its system without the permission of someone outside the team, or depending on other teams  ...complete its work without fine-grained communication and coordination with people outside the team  ...deploy and release its product or service on demand, independently of other services the product or service depends upon  ...do most of its testing on demand, without requiring an integrated test environment  ...perform deployments during normal business hours with negligible downtime Source: Puppet/DORA: 2017 State Of DevOps Report: https://puppet.com/resources/whitepaper/state-of-devops-report
  • 11. @RealGeneKim The First Ideal: Organization  Ideal: any team can independently develop, test, and deploy value to the customer  Not Ideal: to deploy value to the customer, every team must coordinate with tens of other teams, and any of them can prevent it
  • 12. @RealGeneKim The First Ideal: A Measure  Bus Factor  Lunch Factor
  • 13. @RealGeneKim How Many People Do You Need To Feed?  Two pizza team  Feeding everyone in the building  Schedule lunch with 43 different people
  • 14. @RealGeneKim The First Ideal: Organization  Ideal: every team has the expertise, capability and authority to satisfy customer needs  Not Ideal: in order to satisfy customer needs, every team must escalate up two levels (and over two, and down two)
  • 15. @RealGeneKim The First Ideal: Code  Ideal: anyone can implement what they need by looking at one file or module, and make the needed change - Kubernetes sidecars  Not Ideal: to make your needed change, you have to understand all the files and modules
  • 18. @RealGeneKim As Your Ambassador From Dev  For decades, I self-identified as an Ops person…  2 years ago, I’ve started to self-identify as Dev  Clojure / ClojureScript  LISP, functional programming, immutability  3000 lines of Objective C -> 1500 lines of TypeScript/React - > 500 lines of ClojureScript  Development is so fun, and these days, you can do miraculous things with so little effort
  • 19. @RealGeneKim Why Functional Programming  The famous French philosopher Claude Lévi-Strauss would say of certain tools, ‘is it good to think with?’  Core FP concepts  Immutability  Pure functions  Composability  Pioneered by Haskell and Ocaml. Popularized by Clojure, Erlang, Elm, Elixir
  • 20. @RealGeneKim Never Have I Valued Infrastructure More  Things I detest now  Everything outside of my application  Connecting to anything to anything  Secrets management  Bash  YAML  Patching  Building kubernetes deployment files (mostly by Googling)  Why my cloud costs are so high
  • 21. @RealGeneKim Interestingly, It Portends Future Of Ops  Core concepts  Immutability  Pure functions  Composability  Look at…  Docker, Docker Compose  Kubernetes  Kubernetes sidecars  Event streams: Apache Kafka
  • 22. @RealGeneKim The Second Ideal: Focus and Flow  Ideal: your energy and time is focused on solving the business problem, and you’re having fun  Not Ideal: all your time is spent trying to solve problems you don’t even want to solve (e.g., YAML files, Makefile and spaces in filenames, bash)
  • 23. @RealGeneKim Two Types Of Learning ● Procedural Learning ● One-shot Learning
  • 24. @RealGeneKim  Ideally, I can implement my business feature in one place, and I have all the expertise I need to implement it
  • 28. @RealGeneKim The Value Of Platforms  Enable developer productivity  Self-service  On-demand  Immediacy and fast feedback  Focus and flow  Joy  Monitoring, deployment, environment creation, security scans, orchestration…
  • 29. @RealGeneKim  “bash: the disease you die with, but don’t die of…” Jeffrey Snover Technical Fellow, Microsoft @jsnover
  • 30. @RealGeneKim The Second Ideal: Focus and Flow  Ideal: trunk based development  Not Ideal: 5 days merging, 50 people in conference rooms
  • 33. @RealGeneKim Not Ideal “In manufacturing, the absence of effective feedback often contribute to major quality and safety problems. In one well- documented case at the General Motors Fremont manufacturing plant, there were no effective procedures in place to detect problems during the assembly process, nor were there explicit procedures on what to do when problems were found. “As a result, there were instances of engines being put in backward, cars missing steering wheels or tires, and cars even having to be towed off the assembly line because they wouldn’t start.” Source: DevOps Handbook
  • 34. @RealGeneKim Create as much feedback in our system, from as many areas in our system, sooner, faster, and cheaper, with as much clarity between cause and effect. Why? Because the more assumptions we can invalidate, the more we learn, improving our ability to fix problems and innovate. Source: DevOps Handbook Ideal
  • 36. @RealGeneKim How many times per day is the andon cord pulled in a typical day at a Toyota manufacturing plant? 3,500 times per day Source: http://www.gembapantarei.com/2008/04/how_many_times_do_you_pull_the_andon_cord_each_day.html
  • 37. @RealGeneKim Fast Push To Market Debts & Risks Features Quality Defects
  • 38. @RealGeneKim Fast Push To Market — Continued Features Defects Defect fixing dominates work Site reliability tanks Slower and slower velocity Customers leave Morale plunges Devs leave because everything is hard Quality Debts & Risks
  • 39. @RealGeneKimSource: https://twitter.com/johncutlefish/status/1046169469268111361 Who hasn’t felt this? You hire a bunch of developers, but you still can’t ship the features you promised… …and maybe you even have the feeling that things are slowing down…
  • 41. @RealGeneKim Near Death Experiences ● Ebay (1999) ● Microsoft (2002): Bill Gates memo ● Google (2005): Automated testing culture ● Amazon (2004): Jeff Bezos memo ● Twitter (2008) ● LinkedIn (2009) ● Etsy (2009)
  • 42. @RealGeneKim 2002 Microsoft Security Standdown  Famously, Microsoft after SQL Slammer required every product group to freeze feature Source: https://www.wired.com/2002/01/bill-gates-trustworthy-computing/
  • 43. @RealGeneKim The Feature Freeze / Standdown Debt Features Quality Defects Features
  • 45. @RealGeneKim Quote from Marty Cagan from his book Inspired The deal [between product owners and] engineering goes like this: Product management takes 20% of the team’s capacity right off the top and gives this to engineering to spend as they see fit. They might use it to rewrite, re-architect, or re-factor problematic parts of the code base…​whatever they believe is necessary to avoid ever having to come to the team and say, ‘we need to stop and rewrite [all our code].’ If you’re in really bad shape today, you might need to make this 30% or even more of the resources. However, I get nervous when I find teams that think they can get away with much less than 20%. Cagan notes that when organizations do not pay their “20% tax,” technical debt will increase to the point where an organization inevitably spends all of its cycles paying down technical debt. At some point, the services become so fragile that feature delivery grinds to a halt because all the engineers are working on reliability issues or working around problems.
  • 46. @RealGeneKimSource: Satya Nadella, CEO, Microsoft (@satyanadella)
  • 47. @RealGeneKim First Ideal  Ideal: 3-5% of developers dedicated to improving developer productivity  Not ideal: assigned to summer interns and “people not good enough to be developers”
  • 48. @RealGeneKim The Third Ideal: Improvement  Not Ideal: No one cares if someone breaks the build, or checks in code that breaks our tests  Ideal: When someone breaks our build or our tests, fixing it becomes the most important work of the moment
  • 49. @RealGeneKim The Third Ideal: Improvement  Not ideal: When someone needs a peer review, that person has to wait until someone else frees up  Ideal: Whatever I’m working on, if someone needs a peer review, I drop whatever I’m doing to help
  • 50. @RealGeneKim "Automated tests transform fear into boredom." -- Eran Messeri, Google Google Dev And Ops (2013)  15,000 engineers, working on 4,000+ projects  All code is checked into one source tree (billions of files!)  5,500 code commits/day  75 million test cases are run daily
  • 54. @RealGeneKim One Of The Highest Predictors Of Performance Source: Typology Of Organizational Culture (Westrum, 2004)
  • 55. @RealGeneKim One Of The Highest Predictors Of Performance Source: Typology Of Organizational Culture (Westrum, 2004)
  • 56. @RealGeneKim One Of The Highest Predictors Of Performance Source: Typology Of Organizational Culture (Westrum, 2004)
  • 57. @RealGeneKim Google: Project Aristotle, Oxygen, re:Work Source: https://rework.withgoogle.com/blog/five-keys-to-a-successful-google-team/
  • 61. @RealGeneKim Great Practices Enabled  Blameless post-mortems  Chaos Monkeys
  • 64. @RealGeneKim You Don’t Choose Chaos Monkey… Chaos Monkey Chooses You
  • 65. @RealGeneKim “Then I remembered all the Chaos Monkey exercises we’ve gone through. My reaction was, ‘Bring it on!’” The 2014 AWS Reboot “When we got the news about the emergency EC2 reboots, our jaws dropped. When we got the list of how many Cassandra nodes would be affected, I felt ill. – Christos Kalantzis Netflix Cloud DB EngineeringSource: http://techblog.netflix.com/2014/10/a-state-of-xen-chaos-monkey-cassandra.html
  • 66. @RealGeneKim The 2014 AWS Reboot “Out of our 2700+ production Cassandra nodes, 218 were rebooted. 22 Cassandra nodes did not reboot successfully. “Netflix customers experienced no downtime that weekend.” – Bruce Wong Netflix Chaos Engineering
  • 70. @RealGeneKim Session ID: DevOps Is For The Unicorns… ...And The Horses, Too
  • 71. @RealGeneKim DevOps Enterprise: Lessons Learned  In 2018, we’ll hold the fifth year of the DevOps Enterprise Summit, a conference for horses, by horses  Over the years, we’ve had over 200 leaders from:  Capital One, KeyBank, Barclays, GE Capital, ING Bank, Fidelity, PNC, ADP, BofA, Western Union, BBVA  Nationwide Insurance, Zurich Insurance, Hiscox, Aviva, LV=  Walmart, Nordstrom, Target, Macy’s, Marks and Spencer  Nike, Adidas, Sherwin Williams  Verizon, Telstra, T-Mobile, Orange, CSG  Raytheon, Lockheed Martin, Northrop Grumman, CSRA, Jaguar Land Rover  Disney, Ticketmaster, NBC/Universal  Kaiser Permanente  US Citizenship & Immigration Services, UK HM Revenue Collection, DISA Forge.mil, NZ Ministry of Social Development, UK Welfare and Pensions, US Joint Warfare Analysis Center  Amazon PrimeNow, CA, Compuware, Google Search, IBM, MicroFocus, Microsoft, SAP
  • 73. @RealGeneKimSource: Puppet/DORA: 2017 State Of DevOps Report: https://puppet.com/resources/whitepaper/state-of-devops-report
  • 74. @RealGeneKim Leadership Matters  Teams with the least reported transformational leadership behaviors (the bottom-third) were one- half as likely to be high IT performers  Leaders cannot do it alone! Teams with the top 10% of reported transformational leadership behaviors performed no better than the median Source: Puppet/DORA: 2017 State Of DevOps Report: https://puppet.com/resources/whitepaper/state-of-devops-report
  • 75. @RealGeneKim The Fifth Ideal: Focus On The Customer  Not ideal: Functional silo managers behave like union leaders, as opposed to business leaders  Ideal: Functional silo managers make decisions based on what the customer values, and helps ensure their teams have the skills to thrive in the long term
  • 77. @RealGeneKim The Fifth Ideal: Focus On The Customer  Core vs. Context
  • 78. @RealGeneKim Why Do I Think This Is Important?
  • 81. @RealGeneKim  Publication date: November 26  Excerpts will be released in the next three weeks
  • 84. @RealGeneKim Want More Learn More? To receive this presentation and the following:  Announcement and upcoming excerpts from The Unicorn Project  Eight excerpts from Beyond The Phoenix Project audio series w/John Willis  The 140 page excerpt of The DevOps Handbook  The 140 page excerpt of The Phoenix Project  Videos and slides from DevOps Enterprise 2014-2019  One hour excerpt of The Phoenix Project audiobook Just pick up your phone, and send an email: To: realgenekim@SendYourSlides.com Subject: devops realgenekim@SendYourSlides.com devops

Editor's Notes

  1. Book is redshirts from Star Trek, A Team, Hogans Heros, and the movie Brazil 20 years: self identified as an Ops person
  2. Bus factor is the number of people that need to be hit by a bus before your project comes to a screeching halt. In TPP, we had bus factor of 1.  Brent.  Because every outage required Brent, and every major work item required Brent.  If Brent got hit by a bus, the company was legitimately at risk of going out of the business. In the Unicorn Project, I love the concept of the lunch factor. How many people do you need to take out to lunch.   Amazon has the notion of a two pizza team.  No team should be large than can be fed by two pizza.  They can indedepently develop, test, and deploy value to the customer.   No need to take anyone out to lunch. However, in most organizations, to make a small change, everything is so tightly coupled together, you have to take everyone out to lunch. It’s not two pizza, it’s multiple truckloads of pizzas.
  3. We used the most powerful analytical tool to generate this graph: not SPSS, R, Tableau, PLA Sim. We used pivot tables in Excel.
  4. April 22, 2011
  5. [ picture of messy data center ] Ten minutes into Bill’s first day on the job, he has to deal with a payroll run failure. Tomorrow is payday, and finance just found out that while all the salaried employees are going to get paid, none of the hourly factory employees will. All their records from the factory timekeeping systems were zeroed out. Was it a SAN failure? A database failure? An application failure? Interface failure? Cabling error?
  6. Source: http://biobreak.wordpress.com/2010/10/07/games-evangelism-dos-and-donts/
  7. Who are they auditing? IT operations. I love IT operatoins. Why? Because when the developers screw up, the only people who can save the day are the IT operations people. Memory leak? No problem, we’ll do hourly reboots until you figure that out. Who here is from IT operations? Bad day: Not as prepared for the audit as they thought Spending 30% of their time scrambling, generating presentation for auditors Or an outage, and the developer is adamant that they didn’t make the change – they’re saying, “it must be the security guys – they’re always causing outages” Or, there’s 50 systems behind the load balancer, and six systems are acting funny – what different, and who made them different Or every server is like a snowflake, each having their own personality We as Tripwire practitioners can help them make sure changes are made visible, authorized, deployed completely and accurately, find differences Create and enforce a culture of change management and causality
  8. Source: Flickr: birdsandanchors
  9. Source: RyanJLane