Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

SRE Foundation V1 - 0 - Value Added Resources 11 - 2019

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

SRE Foundation Course: Value Added Resources 

 
 
This document provides links to articles and videos related to the Site Reliability Engineering 
(SRE) course from the DevOps Institute. This information is provided to enhance your 
understanding of SRE-related concepts and terms and is not examinable. Of course, there 
is a wealth of other videos, blogs and case studies on the web. 
We welcome suggestions for additions. 
 
 

Videos Featured in the Course 


 
Module  Title & Description  Link 

1: SRE Principles & Practices  ‘What's the Difference  https://youtu.be/uTEL8Ff1Zvk 


Between DevOps and SRE?’ 
with Seth Vargo and Liz 
Fong-Jones of Google (05:10) 

2: Service Level Objectives &  ‘Risk and Error Budgets’ with  https://youtu.be/y2ILKr8kCJU 
Error Budgets  Seth Vargo and Liz Fong-Jones   
of Google (06:17) 

3: Reducing Toil  ‘Pragmatic Automation’ with  https://www.youtube.com/wat


Max Luebbe of GCP (04:45)  ch?v=oDcjAcFTFC0&t=0m56s 
 

4: Monitoring & Service Level  ‘SLI & Reliability Deep-Dive’  https://www.youtube.com/wat


Indicators  with David N. Blank-Edelman  ch?v=1iMo3SkdQqQ  
of Microsoft (08:35)   

5: SRE Tools & Automation  ‘Ironies of Automation: A  https://www.youtube.com/wat


Comedy in Three Parts’ with  ch?v=U3ubcoNzx9k 
Tanner Lund of Microsoft (18:32) 

6: Anti-Fragility & Learning from  ‘Sloth, a Tool for Inducing  https://www.usenix.org/confer


Failure  Network Failures’ with Preetha  ence/srecon17americas/prog
Appan of Indeed.com (04:45)  ram/presentation/appan 

7: Organizational Impact of SRE  ‘A History of SRE at Uber’ with  https://www.youtube.com/wat


  Rick Boone of Uber (06:24)  ch?v=qJnS-EfIIIE 
 

8: SRE, Other Frameworks, Trends  ‘A Look at ITIL4 & SRE’ with  https://dev.tube/video/vFyPXI
Jayne Groll of DevOps Institute  sUEhE 
(11:25)   

© DevOps Institute unless otherwise stated 1 


SRE Foundation Course: Value Added Resources 
 
 
SRE Reports 
 
Report Name  Writers/Publishers  Link 
2019 SRE Report  Catchpoint  http://pages.catchpoint.com/
SRE-Report-2019.html 
 

What is SRE?   Kurt Andersen & Craig Sebenik  https://www.oreilly.com/librar


from O’Reilly Media  y/view/what-is-sre/9781492054
429/  
 
 

SRE Articles 
 
Article Title & Author  Relevant Module  Link 
‘Which Factors Affect  1: SRE Principles &  https://www.ncbi.nlm.nih.gov/pmc/art
Software Projects  Practices  icles/PMC3610582/ 
Maintenance Cost More?’ by   
Sayed Mehdi Hejazi 
Dehaghani and Nafiseh 
Hajrahimi 

‘Measuring and Evaluating  1: SRE Principles &  https://medium.com/@serhatcan/me


Service Level Objectives  Practices  asuring-and-evaluating-service-level-o
(SLOs)’ by Serhat Can  bjectives-slos-84b0dc740a0a 

‘Bloomberg Bets Big on SREs’  1: SRE Principles &  https://www.techatbloomberg.com/bl


by Michael Rembetsy  Practices  og/bloomberg-bets-big-on-sres/ 

‘Site Reliability Engineering at  1: SRE Principles &  https://player.fm/series/devops-chat/si


Bloomberg’ by Stig Sorensen  Practices  te-reliability-engineering-sre-bloomber
g-w-stig-Sorenson 

‘What It Means To Be A Site  1: SRE Principles &  https://dev.to/molly_struve/what-it-me


Reliability Engineer’ by Molly  Practices  ans-to-be-a-site-reliability-engineer-32k
Struve  i 

‘Error Budgets – Practical  2: SLO’s & Error Budgets  https://www.slideshare.net/yaroslavm


Implementation’ by Yaroslav  olochko/implementing-error-budgets-1
Molochko  25400822 

‘How to Avoid the 5 SRE  2: SLO’s & Error Budgets  https://thenewstack.io/how-to-avoid-t


Implementation Traps that  he-5-sre-implementation-traps-that-cat
Catch Even the Best Teams’  ch-even-the-best-teams/ 
by Lyon Wong   

© DevOps Institute unless otherwise stated 2 


SRE Foundation Course: Value Added Resources 
 
‘Site Reliability Engineering:  2: SLO’s & Error Budgets  https://www.appdynamics.com/blog/
DevOps 2.0’ by Saba Anees  engineering/site-reliability-engineering-
devops-2-0/ 

‘Getting Started with Site  2: SLO’s & Error Budgets  https://www.devops.talksplus.com/wp


Reliability Engineering’ by  -content/themes/dotc/2019_Melbourn
Jennifer Petoff  e/presentations/Getting%20Started%2
0with%20Site%20Reliability%20Engineeri
ng%20(Jennifer%20Petoff%20DOTC%20
Deck).pdf 

‘Invent More, Toil Less’ by  3: Reducing Toil  https://storage.googleapis.com/pub-t


Betsy Beyer, Brendan Gleason,  ools-public-publication-data/pdf/4576
Dave O’connor and Vivek  5.pdf 
Rau 

‘SRE Lessons: Continuously  3: Reducing Toil  https://www.rundeck.com/blog/sre-les


Optimize to Reduce Toil’ by  sons-continuously-optimize-to-reduce-t
Damon Edwards  oil 

‘Toil: Finally a Name For a  3: Reducing Toil  https://www.rundeck.com/blog/toil-fin


Problem We've All Felt’ by  ally-a-name-for-a-problem 
Damon Edwards 

‘SRE Lessons: Continuously  3: Reducing Toil  https://www.rundeck.com/blog/sre-les


Optimize to Reduce Toil’ by  sons-continuously-optimize-to-reduce-t
Damon Edwards  oil 

‘Site Reliability Engineering  3: Reducing Toil  https://www.oreilly.com/ideas/site-reli


(SRE): A Simple Overview’ by  ability-engineering-sre-a-simple-overvi
Mac Slocum  ew 

‘What Is SRE?’ by Craig  3: Reducing Toil  https://www.oreilly.com/library/view/w


Sebenik & Kurt Andersen  hat-is-sre/9781492054429/ 

‘Is It Worth the Time?’ by Xkcd  3: Reducing Toil  https://imgs.xkcd.com/comics/is_it_wo


rth_the_time.png 

‘An Engineer’s Guide To SLA,  4: Monitoring & Service  https://plumbr.io/blog/monitoring/an-


SLO, and SLI’ by Ram Lyengar  Level Indicators  engineers-guide-to-sla-slo-and-sli 

‘Service Level Indicators in  4: Monitoring & Service  https://medium.com/@jerub/service-le


Practice’ by Stephen Thorne  Level Indicators  vel-indicators-in-practice-6a1125e24b
ee 

‘Stop Using Nagios (So It Can  4: Monitoring & Service  http://www.slideshare.net/superdupers


Die Peacefully)’ by Andy  Level Indicators  heep/stop-using-nagios-so-it-can-die-p
Sykes  eacefully 

‘Why Does (My) Monitoring  4: Monitoring & Service  https://www.usenix.org/conference/sr


Suck?’ by Todd Palion  Level Indicators  econ19asia/presentation/palino-monit
oring 

© DevOps Institute unless otherwise stated 3 


SRE Foundation Course: Value Added Resources 
 
‘Observability — A 3-Year  4: Monitoring & Service  https://thenewstack.io/observability-a-
Retrospective’ by Charity  Level Indicators  3-year-retrospective/ 
Majors 

‘Monitoring and Observability  4: Monitoring & Service  https://thenewstack.io/monitoring-and


— What’s the Difference and  Level Indicators  -observability-whats-the-difference-an
Why Does It Matter?’ by Peter  d-why-does-it-matter/ 
Waterhouse  

‘3 Ways to Reduce Alert Noise  4: Monitoring & Service  https://www.metricly.com/3-ways-red


in Monitoring’ by Christina  Level Indicators  uce-alert-noise/ 
DiSomma 

‘Observability and  4: Monitoring & Service  https://www.infoq.com/articles/charity


Understanding the  Level Indicators  -majors-observability-failure/ 
Operational Ramifications of a 
System’ by Charity Majors 

‘Run a Service Level Indicator  4: Monitoring & Service  https://gds-way.cloudapps.digital/stan


(SLI) workshop’ BY GDS  Level Indicators  dards/slis.html 

‘The Evolution of Automation  5: SRE Tools &  https://landing.google.com/sre/sre-bo


at Google’ by Niall Murphy  Automation  ok/chapters/automation-at-google/ 

‘SRE at the Department for  5: SRE Tools &  https://dwpdigital.blog.gov.uk/catego


Work and Pensions’ by various  Automation  ry/site-reliability-engineering-sre/ 

‘Measuring and Evaluating  5: SRE Tools &  https://www.atlassian.com/blog/opsg


Service Level Objectives  Automation  enie/measuring-and-evaluating-servic
(SLOs)’ by Serhat Can  e-level-objectives 

‘Best NoSQL Databases 2019’  5: SRE Tools &  https://www.improgrammer.net/most-


Automation  popular-nosql-database/ 

‘On-Call Tools to Support a  5: SRE Tools &  https://victorops.com/blog/devops-on


DevOps Culture’ by Dan  Automation  -call-tools-to-support-culture 
Holloran 

‘Awesome Site Reliability  5: SRE Tools &  https://github.com/squadcastHQ/awe


Engineering Tools’ by Raghu  Automation  some-sre-tools 
Chinnannan 

‘Security & Compliance’ by  5: SRE Tools &  https://www.ansible.com/use-cases/se


Ansible  Automation  curity-and-compliance 

‘Secure Coding Best  5: SRE Tools &  https://www.owasp.org/images/0/08/


Practices’ by OWASP  Automation  OWASP_SCP_Quick_Reference_Guide
_v2.pdf 

‘Testing in Production, the safe  5: SRE Tools &  https://medium.com/@copyconstruct/


way’ by Cindy Sridharan  Automation  testing-in-production-the-safe-way-18c
a102d0ef1 

© DevOps Institute unless otherwise stated 4 


SRE Foundation Course: Value Added Resources 
 
‘Amazon Andon Cord: What it  5: SRE Tools &  https://blueboard.io/resources/amazo
is and how to react’ by  Automation  n-andon-cord/ 
Velentin Bayard 

‘DevOps Tools Landscape’ by  5: SRE Tools &  https://about.gitlab.com/devops-tools


GitLab  Automation  / 

‘Measure Efficiency,  6: Antifragility &  http://devopsenterprise.io/media/DOE


Effectiveness, and Culture to  Learning from Failure​  S_forum_metrics_102015.pdf 
Optimize DevOps 
Transformations’ by IT 
Revolution 

‘Tracking Every Release’ by  6: Antifragility &  https://codeascraft.com/2010/12/08/tr


Mike Brittain  Learning from Failure​  ack-every-release/ 

‘A recovery point objective  6: Antifragility &  https://whatis.techtarget.com/definitio


(RPO)’ by Margaret Rouse  Learning from Failure​  n/recovery-point-objective-RPO 

‘The Learning Organization’  6: Antifragility &  https://www.slideshare.net/littleidea/th


by Andrew Shafer  Learning from Failure​  e-learning-organization-modev 

‘The Three Ways: The Principles  6: Antifragility &  https://itrevolution.com/the-three-way


Underpinning DevOps’ by  Learning from Failure​  s-principles-underpinning-devops/ 
Gene Kim 

‘A Typology of Organizational  6: Antifragility &  http://www.ncbi.nlm.nih.gov/pmc/arti


Cultures’ by R Westrum  Learning from Failure​  cles/PMC1765804/pdf/v013p0ii22.pdf 

‘Do You Want Your Cloud  6: Antifragility &  https://medium.com/@armankamran/


Solutions to Succeed? Start  Learning from Failure​  do-you-want-your-cloud-solutions-to-su
with Embracing Failures!’ by  cceed-start-with-embracing-failures-8f
Arman Kamran  5f40b57a64 

‘The Cost of IT Downtime’ by  7: Organizational  https://www.the20.com/blog/the-cost-


Michael Copeland  impact of SRE​  of-it-downtime/ 

‘How SRE teams are  7: Organizational  https://cloud.google.com/blog/produ


organized, and how to get  impact of SRE​  cts/devops-sre/how-sre-teams-are-org
started’ by Matt Brown  anized-and-how-to-get-started 

‘Kubernetes Up & Running’ by  7: Organizational  https://clouddamcdnprodep.azureed


Brendan Burns, Joe Beda &  impact of SRE​  ge.net/gdc/gdckTlBtc/original 
Kelsey Hightower 

‘Blameless PostMortems and a  7: Organizational  https://codeascraft.com/2012/05/22/b


Just Culture’ by John Allspaw  impact of SRE​  lameless-postmortems/ 

‘The Prime Directive’ by Norm  7: Organizational  https://retrospectivewiki.org/index.php


Kerth  impact of SRE​  ?title=The_Prime_Directive 

‘Creating Antifragile Systems:  7: Organizational  https://www.contino.io/files/Enterprise-


Site Reliability Engineering for  impact of SRE​  Site-Reliability-Engineering-Contino.pdf 
the Enterprise’ by Contino 

© DevOps Institute unless otherwise stated 5 


SRE Foundation Course: Value Added Resources 
 
‘Scaling SRE Organizations:  7: Organizational  https://www.usenix.org/sites/default/fil
The journey from 1 to many  impact of SRE​  es/conference/protected-files/sre19a
teams’ by Gustavo Franco  mer_slides_franco.pdf 

‘The Convergence of  8: SRE, Other  http://itrevolution.com/the-convergen


DevOps’ by John Willis  Frameworks, Trends  ce-of-devops/ 

‘Site Reliability Engineer (SRE)  8: SRE, Other  https://victorops.com/blog/site-reliabili


Roles and Responsibilities’ by  Frameworks, Trends  ty-engineer-sre-roles-and-responsibilitie
Dan Holloran  s 

‘How ITIL4 and SRE align with  8: SRE, Other  https://techbeacon.com/enterprise-it/


DevOps’ by Jayne Groll  Frameworks, Trends  how-itil4-sre-align-devops 

‘Future of Reliability  8: SRE, Other  https://michael-kehoe.io/tags/future-o


Engineering’ by Michael  Frameworks, Trends  f-sre/ 
Kehoe 

‘An Introduction to Database  8: SRE, Other  https://softwareengineeringdaily.com/


Reliability’ by Mackenzie Clark  Frameworks, Trends  2018/10/16/an-introduction-to-databa
se-reliability/ 

‘Stop the Arguments: ITIL v4  8: SRE, Other  https://devopsinstitute.com/2019/11/


and SRE and DevOps All Are  Frameworks, Trends  05/stop-the-arguments-itil-v4-and-sre
Transformation Aids​’    -and-devops-all-are-transformation-a
ids/  
 
 
Websites 
 
Title  Link 
Usenix  https://www.usenix.org  

Honeycomb  https://www.honeycomb.io/  

Player FM – DevOps Chat  https://player.fm/series/devops-chat  

SRE Weekly  https://sreweekly.com/ 

Netflix  https://github.com/Netflix  

Downdetector  https://downdetector.co.uk  
 
 

 
 
 
 
© DevOps Institute unless otherwise stated 6 
SRE Foundation Course: Value Added Resources 
 
 
SRE Blogs  
 
Blog  Link 
AppDynamics Blog  https://www.appdynamics.com/blog 

Atlassian Blog  https://www.atlassian.com/blog  

Prometheus Blog  https://prometheus.io/blog/ 

Rundeck Blog  https://www.rundeck.com/blog  

Tech At Bloomberg  https://www.techatbloomberg.com/blog 

VictorOps Blog  https://victorops.com/blog 


 
 

Additional Videos of Interest  


 
Relevant Module  Title  Link 
2: SLO’s & Error Budgets  ‘SLOs for Data-Intensive  https://www.youtube.com/wa
Services’ with Yoann Fouquet  tch?v=ZdguHXglT8M&feature=
(23:47)  youtu.be  

2: SLO’s & Error Budgets  ‘Latency SLOs Done Right’  https://www.youtube.com/w


with Heinrich Hartmann  atch?v=ycsc2kCaJxM&featu
(27:12)  re=youtu.be 

4: Monitoring & Service Level  ‘Building a Scalable  https://www.youtube.com/w


Indicators  Monitoring System’ with Molly  atch?v=vl1ecpFohZQ&featur
Struve (26.48)  e=youtu.be 
 
 
 

SRE Books 
 
Title  Author  Link 
Site Reliability Engineering  Betsy Beyer, Chris Jones,  https://landing.google.com/sre/
Jennifer Petoff and Niall Richard  sre-book/toc/index.html  
Murphy 

The Site Reliability Workbook  Betsy Beyer, Niall Richard  https://landing.google.com/sre/


Murphy, David K. Rensin, Kent  workbook/toc/  
Kawahara and Stephen Thorne   
 

© DevOps Institute unless otherwise stated 7 


SRE Foundation Course: Value Added Resources 
 
Facts and Fallacies of Software  Robert L. Glass  https://www.amazon.com/Fact
Engineering  s-Fallacies-Software-Engineering
-Robert/dp/0321117425 

Chaos Engineering  Ali Basiri, Nora Jones, Aaron  https://www.oreilly.com/library/


Blohowiak, Lorin Hochstein,  view/chaos-engineering/978149
Casey Rosenthal  1988459/  
 

 
Case Stories Featured in the Course 
 
Company  Module  Link 
Accenture  3: Reducing Toil  https://techbeacon.com/devops/how-accenture-retrofitted-s
ite-reliability-engineering 

Bloomberg  1: SRE Principles  ● https://player.fm/series/devops-chat/site-reliability-


& Practices  engineering-sre-bloomberg-w-stig-sorenson 
● https://www.techatbloomberg.com/blog/bloomb
erg-bets-big-on-sres/ 
● https://www.ca.com/us/modern-software-factory/
content/outsmarting-outages-bloomberg-banks-o
n-sre-for-reliability.html 

Evernote  2: SLO's & Error  https://landing.google.com/sre/workbook/chapters/slo-engin


Budgets  eering-case-studies/ 

Home  2: SLO's & Error  https://landing.google.com/sre/workbook/chapters/slo-engin


Depot  Budgets  eering-case-studies/ 

Netflix  6: Antifragility  https://github.com/Netflix/SimianArmy 


and Learning 
from Failure 

Sage Group  7: Organizational  https://www.meetup.com/DevOpsNorthEast/events/26226323


Impact of SRE  1/ 

Standard  5: SRE Tools &  https://www.youtube.com/watch?v=d5IMvK0YHTg 


Chartered  Automation 

Trivago  4: Monitoring &  https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=


SLI's  web&cd=11&cad=rja&uact=8&ved=2ahUKEwj4m6HJ9qXj
AhU_QEEAHX6-CgQQFjAKegQIABAC&url=http%3A%2F%2F
pages.catchpoint.com%2Frs%2F005-RHC-551%2Fimages%
2FCatchpoint-Case-Study-Trivago.pdf&usg=AOvVaw3UUc
Z1bqtes0_99EYcBZSm 

VictorOps  8: SRE, Other  https://victorops.com/blog/site-reliability-engineer-sre-roles-a


(Splunk)  Frameworks,  nd-responsibilities 
Trends 
 

© DevOps Institute unless otherwise stated 8 

You might also like