Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Turn the lights on
Creating a culture of ownership and trust
With Visibility & transparency
Shai Peretz
About myself:
- Technologist, devops culture advocate
- Held various technology management positions (Outbrain, Cyota,
Shopping.com among others)
- Piano player, mostly Jazz
- Amateur photographer
- Co-founded a waldorf education (Anthroposophy) school in Tel Aviv
So, what is a culture of Ownership?
You’ve build it, you run it!
- You understand best how it is built, and what can go
wrong, hence:
- Responsible to take your code to production
- Create the tests and monitor them
- Set monitoring and alert thresholds
- Decide what is critical and what isn’t
- Say what actions to take when something goes wrong
- Receive the critical alerts and act upon them
- Automate any possible action, so that it will not wake you
up next time:)
What is a culture of Ownership?
- Nothing is ‘not my problem’:
- First verify that all my systems are working as they should, then look
elsewhere
- Be transparent and don’t hide your mistakes
- Culture of cooperation and helping others (production party:)
What is a culture of Ownership?
Learn from your mistakes:
- Document your actions (automatically)
- Blameless take-ins (post mortem):
- Lead by the event manager, as close as you can to the event
- Include all stakeholders
- 5 whys methodology
- Create tasks with due dates and priority
- Go back and check that tasks are done
Celebrate failure -
it is the best opportunity to learn!
Creating a Culture of Ownership and Trust with Visibility and Transparency by Shai Peretz
So you have built a new service…
It is a really great app, smart and useful, people love it.
It is responsive, using all latest technology and buzzwords.
It is communicating with dozen other services via efficient APIs
It is collecting tons of data, process and move it via latest message queues, store
it in several data stores
It grabs the data back using a smart search engine and oops…
The user is getting an error. What happened?
Something went wrong...
Is it the search engine?
Or maybe the data store?
Or maybe a problem with the network?
Or maybe a broken API call?
Or the reverse proxy is down?
Or...
You are in the dark.
Can someone please turn the lights on?
Well, you can start looking for the problem with a flashlight...
But you only have 5 seconds...
That’s because you have committed to a 99.95% SLA, and you have used most of
your allowed downtime already:(
And your system is complicated...
So let’s monitor and log everything!
We now have:
3 millions application metrics per minute +
1 million system metrics per minute +
750,000 log lines per minute
75 different dashboards rotating on six 65” monitors.
Is that enough light?
No. that’s too much light.
Which still leaves us in the dark.
What we need is some filters:
That’s better:)
Now we can see some details...
Yet, we can’t find where the problem is...
That’s because we have too much information. Well, at least for a human being.
Why don’t we let machines deal with it?
That’s exactly what they are built for*:
- Process tons of information in fractions of a second
- Correlate data from many different sources
- Analyze the data, search for anomalies
- Act upon it automatically
(or at least notify someone)
* Assuming the humans who programmed those machines did a good job:)
Great visibility:
Encourage Prevention - helps preventing problems before
they occur, by forcing you to consider most possible
problems in advance
Enables automatic, self healing when possible
And if not - provides us with a laser focus pointer into where
the problem is, in a timely manner (near real time) and allows
us to fix quickly (and automate for next time:)
So what tools should we use?
It doesn't really matter.
Well, it actually does:)
Choose the right set of tools for your organization, that you are comfortable with
as long as you get good coverage of:
- Automatic testing (visibility of the build & deploy pipeline)
- Infrastructure/system metrics and logs
- Application level metrics and logs
- External (user experience) monitoring
- Prediction and anomaly detection
Select tools that you trust and make their availability first priority!
Benefits of good visibility
Enabler for Agile and DevOps culture -
easier to take responsibility, better communication
Drives quality up (both code and infrastructure)
Improves MTTD, MTTR and MTTS (better SLA)
Reduce frustration and improve productivity
Helps to achieve business goals
Now, Who volunteers to monitor the monitoring system??
Monitoring the monitoring system
No alerts. All dashboards are green. Does it really mean all is good?
Not necessarily…
You have to verify:
- Set another layer of independent monitoring, outside your network
- Create ‘positive’ checks, that confirms the system is up
If you don’t trust your monitoring system, it is useless!
Ownership + Transparency => Trust
Bring facts to your discussions
Take ownership on your stuff
Share your mistakes
Don’t blame others
When trust exist, people are more cooperative and open to learn =>
problems are fixed faster and rarely repeat themselves
Transparency
Status pages (if done properly):
- Can save a lot of time while troubleshooting a problem
- Increase transparency, build trust
- Should be automated wherever possible
- Use multi level pages - different level of details for engineering, business and
customers
Share your plans and progres
- Especially when you have delays...
How transparent should it be?
My rule of thumb - open up everything that will not hurt your organization
In order to be able to do so:
- People need to respect confidentiality
- People should have effective filters as to what is relevant for them
T r u s t This is a fragile circle, very easy to break!
Transparency
Impact of good visibility and transparency
Visibility
Transparency
Responsibility
Ownership
Communication
Quality
Frustration
Fatigue
MTTD
MTTR
MTTS
Uptime
SLA
Revenew
Customer
satisfaction
Employee
satisfaction
Thank you for listening:)
shai.peretz@gmail.com

More Related Content

Creating a Culture of Ownership and Trust with Visibility and Transparency by Shai Peretz

  • 1. Turn the lights on Creating a culture of ownership and trust With Visibility & transparency Shai Peretz
  • 2. About myself: - Technologist, devops culture advocate - Held various technology management positions (Outbrain, Cyota, Shopping.com among others) - Piano player, mostly Jazz - Amateur photographer - Co-founded a waldorf education (Anthroposophy) school in Tel Aviv
  • 3. So, what is a culture of Ownership? You’ve build it, you run it! - You understand best how it is built, and what can go wrong, hence: - Responsible to take your code to production - Create the tests and monitor them - Set monitoring and alert thresholds - Decide what is critical and what isn’t - Say what actions to take when something goes wrong - Receive the critical alerts and act upon them - Automate any possible action, so that it will not wake you up next time:)
  • 4. What is a culture of Ownership? - Nothing is ‘not my problem’: - First verify that all my systems are working as they should, then look elsewhere - Be transparent and don’t hide your mistakes - Culture of cooperation and helping others (production party:)
  • 5. What is a culture of Ownership? Learn from your mistakes: - Document your actions (automatically) - Blameless take-ins (post mortem): - Lead by the event manager, as close as you can to the event - Include all stakeholders - 5 whys methodology - Create tasks with due dates and priority - Go back and check that tasks are done Celebrate failure - it is the best opportunity to learn!
  • 7. So you have built a new service… It is a really great app, smart and useful, people love it. It is responsive, using all latest technology and buzzwords. It is communicating with dozen other services via efficient APIs It is collecting tons of data, process and move it via latest message queues, store it in several data stores It grabs the data back using a smart search engine and oops… The user is getting an error. What happened?
  • 8. Something went wrong... Is it the search engine? Or maybe the data store? Or maybe a problem with the network? Or maybe a broken API call? Or the reverse proxy is down? Or...
  • 9. You are in the dark.
  • 10. Can someone please turn the lights on? Well, you can start looking for the problem with a flashlight...
  • 11. But you only have 5 seconds... That’s because you have committed to a 99.95% SLA, and you have used most of your allowed downtime already:( And your system is complicated...
  • 12. So let’s monitor and log everything!
  • 13. We now have: 3 millions application metrics per minute + 1 million system metrics per minute + 750,000 log lines per minute 75 different dashboards rotating on six 65” monitors. Is that enough light?
  • 14. No. that’s too much light. Which still leaves us in the dark. What we need is some filters:
  • 15. That’s better:) Now we can see some details...
  • 16. Yet, we can’t find where the problem is... That’s because we have too much information. Well, at least for a human being.
  • 17. Why don’t we let machines deal with it? That’s exactly what they are built for*: - Process tons of information in fractions of a second - Correlate data from many different sources - Analyze the data, search for anomalies - Act upon it automatically (or at least notify someone) * Assuming the humans who programmed those machines did a good job:)
  • 18. Great visibility: Encourage Prevention - helps preventing problems before they occur, by forcing you to consider most possible problems in advance Enables automatic, self healing when possible And if not - provides us with a laser focus pointer into where the problem is, in a timely manner (near real time) and allows us to fix quickly (and automate for next time:)
  • 19. So what tools should we use? It doesn't really matter. Well, it actually does:) Choose the right set of tools for your organization, that you are comfortable with as long as you get good coverage of: - Automatic testing (visibility of the build & deploy pipeline) - Infrastructure/system metrics and logs - Application level metrics and logs - External (user experience) monitoring - Prediction and anomaly detection Select tools that you trust and make their availability first priority!
  • 20. Benefits of good visibility Enabler for Agile and DevOps culture - easier to take responsibility, better communication Drives quality up (both code and infrastructure) Improves MTTD, MTTR and MTTS (better SLA) Reduce frustration and improve productivity Helps to achieve business goals
  • 21. Now, Who volunteers to monitor the monitoring system??
  • 22. Monitoring the monitoring system No alerts. All dashboards are green. Does it really mean all is good? Not necessarily… You have to verify: - Set another layer of independent monitoring, outside your network - Create ‘positive’ checks, that confirms the system is up If you don’t trust your monitoring system, it is useless!
  • 23. Ownership + Transparency => Trust Bring facts to your discussions Take ownership on your stuff Share your mistakes Don’t blame others When trust exist, people are more cooperative and open to learn => problems are fixed faster and rarely repeat themselves
  • 24. Transparency Status pages (if done properly): - Can save a lot of time while troubleshooting a problem - Increase transparency, build trust - Should be automated wherever possible - Use multi level pages - different level of details for engineering, business and customers Share your plans and progres - Especially when you have delays...
  • 25. How transparent should it be? My rule of thumb - open up everything that will not hurt your organization In order to be able to do so: - People need to respect confidentiality - People should have effective filters as to what is relevant for them T r u s t This is a fragile circle, very easy to break! Transparency
  • 26. Impact of good visibility and transparency Visibility Transparency Responsibility Ownership Communication Quality Frustration Fatigue MTTD MTTR MTTS Uptime SLA Revenew Customer satisfaction Employee satisfaction
  • 27. Thank you for listening:) shai.peretz@gmail.com