Q1 2020-Epic-Failures-Volume-2
Q1 2020-Epic-Failures-Volume-2
Q1 2020-Epic-Failures-Volume-2
DevSecOps
VOLUME 2
Provided courtesy of
Epic Failures in
DevSecOps
VOLUME 2
Copyright © 2020-2021 Sonatype
ISBN: 9798600197497
Imprint: DevSecOps Days Press
Publisher:
DevSecOps Days Press
48 Wall Street, 5th Floor
New York, NY 10005
www.devsecopsdays.com
Epic Failures in
DevSecOps
VOLUME 2
JANUARY 2020
Introduction
Another year, another nine failures. That’s just the way it goes when you ask to
hear stories about Epic Failures. Everybody’s got one.
Last year, in 2019, when we published the first Epic Failures book, we had no idea
whether anyone would be interested. Thousands of downloads later, with literally
thousands of printed books being handed out at conferences, we now know the
truth: The community loves failure. At least hearing about other people’s failures.
Volume 2 builds on the momentum created by Volume 1. The quality of this set
of stories is outstanding, starting with “Question Everything” by the DevSecOps
Contrarian herself, Eliza May Austin. Larry Maccherone, “Shift Left, Not S#!T
Left,” runs through a list of six ways to fail on a project, while “Making Everyone
Visible In Tech” by Jaclyn Damiano tells a personal story of life transformation
through tech. Sladjana Jovanovic and Bill McArthur from TD Bank take on the
Herculean task of breaking down silos to create a collaborative environment in
“From Silos to Communities.” “What Not to Do When Rolling Out DevSecOps
and Other Learnings” comes from Jasmine James in her work as an IT Manager
at Delta Airlines. Marc Cluet takes on cultural transformation in “Cultural
Approaches to Transformations: Staying Safe and Healthy,” while Ryan
Lockard focuses on “The Seven Deadly Sins of DevSecOps.” Chris Riley asks us
the meme-inducing question, “Did You Try Turning it Off and On?” when exam-
ining simple bugs that have debilitating outcomes, and we finish with IT sceptic
Rob England and Dr. Cherry Vu encouraging us to “Kill the Restructure.”
One of the motivations for publishing a book like this is for you, the reader, to be
able to identify with the stories. It is our hope that you will see yourself some-
where within these pages and know you did not screw things up on your own,
that you are not unique when it comes to hosing a project, and sometimes you can
sit back and laugh after it’s all over. It’s important to be able to step back and get
perspective when something goes wrong — and even better — to let others know
what happened.
viii
Introduction
I’m proud to give these authors the chance to tell their stories and hope it encour-
ages you to let others know about your Epic Failures. Who knows? You might end
up telling your story in Volume 3 of Epic Failures in DevSecOps.
Mark Miller
Founder and Editor, Epic Failures in DevSecOps
Co-founder, All Day DevOps
Senior Storyteller, Sonatype
ix
CHAPTER 1
Question Everything
by Eliza May Austin
Epic Failures in DevSecOps
CHAPTER 1
Question Everything
The biggest challenge with making changes is making those changes stick. I think
that’s only one half of the problem. As a security professional, I’m concerned that
some things “stick” too well, becoming unchallengeable, infallible, and immovable.
They become the new paradigm and anyone who doesn’t “get it” is a heretic. That’s
how it was for DevOps ten years ago, and it appears to be the case for DevSecOps today.
Don’t get me wrong, I believe passionately in security and I can see the advantages of a
DevSecOps approach. It’s just that what I see organisations do on the ground does not
always line up with what I believe DevSecOps should be.
In the early days of DevOps there was a saying: “I wouldn’t want my life support
system developed by DevOps.” That made sense. You can’t “fail fast and roll back”
a person who died because of bugs in the software. This lesson has much to teach
us in DevSecOps too. There are times when DevSecOps is the right thing to do.
There are times when, despite the name, DevSecOps has delivered some horri-
bly insecure code. We need to get to grips with this before someone outside of
Development, Security, or Operations loses faith in us.
Some of you will come from environments where DevSecOps is working perfectly
— you have been indoctrinated by the religion, drank the Kool Aid, and see little
room left for improvement. I’d ask you to reflect a little. What are you bringing
to the table? Is it Dev, Sec, or Ops? I would bet my favourite pair of heels that
you’re not a security professional. Despite what the name will lead you to believe,
DevSecOps is little more than DevOps if you’re doing it well. It is, more often than
not, nothing more than just another label.
Part of being a security professional involves asking the difficult questions and chal-
lenging the status quo, the trends, processes, and hierarchy. The aim of the security
professional is to see the wider picture, not concentrate solely on oneself. Overall,
the mindset should be: “How can we support the business in XYZ without compro-
mising the security and the integrity of the project or topic in hand?” I’m willing to
put myself in the firing line, voice my observations, and hopefully carve out room
2
Chapter 1: Question Everything
for improvement. I believe in a future for DevSecOps, but that means I’m willing to
take the time to rip it up with the aim of improving it. I see the biggest epic failure of
DevSecOps is its ability to help security, and this will be my running theme.
So I ask you, “Is this all hype?” Is this just today’s buzzword, trend, or paradigm?
What does successful DevSecOps look like? And is it too early to measure success?”
Fast forward a few more years and security professionals became more than just
the geeks who look after the firewall. Security teams became the norm in the big
corporates. Infrastructure was secured, architecture improved, and ultimately
code was scrutinised. All too often the vulnerabilities found could be fixed with a
minor change to the source code — source code that security professionals, quite
rightly, have no authority to edit. This led security professionals to “educate”
developers on how to rectify the code. The challenge was that these security “fixes”
were seen as an afterthought and the same issues would arise again and again. The
security professionals became increasingly frustrated and poked at the developers.
Despite needing one another and being unable to fulfill the other’s role, a fric-
tion-driven divide drenched in insecurities and intellectual masturbation ensued.
In much the same way Ops “muscled in” on Dev when code became increasingly
unsupportable, Sec began to engage more closely with DevOps.
DevSecOps is, in my mind, an unnatural, somewhat forced, but very much needed
evolution of a faster and more dynamic digital world. We need to utilise the
knowledge of security professionals in the development lifecycle to make appli-
cations and systems more secure. We need to bring in the knowledge of IT Ops to
make products useful and distributional. Fractured communication channels need
repair. They need strengthening and DevSecOps needs to become a thing — and to
some degree it has — but simply using the terminology doesn’t mean the process
is actually working.
3
Epic Failures in DevSecOps
4
Chapter 1: Question Everything
pen-test reports on their own, when simply having a conversation would help
them get things done in a quarter of the time. But they aren’t alone.
I believe the noise around the skill shortage has created a complacency issue in
SOCs, where all too many people lack ingenuity or the ability to understand the
attacker’s tactics, techniques, and processes and instead rely solely on tools. Blue
Teamers often see an ego problem in the offensive security side so they don’t bother
to engage. On the flip side, Red Teamers see nothing but “ticket monkeys”’ when
they look at a SOC (and sometimes they are right). Ultimately, two fundamental
technical teams that would produce magic if they worked together, instead become
completely disengaged with each other. Throw into the mix GRC being labelled as
“pencil pushers” who fail to make the distinction between being compliant and
being secure, and you have a further disenfranchised security industry. Expecting
security to simply jump onboard the “Happy-Happy-DevSecOps-Rainbow-Culture
Train” isn’t going to just happen without proof that it works, and quite rightly so.
SCARED SILENT
Here is another example of the lack of respect for security functions within
cyber security by their peers in security. An MSSP I worked with/for a large SOC
employed with varying skill sets and a separate team carrying the title “security
engineering.” This team managed the tools that the SOC used and deployed the
products that the MSSP sold to their customers. Because the SOC were using these
tools along with data from client environments each day, it was up to them to
make recommendations for refinement of these tools to the engineering team.
5
Epic Failures in DevSecOps
Although it is common sense that this is the way to run the refinement process,
all of the junior members of the SOC were too scared to propose refinements to
the engineering team. And, the Senior Analysts that suggested refinements were
often dismissed by the Security Engineering team. I remember a heated altercation
where I was told that, as a Senior Analyst, I had no place telling an engineer how
to refine the tools —the very tools I was using every day! That says it all, doesn’t it?
Essentially, the talent needed for the Sec in DevSecOps is currently working in a
cultural minefield. An easy integration into these foreign lands, where a whole
new cultural adaptation is needed, is not going to happen without a little coaxing
and a lot of guidance.
NOT MY PROBLEM
A company I consulted at via an MSSP pushed out an application on an externally
facing platform, and it was DDoSed within minutes. When questioned how the
application was sent into the world in such a poor and insecure state, the response
was, “well, it went through the DevSecOps team!” This added another layer of ani-
mosity between DevSecOps and other teams. The DevSecOps team, by definition,
should not be allowing poorly configured systems to enter production, and they
certainly should not be seen as an alternative to a security team, especially if there
already is a security team. What a way to add in more bureaucracy!
6
Chapter 1: Question Everything
mastery in and of themselves. What on earth are we doing calling this a single job
role? Who are we trying to kid? A part of me thinks it’s a money-saving exercise,
getting one person to do the job of three and by calling it something fancy. This is
not what DevSecOps is about.
Business Implementation
A study conducted by the managed services provider Claranet states that 88%
of UK businesses have adopted DevSecOps, or plan to within the next 2-5 years,
yet only 19% are confident about security integration. Here again is the giant
pink elephant in the room. Essentially this survey tells us that companies have
adopted DevOps but called it DevSecOps, right? When we take into account that,
by design, DevOps outpaces security and business processes, this is just going to
further feed problems. Are we just knowingly putting business and personal data
at an increased risk of compromise, while claiming to be doing the opposite?
When I come across people who are DevSecOps Engineers, DevSecOps Analysts,
or even DevSecOps Evangelists (WTF is that?), they have a very aggressive, protec-
tive stance about DevSecOps, as if merely questioning its fabric is a direct attack
on them. This tells me the fabric from which their role is built is a fragile one at
best. I’m becoming increasingly concerned at the rate with which people are label-
ling themselves DevSecOps Experts and how that will impact the movement. Self
proclaimed experts make me nervous.
Meetups and conferences that I attend usually are security focused; however, I
do make a conscious and deliberate effort to attend Dev / DevSecOps, and general
7
Epic Failures in DevSecOps
tech events to expose myself to different ideas. I try not to remain in a bubble of
self-imposed egotistical solitude.
When I ask security professionals their opinion on DevSecOps, and granted this
is who I speak with most often, I am met with responses that appear to match
the same distinct narrative: I can’t say much about it really, I don’t work in
DevSecOps,” Or, “I don’t have much involvement in that department, so I don’t
know.” Essentially DevSecOps is viewed as a completely different department
from security. Not good.
That Podcast
I was invited to participate in an All Day DevOps podcast with Mark Miller, and
we had a light-hearted discussion about DevSecOps. From this, I was labelled the
“DevSecOps Contrarian.” I went on making the assumption that everyone knew
DevSecOps was failing, but Mark was adamant that I was mistaken; DevSecOps
is working and he wanted to prove it to me. We’d both shared a prime example of
communication failure from our individual silos. Yet, I knew it wasn’t working
and he knew that it was. After the podcast however we were met with feedback
that further validated my critique. Many people felt relieved that someone finally
came out and said it.
8
Chapter 1: Question Everything
BURN-OUT IN SECURITY
Burn-out is a huge issue in cyber security. I don’t know many people who have
been in the industry longer than five years and have not been completely burnt
out by it. My theory on this is that it’s because things are constantly evolving. You
will never master security because, tomorrow, there will be a new development
that you need to learn the security around (whether it’s offence or defence is irrel-
evant). You’re never going to feel a sense of satisfaction that you know your craft
completely. This is probably why it’s well known that rates of imposter syndrome
in security are shockingly high.
There isn’t a solid way to quantitatively measure the metrics of security success.
If a SOC analyst is responding to incidents, how many are enough to consider it
a successful day? What we deem as success in security, in its rawest form, is “not
being compromised.” But how does such a vague end goal end in individual job
satisfaction? Has a threat hunter who didn’t find anything actually failed? What
if they feel like a failure, so they just keep hunting and hunting and hunting and
when they were supposed to go home at 6:00 PM they are still there at 1:00 AM sift-
ing through network traffic to try and find something to legitimise themselves?
This happens. As for forensics, once I couldn’t find evidence of malice to prove or
disprove a theory. And although I couldn’t find it, I couldn’t stop looking because
I didn’t know if I couldn’t find it because I was bad at my job or because it wasn’t
there. I was determined to prove I wasn’t rubbish so I just kept going. That really
isn’t sustainable.
Security doesn’t have a tangible endpoint, where “enough is done now” and “secu-
rity is complete.” With developers, the product is complete when it’s functional
and deemed to be so. With Ops the project is complete when its rolled out success-
fully and everyone has signed off on it. Security is complete when… well, never.
We never reach a point when security is done.
There are over two thousand security tools on the market, and in any one
9
Epic Failures in DevSecOps
environment you can be expected to know how to use up to 100 of them, maybe a
dozen day to day. Simply not knowing is often treated as unacceptable. Personally
I believe there are no such things as stupid questions. It’s stupid people who don’t
question anything. But this isn’t the norm; the security industry talks the talk on
mental-health awareness, imposter syndrome, and collaboration, but there is still
so much stigma attached to admitting you don’t know something. In my opinion,
this is an obvious reason for the lack of new blood in the sector.
OVERWORKED
There’s a systemic cultural problem in service providers. MSSPs, even the big
names, are known to over-promise and under-deliver. Sales people, service man-
agers, and C-level hotshots make a career out of overzealous but well intentioned
customer engagement. But when the over-promised inundation of work needs to
be carried out, fixing the bad practices of the business and the heavy weight of its
reputational purity must be borne upon the technologist’s shoulders.
I can’t speak for everyone in security, (and I don’t try to!), but this is certainly my
view. Consider all of the issues that plague the security community, and ask your-
self why a security professional who is overworked, nervous, and working under
high expectations as BAU would take on another, ill-tested set of responsibilities
when it’s not even a requirement imposed by most employers. Well, they wouldn’t
would they?
Trending
Things are not all bad, though. DevSecOps has become mainstream. It is so fash-
ionable that DevSecOps has become a pick-up line!
10
Chapter 1: Question Everything
Intern and didn’t really know what DevSecOps was. But he claimed, “It’s all the
rage!” before offering me his bong. It wasn’t the most romantic situation, so, I
must admit, I ended up leaving. It still makes me laugh that he thought “being a
DevSecOps Engineer” was a good chat-up line. I really wanted to get that story into
this book — it’s a great example of DevSecOps being hyped up to such an exagger-
ated level that people use it as a chat-up line with women they have never met.
Reviewing your own code and threat modelling your own application is like tak-
ing yourself to court to sue yourself. It’s likely that a pen-tester will have a great
deal more tricks to utilise in testing. It’s great for a tester to sit with the developers
upfront to correctly understand how the application is intended to be used. This is
a great example of how DevSecOps can truly make a difference.
So is it all Fluff?
Do I think it’s all fluff? No, no I certainly don’t. DevSecOps is a reaction to
what was, and still fundamentally is, a disjointed mix of technological fields.
DevSecOps is a framework I’d encourage any company which produces techni-
cal products or relies heavily on developers to adopt. I’m simply asking you to
question what and how you adopt it. I don’t want to tell you what to think, I want
to challenge you to please think. Let’s ask the difficult questions, speak the truth,
and get DevSecOps working for good.
11
Epic Failures in DevSecOps
Acknowledgments
Did anyone actually read my whole rant? In that case I’d like to thank you for
doing so. I’d like to thank Mark Miller and the team at DevSecOps Days for invit-
ing me to contribute to this book. It is both jarring and flattering to be invited to
co-exist in print with such talented and interesting people. I’d like to thank the
proofreaders, who no doubt had a puzzling experience trying to fix my horrific
grammar and sentencing issues. I honestly spell checked it but sometimes my
spellings are so bad even Google gets confused.
REFERENCES
www.vanillaplus.com/2019/03/14/45749-businesses-open-
security-risks-failing-integrate-DevSecOps-says-claranet-research
12
CHAPTER 2
Shift Left,
Not S#!T Left
by Larry Maccherone
Epic Failures in DevSecOps
CHAPTER 2
Actually, there is a better analogy than the pendulum. Have you ever witnessed
a first-time boat driver? A boat responds more slowly to steering input than
the first-time boat driver expects, so their journey is a series of over-correc-
tions, much like our Goldilocks-seeking pendulum swings. However, while the
pendulum is always covering the same ground, the boat is constantly zig-zag-
ging over new waters. That’s the situation in software engineering. While
RUP was about people and processes, its focus was on large organizations and
the approach included the micro-definition of dozens of roles. Agile also has
a lot to say about people and process, but more so, it is about product-market
fit and its focus is on development teams you can feed with two pizzas. Also,
it envisions only three roles, the Product Owner, the Scrum Master, and the
Team, filled by “T-shaped” resources. Similarly, while both XP and DevOps
address engineering practices, XP was more about code craftsmanship by a
small number of developers, while DevOps, and especially its cousin, Site
Reliability Engineering (SRE), has more to say about how a larger organization
coordinates to build and operate larger systems.
Within each of these big movements are fractile layers of zig-zag. Agile encom-
passes Scrum, Crystal, SAFe, etc. Also, an interleaving third-order fractal occurs for
the unique journey of each organization. This brings us to the focus of this chapter
— the story of Comcast’s journey towards DevSecOps. That journey was origi-
nally labeled with the phrase “shift left.” However, those early efforts often felt
more like, as Jeff Williams founder of OWASP and Contrast Security likes to say,
“s#!t left” to the development teams whose behavior the initiative was seeking to
14
Chapter 2: Shift Left, Not S#!T Left
change. Like the new boat driver, we had over-steered and had to learn from those
early “epic failures” to get to the place we are now. Before I get into the details, let
me first back up and set the stage.
I knew there was a better way because over a decade before I started at Comcast, I
was part of an initiative to try to counter this bolt-on security mindset across the
entire software development industry. At the time, I was the Software Assurance
Director of Carnegie Mellon’s CyLab. In that role, I co-led the launch of the Build-
Security-In initiative along with industry thought-leader Gary McGraw, and
Noopur Davis, a scientist at the Software Engineering Institute (also housed at
Carnegie Mellon). However, we failed to significantly move the needle at the time.
There were a number of headwinds that we were fighting.
» The security organization had the responsibility for security and thus the
budget for change initiatives. Build-Security-In was all about getting develop-
ment teams to take ownership of the problem but they didn’t have the budget
to do much.
» There is a profound lack of trust and thus little cooperation between security
and development. Given the choice between an approach that requires influ-
ence over development team behavior and something they can do on their
own, without the cooperation of the development team, the security group
will choose the latter every time.
15
Epic Failures in DevSecOps
he Agile movement had broken down the silos between development, QA, and
T
product management so that development teams felt empowered to take more
ownership of the end-to-end delivery of value.
However, the real game-changer came in the form of the DevOps movement.
DevOps gives both the motivation for security teams to change AND the means for
development teams to take ownership. The motivation is driven by the fact that
no amount of productivity improvement is going to enable throw-it-over-the-wall
security specialist groups to keep up with the speed of DevOps teams; and DevOps
automation is the means by which development teams can implement robust
security validation in the CI/CD pipeline.
So, when Noopur Davis, my Build-Security-In co-founder, started her new job as
CISO at Comcast, she asked me to join her and give our Build-Security-In idea
another shot in this new fertile ground. That was almost four years ago and I
believe we’re in a great place now, but that doesn’t mean we didn’t have some epic
failures along the way.
The first big push of our emerging DevSecOps initiative was to encourage the use
of security scanning tools in the form of desktop scans and IDE plugins. Our tool
selection effort prioritized this functionality highly. After an evaluation, selec-
tion, and a purchasing cycle, we were ready to start rolling it out. We went on a
roadshow with the vendors to eight different locations where Comcast employs
developers and gave our pitch to over 1,500 developers in the first round. We
16
Chapter 2: Shift Left, Not S#!T Left
allocated expensive tool licenses to many of them but only got 190 developers to
do an initial login and out of that 190 an even smaller number actually ran their
first scan and only a handful got into a pattern of regular scanning.
We looked at the few pockets of teams who were consistently running the scans.
These teams had installed the scan as part of their team’s CI/CD pipeline. They
were interrupting the pipeline when the current pipeline run had vulnerabilities
that exceeded the team’s current policy threshold.
The surprising insight about these teams is that despite the fact that the scans
were being run for them in their team’s pipeline, they actually had the highest
desktop scanning usage. Nobody wanted their code to be the reason the pipeline
light turned red, so they were more inclined to check their work before submitting
it to the pipeline.
We had shifted too far left too fast by expecting individual developers to change
their behavior without the intermediate step of team-level reinforcement pro-
vided by a CI/CD pipeline integration. You can think of this as a form of Chris
Roberts’, “I will fail. We will succeed.”2 You can also think of this as “s#!tting left”
on the individual developers by expecting them to change behavior by their own
willpower without giving them team-level reinforcement.
A related epic failure was that we chose tools that were optimized for desktop and
IDE usage. Some of these tools also had great CI/CD integration and team-level
functionality but not all of them so we had to change our tool mix.
LESSON A
A desktop scan is much more likely to occur when the developer knows that
a team-level scan will be highly visible. Developer behavior is much easier to
change when it’s reinforced by highly visible team norms.
LESSON B
Optimize tool selection for those that enable rapid and easy integration into
the CI/CD pipeline rather than IDE integration.
17
Epic Failures in DevSecOps
Like any large organization that has grown with acquisitions over the years, we
have a number of relatively isolated parts of the organization each with a number
of two-pizza development teams and each with their own local security directors
and security specialists. Three of those groups volunteered to help pilot a more
consistent approach to getting automated security scans. The idea was to identify
the repositories where the source code was stored and run the scans for the indi-
vidual two-pizza teams in a pipeline that a cybersecurity group maintained. We
had three completely different results from the three experiments.
1. The smallest org had only five development teams and they had by far the
most success with this approach. However, when we looked at why, it became
obvious that the success was partially a function of the relatively small size of the
organization combined with the superpowers of that local security director. She
made sure her teams got near-immediate feedback on the scan results and she had
a knack for cajoling the teams all the way to resolution of the findings. This suc-
cess was going to be hard to duplicate in larger orgs even in the unlikely event we
could find each a local director with similar superpowers, and the vast majority
of the rest of Comcast development teams had no such local security director or
specialists.
We weren’t going to mess with the success of this group but at the same time, we
didn’t think it was a good approach to apply more broadly. This reluctance was
reinforced by the fact that our larger central cybersecurity group often complained
that 60-80% of their time was spent cajoling reluctant development teams.
2 . The largest org also had the largest staff of security specialists and dedi-
cated project management resources. They quickly came up with a project plan
to get scans running for the 150 or so applications in their org. They got manage-
ment to put scan coverage metrics on their team’s quarterly goals, and they started
executing the plan.
18
Chapter 2: Shift Left, Not S#!T Left
This rollout was about 30% done when we noticed a huge problem. While the
scans were being run consistently, negative findings had essentially zero resolu-
tion. Comcast gets almost no value out of knowing exactly how many vulnerabili-
ties we have. The scale for the information value of such data is 0, a few, and many.
More information than that is just gilding. The real value only comes from the risk
reduction of removing vulnerabilities from our products. We needed the scans to
drive the rapid resolution of the findings and we weren’t seeing that. We immedi-
ately put the brakes on the rollout plan, a more difficult task than you might imag-
ine considering the rewards associated with the achievement of pre-set quarterly
goals, but we controlled the licenses so we eventually got them to listen.
We told them that the license spigot would not be turned back on until we started
to see a healthy resolution curve. They shifted most of their local security special-
ist resources towards resolution of scan findings. This involved cajoling individual
teams with whom they had about as good a relationship as the central cybersecu-
rity group did with their development teams. We started to see a shallow resolu-
tion curve but cajoling was taking up 60-80% of their effort. It was too little result
from too much effort. Perhaps more telling was the fact that the folks in that local
security specialist group were not enjoying this cajoling work. One described it as
“soul-crushing.” A few of them left and more threatened. This sub-org was recently
subject to a reorg and it is yet to be seen if that will lead to improvement or make
this problem worse.
We were left with no choice but to conclude that this too was not a sustainable,
repeatable approach. In fact, it felt very similar to the bolt-on, throw-it-over-the-
wall, prelude-epic-failure that we started with. The only distinction was that the
scans were automatically run on a more frequent cadence rather than occasionally
run by a security specialist’s hand. It was a distinction without a difference.
3. The third group was our proverbial Goldilocks “just right.” Rather than
stand up an out-of-band CI/CD pipeline, they integrated the cybersecurity scan-
ning tools into their org-level-maintained CI/CD tool as a template that could be
applied to projects. This meant that the feedback came to the team in the exact
same form as other failing tests.
A vulnerability became just a kind of defect. Resolving them became just another
aspect of their low-defect mindset. After the initial set of findings were burned
down, teams stayed at zero vulnerabilities.
19
Epic Failures in DevSecOps
LESSON C
Rather than stand up an independent pipeline, create recipes and templates
to enable easy CI/CD integration of cybersecurity scanning tools into the
team’s existing pipeline and defect resolution process so vulnerability feed-
back is responded to like any other defect feedback.
LESSON D
Insist that the development team own the problem of security for their prod-
uct and accept that security’s role is not one of gatekeeper, but rather tool
smith and advisor.
LESSON E
Create a culture where it’s more important to do the right thing at all times
even if that means someone will miss their quarterly metric goals. This
might mean you have to get a policy changed, an exception issued, or go up
against the boss or the legal department.
» Less than half of the teams we were working with at the time were serious
about their use of CI/CD tools.
» There were about a dozen different CI/CD tools in use around the company
and no single tool had more than 18% internal market share.
20
Chapter 2: Shift Left, Not S#!T Left
» The most common pattern was for each team to stand up their own CI/CD
tool which meant they were inconsistently configured often with wide-open
ports, unpatched servers, and no access control thus adding to the cyber-
security risk as opposed to reducing it. Via port scanning, we were able to
determine that there were over 100 different Jenkins servers on our network.
We would have to visit each of them to do this right.
» Since there was no central well-supported CI/CD tooling, our ability to follow
Lesson C which says to focus on central recipes and templates, was much
harder. That was a lot more work to do for 12 CI/CD tools than it would have
been to do for one or even a few.
» When we came across teams with immature DevOps practices, there was no
group to whom we could refer them to improve their DevOps practices. We
filled in where we could but it wasn’t our mandate.
This wasn’t going to scale and I started telling my management that the lack of
DevOps maturity in the organization was the most serious limiting factor. We
could get less mature teams to submit their applications for pen testing and
threat modeling. We could continue our rollout of ever-improving cybersecurity
training, but we could never achieve the high levels of quality and security with
automation that we were able to accomplish with high maturity teams.
When any of my staff had to work with these not-quite-DevOps teams, they were
disappointed because they knew how much easier and more effective our efforts
would be if they’d only had a CI/CD pipeline.
In the spirit of not letting a good crisis go to waste, Noopur and I put the growing
influence of our DevSecOps program behind an ongoing effort to standardize as
much as possible around a well-supported multi-tenant CI/CD tool and DevOps
practice adoption initiative. We helped write the budget proposal and we built
first-class support for the chosen CI/CD tool in the form of reusable resources
that any team could add to their pipeline with a few lines of YAML. We now have
a multi-million dollar per year DevOps transformation program that keeps our
multi-tenant CI/CD tools up and running and is assisting development teams with
white-glove pipeline implementations very similar to our own. Teams who have
onboarded to that DevOps program now have a robust base upon which to imple-
ment our security scanning tools and practices. Such teams are now fast tracked
through our DevSecOps program.
21
Epic Failures in DevSecOps
LESSON F
Create different tracks for different team personas. Fast track those with
mature DevOps practices. Invest the bare minimum to provide traditional
application security support for teams with lower DevOps maturity, but
insist that they get on the path to DevOps maturity before you do more... and
connect those teams with the right group to start on this DevOps journey.
» The pull request provides a central place where all feedback can be made vis-
ible and recorded for later reference. The CI/CD tool provides a similar place
for visibility and record-keeping but it will be missing code reviews and other
forms of feedback. We have teams that use one CI/CD instance for building
and testing and a different instance for deploying. Other teams use SaaS ser-
vices for purposes like test coverage visualization, accessibility testing, DAST,
etc., that are separate from their CI/CD pipeline, although often triggered by
the same GitHub event. You can aggregate the output of all such feedback in
the pull request.
22
Chapter 2: Shift Left, Not S#!T Left
The subplot in this section didn’t involve oversteering like the other sections,
but rather, we realized that we were understeering and would benefit by steering
further left in this regard.
This ties in nicely to the story I told earlier about the sub-org in Epic Failure #2,
Experiment #2. Once we steered our DevSecOps initiative hard towards pull
requests it became part of our coaching practice. I was leading a coaching session
for the org-local cybersecurity team for that sub-org when the cybersecurity leader
told me that my approach wouldn’t work because he didn’t think the teams in his
org used pull requests, although he was uncertain what a pull request was. One
of that leader’s direct reports corrected him on the spot saying that many of their
client teams were in fact using them and at least one had explicitly asked for the
scan results to tie into them.
The desire to get out of the “soul crushing” cajoling role was at war with this
org-local cybersecurity team’s unwillingness to relinquish some control and trust
the development team to own the security of their products. It is still to be seen
if they will steer further left in this way but we now have them thinking about it,
and they know that the rest of the organization is steering further left.
LESSON G
If you don’t know what a pull request is, LEARN, and then encourage all
teams to use them for all code that needs to be merged into the product. Make
sure the feedback from every cybersecurity scanning tool shows up in the
status checks for the pull request.
23
Epic Failures in DevSecOps
So, we steered the boat right and temporarily took secrets management out of our
shortlist of critical practices while we stood up robust self-service support for our
chosen secrets vault. Then we steered left again and started expecting all teams to
stop storing secrets in source code repositories.
LESSON H
Make it “click here to get started” easy before you insist that teams comply
with a particular policy.
Security people: “Those darn developers are cranking out crap that’s going to get us
hacked!”
Developers: “Security is nothing but an obstacle. They don’t understand that we have lots
of other concerns and the only ‘help’ they provide is to browbeat us.”
24
Chapter 2: Shift Left, Not S#!T Left
TRUST
SELF-INTEREST
Where:
» Credibility = How well you actually know what you are talking about
» Empathy = How much you show that you care about someone else’s interests
» Apparent Self-Interest = How apparent it is that your words and actions are
in your own interest
I have been using some form of the above formula since someone I worked with
at Rally Software shared something similar with me. Until recently, I didn’t know
where it originally came from but I’ve since learned that it’s the original work of
Charlie Green at TrustedAdvisor.4 Many thanks after the fact to Charlie.
While this part of the story doesn’t involve any oversteering, it definitely is an
example of how slow it is to turn a boat. Also, it is a form of s#!tting left ironically
by not trusting the folks we should be shifting left towards. The boat has turned
and we now see much higher levels of trust going both ways.
LESSON I
Build trust between security and development by following the Trust
Algorithm (shameless plug).
25
Epic Failures in DevSecOps
These are deja-vu moments for me because they are the exact same reaction I got
from software quality assurance (QA) leaders after my talks at the beginning of the
Agile movement.
The Agile movement involved breaking down silos and moving the responsibility
for quality and product management to the development team. At the beginning
of the Agile movement, the QA folks refused to believe that the development team
could ever be trusted to confirm the quality of their products and that a separate
QA organization would always be necessary. Fast forward 15 years to now —
very few organizations still have a dedicated QA department. The folks formerly
employed by those departments either learned to code and joined a development
team as “T-shaped resources” with QA as the vertical part of the “T,” or they found
another job. Similarly, with the DevOps movement, dedicated operations depart-
ments are being evolved into site reliability engineering (SRE) teams and/or more
and more of their responsibility is being shifted to DevOps teams.
Another aspect of the current Dev[Sec]Ops movement that reminds me of the old
QA/Agile story is around automation. When developers took ownership of QA,
they automated previously manual testing. With the DevOps movement, devel-
opers are taking ownership of how their products behave in production and they
are automating all deployment and operations tasks. Security is joining the party
with the start of the DevSecOps sub-movement and security specialists will need
to pick up CI/CD automation experience to stay relevant.
26
Chapter 2: Shift Left, Not S#!T Left
I’m reminded of a book, “Who Moved My Cheese?” that’s often given to about-to-
be-let-go employees whose skills and mindset have become obsolete. It’s a story
about two mice. One who quickly adjusts to the changing environment when
his cheese is moved, and another who stagnates in place for a long time after. The
DevSecOps movement of today is at roughly the same place that the Agile move-
ment was 15 years ago and the DevOps movement was 3-5 years ago.
This time security specialists are the ones whose cheese is about to be moved.
You can either dismiss what I’m saying as kumbaya that is not relevant to your
organization or you can start to get yourself and your organization ready for this
emerging DevSecOps movement.
27
About Larry Maccherone
Larry Maccherone is an industry-recognized thought leader on DevSecOps, Lean/
Agile, and Analytics. He currently leads the DevSecOps transformation at Comcast.
Previously, Larry led the Insights product line at Rally Software where he published
the largest ever study correlating development team practices with performance.
Before Rally, Larry worked at Carnegie Mellon with the Software Engineering
Institute (SEI) and CyLab for seven years conducting research on cybersecurity and
software engineering. While there, he co-led the launch of the DHS-funded Build-
Security-In initiative. He has also served as Principal Investigator for the NSA’s Code
Assessment Methodology Project, on the Advisory Board for IARPA’s STONESOUP
program, and as the Department of Energy’s Los Alamos National Labs Fellow.
REFERENCES
1. www.zdnet.com/article/anatomy-of-the-target-data-breach-missed-
opportunities-and-lessons-learned
2. www.devsecopsdays.com/articles/epic-failures-in-devsecops-book-download
3. www.devsecopsdays.com/articles/trust-algorithm-applied-to-devsecops
4. trustedadvisor.com/articles/the-trust-equation-a-primer
CHAPTER 3
Making Everyone
Visible in Tech
by Jaclyn Damiano
Epic Failures in DevSecOps
CHAPTER 3
The research supports this: Ryan Carson, CEO of Treehouse, says: “Most companies
have a significant challenge recruiting and retaining a diverse set of employees,
particularly women in technology. For our team to match the diversity of America,
we’d need 13.4% Black, 1.3% Native American, 18.1% Latinx, and 50% women
employees. “ Today, 7% of the high tech sector workforce is Black, and 8% is
Hispanic. Depending on what source you read, between 20-36% of the high tech
sector is female. Most research also states only 18% of engineering graduates are
female. The problem intensifies as you look up the hierarchy. In the U.S. top 1,000
companies by revenue, only 19% of CIOs are women.
Those of us who work in tech need only to look around during a meeting to be
confronted with an all-too-real illustration of these numbers. Sometimes I look
around an office building and check out the conference rooms. Who is sitting
around the table? If I see only men, I have an odd compulsion to run into the room
screaming “Wait, you’re missing a gender!” Gender is only the beginning of what it
means to have a diverse team.
The research on diversity is clear: Diversity isn’t just about being nice or noble.
Diverse teams build better products, which increase company revenue. In today’s
30
Chapter 3: Making Everyone Visible in Tech
Eighteen months ago, I was quietly lamenting this issue. Then something hap-
pened that completely changed my career, and honestly my life. An epic failure
occurred on the team I lead.
At my company, one of the teams I lead hosts a Tech Day three times a year. These
Tech Days mimic a DevOps Day format. We bring in external luminaries to speak,
and we ask our internal employees to speak and share their stories of successes
and failures. These events have two distinct goals, building community and
energizing our engineers. In mid-2018, we sent out a Call For Papers (CFP) to 1,000
engineers. Guess how many women responded to this call?
Well, if you guessed zero (or you’ve heard me previously bellow about this during
a public talk), you’re right. We failed. We failed to have a diverse agenda.
» The Bravery Deficit: Reshma Saujani, Founder & CEO of Girls Who Code,
worries about society “raising our girls to be perfect and our boys to be brave.”
Her point is so spot on. If we don’t encourage our girls to take risks, climb the
tree, and fall, they’ll never experience failure and recovery, a critical skill in
business.
» Women fail because they don’t begin: Harvard Business Review published
a study stating: “Men apply to jobs if they meet 60% of the qualifications.
Women apply for jobs only if they meet 100% of the qualifications.”
Frighteningly, statistics are similar when it comes to promotions.
31
Epic Failures in DevSecOps
Why don’t we see women in the workplace? Another Harvard Business Review
article, “Why Women Stay Out of the Spotlight at Work,” states that women stay
out of the spotlight for three main reasons: avoiding backlash in the workplace,
finding professional authenticity, and parenthood pressures. In other words,
women are afraid of being called out as bitchy; HBR states “Most women rejected
the executive, self-promoting leadership style in favor of a mission-oriented, com-
munal style.” And, balancing parenthood and working is freaking hard.
I learned a lot through my research that helped me understand more about myself,
which was great. However, what I learned didn’t solve my problem at work. I still
had a hard time getting women to volunteer to speak. So, I did the only thing I
thought would help: I turned our failure of having a male-driven agenda and my
subsequent research into a talk. I added some tactical things that everyone can do
to help mitigate the situations outlined above:
» Women need to sit at the table during a meeting. Do not call in, do not sit
along the perimeter of the room. Claim your voice. If you absolutely need to
be remote, make sure you share your expertise, don’t stay quiet.
» Women need to help each other. Introduce yourself, be there for someone else
coming up the chain.
» Women need to speak at conferences. Please, for the love of God, having any
conference be one of anything is super boring.
At our next Tech Day, I gave this talk in an ignite style. I called it “Making Women
Visible” and delivered it in September-ish 2018. As it happened, during this par-
ticular DevOps Day, one of our external luminaries was Dominica DeGrandis, a
personal hero of mine. Her book, “Making Work Visible,” is required reading for
anyone on my team, and obviously, my talk title was a riff on her book title. She
complimented me on my talk and encouraged me to keep speaking.
32
Chapter 3: Making Everyone Visible in Tech
Then, two days later, our team flew home from DOES. As our plane landed in
Newark my mentor and boss at the time looked at me and said “That talk really
resonated with people. So what are you going to do about it? The community is
looking to you.” Oh, dear God. I could either do something about this, or I could
just keep talking about it, perpetuating inaction. I’m here writing a chapter, so you
could probably guess that I have a bias for action...
Then like magic, the light bulb went on. I was at the Grace Hopper Celebration
in 2018. Mariana Costa-Chica, who formed an organization called Laboratoria
in South America, was honored for her work in diversifying tech. There weren’t
enough women in tech in Peru, so she went out to underserved communities to
find women who had not just a passion for technology, but had the aptitude to
make a great developer. She brought successful candidates into a boot camp and
33
Epic Failures in DevSecOps
Her talk resonated with me because of where I grew up. I didn’t know what a soft-
ware engineer was until I went to college. And we all know…
Bucknell changed my life. I met friends that made me feel safe and accepted. I
studied topics that were interesting and engaging. I took my first economics class
and discovered this was the lens through which the world made sense to me. (My
friend Emily explained that this is how I’d “know” my major when I found it. She
was eternally right. Now she’s an astrophysicist.) In any case, I fit at Bucknell,
academically.
34
Chapter 3: Making Everyone Visible in Tech
Why am I divulging all of this to you, dear reader? A couple of reasons. First, I have
a fear that someone may think I’m a privileged person who has a savior complex. I
worry that people think that I am going into communities that I feel sorry for, try-
ing to save them. As fortunate as it is that I’m on this side of the story, I feel like I
could have easily been one of the people that Mariana helped lift up. Second, some
people get squeamish when I talk about working alongside people who didn’t go
to college or who may not have ever had a role model who worked in an office. It’s
at these points in conversations that I’ve started to have to speak up and tell them
about where and how I grew up.
In summary — at the end of the day, I have a responsibility to do what I can to give
others a chance. To give people an opportunity. There are too many smart people
in crappy situations that need a door. They’ll walk through that door if they want
it — trust me, I’ve seen it.
» Bring the apprentices in and pay them a living wage during their learning.
» Co-locate the apprentices and teach them how to code. Use multiple modal-
ities of instruction: online, in-person, 1:1, and group coaching. Eventually,
deploy apprentices to line-of-business technical teams to code in the compa-
ny’s repositories.
35
Epic Failures in DevSecOps
Now, when I talk about diversity, I realize it’s not just about gender and race. It’s
about a lot of things: socioeconomic status, religion, age, body size/shape, sexual
orientation, people with different abilities, cognitive differences, etc. The list goes
on and is deep and broad. When it was time to put a call out for applicants, we
marketed to all people in hopes that we’d end up with a diverse set of candidates.
It worked. Our apprentices are all different, brilliant human beings who ended up,
in large part, becoming awesome engineers. More on that later.
I needed money and headcount to get this program running. I called it Project
Athena (yes, the Goddess of Math, I’ve had enough of “Project Hercules”); my
brief experience leading a marketing and communications team taught me how
important it was to brand this project. Simply stated, the goal of Athena is to
enable people from underserved and underrepresented communities to grow into
awesome technologists to build products our customers love. Project Athena pro-
vides an actionable plan to create a new labor pipeline of qualified, diverse people
to work in technology.
I put the idea on paper, and by mid-November 2018, I was pitching it to my boss.
I expected him to say it was crazy. The idea was too big. It wasn’t the right time. It
was too expensive. Instead, he said, “this is a good idea.” Excuse me, what? Nope.
It’s not too expensive. Yep. It is the right time. The idea was big but good. Wow.
During this time, my company was going through some major changes. Selling
a plan like this is hard; selling it when things are changing was harder. These
changes, though, presented a great opportunity. We had to hire a lot of people in
a short amount of time. Ever spin up a job requisition in tech? Finding the right
candidate can take months. Finding many right candidates all at once is super
difficult. If we had a program to create a talent pipeline where it did not exist, that
would help. We can train them how we want them, and then plant them as seeds
of change in our organization.
After my boss encouraged me, I got on our senior vice president’s calendar. I was
waiting for him to kill the idea. But he didn’t. He thought it was a good one. He
scheduled a meeting with our CIOs in January. I created a pitch deck, outlining
36
Chapter 3: Making Everyone Visible in Tech
the costs and benefits. I told them how I thought this could help with our open
requisitions, and how we could diversify our staff. They listened. They asked lots
of questions. They asked me to work with HR and get alignment. I was terrified. I
had no relationship with our HR leader, and I knew that this was an “outside the
box” program that would require a leap of faith. At large enterprises, we have a lot
of responsibilities, and sometimes we get too afraid to jump. On top of that, some
HR employees may construe this kind of activity to be a threat to their role in
talent acquisition. I’m happy to report my HR partners jumped with me, and were
enormously supportive. They kept me sane (and employed) during this project.
Jeepers, the things they deal with! We have it easy with systems. They’re dealing
with the most complex system ever created: people. Respect.
In January, I also took the stage in NYC at a public DevOps Days event. I proposed
Project Athena to the audience. At the time, I wasn’t sure my company would
approve the initiative. I wanted to get an external read on how it would play. The
result was one that I’ve come to rely on from this community: nothing but love
and support.
Between January and March of 2019, I met with anyone at my company that
would talk to me about Project Athena, hundreds of people. I reached out to
wildly senior people (C-Level, the board of directors) in my organization to pitch
the idea. I learned something super important: The busiest, most senior people
always make the time to talk to people. I remember crying when I got a note back
from one of the senior most executives in our company, expressing his support
for the program. I mean, he’s a genius and he took the time to respond to me? A
girl from Scranton who worked in the cafeteria during college? How’d it come to
be that I had a voice someone was interested in hearing? It’s moments like these
that I hear Panic! At The Disco running through my head: “Hey look, ma, I made
it…” It’s just an email, woman. Get it together. I will tell you, through this period
of time I thought the program was dead about a dozen times. Jennifer Wood from
London DOES 2019 talked about the “abyss of despair.” Man, I was in that abyss a
lot during this pitch period, but, as Jennifer points out, things always turn around.
Whenever I put the plan aside because I was discouraged, I’d get a call or a meet-
ing or a word of encouragement that made me pick it back up. Finally, it was time
to get a go/no-go on this thing.
37
Epic Failures in DevSecOps
I’m going to pause here and talk about the enormous lift my team made between
April and May. We had to run all language about the program through HR and
Legal, application forms and job descriptions. We had to learn the onboarding
process and get 40 people through things like background checks within a short
amount of time. We needed to get 40 computers, logins and badges. We had last
minute paperwork issues. Who is funding this? What is the budget code? The
what? I promise, I couldn’t believe the power of a one-word email from my SVP
“Approved.” He later told me he just needed to give me sign-offs and then get out
of my way. I’ll never forget that. It’s a sign of a true leader. Trust your people. Give
them the support they need, then get out of their way.
38
Chapter 3: Making Everyone Visible in Tech
It Really Happened
During the last week of May, we had our orientation/kick-off celebration and all
of our apprentices joined us on site. I was numb. I expected to be elated. I think I
was partially paralyzed with fear. What were we doing? What if this failed? We
immediately needed to put all of our self doubt aside. On our first day in Texas,
we realized that our apprentices had needs we weren’t anticipating. Some of our
apprentices were facing Maslow’s Hierarchy of Needs issues. We had a few people
who were houseless. More were food insecure. Some didn’t have cars. It was at
this moment we realized this work wasn’t just work, it was, as one of my CIOs
would call it, noble work. Few of us on staff had dealt with these issues. We did
the best we could, but most of the time we were figuring it out as we went along.
We created food drawers in each location that the apprentices had access to for
nourishment. We tried to come up with solutions to seemingly intractable prob-
lems (bus rides three hours each way to the office, medications that could only
be distributed at the shelter during work hours, etc.), when the apprentices just
needed someone to listen. It’s not a stretch to say we learned more from them than
they did from us.
On June 3rd, we co-located these candidates in our Dojos. Dojos are immersive
learning centers, permanent physical spaces where onsite coaches work with
employee teams to upskill. A full stack team comes into the Dojo with their
backlog, and we work with them to do product discovery and teach them modern
engineering practices. We leveraged our Dojo model, and assigned coaches to
apprentices at a 1:10 ratio. The apprentices spent most of their day learning from
a full-stack JavaScript online curriculum especially designed for people with no
background in tech. This learning was augmented significantly by the coaches.
The coaches did weekly 1:1s with each apprentice. By talking the coaches through
the apprentices’ code, the coaches could get a good understanding of where the
apprentices needed extra help. The coaches would then do reinforcement learning
sessions, Katas, etc. They were working night and day to stay ahead of the appren-
tices in the curriculum.
The Dojo learning continued through the end of September. At this point, we
wanted the apprentices to be placed in line-of-business technology to get their
hands dirty with real coding repositories. We held a job fair, and made apprentice
placements. The Dojo felt lonely again. The coaches supported the apprentices
while they were working with the teams.
39
Epic Failures in DevSecOps
What We Learned
Holy creepers! We are responsible for the livelihood of 40 people. We weren’t
used to working in an environment where someone’s life depended on what we
did. This was the first time many of my staff saw the realities of poverty. It was also
the first time many of them couldn’t “fix” something. We all learned that some-
times what we needed to do was just listen, because fixing it wasn’t something we
could do.
Crying is okay. Even at work. Even if you’re typically a tough chick or dude.
Before Athena, I had only cried at work three times during 18 years. Now we have a
joke in the Dojo that we cry weekly. Tears of joy and frustration, all of the feelings.
Not knowing what is coming next sucks, but #itsgonnabefine. More often
than not, we had no idea how the heck we were going to get 40 people with no tech
background to be beginner developers in six months. We didn’t know SO much.
But what I did know, what experience has taught me, is that very few things are not
fixable. My team will tell you that whenever we faced a big challenge, the words that
would come across my lips were “it’s gonna be fine.” Because, honestly, it had to be.
In October of 2019, #itsgonnabefine became a laptop sticker for the staff and me.
Because we were braver, we were stronger and we knew we could do this.
We did more good than harm. We were super self critical throughout the pro-
gram. Could we teach more effectively? Could we give them more support? 10%
of our original apprentices dropped out of the apprenticeship. We couldn’t make
it work for everyone. However, it did work for the 36 people who graduated from
the program and now had an experience working in IT in a large enterprise. 80%
of the 36 graduates converted to full-time employment at our company. I like to
think that we’ve changed the lives of those people and their children.
There’s some irony in my story. Here I am trying to help educate women about
feeling like an imposter, like somehow they don’t deserve to be successful yet
how many times in this narrative did my insecurities come out? Once you see
it in black and white, it cannot be unseen. I try to carry that with me. I am more
educated now, but not immune to falling victim to the confidence gap.
40
Chapter 3: Making Everyone Visible in Tech
What’s Next?
We have funding for another cohort in 2020. We are looking at how to self-fund
through governmental grants specifically earmarked for software development
apprenticeships. We are also considering what other areas of our company we can
use apprenticeship as a viable model.
41
About Jaclyn Damiano
Jaclyn is an Associate Director at one of the world’s leading telecommunica-
tions companies. Jacki leads enterprise wide initiatives aimed at creating a
culture that passionately embraces modern engineering and organizational
practices. Previously, Jacki held roles as a business analyst and project manager at
Accenture, AllianceBernstein, and Goldman Sachs. Her passions include Product
Management, Enabling Cultural Change at Scale, and Diversity and Inclusion.
Jacki is an active member of the DevOps community and regularly delivers talks
on digital transformation and diversity at tech conferences. She proudly earned
her B.A. in Economics from Bucknell University.
Acknowledgments
Mark, thank you for the honor of contributing to this book — I am so grateful to
know you! Also, thank you to the editors and everyone who made sense of my
writing. Most of all, thanks to my kids and husband who support (most of) my
crazy ideas.
REFERENCES
» hbr.org/2018/11/how-my-company-created-an-apprenticeship-program-
to-help-diversify-tech
» www.techrepublic.com/article/5-eye-opening-statistics-about-minorities-in-tech
» www.cio.com/article/3267945/deloitte-report-has-some-good-news-
about-women-in-technology.html
» www.mckinsey.com/business-functions/organization/our-insights/
why-diversity-matters
CHAPTER 4
From Silos to
Communities
by Sladjana Jovanovic
and
Bill McArthur
Epic Failures in DevSecOps
CHAPTER 4
I am a technology executive with years of experience fighting the silo effect that
plagues organizations of all sizes. At TD Bank, where I work, we also faced this
dilemma. I have learned that when we break down the barriers to communication
and collaboration, we thrive as humans and as organizations.
We call our software engineering tribes “pods.” Other companies call themselves
“two-pizza teams” or “squads.” We are all skilled at developing secure, reliable,
functional software and that’s what we love to do. When integrating our mas-
terpieces with those of the other tribes, we confirm what we already know, our
software is a piece of art and their software sucks. A battle develops through which
we learn what drove our decisions and assumptions.
You are probably thinking that this doesn’t feel that hard to solve. After all, agile
software development was invented to organically drive collaboration. Further,
one of its main principles is to break a problem down into smaller pieces. But ask
44
Chapter 4: Collaboration vs. Silos
yourselves, are we good at stitching the solution back together? When we are
developing software, do we think of it as a piece of a much larger whole? I was a
software engineer for many years and happily worked in silos without even being
aware of it.
The specific tribe that I was now a part of was two years into replacing a 20-year-old
web-based monolith with a restful architecture supported with Angular UI. The
new Beta version went live, and about 90% of our active customers had started using
it. We were flying! And, that was a good and important thing because our tribe had
something to prove. In the fall of 2014, a yearly ranking by a major Canadian news-
paper gave us an average rating, and said that we were stuck in the ’90s.
Following this report, we put a lot of effort into the transformation of our platform
and we were one of TD’s first agile factories to get it right. We talked about the
customer all the time. Our tech team didn’t write a line of code without thinking
of the impact on our users. To improve our customers’ experience, we regularly
conducted surveys and incorporated sprint retrospectives and feedback sessions.
We continuously pushed out improvements in small increments. We also struck
the right balance between the product and the platform. This was the basis of the
strong collaboration within our tribe. It made a lot of sense to me, and in a matter
of days, I felt I belonged to this tribe.
After the introduction of the new platform, we began to encounter some bumps.
There were quite a few users logging into the legacy platform and we were spend-
ing a lot of energy supporting it, instead of fully focusing on improving the new
platform. It turned out that many users still preferred the old, familiar, stable, and
45
Epic Failures in DevSecOps
clunky interface over the shiny star we were so proud of. Transition to the new
interface generated many questions such as “where the heck did they move my
button?!” On the infrastructure side, the interaction between the new front-end
platform, which was installed on TD’s cloud, and the downstream components,
which continued to sit on the physical boxes, required some optimization.
Furthermore, the downstream services were mostly shared and managed by dif-
ferent teams, and coordinating the changes was challenging. All of this ended up
lowering our customer experience index.
So how did I swim through this? On the first day of my new job, my new boss sug-
gested that I get to know Bill McArthur. I had known Bill from one of my previous
jobs at TD. Smarter than anyone else and a brilliant Software Engineer, Bill had
assembled the best development team I had ever worked with at TD. When you
saw Bill’s team iterating through the agile software development, you would have
thought that you were at a tech startup.
So why were we having issues? While I found Bill’s team innovative and creative, I
learned that some found them difficult to work with. I was also interested in learn-
ing how we were improving the platform to ensure it was resilient to the changes.
I sat down with Bill and asked if he had a technical roadmap. He looked at me
completely puzzled. He said he felt that our business leaders were more focused
on getting the new functionality deployed for our customers than in knowing
how the technology platform was being evolved. This is where my enlightenment
started.
46
Chapter 4: Collaboration vs. Silos
Bill had this mystery pod, who was iteratively designing the platform to support
the business requirements that the feature-pods were going to be working on
next. It was no surprise that our feature-pods were so efficient, their work was
mostly focused on enabling user interface (UI) features, adding finishing touches
of logic and finalizing integration with the underlying services that were already
prepared.
While I learned that the team was very efficient in building new business capa-
bilities, the fact that we had not talked about resiliency with our business leaders
worried me.
Why did resiliency matter so much? After all, we were quick to remediate any
bump we encountered. We got to know the downstream tribes better and the
collaboration with them was improving. The problem was that our remediation
efforts were reactive. By not being proactive, we did not understand how resilient
the platform truly was.
It was June 2016, and the United Kingdom was heading into the referendum to
decide whether to stay in or leave the European Union. On June 23 that year, in
the event known as Brexit, most people in the U.K. voted to leave the E.U., which
had the potential to create a lot of uncertainty and a negative economic impact on
Europe and the rest of the world. The next day, as people started reacting to this
news, a new bar was set around the globe for usage and volumes on many systems,
including ours.
47
Epic Failures in DevSecOps
Based on these early learnings, platform resiliency became and has remained one
of TD’s top priorities. It also became a daily topic with my business partners. We
agreed to make a transformational shift in how we operate and achieve resiliency
through collaboration.
48
Chapter 4: Collaboration vs. Silos
one of our main screens had a capacity issue causing a performance degradation.
This revealed a need to improve the resiliency of the entire platform, not just the
application.
It was very interesting to learn how intertwined the team and platform resiliency
was, and that improving the overall resiliency required us to focus on people, as
well as on the product that we were building. As with any idea, to see it through
sustainably and at scale, we needed to implement a new operational structure. We
called it the “Resiliency Playbook.”
Resiliency Rules
Leadership is about rallying the team around common goals. Making the goals
clear and achievable is a critical first step to success in this journey. While our
ultimate goal was to improve collaboration and resiliency, we knew that both
were hard to measure and that they had many layers to them. Instead, the goal
we set for the team was to achieve the 99.99% availability. This meant that the
application would be available and functional for our customers 99.99% of the
time. We knew that the only way to get there was by focusing on collaboration
and resiliency, which led us to transform how we operated.
To improve transparency, the first step was to redefine the platform to be the
entire ecosystem of TD and vendor services that the application relied upon. We
mapped all services and their dependencies end-to-end and analyzed how they
were configured, how critical they were to the customer and how they interacted
with one another. We then assessed the gaps in the ecosystem that stood in the
way of achieving high availability. Finally, we printed the platform map with the
highlighted resiliency scores in poster format and shared it broadly. The impact
was astonishing. The poster showed up on many office walls. Whenever we
discussed any of the platform components, our business leaders would first look
up the map to understand what we were referring to. This way our conversations
quickly moved from “Why?” (the issue was happening) to “How?” (to solve it). For
the first time in my career, the technology platform my team supported was not a
black box to anyone!
49
Epic Failures in DevSecOps
Considering that we already had a mature agile operational model, we were able
to efficiently drive improvements in resiliency and it wasn’t long before our
customers started responding positively to our efforts. This made us even more
determined to keep going.
Tackling collaboration was the next focus point. We came up with the concept of
“one team,” and we invited all TD teams who supported the components of the
ecosystem to join. We collocated with several of those teams, conducted regular
“Scrum of Scrums” sessions, shared ideas, and talked about the platform as some-
thing that we were all building and supporting together. Our vendors became
“partners” and we made them a part of the “one team” too. We expanded our tech-
nology roadmap across the platform ecosystem.
To share our learnings, we published a Resiliency Playbook, which over the next
few years, got adopted by many other teams and platforms in TD.
And finally, throughout this process, there emerged a diverse and collaborative
tribe. In 2018, that same major Canadian newspaper that rated us so poorly four
years ago, now placed us as joint-winners and best-in-class. We were labelled as
blazing a trail of fast-paced innovation — a remarkable turnaround.
50
Chapter 4: Collaboration vs. Silos
Collaboration was the key next step in maturing innovation and resiliency. The
insights we have gained through collaboration with other teams, industry experts
and our customers have broadened our views. This has made us better able to
understand and predict the outcomes of our actions under the rule of resiliency,
while understanding the importance of purposeful innovation and how to encour-
age it at every level.
We have traded silos for a community of creative and collaborative teams who are
building the TD of the future.
51
Epic Failures in DevSecOps
Acknowledgments
I would like to thank Bill McArthur and our WebBroker team, who taught me the
importance of collaboration and platform resiliency, and Leo Salom, whose leader-
ship has motivated me to lead creatively.
52
Chapter 4: Collaboration vs. Silos
Acknowledgments
I would like to thank Sladjana Jovanovic for inviting me to help her tell our story,
and Mark Miller for giving us the opportunity and forum to tell it. I would also
like to thank all of those that were part of the journey that our story tells - from
every developer on the team, all the way to our senior executives Jeff Martin, Salim
Jivraj, Richard Wilks and Paul Clark who empowered us to be successful. Lastly,
I’d like to thank my wife, who during the timeframe of our story, was as under-
standing and supportive as she has always been.
53
CHAPTER 5
CHAPTER 5
Although this position was my first experience with DevSecOps, I had more
experiences later in my career. In the latter portions of this chapter I will meld the
stories together to create a narrative that captures the full essence of all DevSecOps
failures. Each failure contains a lesson and improvement point that is captured at
the end of the chapter.
56
Chapter 5: What Not to Do When Rolling Out DevSecOps and Other Learnings
some themes emerged that happened to make an acronym (AEIOU) that can be
easily remembered. I’ve used this acronym in multiple environments for deciding
whether the tool was appropriate for implementation.
A — APPLICABILITY
As soon as you recognize a need, it can be simple to search and find a tool that
serves your purpose. One important question to ask yourself is, “How can this tool
be applied to the current environment to achieve enterprise goals?” The selected
tools for the DevSecOps environment should directly reference defined enterprise
goals for Development, Security, and Operations process improvements. Ideally,
the tool should contribute to the improvement of new or existing measures
within the competency area.
E — ENTERPRISE READINESS
As enterprises of multiple sizes adopt tooling, the availability of infrastructure resil-
ience and support can be an important consideration. Companies that are heavily
regulated or have critical operations and processes should consider tool readiness.
As tools come from many points of origin these days — open source, commercial
off the shelf, and SaaS (to name a few) — companies should consider the availability
and support. There will be a dependency introduced to the system based on the tools
you choose.
I — INTEGRATION
If there are existing tools in the organization that can be leveraged as a part of
the DevSecOps ecosystem, the points of integration must be considered carefully.
Given the robust solutions that are available in the tooling landscape, the existing
tool can often be replaced in part or completely by a new tool to avoid an integra-
tion contention. When new tools are integrated within an ecosystem, it is import-
ant to over-engineer the integrations in case the tool is replaced in the future.
O — OVERHEAD
As new tools are introduced into the environment, there are sure to be administra-
tion and governance implications on owning teams. When evaluating tools, map
the roles and responsibilities within these two areas with the owning teams. If this
is completed across the organization and all stakeholders are included, the imple-
mentation and adoption of tools will be seamless.
U — USEFULNESS
As tools for DevSecOps are evaluated, it is important to recognize if the tool can be
leveraged at the time of implementation. There are many situations in which tools
57
Epic Failures in DevSecOps
are brought into the ecosystem when other dependent systems were not yet avail-
able. When considering goals of a lean organization, it is important not to introduce
waste into the system by bringing in the appropriate tools at the wrong time.
Whoa! Who knew that deciding what tools to utilize would be such an involved
process? One of the great things about having some guidelines to evaluate tools was
the ability to stack tool capabilities against each other within the many categories.
This structured process ensured that all attributes were scrutinized consistently. For
many of the tools, additional attributes were defined so that the common compo-
nents within competing tools could be ranked. Although rigorous, this process was
well worth it and resulted in a clear direction as to which tools could enable the
organization the best. Having been very deeply involved in the evaluation, I was
now well exposed to the capabilities of the tools that were chosen. This put me in a
great position to lead the actual implementation of the tools, and with the help of
my team, start to drive adoption for new development activities.
58
Chapter 5: What Not to Do When Rolling Out DevSecOps and Other Learnings
In order to ensure that policies and standards were being established as a part of
the roll out of DevSecOps, we established a cadence with the individual teams to
share information and considerations that should be included in the other areas.
59
Epic Failures in DevSecOps
As we found during the rollout of new concepts, the standard methods for convey-
ing information work in some cases. However, there will be concepts that are best
applied through other means of learning such as experiential or immersive deliv-
ery. Some companies, including ours, have created a place for this type of learning
that is known as a “Dojo.”
60
Chapter 5: What Not to Do When Rolling Out DevSecOps and Other Learnings
It is also very important to ensure that teams are aware when their development
practice is failing. Thankfully there are many ways that this information can be
shared with the team who use APIs to expose information available in each respec-
tive DevSecOps tool within your organization. Whether you choose to expose
them through periodic reporting or continuously, in the form of a dashboard, the
important thing is to empower the team with information.
61
Epic Failures in DevSecOps
Summary
No transformation commences without challenges. Organizations come in many
sizes, shapes, and with varying levels of maturity, so it is paramount to remain
flexible as you introduce any portion of DevSecOps. Being a leader in many of
these efforts has taught me that, although you might think you know the answer
to the problem, there is a good chance that the problem you think you see is not
the root problem. Dig deep to truly understand the intricacies of the organization
before attempting to deliver a DevSecOps solution. Even when you think you’ve
reached a level of understanding of the people and practices within your space,
continue to listen to those who will be impacted most by the change you’re incit-
ing. The people that will live in this new world, their thoughts and opinions, along
with the enterprise goals to be met, matter most.
62
Chapter 5: What Not to Do When Rolling Out DevSecOps and Other Learnings
Jasmine currently lives in Atlanta, GA and loves to attend conferences and commu-
nity events for all-things tech, especially around DevOps and Cloud. Sometimes
she even speaks or writes articles about her journey within DevOps and her career.
Want to see what she’s up to? Find her on LinkedIn where she posts regularly.
Acknowledgments
A special thanks to Mark Miller and Derek Weeks from the Sonatype team for this
opportunity to share. I have avoided many pitfalls as a result of reading or hearing
about other’s experiences, so I am happy to be able to share my own journey
and learnings. A gigantic thanks also to my leader Keanen Wold for support as I
contribute to multiple efforts within the DevOps community, and to my team for
sharing their experiences with me. Finally — a special thanks to the proofreaders
and copywriters that helped refine my story!
63
CHAPTER 6
Cultural Approaches
to Transformations
Staying Safe and Healthy
by Marc Cluet
Epic Failures in DevSecOps
CHAPTER 6
Cultural Approaches
to Transformations
Staying Safe and Healthy
I wrote this chapter of the book not to tell personal tales, but as a quick guide to
exploring the minefield of mistakes that most cultural transformations make.
Cultural transformations are not easy or fast, and under most circumstances, they
can be nerve wrecking.
I have dedicated a good part of the last six years to helping organisations transform
their culture and ways of working. I’ve done everything from a full DevOps transfor-
mation to adding SRE security assessments to a simple migration to the Cloud (with
all the changes that this represents). I embrace agile, scrum, Kanban and all the
things that make modern IT companies faster and stronger at their game.
During these journeys I have found all kinds of madness, mystery, trouble, and mis-
takes — some of the stories funnier than others. I will walk you through my approach
and share some of the best stories, or at least those funnier ones (fun not guaranteed).
Expectations about what DevOps and DevSecOps are, and what they can provide, sit
in a wild spectrum. Expectation handling becomes a wizardly art form, where
everything is on fire, and not in a good, “look-at-that-fire-so-beautiful” way.
Transformation Expectations
Whenever you are exploring DevOps and DevSecOps, you can start to understand
the situation you signed up for by just listening to some of the answers you get to
your questions. Some of them will sound vaguely like these:
» “Yeah, we are already doing it as we have Jenkins.”
» “I have a Palo Alto appliance connected to the cloud, we are secure.”
66
Chapter 6: Cultural Approaches to Transformations: Staying Safe and Healthy
If you haven’t heard any of these, you might have a false sense of confidence. Any
cultural transformation is complex; one attached to a technical transformation
is even more so. Companies decide to transform for technical or market reasons,
which means that not only will you have a complex technical and cultural trans-
formation, but also a very strict timeline in which to deliver it.
The first thing to understand is what the company really expects out of the trans-
formation. Companies are driven by people, and sometimes people make decisions
with little information at hand. It reminds me of when I purchased my first car — I
thought I was making the right decision but I realise now that I didn’t know much,
and the choice I made was more from a gut instinct than anything else.
In order to help them buy a better car, it’s important to understand the pain they
are experiencing. Is it market pressure? Money bleeding? Not having anyone in IT
who is good at Cobol? It’s your task to find out, and while you are doing it remem-
ber: avoid judgement. We all have awkward moments we don’t speak of, and you
will be going through their awkward moments.
Once we have an initial idea of the situation and what the real problems are,
remember that DevOps is all about People, Culture, and Tools — in that order. You
need to categorise problems and understand how they correlate and interact.
Prepare a transformation roadmap, and as with any roadmap, make sure that
objectives are well defined in the near horizon, with broader and more generic
objectives defined as you move further forward in time.
67
Epic Failures in DevSecOps
Keep communications frequent, and ensure a proper feedback loop so the plan can
be adapted as events develop.
I’ve found myself in several projects where every single step of transformation was
not only decidedly risk-averse, but also had decisions made by a huge committee,
where every single part of the company had a voice. I’ve even found myself in
projects with weekly tracking and decision-making sessions for next steps.
Leadership can, and sometimes will, use this as an excuse to not commit to any
change, in fear of losing people. The inherent question then is how to move
68
Chapter 6: Cultural Approaches to Transformations: Staying Safe and Healthy
There is an inherent danger with trying to adapt to this new way of writing
antifragile, microservice-based software. By doing things too quickly or by using
a team that only knows how to write monolithic software, you can end up in a
position where a team has created a “microservice monolith” with such strong
couplings that you lose most of the advantages of separating functionality in
smaller bits of software.
69
Epic Failures in DevSecOps
Neither end of the spectrum is beneficial for the long-term flexibility and happi-
ness of the engineering team. I’ve seen companies struggle heavily with this. It is
a problem that if not resolved properly can bring severe, long-term pain. It’s like
trying to go on an extreme fattening diet or an extreme weight-loss diet: neither
will be good for you.
If you give complete liberty to the engineering teams, they will come up with
several different ways of solving the same problem. It is creativity at its finest
and best, but at the same time, if you plan to carry forward every single different
method, it will likely have to be supported and maintained by someone with a less
positive view of the world.
This will normally lead to a situation where, while there is initially all kinds
of investment and excitement about the rosy future, you end up rediscovering
the large overhead of maintaining your own software — and the headaches this
involves.
Continuous Integration is the most well known and already covered by tooling;
there are prominent tools and a big community around it.
But, a lot of companies will try to use the same hammer for Continuous Delivery
even though Continuous Delivery requires a completely different set of tools. It
70
Chapter 6: Cultural Approaches to Transformations: Staying Safe and Healthy
Rearranging teams in a way that matches your desired end state will reduce effi-
ciency and velocity on the short term, but it will accelerate the adoption of new
methodologies, which in the long term will pay back tenfold.
If we push change too quickly on people, the processes might suffer when they’re
not ready for that new pace. There’s no advantage to changing tools too quickly,
without reviewing processes or upskilling people.
71
Epic Failures in DevSecOps
The rubber band theory (which I coined, no trademark, use away!) likens People,
Process, and Tools to levers which are connected by rubber bands. By pushing one
lever and not the others, you create tension on the other two which can become
too much and snap the rubber band. We must aspire to move all three levers at the
same pace to keep tension to a minimum.
Cloud computing and any kind of public computing should always be considered
a zero-trust zone. Security should always be involved from the beginning, even
if you’re having initial thoughts about an idea in the shower (well, maybe not
there…). They should always be there, helping, collaborating, and ensuring that
the application is coded defensively with security in mind. Don’t wait until the
last minute to add security — that will be too late.
If you give full control and liberty to the teams, some of the economy of repetition
and automation can be lost. At the same time, not giving enough liberty will limit
the innovation and speed of delivery, which is about as great as driving an amaz-
ing car in bumper-to-bumper traffic.
One solution I normally suggest is to adopt the blackbox theory, which creates
domain-exclusive zones — but, all the boundary zones need to be agreed on across
the organization. One of the ways this manifests is an approach of “APIs every-
where,” in which all applications expose their functionality as API calls. Blackbox
theory can also be applied to monitoring and other areas where there is a predom-
inant advantage of having a common framework to ensure correlation of all the
data and a smooth end-to-end journey, as microservices can definitely make the
approach be seen as inconsistent if the teams are too loosely bound. This can be a
design decision, as AWS does with their services.
72
Chapter 6: Cultural Approaches to Transformations: Staying Safe and Healthy
As I was starting to interview different teams and understand the real desire behind
the transformation itself, there was a game being played between some of the lead-
ership in this company and account managers from (quite big) Cloud providers. The
result of that was that the whole strategy of Cloud migration would change almost
on a daily basis, depending on who made a more compelling offer for their services.
This company had a Cloud transformation committee that was meeting weekly,
and it would make good recommendations, but Leadership would ignore them
most of the time, until eventually the committee was disbanded.
73
Epic Failures in DevSecOps
We had to take a more radically direct approach to this, first going through all the
teams and understanding their pain, their issues, and what motivated and drove
them. This took the best part of three months.
Once this initial phase was finished, we came up with a boundary framework,
something that could be used to define inputs and outputs for all the different
services. This would help have a centralised monitoring and security stream
and allow a full view of the system. We also encouraged centralising some of the
deliverables that were not central to the team “raison d’etre” like how to spin up
containers, maintain a container repository, or security scan new components.
In Conclusion
Transformation is a very hard business — if it was easy it would not have fostered
a whole industry of its own. There is a lot to read out there, and a lot to learn from
going through the pains of driving a transformation yourself. Remember to set the
right expectations and take a proper investigative approach — usually people under-
stand the problems but that does not mean they fully understand the underlying
issues. Make sure you have the right support from Leadership and stakeholders.
Be very aware where things can go wrong, as you will have to correct things
quickly and smoothly. And always try to accomplish a successful transformation
with open communication, setting the right expectations and timelines.
I hope to have inspired you enough to have a try at this — it is certainly more chal-
lenging than Engineering, at least for me! Safe journey.
74
Chapter 6: Cultural Approaches to Transformations: Staying Safe and Healthy
Marc has contributed code to several projects including Puppet, mcollective, Juju,
and cloud-init, and helped create MAAS. He loves solving complex infrastructure
problems and applying solid and repeatable solutions. He is also an expert in build-
ing up agile engineering teams.
Acknowledgments
For their inspiration and collaboration, I would like to thank Mark Miller,
Derek Weeks, Matt Saunders, Jack Moore, Alex Dover, Jordi Duran, Tony
Chapman, Stephen Thair, Paul Heywood, Dominique Top, Nic Ferrier,
Avleen Vig, Juan Negron, Robert Collins, Michael Man, Chuck Short, Ivan Pedrazas,
Chris Turvil, and Marcus Maxwell. Thanks for leading the way.
I would also like to thank the reviewers for all their amazing feedback: Alek Kras,
Binyam Asfaw, Baptiste Durant-Bret and Ai V.
75
CHAPTER 7
CHAPTER 7
In my consulting job with Contino, we partner with the world’s largest regulated
enterprises and try to help them with their cloud journey and adoption of DevOps
practices. We place a strong focus on making security a major part of both the
cloud and the operating culture of these organizations that partner with us. This
level of access gives me a pretty unique and validated understanding of the indus-
try’s security posture. While I am not going to be discussing any of my clients
(past or present) in this chapter, I will be pulling from my 20 years in the industry
seeing DevSecOps applied, or more commonly misapplied.
78
Chapter 7: The Seven Deadly Sins of DevSecOps
As much as I no longer follow the dogma, a lot of the teaching is still very much
part of who I am.
It is said in Roman Catholicism that each of the seven deadly sins is uniquely bad.
Any time one of these sins are committed, we must confess them and do all that
we can to not transgress again. Applying the DevSecOps context, each of the fail-
ures discussed in this chapter are an opportunity to reflect, inspect and improve
our own DevSecOps practices every day.
Pride
Change is hard. In my everyday conversations as a DevOps consultant, I speak to
many organizations and leaders across the globe. Some of these are customers, oth-
ers potential customers and some just folks asking for insights. The one common
theme throughout is that the desire to change the operating model and technical
posture of the organization is not nearly strong enough to overcome the forces
that resist the change.
One of the clearest markers for limited change I can point to comes from the 2019
Sonatype DevSecOps Community Survey. When asked how teams are informed
of DevSecOps issues, the results are broken into two groups: respondents with
Elite DevOps Practices and those without. Among the two groups, you see fairly
similar results, with being informed from the Security Team as the most frequent,
broadcast emails second, manager/boss third, and customers (really?! ffs) fourth.
But the other response is where the interesting bit lies. The chasm between those
teams that were informed from tooling that have elite DevOps practices (63%) and
those without the DevOps practices (39%) is significant.
Now, does that mean there is limited or no security scanning tooling in those not
being informed by the tools? I would wager a guess not. Having spent 17 years in
79
Epic Failures in DevSecOps
highly regulated enterprises myself, before moving to consulting, I have seen first
hand (and have the scars to prove it) that tooling is not the issue here.
But automated tooling can cause problems if it does not clearly support the goals
of the organization, or the mindset required to configure or interpret the tooling
outputs is lacking such that the tools are rendered useless. One blatant example
was a team I assumed leadership over that had a security scan that emailed you
every 30 minutes if there were no issues found. Read that last sentence again.
Right, you’re back now. So if there was a security issue, you would not get the
email. This pattern literally created a noise channel. Within a week of being there
I was added to the alert emails, and in under an hour questions were asked. It took
some time for me to find someone that could intelligently talk about the use of
this alert since most had created an email filter to stave off the annoyance.
This point is further validated in another finding from the Sonatype report:
80
Chapter 7: The Seven Deadly Sins of DevSecOps
Greed
Have you ever seen “Hoarders?” It’s a reality TV show that dives into the story of
a person that exhibits the attributes of the compulsive hoarding mental illness.
They deep dive into the life of someone who has packed their house with a life-
time of collections, newspapers, jars, dolls, or whatever else they have come across.
It’s a portrait of the imprisonment of the afflicted person. Despite a strong desire
to break the cycle, they can’t do it without outside help. The show then attempts
to help by bringing in loved ones and professionals to course correct the disorder.
Usually with happy results.
Most teams I have worked with in my career are brownfield teams. The software
code they work in is usually older than five years. It is the aggregation of multiple
languages and a hodgepodge of libraries — open, closed and custom sourced. Too
many times, these monolithic systems have little to no automated unit or func-
tional tests. Sometimes there will be superficial security scans, but pretty poor by
modern standards. These are code hoarders. Admittedly, the teams themselves are
not greedy. They did not desire to have this much debt. By several means (often
organizations that value output over outcomes or devaluation of SOLID princi-
ples) these systems have been created by a nameless cadre that likely have never
written code in their life, nor do they express empathy for the danger of working
in such a codebase. I affix the greed to that group.
48%, or nearly half, of the non-elite DevOps organizations have no standards for
how open source or third-party libraries are used in their ecosystems. Given there
are no standards, it is not a massive leap to assume they have no f*&#ing clue
which libraries are even in the codebase. In the Wrath section, the concept of the
81
Epic Failures in DevSecOps
Software Bill of Materials is further discussed, but this concept is a massive secu-
rity vector that should be applied in these regressive organizations.
With this degree of code sprawl, we should look at the cost of security breaches to
quantify the risk in such code bases. Globally, the average cost per security breach
in 2019 was $3.92M. That cost balloons to $8.19M if you live in the United States.
In 2019 it took an average of 279 days for a security breach to be discovered by the
impacted organization. With a delay like that, it is interesting to understand how
the cost of that breach hits the organizations financials. As you’d expect, the largest
impact is felt in the first year, 67%. In the second year, 22% of the financial impact
of the security breach hits the company’s books, and then there is an additional 11%
that lingers for years three and longer.
If you are familiar with American Baseball, you may recognize why I call this the
Bobby Bonilla contract of DevSecOps. But the New York Mets only signed that
contract once (they signed the infamous contract in 2001 and it pays $1.5M per year
till 2035). Unfortunately for us, this is a recurring cost for each and every security
breach that is realized. If you are involved in security, data, release management,
product management or any number of other areas and intend to continue to be
gainfully employed, it is imperative to stop being greedy with your code, and start
building a mindful security roadmap.
Lust
Who doesn’t have a plan for utilizing Kubernetes yet? Anyone?
Unless you’ve been living under a rock for the last two years you know that
Kubernetes has been garnering a lot of attention in the technology space.
Kubernetes is an open-source container-orchestration system for automating
application deployment, scaling, and management. It was originally designed by
Google, and is now maintained by the Cloud Native Computing Foundation. It
aims to provide a “platform for automating deployment, scaling, and operations of
application containers across clusters of hosts.”
82
Chapter 7: The Seven Deadly Sins of DevSecOps
83
Epic Failures in DevSecOps
Gluttony
Working in the federal space seems exceptionally difficult. I have gotten close,
working on a Federal Information Security Management Act (FISMA) compliant
project once or twice in my career, but for the most part have tactically missed out
on working for long periods in the federal space. I like to think that those who
do work in the federal space have our federal systems secure and locked down,
but as we have learned through the public record of what has happened with
Edward Snowden and Wikileaks, even the most hardened systems are vulnerable.
But what is unique about the U.S. government is that FISMA requires the Office
of Management and Budget (OMB) to report annually their findings of incidents
around cybersecurity.
Let’s break this down a little. At the meta layer, it’s natural to think the U.S.
government will face cybersecurity attacks. Every black hat, from nuisance to bad
actor likely sees the U.S. government system as its white whale, and for some rea-
son decides to occasionally throw a harpoon at that whale. But what is interesting
here is the increase in incidents. As a cynic, I am quick to assume the bureaucracy
of the federal government is likely not allowing for real-time threat detection,
vulnerability remediation or artificial intelligence to derive typical patterns so
that one can detect atypical patterns around access or data query. And sadly, my
black cynical heart is proven right. Let’s deconstruct the five “major” incidents (all
disclosed in the public report):
84
Chapter 7: The Seven Deadly Sins of DevSecOps
“On May 11, 2017, the Acting CISO of DHS’s Office of the Inspector General (OIG)
reported to DHS officials that the sensitive personal information of 246,167 DHS
employees had been discovered on the home computer of a DHS employee. An
additional 159,000 case files from the OIG’s investigative case management system
were also found.”
Say what? OK, but how critical could that data be? I mean, let’s not get ahead of
ourselves. This could be data like an employee’s start date that is less critical. But
no, the data that was compromised contained the names, Social Security num-
bers, dates of birth, positions, grades, and duty station data for the ~250K federal
employees. In layman’s terms, someone walked out of a secure, federal building
with a quarter million people’s most valuable PII (personally identifiable informa-
tion) on (presumably) a thumb drive, and it set off no alerts. According to the DHS,
they will be doing better. They are going to implement more security and data
access controls as well as look at usage patterns of those with access.
This data-retrieval tool was largely used for the application for federal student
aid. The compromise started in 2016. A group of hackers found the exposed data
and used it to manipulate the student aid system. They were able to file upwards
of 8,000 applications and fraudulently collect up to $30M in student aid money.
There were no automated systems in place to detect the fraud, there were no data
or endpoint scans in place to identify the open access and there were not even
manual checks in place to validate the funds were distributed to legitimate people.
This type of failure is of critical importance because it shows how asleep one can
be at the wheel. Even with the (sometimes) security theater of compliance such as
FISMA High, it is still possible to architect and release systems that expose PII that
can be acted upon to steal taxpayer money.
85
Epic Failures in DevSecOps
While the United States government has access to a high amount of tax-payer
backed funding, they seem to overlook the most basic of security patterns available.
There is an appearance to feast upon the learned behavior and operating models
of the past. This feasting has bred complacency, and from the complacency has
emerged issues. Thankfully, there is a recent ray of hope. As of the 2020 National
Defense Authorization Act (2020 NDAA), there is now a requirement within the
Secretary of Defense to establish the use of Agile DevOps development practices as
an alternative for the joint strike fighter autonomic logistic information system.
So, while the gluttony of the past typically predicts the gluttony of the future, it
is refreshing to see a bill passed that requires the federal eye be gazed upon more
modern security approaches.
Wrath
May 2017 I was working for a larger enterprise in downtown Philadelphia and
I was running my own development team for a much larger line of business.
As is the case with large corporations, we all directly reported to a Senior Vice
President (SVP) who worked for the Chief Information Officer. This organization
did not have a CTO, so the SVP was effectively the CTO. I had worked for him for
about four years at this point, and we had a “trust but verify” relationship, it was
bi-directional.
86
Chapter 7: The Seven Deadly Sins of DevSecOps
The team went from one of the worst performing groups in the organization to a
pinnacle of engineering both from a resiliency and from a developer experience
perspective. The SVP himself even noted to me a few times how impressed he was
with how this team had rounded out over the years.
On this particular morning, I saw that SVP frantic. I’d seen it before. While he
was typically mild mannered and much more reserved than the typical person, he
would have his dynamic moments. He was bolting across the 18th floor. I hadn’t
even reached my desk, I was still wearing my jacket and backpack. When he was
like this, I knew it was best to avoid him. In spite of my best efforts, there was a
momentary locking of the eyes, and he made a bee line for me. He looked at me
and said, “Oh thank god, your team is a .Net team. You are fine….” My curiosity
now piqued, I asked what he was talking about. At that moment, he was the first
to tell me about the yet-to-be-named attack now known as WannaCry. After get-
ting him to slow down a little, I asked a little more detail. I got the five-cent tour of
an executive’s understanding of a ransomware attack that he had just been briefed
on 30 minutes prior. Neither his explanation nor his assertion that we were fine
because we were a “.Net shop” resonated with me. To make matters worse, I then
was told “Oh, you’re also in the cloud. You are fine.” At that point, I knew I had to
scrap my morning plans and really unpack whatever the hell was really going on.
After 5-7 minutes of Googling I knew I was not, as he said, fine.
Over the course of my time with that team, I had done a few smart things — I also
have my share of bonehead moves there too. One of those smart things was to
hire a couple of really, really smart people. One of those people made an effort to
87
Epic Failures in DevSecOps
start an inventory of all of the software we had running on our systems. Today, I
know this is called the Software Bill of Materials (SBOM) a practice, but at the time
we just called it the “inventory.” I pulled together a SWAT team to assess which
items in the inventory would be most susceptible to the struts vulnerabilities,
and which would be the next most vulnerable. We then used a set of tools to do
vulnerability scans (I believe we were principally using masscan and nmap at that
time). Sure enough, we found vulnerabilities. I made sure we followed procedures
to remediate the very few incidents we found, and we continued to scan lesser
susceptible areas.
That spring day, a few poignant lessons were inadvertently taught to me:
» Never, ever, go on word of mouth from anyone as to the security of your sys-
tem. If the code doesn’t say you are safe, you are not safe. Even when the code
says you are good, trust but verify.
» Too few organizations and teams really know all of the software running
on their systems. Today there are a few tools on the market that automate a
lot of the vulnerability or automate the versioning of libraries (such as the
Nexus Repository service), but the only way for these tools to be effective is by
creating a security-first and zero-trust mindset in the engineering culture of
the organization.
» Software Bill of Materials (SBOMs) are baller. If you don’t have one for your
project, you really should create one.
Sloth
Back in 2006, I was an ambitious, corporate, technical project manager. Five years
out of school, with a job that I hoped to have within the decade, I was living the
dream. I was working for a global enterprise that was fully invested in the Oracle
e-Business Suite. Nearly all of my work was focused on that core ERP and asso-
ciated applications. The only work I did outside of Oracle customization and main-
tenance was related to the management and migration of servers and data centers.
That year I was asked to lead one of the projects associated with a major upgrade
88
Chapter 7: The Seven Deadly Sins of DevSecOps
of the core Oracle eBusiness Suite. We were aiming to upgrade the software from
version 11.5.9 to 11.5.10.2, this was a version skip which we regression-tested and
feature-tested for months, to assure all critical business systems would function
fine after the upgrade.
We had (and I was responsible for) a massive Microsoft Project Plan. It was a thing
of glory — hundreds of lines, loads of start-to-finish and finish-to-start dependen-
cies. Tasks were allocated and milestones were defined. What it lacked in accuracy
it made up for in pageantry. Speaking of pageantry, the status reports and status
meetings were the Belle of the Ball! Weekly we’d meet and review a two-slide
status report. It only shaded amber when we wanted to make a team move faster
than they were, but held strong at green for the majority of the project. The entire
project took about six months from start to finish. Each month, waterfalling into
the next.
When I woke the next morning, I made a fresh pot of coffee, made sure the turkey
for Thanksgiving went in the oven, and got back to jamming on my trusty IBM
Thinkpad. Much to my delight, the upgrade was still rocking along to plan!
Around noon, I remember us doing the smoke tests which comprised of some sim-
ple features, transactions and an innocuous data comparison. That data test just
89
Epic Failures in DevSecOps
ensured the data that was in the tablespace prior to the 11.5.10.2 upgrade was still
there post. We had run this at least three to four times prior in the lower environ-
ments, each time it passed. This time, in production, as my turkey cooked and my
family was showing up to start watching football and eat, something unique hap-
pened. It failed. There was less data in the new tablespaces than in the old. During
the upgrade, we lost data. But that cannot be. We tested this, a number of times.
We followed the documented migration path. We did all the right things, damnit!
The next six hours are honestly a blur. I know we had a hotline open with Oracle
support. I know I had calls coming into my (super sweet) Nextel mobile from
folks whose names I had only seen high up on the org chart, and never spoken
to live before. I also know that we made the call to backout the upgrade and
use the Oracle migration utility again, to try to reinstall 10.5.10.2 again, hoping
a step was missed and a fresh reinstall would fix the data issue. Once that call
was made, we signed a fictitious contract to power though at least 12 hours of
new work, er — re-work. At this point, I had a family eating in my dining room,
discussing all they were thankful for in that year, while I am in my basement
pacing and tearing out my 26-year-old hair (yes, I did have hair at that time). As
this was my project, I was under the microscope. I had to make a call to the CIO
every 30 minutes at the top and the bottom of the hour to give an update, even
if the update was “no update.” It was brutal. I remember forcing myself to stand
because I was afraid I would fall asleep if I sat down. At this point, I had effec-
tively slept 1.5 hours in the past 36. And there was no sign that a real sleep was
coming anytime soon. As Thursday switched over to Friday, we finished the rein-
stallation. We ran the smoke tests again, and all I wanted to see was a different
result than we had seen the day before. Sadly, I got my wish. No, we didn’t find
out the data issue went away, we found out it got worse. More data was missing
this time. FFS.
If things were tense before, we just went into the North Korea level of tension.
Finger pointing and blame took over what I had previously thought were some
of the top technical minds I had met to date. We went back to the Oracle support
team and found ourselves in a priority status that we had not heard of before,
some super high “drop all other shit” status level. We spoke to several core Oracle
support and engineering leads. At this point, we spoke to a gentleman named
Mohammad based in Egypt. He was an on-call support person, that happened to be
pulled in. He was far from the senior folks we all were locked in on hoping praying
they would find a solution to this situation. Mohammad dug into the logs, asked
for a copy of our backed up data and disappeared. As we thrashed on the Oracle
90
Chapter 7: The Seven Deadly Sins of DevSecOps
support call, and slung half brained theories on what we could do via instant mes-
senger, Mohammad hammered out a script. He seemed to have found a small issue
somewhere between our data and the migration scripts that likely would cause a
race condition, so he added a patch to the script to handle the error.
After some interesting conversation about the validity of both Mohammad’s error
hypothesis and the effectiveness of his script, we agreed to re-install, a third time,
using this patched script. Given the time required to install the whole upgrade,
we made the decision to split the team, so we could install simultaneously in pro-
duction and to the non-production, staging environment. Given the time required
to install in staging, then again in production, we would have breached the drop
dead time for the upgrade to be completed, without impacting critical business
process scheduled for Sunday evening such as customer billing and end of month
batch processing. As the fresh install was coming to an end, my sheer exhaustion
was shifting to optimism. I was starting to blindly trust we were heading to a
happy ending, again we had never failed on a major release. In the non-produc-
tion, staging environment, we completed the install faster, because it had far less
data than production. We ran the smoke tests and everything passed. But that
meant very little since everything had passed there prior to Mohammad’s patch.
Some time later, production was re-installed with the patched script. On a call
with five people, I remember waiting to hear the result of the smoke test live. Each
test slowly came back as passing. Then we reached the data comparison test. It
was likely a five-minute test at most, but it felt like an eternity. And then my lead
DBA spoke in a faint accented voice and said, “it passed.” In a somber, monotone
voice, he said the two words I wanted to hear most in the world! We then started to
mobilize everyone. The functional testers, the business team, the CIO (of course)
and then the compliance team to let them know we had cleared the blockage. Or
so we thought.
91
Epic Failures in DevSecOps
reverse the upgrade. Restore the RMAN backup. Reset the entire production envi-
ronment back to a state like it was on Wednesday evening, the end of the business
day in California. We completed and tested the backout on schedule, and allowed
for the business critical processes to run Sunday evening. We failed to upgrade
production.
Compliance as code was, at best, in its infancy in 2006. It certainly was not a reality
at the enterprise at which I worked. Modern build/test/deploy pipelines are being
built to allow for the most simple of compliance checks, to the extreme, to be
represented as injectable rules. Monitoring systems provide a feedback loop to
the health of the change and the overall system in near real time. Had this been in
place at my company in 2006 for this upgrade, we could have very quickly asserted
the Sarbanes-Oxley compliance checkpoints. But it wasn’t, so we couldn’t. But here
we are, 14 years later.
I now consult for large enterprises on how to better enable healthy systems and
build elite teams. Very few of these enterprises have ANY compliance as code
framework in place!. There are an army of vendors and tools that will line up to
offer you the blue pill (or is it the red?) to solve this with a license, but to me, the
epic failure in this state for most all enterprises is the resistance to partner the
compliance organization with the engineering organization. There is a visceral
distance in some organizations between these groups that prevents doing what is
right for the company, the users and the intent of the regulation itself. As the new,
digital-first cohort of leaders come to board rooms and executive meetings, it is
imperative to not only address this divide, but close it ASAP.
Envy
Perhaps the most egregious of all DevSecOps sins is the belief that tooling will
save you. Mark Miller and I have had friendly discussions about this in the
past. One of my core beliefs is that security must be embedded into the culture
of any successful team/organization. Too often, security leaders fall into the
trap of looking at white papers or marketing materials from vendors about
how their peers or competition are securing their systems with various tools.
Using simple correlation, you can leap to the conclusion that a team’s security
posture is a result of the tooling. And, trust me, the tools are the commodity. The
bespoke item in secure teams is the security mindset and capabilities. There is a
saying “Treat your keys like underwear, never let anyone see them and change
92
Chapter 7: The Seven Deadly Sins of DevSecOps
them often.” This speaks to my point. Be aware of best practices and be rigorous
enough to stay one step ahead of the bad actors, rather than obsessing with what
your contemporaries are doing.
Sure, be alert and aware of what is going on in the market. Stay current with NIST
policy changes. Watch which commercial tools are coming into the market to
secure the most recent tech. But first and foremost, build out a DevSecOps mindset
and community of practice within your walls.
One of the largest US financial services companies, well known for their progres-
sive, early adoption of the cloud and modern take on DevSecOps had a very unique
position on internal upskilling during their rapid journey atop the technology
ladder. They created an in-house university. This group has a remit to “put the
employees in the driver’s seat,” but the reality is they are shifting capabilities such
as security, automation, quality, and provisioning left by making an active invest-
ment towards the capability uplift of the team, and attaching that to annual incen-
tives. So if there is one group to be envious of, it is the one that actively invests in
security capabilities and DevOps best practices within itself.
93
Epic Failures in DevSecOps
Acknowledgments
Thank you to my beautiful wife, Emma. You stay at home so I can travel the world
pretending to be a smart guy.
REFERENCES
» www.en.m.wikipedia.org/wiki/Kubernetes
» www.whitehouse.gov/wp-content/uploads/2017/11/
FY2017FISMAReportCongress.pdf
» www.congress.gov/bill/116th-congress/senate-bill/1790
94
CHAPTER 8
by Chris Riley
Epic Failures in DevSecOps
CHAPTER 8
It all began with the CEO being unable to send an email. Back in the day, Lotus
Notes was the king, and applications ran on hunks of steel, not virtual machines.
One day, at one of the worlds largest tech companies, the inability of approxi-
mately 13,000 employees to send and receive email was the catalyst for sheer
panic. The culprit was a light switch. Names have been changed to protect the
mostly innocent.
Bill was on call. It was 1995, and the best notification method, at that time, for
being on-call was a brick of a pager. Bill’s company was not willing to invest in
cell phones. Email, in this case, was out of the question. It was a calm week, and a
smooth Friday. Bill finished his week by doing the rounds at the data center after
racking a new box for a DB2 backend. When Bill’s work was done he was eager to
head out and to start the weekend with a hot date. Shortly after he had picked up
his date, he received a page. His date thought it was a drug deal, but he knew it was
a different sort of trouble. Apparently the executive assistant of the CEO called the
director of IT, who looked at an on-call spreadsheet naming Bill as the victim. This
was not the typical process. Usually there were many more hoops, but at the time,
the key stakeholder was able to escalate the situation with ease. The best option
for Bill was to pull over, locate the nearest phone booth, and call in. He was greeted
with, “Bill, the CEO can’t send emails, what’s going on?”
In 1995 the criteria for being a Jr. IT Manager was as easy as taking one computer
science course, and showing an aptitude. Bill enjoyed his computer science class,
and he was ready to enter the real world of IT. At this time there was also a massive
move from mainframe applications to a client-server. Little did Bill know that he
was going to get the keys to a massive datacenter, previously a hardware assembly
plant, where mainframe to client-server migration had just taken place, and every-
one was flying just a little blind.
96
Chapter 8: Did You Try Turning It Off And On?
If you are a Jr. IT manager like Bill, and get this call, everything puckers up. There
is no way not to feel guilt and pressure to get things solved. During this time, the
blameless culture did not exist and outages nearly always had an internal victim
with long-term consequences.
Back to the datacenter Bill went, and fast. When he got there to check the Lotus
server, he noticed that it had recently restarted and was in the process of booting
up. Problem solved? Not the definition of self healing most hoped for, but it did
explain the outage, and hopefully the restart was the solution. A moderately
equipped Lotus server at the time took a good 30 minutes to be up and available.
After 30 minutes, email was up and running, but the root cause was not yet clear.
After doing some digging in the logs there was no clear indication of what was
going on. There were no flood of errors just prior to the reboot. Bill did not want to
lose sight of what was most important at that time, his date! He decided that this
was all he could do for now and chalked it up to some fluke reboot. This is where
an experienced IT would have known that things were just beginning.
Bill made a note to troubleshoot next week and got back on the road. Fifteen
minutes later, another page. Instead of finding a phone booth, he went straight to
the datacenter and picked up the phone, “Bill, it’s down again.” What was going
on? Not only was the Lotus server rebooting again, so was every machine on that
particular rack, but none of them showed obvious errors to justify the reboot.
Without clear indication it was something with the server itself, and not related
to the other servers which also restarted, all that was left was power. Bill turned to
look at the wall, and there it was. The light switch right by the door with a slip of
blue painters tape dangling in defeat. That switch was the bug and the tape a very
poor safety mechanism. That particular rack was mistakenly on a switched outlet
intended for extra lighting and Bill had flipped the switch. Twice.
When the next week rolled around this particular case had high visibility, and
the post mortem was a key asset to figuring out a long term solution. Don’t use
switched outlets and invest in auxiliary power. The post mortem looked some-
thing like this:
97
Epic Failures in DevSecOps
While it was not a permanent fix, the short term fix was a mound of tape on the
light switch and a sign that said, “DO NOT TURN OFF.” The runbook for future
incidents, in this case, would be to turn the power back on. It was so simple. But
the impact of the lost emails on a multinational company was tremendous and
not trivial. Not only was it a threat to the business operation but it was a clear path
for any would be saboteur.
Bill was lucky that no drives were lost and everything came back up after a hard
power cycle. But this incident left him feeling stupid, left the IT organization
98
Chapter 8: Did You Try Turning It Off And On?
embarrassed, and an entire company with lost emails. The level of redundancy
that we have today, did not exist back then, thus a days worth of lost emails cre-
ated many issues. It was an undue risk, with potential revenue loss, compliance
and lost efficiency.
There is clearly a major flaw in your infrastructure if someone can take it down
with the flip of a switch. Where was the UPS? In Bill’s case this was a physical
switch, but many switches that are as simple to toggle and as widely exposed still
exist in every tier of an application.
I enjoy this story for so many reasons. First, at Bill’s expense, it’s hilarious. Second,
because it’s retro. We know now that no datacenter would ever be setup this way,
but back then data centers were just places to store and plug things in. What nerd
doesn’t like to talk about the good ol’ days. The story also highlights the big impact
of seemingly innocuous problems. During those days of IT, things were more
rudimentary than they are today, which made understanding what was going on
even easier. Modern applications and infrastructure have become more complex
with many more moving parts. The ratio of simplicity to the magnitude of impact
is fun and worth exploring in modern day instances.
Engineers love to talk about the KISS (Keep It Simple Stupid) principle, but they
rarely follow it since engineers spend most of their time grappling with large
problems. Thus, when something breaks, we expect it to be equally complex. The
semicolon was the death of my development career building Optical Character
Recognition (OCR) engines and Genetic Algorithm (GA) SDKs. The fact that I
could prevent a compile of any code with a single missed character within large
chunks of complex algorithms, drove me nuts. I was either to be committed, or
admit coding was not for me.
99
Epic Failures in DevSecOps
Turning things off and on seems funny, but still in many outages it is a common
catalyst for issues. We just use different language to describe it, rollback, re-route,
and flags. Many of the runbooks written today are scripts filled with methods for
rolling back services. If it’s not a rollback it could literally be restarting an instance
that is pegged, or cleaning up drive space. Essentially, often the problem and the
solution are fairly basic. However, the impact of these basic issues can be dramatic
and ironically we may not even be spending time on the root cause as it seems too
trivial to be the solution.
Toggle Safely
Bill’s light switch is a basic toggle. Toggles have become increasingly more com-
plex in the application and IT world, and are found everywhere, including being
instrumented in code, and in scripted infrastructure. The virtual switches have
the same impact as the physical one.
100
Chapter 8: Did You Try Turning It Off And On?
There is no stopping the spread of toggles, nor should organizations aim to do so.
Without the flexibility that smaller, switchable, and more portable chunks of code
provide we would be moving in the opposite direction of application features and
velocity. The monolith is the exact definition of this, and in Bill’s case the mono-
lith included the server, and the power it was connected to. That is it.
So get limiting toggles out of your mind and, in fact, be ready to increase applica-
tion integrity and security by introducing more. The best way to frame a strategy
to address the modern digital switch problem is more testing, smarter design
patterns, better monitoring, and better alerts.
Continuous Testing
Anticipating that toggles in the application will change, potentially even in
production, is an important thing to acknowledge, but also not fight. The trick
is being aware of when this happens, and the potential impact of an off/on.
Classically when applications end up in production, the monitoring is limited
to the basic traffic coming in and infrastructure up time. While this increases
awareness on the big buckets, such as the plug being pulled, it does not address
the complexities of modern applications. That is why teams need to keep testing
beyond deployment, extending the test automation into production. There you
can have a test suite that will continuously poke at the application in production
so that any inadvertent changes to code, rollbacks, feature flags, artifacts, and
infrastructure changes are visible and their impact known. Because a test suite
goes beyond up/down status it is able to reach a more critical depth of understand-
ing by checking for the relationship of components and the functionality they
support. Continuous testing should include:
» Contracts
101
Epic Failures in DevSecOps
input and what they output, in specific ways. If you are going to instrument a ser-
vice that relates to another service, these contracts are essential to building your
functionality. Also in test automation, make sure that as services are they continue
to play nice with each other and if your contract has a flag/toggle/switch with an
expected state, that it is actually delivered. Microservices very often have their
own lifecycle, so in addition to not being able to communicate directly about func-
tionality, teams are not always aware of where a service is in it’s latest incarnation.
They should not need to, or they need to delay their functionality because a related
service is not done with theirs.
If you are super brave, you can and should also include the chaos test. Flip those
switches for funsies. Kill a pod, toggle a feature, rollback services, just to see what
happens. Sounds scary, but chaos will happen, and it can be automatically gener-
ated in a manageable and visible way.
This is also because modern applications are not far removed from their infra-
structure, and with Kubernetes and containers they are basically one in the same.
Design patterns impact the application at all tiers now. Organizations need to
consider how they build their applications and standardize design best practices
across the team. This standardization does a lot of things in terms of making appli-
cations easier to test, easier to understand when things break, and better correlate
the relationship between various components.
The goals we are trying to achieve with better design patterns are more isolation,
mitigated failure impact, faster failure identification, and faster resolution. These
are what the below suggestions are based on.
102
Chapter 8: Did You Try Turning It Off And On?
» Be Stateless and Event Based: Without states you don’t necessarily have a
switch to be flipped or at least one that is scoped to the entire application. I
am not implying that stateless architectures are a bulletproof firewall, but it
does limit the impact of bad requests and forces a completeness in adoption
and creation. Being stateless and/or event based also forces the services layer
to be the lens for the entire application.
» Test Driven Development: Write your functional tests before you write the
feature. The nice thing about contributing to a book like this is that I get to
be prescriptive for things that I myself would not do. But this is also why no
one should hire me as a developer. Test driven development feels like you are
being punished even before you did anything wrong. I get it. But the feeling
of success once you do complete the feature, and it passes the unit test is tre-
mendous. The big thing that you gain with test driven development practices
is a buffer. Tactically you are doing a better job at validating the features you
write. But it also creates a bugger for the developer before they crank out new
code. They will think about that code’s impact on the overall code base and
surface areas, where it might cause problems, be it functionality, potential
vulnerabilities, or breaks in internal best practice. Unit tests should be seen
beyond just their functional test value.
103
Epic Failures in DevSecOps
» Robust API Gateway: If you allow your services a lot of flexibility, but have
your API gateway as an overall control plane for all communication, you
can build safety nets into the API gateway to prevent basic changes causing
catastrophic events. It’s almost a monolithic communication overlay on a
non-monolithic application.
While it’s not really a design pattern but just a good practice, development teams
should be continuously refactoring their code to remove outdated and unused
code blocks or validating that new code written does not duplicate something that
already exists. This is extremely important and good developer hygiene. Easier
said than done. This has to be systematized through the development team, and
it also has to include your test suite. In your test suite, not only are you wasting a
huge amount of time running tests on functionality that is no longer used, you are
exposing its existence.
Do not write a new function that is a ctl+v of another where you are just making
incremental changes. It may feel like it will take you longer to understand an older
function to update, than to just recreate it, but the risk of abandoning it and it
becoming an issue later is a far greater pain. Just don’t be lazy. Before diving into
new code, make sure it does not deprecate old. If it does, get rid of the old. Test
driven development can also help identify this, but it can’t force the habit, it has to
be standardized across the team.
Better Monitoring
For the longest time application performance monitoring and infrastructure
monitoring have been a process of record, and move on. At least move on until
something breaks, much like documentation monitoring was treated as a “set it
and forget it” until it’s needed practice. As development teams evolve into DevOps
practices they are seeing monitoring as a more proactive tool, and even a tool for
self healing.
In the world of cyber security, monitoring is pretty mature. Security Event and
104
Chapter 8: Did You Try Turning It Off And On?
Incident Management (SEIMs) are a set of well known tooling and practices. The
same practices fit within DevSecOps but what is tremendously different is that the
end-points are not just servers and devices, they are snippets of code. The vulner-
abilities found in applications go deep. And the deeper they are the harder they
are to discover. So monitoring needs to be top down from client to app as well as
inside out from code to client. This is where modern monitoring tools come in.
Development teams need to understand two key things about monitoring their
applications.
» The tools are only as good as the information architecture that has been
established for the data they ingest. Information architecture is how infor-
mation is organized from sources to tags. But also how it is consumed, from
dashboards to individual access.
» The tools are only as good as their instrumentation. If you want full visibility
you need to instrument your monitoring at the code level up. I wish there
was an uber agent that could pry into every nook and cranny of applications,
but generally developers need to take monitoring to the next level by intro-
ducing specific tools that can monitor at the code and API level.
The basis of monitoring tools is the same; collect some data, show some data. But
they have become more and more specialized in the way they collect and display
information. Concepts like real-user monitoring (RUM) and exception monitoring
have entered the market as more code-level monitoring approaches, as well as
distributed tracing for more modern infrastructures. At the end of the day a good
monitoring tool can be manipulated into covering your entire stack, but finding a
good vendor partner that can help you get there in the most comprehensive way is
important.
The other trend with monitoring is the ability of monitoring tools to take on
first-tier remediation. This is starting the trend of self healing. Tools can now run
scripts to solve common issues such as shutting down vulnerable services, or
doing a rollback. While this functionality is not a requirement, it soon will be, and
it is good to plan for it.
105
Epic Failures in DevSecOps
Machine learning is also a new and common term thrown in the monitoring
conversation. There is intelligence regularly being built into monitoring tools by
some really smart data scientists. It is not the singularity and it’s not magic. Where
it is being deployed has very clear problem/solution value. Don’t expect magic
when you hear “machine learning,” but realize there are tactical use cases, and it’s
better to focus on the use case than the term.
Using terms like self healing, machine learning, and distributed tracing are cool
party tricks, and they also provide real value.
Better Alerts
Like monitoring, most alerting is considered the process of barking. Like Bill’s
page. Alerting is not just about being loud to the people who need to know. It has
a lot to do with context, and everything mentioned above are ways to build more
context. Continuous testing gives more coverage, better design patterns give more
detail and better monitoring gives humans greater visibility when something goes
wrong. The next step is taking action.
The typical alerting scenario is this. You get the call because you are on a spread-
sheet, you acknowledge something bad has happened, then you go call your
favorite dev to help you fix it. There is this path of least resistance we establish,
often at the burn-out cost of our team mates. But modern incident response and
management should get alerts to the right person based on intelligence, giving
the responder great context to address an alert. Things like access to a runbook,
monitoring tool dashboards, and full traces. And then help them track and audit
the entire process of remediation, or bringing other subject matter experts into the
fire fight.
Bill in this case was alone, he was junior and under a lot of pressure from his
employer and his girlfriend. What modern tooling could have done for Bill was
let him know that it was not just the Lotus server that went unresponsive, it was
several, all on the same rack. Access to monitoring tools would have let him know
that nothing strange was happening on any of the boxes up to the point of failure.
This should have been sufficient for him to realize that power might have been
the root cause. Now in the world of IT the Universal Power Supply (UPS) could also
have thrown an alert, if they used them.
106
Chapter 8: Did You Try Turning It Off And On?
Ah the people problem, the most frustrating aspect of DevOps adoption. I rarely
run into people who simply do not want to collaborate and are still on team silo.
But no matter how well intentioned your team is, including you, when you get in
the thick of application development, testing, delivery, and support, over commu-
nicating is not a top priority. Our first inclination is to look to tools (is teamwork
a thing?) to help, techies are notorious for secretly wanting to be robots and reduc-
ing everything to if/then statements. People, because of this crazy thing called
emotions and ego, are not willing to fit into an algorithm.
The best system I have found is to adopt Maslow’s Hierarchy of Needs into a deci-
sion hierarchy of needs where no decision can be made until the previous need
is met. The first filter for every decision is communication. Can you answer the
following:
» Does this change impact a specific function/team that is out of your direct
influence?
The first question is easy to answer, except for the fact that you can be overly
inclusive and make a case for every change impacting the entire team. You have
to be discerning and always reverse back from the applications/services impacted.
The last question is the hardest. Because one of the biggest deterrents of long term
effective communication is when someone believes they have already communi-
cated effectively, but finds out later that communication went nowhere, and they
have to repeat themselves.
The repetition problem. Most people hate to repeat themselves, and you should.
The more you spend time repeating yourself, the less time you spend on higher
107
Epic Failures in DevSecOps
value tasks. But if someone does not listen to the message you deliver, then you
will have no choice. Applications are developed in a very utilitarian world and all
that matters is the outcome.
So the first filter in the Hierarchy of Decision Needs is the hardest to pass and the
most time consuming. Just like in documentation, the goal should be brevity. If the
communication does not deliver clear value or insight, then it should not be given.
This takes a lot of personality out of the equation, but that is the erratic factor you
are trying to control for. It should not impact the character of your team or your
culture that the communication outside of getting code shipped is casual.
ChatOps is a technology that has stepped in to help and has become a tremendous
tool, as long as it does not just become a modern version of a broken system. The
benefit of ChatOps is the tools force fast and brief messages. The team needs to be
skilled in crafting these messages, and making sure they are clear. If they are effec-
tive at doing so, then the ratio of shared information to repeated information will
drop naturally. So there is an incentive to do so. As ChatOps evolves the addition of
intelligent bots that retrieve historical communication based on new questions of
the team could be tremendous in avoiding repetition.
If the team can act like robots when it’s required to ship and manage applications
and as humans when it comes to collaborating, then the bi-modal operation can
be super efficient. As more and more automation becomes available, automating
the tasks make it so there is one less thing to potentially have bad communication
around. Don’t let bad communication and the people problem become a good
justification to avoid DevOps and embrace the monolith.
We have not really advanced beyond the light switch. There are light switches
all over our code and infrastructure. One of my new favorite ones is Feature
Flags. This is a super powerful design pattern. Feature flags is the ability to turn
on or off features on demand. Sound familiar. A similar incident to the physical
switch occurs when a feature flag which other features were later dependent on
108
Chapter 8: Did You Try Turning It Off And On?
is flipped off.
An example in the real world is a new feature that gives users the ability to add
items from a wishlist to a shopping cart. Once this feature was turned on an ele-
ment of it was tied to the cart itself, creating a direct relationship between the new
feature and the existing one, the shopping cart service.
At some point the product team decided that this feature was not where they
wanted it to be. It lacked adoption and there were plans for making it much more
robust by linking it to analytics that would drive users to move wishlist items
to the cart as quickly as possible. When they decided on this they leveraged the
power of feature flags to turn the feature off in production. The problem was that
now the checkout button in the cart was intimately linked to this feature, such
that if there were any items in the cart that were currently or at some point on a
user’s wishlist, the user could no longer check out. Exceptions were collecting as
users tried to check out, there was an abnormally high amount of cart abandon-
ment and support was overwhelmed with tickets. The first place the the team
checked was the backend, next the codebase for the shopping cart service, then all
the checkout functions. But the solution was as easy as a switch turned off.
You could argue that in this case there was a flaw in deciding feature boundaries. You
could also see how the sidecar pattern could have been a big help. But... would of, should
of. This is going to happen in your code base, kill switch landmines. The answer is
not to stop using feature flags, or stop allowing them to be toggled in production. The
answer is to know how to respond to issues, have a mechanism that gives visibility
into this manual change, and understand that the issue really could be that simple.
» First off, the KISS principle applies to incident response as well. The team
should know to not overreact or over assume the issue is something more
complex that it actually is. Techies like to be problem solvers and when the
alarm bells go off we can often go into John Nash mode and over complicate
the issue. Until you can validate that it isn’t, leave the possibility open for the
109
Epic Failures in DevSecOps
issue to be trivial and basic. Establish a procedure for high-level sanity checks
and go through it quickly. Is the power on? Is this an external access issue?
For example, try pinging relevant services and running health checks. Ideally
your monitoring tool will have done this for you already, as there is a good
chance this is the source of the alert anyway.
» Next, establish an escalation policy that is meaningful and efficient, not just
convenient. So many organizations get complacent, for good reason, when
an alert happens. They will call the one person who can fix everything, they
will avoid reading runbooks, and they will address the symptoms without
spending the time to identify the cause. In modern systems when one thing
breaks, often times so do many other things. Once you acknowledge an alert,
expect more. There is a good chance it’s all related. It’s important to triage the
incident to understand the root cause, because often times the root cause is
not the initial system that screamed at you.
» Once the root cause is established, be laser focused on remediating that issue.
» When the systems come back up, you are not done. Often times you can get
things back up and running by failover of the affected services, but there could
be other issues. The flood of errors can pile on for a while and things like perfor-
mance can be impacted. Understand your steady state, have a way of measuring
it and look for the steady state before you consider all things good.
You might think the answer is simply, “do not have switched outlets,” i.e. don’t
have literal kill switches in your code. However trivial that seems at scale with
massive code bases this is actually something pretty difficult to expect. Things
will break. Having an understanding of how your team will approach it is not only
important to build consistency in responding to issues and spotting trivial issues
faster. It’s also important so that the organization is not built on a team of insti-
tutional knowledge, where any changes in the team have a direct impact on the
response rate and mean time to recovery.
110
Chapter 8: Did You Try Turning It Off And On?
We have made huge strides in automation, monitoring, and alerting since then,
but there is more to come. Bill heard about the issue from the entire organization,
the CEO’s executive assistant in particular had a lot to say. This was not a time of
blameless culture and the infrastructure was setup in an unstable way.
Bill was on the hook. The aggressor of the Lotus outage now has an amazing career
as a … Sales Director. That’s right. And, “the date” later became his wife.
Ultimately, IT wasn’t for Bill, who now has an amazing career as a Sales Director of an
enterprise software company. That is how I learned his story, as he was training a team
on the very real day-to-day problems techies face. Bill can now talk with pure empathy
to any techie who is or was on call at some point and knows the feeling of business
critical systems being down where the solution is stupid obvious. Bill is now a tech
hero trying to mitigate and address these techie life challenges. Fortunately for Bill his
date did not only stick around after an annoying distraction to their evening, she is
now his wife of 20+ years. We all survive the major outages, but by being effective in
knowing how we will deal with them versus avoiding them, our lives, and the lives of
our team, will be better and applications more stable and secure.
111
Epic Failures in DevSecOps
Chris speaks and engages with end-users regularly in the areas of DevOps, SecOps,
and App Dev. He works for Splunk as a Tech Advocate and is a regular contributor
to industry blogs such as ContainerJournal.com, DevOps.com and Sweetcode.io.
He is also the host of the podcast, Developers Eating the World.
Chris obtained his Computer Science and Business degrees from Regis University
in Colorado and currently lives in Colorado with his wife and two daughters. He is
a fan of physics and psychology, and has an eclectic set of hobbies that range from
Genetic Algorithms (GA) to Mineral Collecting to LEGO.
Acknowledgments
Often times the best stories and knowledge transfer come from a stream of
consciousness. It takes an army, in my case especially, to make that stream sound
good. I want to thank all the copy editors, and practitioner proofreaders for tak-
ing the time to review and beautify my thoughts. I also want to thank a certain
coworker who made the meat of this story very real.
112
CHAPTER 9
CHAPTER 9
Let’s list some of the more dysfunctional approaches to advancing Agile culture:
» Big bang change.
Most of all: failure to change the management and governance. This is perhaps the
biggest issue of them all. Management is the lock on advancement. The primary
function of many middle managers is to control risk. They’re change-resistant by
nature. Add to that senior management who are blissfully unaware of their own
need to change, and advancement is going nowhere.
114
Chapter 9: Kill the Restructure
Complex System
Work is a complex system. It’s an organic soup of attitudes, beliefs, behaviours,
mood, vision, personalities... There are no crisp inputs and outputs, just energy
and activity in a network. You don’t know how to change it. Nobody knows, no
matter how much you pay them. Stop pretending that anybody knows what an
optimal organisational structure is until they’ve tried it. It is a patronising, even
arrogant, fallacy that anyone can know what a better structure is in advance. It’s
the nature of complexity. We can only experiment in increments. Structure must
be emergent, not imposed.
For Example
Look at DevOps as an example. The point of DevOps is to span silos, not to change
one set of silos into different ones. Changing from North/South slices into East/
West slices is still slices. We see too many enterprises assuming one of the first
steps of DevOps is a reorganisation. DevOps isn’t about org structures. You can
organise into functional technology silos with virtual product teams, or into prod-
uct teams with virtual technology functions and guilds. Either way it is a matrix.
Most legacy enterprises are organised into functional teams. Although the current
preference is for a product team instead and there is an argument that a team
should stay together over the long term, the reality is that the size of the team
will grow and shrink over time as the volume of change in the product varies.
Therefore only a core will be constant anyway — the teams need to be fluid. So
there is no downside to them being virtual teams taken from functional groups.
Moreover, the future is product teams brought together from multiple depart-
ments, not sourced exclusively from IT — someone from marketing, product
design, digital design, shadow IT teams, third party suppliers, vendors. All the
more reason for them to be virtual.
115
Epic Failures in DevSecOps
Get the DevOps working first, then a reorg may be an optimisation. When DevOps
makes your people realise they need to restructure, then it is time. Pull not push.
Restructuring
It is enervating when the goal of the restructure is explicitly “to be more agile,” but
it is done in this one big bang. The irony. Finite step transformations (from “as-is”
to “to-be”) are big bets. Agile is supposed to be about making small bets with min-
imum blast radius. A restructure is never that. If you’re truly agile, you will never
(seldom?) do a big-bang restructure again.
» And they turn one set of silos into a different set of silos.
OK, sure they work sometimes. But, if a reorg is successful it’s usually dumb luck
and bloody toil. The consulting firms will only tell you about their successes: even
a blind squirrel finds a few nuts. Some can be seen to have been done in a fairly
successful and intentional way. It takes some very good and very strong people at
many levels to pull it off.
But the exceptions just prove the rule, in fact I doubt there are exceptions that
truly “succeed.” It’s likely just too narrow a definition of success. Reorgs are
violent. They’re done to people against their will and by force. They have all sorts
of toxic cultural repercussions. This, at its heart, is my deepest objection to how
116
Chapter 9: Kill the Restructure
reorganisation is done. What are the medium term consequences for psycholog-
ical safety, for trust in management, and for real lasting change, when process
reengineering and organisational restructure is designed behind closed doors then
imposed by decree?
» As well there are ideas like servant manager, transformational leader, open
space, invitational leadership, promise theory, sustainability, and more...
They all aim for, as Jonathan Smart put it, “better value sooner, safer, happier.” We
simply call them the New Ways of Working and Managing.
Sound like an exaggeration? Let’s look at the actual European Renaissance, para-
phrased from Wikipedia (so it must be true):
117
Epic Failures in DevSecOps
» Overcoming our cognitive biases and defeating the myth of simple systems;
modelling how the world really works, not how we would like it to.
» Flipping the hierarchy, getting out of the way, bringing real work to the fore.
» It’s time we got more humanity back into work. People aren’t resources,
they’re colleagues.
118
Chapter 9: Kill the Restructure
» The world urgently needs higher productivity to weather the coming eco-
nomic storm.
Emergence
Organisational structure should emerge from the needs of the people, pulled by
them not pushed on them. It should happen in an agile way — constantly reflect-
ing, adjusting, and improving. Iterate, increment, experiment, explore. You can’t
make knowledge workers do anything. We have to stop doing change to people. If
they don’t want it, then that’s the problem to be addressed. You can’t force them to
believe something different, or even to act in a different way. You can force slaves,
manual workers, and clerical workers because you can see the output per person.
Taylorism.
And they sure won’t change in a matter of weeks or months. Humans aren’t
machines. We aren’t Human Resources to be manipulated and engineered. There is
a lot of deterministic thinking amongst consultants and thought leaders which we
find distasteful.
119
Epic Failures in DevSecOps
He has owned his own consulting company, Two Hills, since 2005. Before that, he
had technical, management, and solutions roles for twenty years in the software
industry.
Rob is a contributor to The DevOps Handbook, and to ITIL (2011 Service Strategy
book, and minor contributions to two upcoming ITIL4 books), and a lead author
of VeriSM. Rob was awarded the inaugural New Zealand IT Service Management
Champion award for 2010 by itSMFnz, and made a Life Member in 2017.
She has helped business and public sector organisations develop their change
management capabilities. Cherry applies the most practical skills and instruments
to optimise their change outcomes with a goal of arming leaders, practitioners,
and change agents. Lately she has been immersing herself in the IT industry, bring-
ing a different perspective to help Rob with transformations.
120
Conclusion &
Acknowledgments
Epic Failures in DevSecOps
Conclusion
“My, you sure love a good story, don’t cha!”
—Mrs. Lovett, Sweeney Todd
The stories in this book are based upon real-word experience. Each author brings
a unique perspective to either a massive failure, or how to set yourself up to have
smaller failures than expected. The purpose of this type of information exchange
is twofold.
First, we want you to know you are not alone. As the DevOps/DevSecOps commu-
nities continue to grow and share their experience, it gives us a chance to start to
recognize patterns and develop processes for managing those patterns of failure.
Second, we want you to participate in these exchanges, either by telling your
personal journey in a presentation at All Day DevOps (alldaydevops.com), or by
participating in your regional user groups, meeting your peers, and recognizing
the patterns that might be unique in your specific area.
As a follow up to this book, we’ve already started planning the next edition.
The thought is to supplement the stories with a regular online monthly
meeting to exchange stories with you and the Epic Failures authors. Justin
Miller has volunteered to help manage that initiative. Give him a call if you’re
interested in participating, or just want to talk about what you’re working on.
You can reach Justin at +1 (571) 336-2598, jmiller@sonatype.com.
If you enjoyed the book, we’d very much appreciate a 5-star review on Amazon. It
will encourage others to join in our community, and continue to “fail.” Failure is
progress. May you have great “progress” in your upcoming ventures.
Mark Miller
Founder and Editor, DevSecOps Days Press
Co-Founder, All Day DevOps
Senior Storyteller, Sonatype
122
Chapter 9: Kill the Restructure
Acknowledgments
This book is the work of eleven authors, but a lot went on behind the scenes to
make all the pieces work.
Thank you to the authors of Epic Failures, Volume 1 for being mentors and sup-
porters of this edition: Aubrey Stearns, DJ Schleen, Caroline Wong, Fabian Lim,
Chris Roberts, Chetan Conikee, Edwin Kwan, Stefan Streichsbier.
The support team who helped produce the book were indispensible for making sure
the quality and presentation of the content was top notch: Alexis Del Duke and Nikki
Mejer copy editors, Melissa Schmidt editing formatting and graphic design. A special
shout out to the 100+ community members who volunteered their time to proofread
the book and give invaluable feedback to the authors. Sincerely, thank you.
Finally, thank you to you, the reader, for contributing your valuable time to sup-
port the DevOps and DevSecOps Communities. We look forward to meeting you
in person at upcoming DevSecOps Days (devsecopsdays.com), and online for All
Day DevOps (alldaydevops.com).
You’re welcome to reach out to the authors for further discussion by following
them on Linkedin, Twitter and various community forums.
123
DOWNLOAD BOTH VOLUMES