RISK MANAGEMENT
FOR S O F W A R E PROJECTS
There is little to
instruct software project
managers on how to
handle risk in a way
that ensures the success
of contingency planning
and avoids crisis. This
sevenstep procedure
describes how to
identify risk factors,
calculate their
probability and effect on
a project, and plan for
and conduct risk
management.
zyxwvutsrqpo
RICHARD FAIRLEY
Software Engineering Management
Associates
IEEE SOFTWARE
zyxwvutsrqponm
zyxwv
zyxwv
zyxwvu
zyxwv
zyxw
M
any software projects fail to
deliver acceptable systems
within schedule and budget. Many of
these failures might have been avoided
had the project team properly assessed
and mitigated the risk factors, yet risk
management is seldom applied as an
explicit project-management activity.
One reason risk management is not
practiced is that very few guidelines
are available that offer a practical,
step-by-step approach to managing
risk. To address this deficiency, I have
created a seven-step process for risk
management that can be applied to all
types of software projects.
I base the process on several years
of work with numerous organizations
to identify and overcome risk factors
in software projects. My clients and I
have used a variety of risk-management techniques within the framework of the process. I describe one set
of techniques here, which incorporates
regression-based cost modeling, but
other techniques, such as decision theory, risk tables, and spiral process
models, are equally applicable.'
ELEMENTS OF RISK MANAGEMENT
T h e seven steps of my risk-management process are
1. Identifi risk factors. A risk is a
potential problem; a problem is a risk
that has materialized. Exactly when
t h e transformation takes place is
somewhat subjective. A schedule delay
zyxwvutsrqponm
07407459/94/$02 03 0 1994 IEEE
57
zyxwvutsrq
zyxwvutsrq
zyxwvutsrqp
zyxwvutsrqpo
zyxwv
zyxwvu
zyxwvut
zyxwvutsrq
+
of one week might not be cause for
concern, but ;I delay of one month
could have serious consequences. T h e
important thing is that all parties who
map be affected by a schedule delay
agree in advance o n t h e p o i n t a t
which a risk will become a problem.
That way, when the risk does become
a p r o b l e m , i t is m i t i g a t e d by t h e
planned corrective actions. In identifying a risk, you must take care to distinguish symptoms from underlying
risk factors. A potential schedule delay
tnay in fact be a symptom of difficult
technical issues o r i n a d e q u a t e
resources.
Whether you identify a situation as
a risk o r an opportunity depends on
your point of view. Is the glass half full
o r half empty? Situations with high
potential for failure often have the potential for high payback as well. Risk
management is not the same as risk
aversion. Competitive pressures and
the demands of modern society require that you take risks to be success-
ful.
2. Assess risk probabilities and effects
on the project. Because risk implies a
potential loss, you must estimate two
elements of a risk: the probability that
the risk will become a problem and
the effect the problem would have on
the project’s desired outcome. For
software projects, the desired outcome is an acceptable product deliv-
ered o n time and within b u d g e t .
Factors that influence product acceptability include delivered functionality,
performance, resource use, safety,
reliability, versatility, ease of learning,
ease of use, and ease of modification.
Depending on the situation, hilure
to meet one or more of these criteria
within the constraints of schedule and
budget can precipitate a crisis for the
developer, the customer, and/or the
user community. Thus, the primary
goal of risk management is to identify
and confront risk factors with enough
lead time to avoid a crisis.
T h e approach I describe here is to
assess the probability of a risk by computing probability distributions for
~~
REGRESSION-BASEDCOST MODELING
I‘ou tlevelop a reyrcssion-hased coq rnodel l)y collecting
data from past projects for I clationships of interest (like software size and required effort), deriving a repression eqwtion, and incorporating adclitional cost factors to explain
deviations of actual project co’its from the costs predicted by
the regression equation.
A commonly used approach to regression-based cost
inodeling is to derive a linear equation in the log-log domain
(log Effort, E, as a linear slope-intercept function of log Size,
S) that minimizes the residuals hemecn the equation and the
.
i
kq E (Effort)
10,000 1,000
-
10-
.
.
zyxwvutsrqponmlkji
zyxwvutsrqpon
. .
100
point^ fbr actu; irojcct5. ‘I.i.aii\firiiiinp t h e 11nc.11.
ccl11,i
tion, log E = log R + b lop .q, froin thc. lo~-lopd o i i i : i i i i t o r l ~ c
real tfoin in gives you :In esponenti;il rel;ition4iip (dthc toriii
R = a * . Figure A illustrate’;thk :ippro:ich, \I h u e I: i\ iiic:isuretf i n person-months and .Yis nieasuretl i n thouwntls of
lines of source code @LO(;).
As the figure shows, it is not ung+al to ohsenpc u.itlc
scatter in actual project (l,it:t, d 1 i c . h intlic;ites large variations
in the effort predicted I I ~1 1 1 ~ regiu\ion equation anti the
actual effort. Residual error is one nieasiire of the variations.
A large residual error nieans that factors in addition to s i x
exert a strong influence on required effort. I f size were a perfect predictor of effort, every data point in Figure A would lie
on the line of the equation, and the residual error would he
zero.
The next step in regression-based cost modeling is to identify the factors that cause variations between predicted and
actual effort. We might, by examining our past projects, deterinine that 80 percent of the variation in required effort for projecm of similar size and type can be e.xplained by variations in
stability of the requirements,familiarity of the development
dat.1
log E log a t b log S
7
+
remwith &e application domain, and involvement of
users dwring the development cycle. As ihstrated inTable A,
you cm assign weighting factors to these variables to model
their’*.
code size and complexity and use them
to determine the effect of limited target memory and execution time o n
overall project effort. I t h e n use
Monte Carlo simulation to compute
the distribution of estimated project
effort as a function of size, complexity,
timing, and memory, using regressionbased modeling.
This approach uses estimated effort
as the metric to assess the impact of
risk factors. Because effort is the primary cost factor for most software
projects, you can use it as a measure of
overall project cost, especially when
using loaded salaries (burdened with
facilities, computer time, and nianagetnent, for example).
zyxwvuts
zyxwvutsrqp
3. Develop swategies t o mitigate iderztzfied ?.irks.In general, a risk becomes a
problem when the value of a quantitative metric crosses a predetermined
threshold. For that reason, two essential parts of risk management are setting thresholds, beyond which some
corrective action is required, and
determining ahead of time what that
corrective action will be. \frithout
such planning, you quickly realize the
truth in the answer to Fred Brooks'
rhetorical question, "How does a project get to be a year late?" One day a t
a time.'
Risk mitigation involves two types
of strategies. Action planning addresses
risks that can be mitigated by irnmedi-
ate response. T o address the risk of
insufficient experience with a new
hardware architecture, for example,
t h e action plan could provide for
training the development team, hiring
experienced personnel, or finding a
consultant to work with the project
team. Of course, you should not spend
more on training or hiring than would
be paid back in increased productivity.
If you estimate that training and hiring
can increase productivity by 10 percent, for example, you should n o t
spend more than 10 percent of the
project's personnel budget in this
manner.
Coiztitzgenq~planning, on the other
hand, addresses risks that require mon-
zyxwvutsrqpon
zyxwvutsrqponm
zyxwvutsrqp
zyxwvutsrqpo
zyxwvutsrqponmlkjihgfed
1
zyxwvu
1iic(11111ii
; i ~ q i l i ( ~ i t i c~ ~i i \i ~ ~ ~ ~ i ~:iiiJ
i ~ ' Iii ii uc ~IIV'I~ , 1111 I I ~ I C I I ~ ~ ~ I I I
,111 I,'\ I . ( $ 1 1 . i o I .1 : I ,I) . I , :); Iii\\ I.CC~IIII-L'inclit\ v o l , i t i l i t \ , i i i e t l i t i i i i ;il)l)li(. i t i o n e\pc~-~cnc~'.
;incl hish
tiscr invoI\criient \\(iiiI(lrcvitt i i i ,111 l , , \ l ( of0.W (0.H I .O *
0.8). l'he foriiier situation \ t i i , It1 rcqixre F h iwrccnt mire
effort than the noinin;il estimate: uhilc the latter a.ould
require 36 percent less effort th;iii the n o t n i i i a l caw.
Using effort iriultipliers t o a d i u s t an estitriate implies that
factors not accounted for in the model (Io not change from
past projects t o the one being estimated. 1;or exarnplc, the
model presented in Figure A and 'Jable A does not iiieorporate factors such a s personnel capabilities o r stability o f the
development environment. If these factors should change,
the correspmding impacts (positive or negative) must be
incorporated into the estimate for a future project. Failure to
do so increases risk.
u.o111(1rc\ult in
Cost driver
Effort multiplier
1 .o
High
I 2
I 4
1.0
i r
I .3
1.0
0.s
low
~
I-! c qiI ireii ien t\ 1.1) I at iI it!,
\ppl 1<.;1 I II I l l I'\ / ) e1.1 e I1C.C'
User involveinent
II
s
Medium
Boehni illustrated, bp cuample. hot\ to construct :I I cyc.5sion-based cost model; hcrii "if 11 iilir' o l the model. 'I he
model does not work withimt Icc.iIihr.iLion to allow for differences in Boehm's environment and the enwronment of
interest, however. When organizations use the equations
and tables without doing so, the estimates may be seriously
skewed. Cocomo equutions and tables should not be used as
published without recnlibating the model in the local
environment.
t
zyxwvutsrq
h t t p k . Barry Goehm's Cocomo (Constructive Cost
Model) is perhaps the best known example of a regressionbased cost model. Cocomo is based on data from 63 projects,
collected by Boehm during the mid-to-late 1970s. He clustered the data into three groupings, which he called modes.
He then derived two linear equations for each mode in the
log-log domain; one equation for estimated effort as a function of software size and one for estimated development time
~tified15 cost drivers as
9t m the observed mria-
Automotion concerns. Several tools are available that automate
regression-based cost modeling. One of the best tool sets,
for versatility and ease of use, is from the %&tar Systems
Company of Amherst, New Hampshire. The Softstar tools
ihclude a tool (Calico)for entering local project data and
deriving regression equatiims tailored to the local environment. a tool (Dbedit) M edit the effort
i t o r i n g for s o m e f u t u r e response
should the need arise. T o mitigate the
risk of late delivery by a hardware
vendor, for example, the contingency
plan could provide for monitoring the
vendor’s progress and developing a
software emulator for the target machine.
Of course, the risk of late hardware delivery must justify the added
cost of preparing t h e contingency
plan, monitoring the situation, and
implementing the plan’s actions. If
the cost is justified, plan preparation
and vendor monitoring might be i n p 1e m e n t e d i I n m e d i a t e l y, b u t t h e
action to develop an emulator might
be postponed until the risk of late
delivery hecanie a problem (the vendor’s s c h e d u l e slipped beyond a
predetermined threshold). T h i s
brings up the issue of sufficient lead
time. W h e n do you start to develop
the emulator? T h e answer lies in analyzing the probability of late delivery.
As t h a t probability increases, t h e
urgency of developing the emulator
becomes greater.
4 . 1Vlonitor Yisk j ; l r t o n - . You must
monitor the values of risk nietrics,
taking care that the metrics data is
objective, timely, and accurate. If
rnetrics are based on subjective fact o r s , y o u r project will quickly be
reported as 90 percent complete and
remain there for many months. You
must avoid situations in which the
first 90 percent of the project takes
the first 90 percent of the schedule,
while the remaining 10 percent of the
project takes another 90 percent of
the schedule.
F. Invoke a contingenry plan. A cont i n g e n c y plan is invoked w h e n a
quantitative risk indicator crosses a
predetermined threshold. You may
find it difficult t o convince the affected parties that a serious problem
has developed, especially in the early
stages of a proiect. A typical response
is to plan on catching up during the
next reporting period, but most projects never catch up without the explicit, planned corrective actions of a
c o n t i n g e n c y plan. You m u s t also
specify the duration of each contingency plan t o avoid c o n t i n g e n t
actions of interminable duration. If
the team cannot solve the problem
within a specified period (typically
one to two weeks), they must invoke a
crisis-management plan.
6. M a n a g e the crisis. D e s p i t e a
team’s best efforts, the contingency
plan may fail, in which case the project enters crisis mode. T h e r e must
be some plan for seeing a project
through this phase, including allocating sufficient resources and specifying
a drop-dead date, at which time management must reevaluate the project
for more drastic corrective action
(possibly major redirection o r cancellation of the project).
7. R e c o z ~ ~ f i oamcyisis. After a crisis, certain actions are required, such
as rewarding personnel who have
worked in burnout mode for an extended period and reevaluating cost
and schedule in light of the drain on
resources from managing the crisis.
I illustrate these seven steps for a
project to implement a telecommunications protocol. T h e project, which is
actually a composite of several real
projects, gave me the opportunity t o
explore key risk-management issues,
such as the likelihood that an undesired
situation might occur, the resulting
effect of the risk situation, the cost of
mitigating t h e risk, t h e degree of
urgency in mitigation, and the lead
time required to avoid a crisis.
Riskidentifiiation. I used a regressionhased cost model to identify and assess
the impact of risk factors on estimated
project effort. T h e box on pp. 58-59
describes regression-based cost modeling in more detail, as well as some
tools for automating it. For the teleconi project, I used a regression-based
cost model for real-time telecommunications systems on microprocessors,
which I had developed for the client,
using historical data from similar projects.
T h e regression equation I derived
to relate effort to product size is
zyxwvu
zyxwvutsrqponm
zyxwvutsrqp
zyxwvutsrqpon
zyxwvutsrqpo
zyxwvutsrqponm
zyxwvutsrqpon
zyxwvu
1
L
.
-
60
..
~~
CASE STUDY
T h e project’s goal was t o implement a telecommunications protocol for a network gateway using a 10MHz microprocessor with a 2 5 6 Kbyte memory. T h e project had several constraints that challenged risk
management. T h e project team could
not enlarge the memory because the
processor was provided by the customer and its use was mandatory. T h e
maximum execution time for message
processing was 10 ms.
7 -
Effort
=
7.6
(Size)”-’
here EAF is the effort-adjustment
factor. EAF is the product of 15 cost
factors taken from Barry Boehm’s Cocomo model:’ Required software reliability (Rely), ratio of database size to
source-code size (Data), software complexity (Cplx), execution time constraint on the target machine (Time),
m e m o r y constraint o n t h e t a r g e t
machine (Stor), volatility of the development machine and software (Virt),
response time of the development
environment (Turn), analyst capability
(Acap), applications experience for the
development team (Aexp), programmer
capability (Pcap), team experience on
the development environment (Vexp),
team experience with the programming language (Lexp), use of modern
programming practices (Modp), use of
software tools (Tool), and required
development schedule (Sced).
Using these cost drivers as a checklist for the telecom project, I identified five risk factors and added one
(Size):
+ Cplx. Effect of algorithmic
complexity
+ Tzme. 10-ms timing constraint
+ Stol: 256K memory of the target
processor
+ Ve-vp. Lack of experience with the
target processor
+ Tool. Lack of adequate software
tools for the target processor
U
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
~.
..
.
-~
~
~~~~
EAF
.~
.~
M A Y 1994
zyxwvutsrqponm
zyxwvutsrqponmlkjihg
zyxwvutsrqpo
+ ,Yi:.c. L-nccrtain t! i n est i iiia ted
code s i x .
'l'hew sis f:ici:or5 ;ire interrel;ited:
If thc algorithins are complex, cotle
U h e r e F:-\E'
i 5 the product of S t o r ,
7'ime. ; i n d (:pis, and where Size and
Cpls are modeled hy the prol);il)ility
distri1)utions i n Figure I . Stor a n d
'l'iiiie :ire dependent o n Sile.
I determincd \.slues f o r Stor I)!
first randonil! selecting a due from
t h e i n ve r s e p 1-0 I) :i1) i I i ty tl i s t r i hii t i on
for Sire. I then u s e d ;I code-c\p;insion factor of I O (1)asetl on :i ratio o f
1 t o 4 for soiirce-to-ot)ject instruct i o n s ant1 1 t o -t for object instructions to ohject bytes), niultipliecl 1)).
Sire. nntl divided 1 ) ~ 2C6K
.
(the iiieiiior!- size) to get t h e percent;ige o f
inemor!- used. Th'it is,
zyxwvutsrqpo
zyxwvutsrq
zyxwvutsrq
zyxwvutsrqp
si/,e is likely t o incre:iw; if size increases, more rnc:mor! and csecution
tinie m i l l I)c requireti. \\'itti more e\perience on the target pi-ocessor ~ i r chitccture and \i-itli hetter softvxrc
tools, the teain might 1)ettt.r control
the code sile, isecution time, a n d
rnemoi?- requirements.
Probability and effects assessment. l e c o r d i n g t o e t itlence from siiiiil:ir
projects a n d s o m e an;il!.sis, 1 estiin;ited that thc size o f the telecom project'\ c o d e \voidcl l i e no leis than ()
KI,O(; a n d no iiiore th;in 1 3 KI,O(;%
~ i t thh e m o s t likel!. si/.e I)cing
apI~ro~iiiiateI!.1 0 KLO(;, ai I*.igure
1 ;I 5 h o n - i . l,.igyirc I l l is the prol);ihilit!--tlcnsit!. function for the prot)at)lc
e f f e c t of ;iI g o r i th 111 i c c011ip1 c s i t!
(Cpls) on proiect effort. Is the figure
s h o w s , I e.;tini;iteil the niost likel!.
iiiipct to I)e 1 3 , nith .I normal distriliiition of 1.0 1.01 .O. 'l'he fiinction
for (:pis i i i o t l e l s t h e imp;ict t h a t
u rice r t a i 11t !. i 11 t:i r g e t - iii ;I ch i 11e
experience ( L ~ c s p )ant1 lack o f tooli
('1'001) \ < i l l ha\-e o n the ability to
control the coniplexit! o f the p r o gram that implements the comiiiunication algy)rithrni. I used these pro1)at~ilit\--densit! fiinctioni t o deri1.e ;I
d i s t r i hii t i o 11 o t' p r o 1) ;I I) I e 1) r o j e c t
effort, a i the 1)o.c 011 11. 0 2 dcscritws.
'I'hus. the risk. f,ictors to l i e 11ioc1clcd a r e software sire. algorithmic
coniplerit!., ancl t h e meni(ir! ancl
eseciitioii-tiiiie CI )nstraint5 o f the target machine. - 1 ' 0 asscs4 the effect o f
rin c c r t 21 i n t y i 11 si z c , c o 111 11 I e .xi t !.,
exec i i t i o 1 1 ti 111e , 11(1 t h e ni e i i i o r!
constraint 011 the required effort, I
c o n s t r ti c t c' t l ;I 11 r o 1) a 1) i I i 5 ti c cost
m o d e l a n d used .'blontc ( h r l o siiniilation. 'T'he siiniiI;ition motlel is o f the
zyxwvutsrqponmlkjihgfedcbaZYXWVU
Memory used
i
I,esi t h m 50%
1 .oo
70%
1.06
1.21
1.56
X i'2,
I
Star
95%
~~~
Time used
Time
1 .oo
1 . w thm S O Y ,
70%
1.11
Xj'X,
1.30
95%
~~~
1.66
~~
~~~
~
zyxwvutsrqponmlk
zyxwvutsrqponmlk
form
IEEE SOFTWARE
61
T h e last two columns of Table 1
show how execution time affects project effort. T i m e , which is also dependent on Size, is modeled as
zyxwvutsrqp
zyxwvuts
zyxwvutsr
zyxwvut
zyxwvuts
zyxwvut
zyxwvut
zyxwvutsrqp
Percentage of time
[ ( 1 / 2 ) * (1/3)
SIZE)] / 10
This injegral is the probability that x will he in the range \i.to Z ; for example, the probability that Size m i l l be in the r.in?e of 10.000 to 12,000 lines o f
code is:
~ ( 1 0 5 s i z e < 1 2 ) =Pj ( X j dx
d
where p(x) is the probability-densiv functioniGFipw !a in the main text.
The inverse distribution funczion, P- l(s), provides yalues o f x that corre,pond to given values of P(x}. Inverse probabili~;-clis~ibuuon
functions are used
in hlonte Carlo simulation to compute values of x &at correspond to randomly
\elected prohability values, P(x).
In practice, you can calculate P-'(s) by table IooL~ipfor certain well-defined
prolnbility distributions ( Z tables fur normal distributions, for erample) o r by
,ampling techniques such as the Latin Hypercube sampling method.'
Monte Carlo simulation is a technique for modeling probabilistic situations
that are too complex to solve analytically. Probability distributions are specified
tor the input variables to the model. A random number generator is used to
select independent sample points from the inverse probability distributions for
each of the input variables. These sample values are used to compute one point
on the specified output dismbution(s). Repeating the process a few hundred to a
few thousand times producesa histogram that appro?dmatesthe resultant probabilitv distributions to any desired degree of accuracy.
&til recently, Mon&Carlo simulation was the province ofm&e+specidlists. lntrodu&on of X-based and Macintosh-based ShmhtiOn padtpges,b
made Monte Carlo
.
anyane who knowssmusticsand
P G . Tito toois tor Monte Carlo simuiation are @Risk from._
Palisade.Corp.
. .off
Los Angeles and Crystal Ball &om Decisioneering Corp. of Denver, both ot
which run in mnjunction with a spraadheet. For the telecom project drsetibed
to Ypecify'probabil
ert, r&&
selec
Percentage of m e m o r y =
100 * [16 * SIZE] / 256
(2)
For example, I determined that the
percentage of memory used is 93.75
when Size is 15 KLOC. Table 1 shows
62
values of Stor and T i m e taken from
C o ~ o m o In
. ~ the first two columns are
the values of Stor for various percentages of use. From the table, I interpolated that Stor is approximately 1.55
when the percentage of memory is
93.75.
* (4 *
= 100
*
(3)
where 1/2 is the average cycle time in
milliseconds for instruction processing
o n the target processor (five clock
ticks at 10 MHz); a third of the object
bytes are instructions executed by the
main timing loop (an assumption) and
the remainder are data cells and exception-handling code; and 4 * Size is
the expansion factor from source instructions to object instructions. I then
divide Time by 10 ms (the timing constraint) to determine the percentage of
time. T h e percentage of time is 100
when Size is 15 KLOC.
Although, as this analysis shows,
the timing constraint dominates the
memory constraint, I tracked both factors because the assumption used to
derive the percentage of time equation
(Time) was n o t certain and because
both S t o r and T i m e affect Droiect
become
effort. the
I n dominant
reality, m
factor.
e m o r y could
I
,
T~compute the probable effort for
t h e telecom project, I used M o n t e
carlo
and the Crystal Ball
simulation tool from Decisioneering
Corp., which randomly selected data
points from the inverse probabilitydistribution functions for Size and
Cplx and used the value of Size along
with Table 1 to determine values for
Time and Stor. T h e tool then used the
values of Size, Cplx, Time, and Stor in
the regression equation to compute a
point on the probability-density histogram for effort. T h e tool should
repeat this computation at least a few
hundred times to produce a reasonable
approximation of the probability-density function for estimated effort.
Figure 2 shows the probable effort
for the telecom project converted to
dollars, because effort was the primary driver of this project's cost. T h e
conversion factor was a loaded salary
M A Y 1994
zyxwvutsrqpon
of S10,000 per person month, loaded
meaning that indirect and overhead
costs are iIicluded. l ' h e right vertical
axis indicates the actual number of
times the tool computed a given cost.
T h e left vertical axis indicates the
probability of that cost occurring, as
computed liy the ratio o f the number
of occurrences t o total occurrences.
T h e summation of probabilities
up to any givcn dollar amount is the
prohahility that the project can \)e
completed for that atnoittit of money
o r less. 'I'able 7 presents s o ~ i i eestimated costs and associated probahilities. F o r example, it is 70 percent
p r o b a b l e t h a t t h e p r o j e c t can lie
completed for $600,000 o r less (60
person months of effort at $ l O , O O O
per person month). This cost might
involve scheduling six people for I O
months o r five people for 12 months.
As illustrated i n F i g u r e 2 a n d
Table 2 , low complexity ; i d a siiiall
product size, with associated sniall
values of T i m e arid Stor, would result
i n low cost. I f the product is large
a n d complex, t h e r e s u l t i n g c o s t
would lie high.
T h e nest issue to face is commitment t o a schedule and h d g e t . '1'0
distinguish estimates from commitments, I used the equation
Comrnitinent
=
Risk mitigation. B o e h n i r e c o m av ()i d an c e , t r a n s f e r , a n d
111e 11d s
acceptance as potential risk-mitigation strategies.' For the telecom project, avoidance techniques might lie
t o b u y m o r e m e m o r y o r a faster
processor o r to decline the project.
T r a 11s fer tech n i q i t e s i n i g h t i nc 1U d c
i 111p I e I 11e tit I ng t h e 1owest 1aye r s o f
the communications protocol in
hardware, placing t h e top levels of
the protocol on a network server, o r
suhcontracting the work t o specialists i n c o m m u n i c a t i o n s o f t w a r e .
Acce 11 tan ce tech n i q U es require that
all affected parties (customers, users,
In a n a g e r s , d e ve 1o p e rs) , p u b I i cl y
acknowledge t h e risk factors arid
a c c e p t t h e m . '['hey also involve
prep a ring a c t i o n , con t i n ge n cy, a n d
crisis - m a n a g e in e n t 11 I a t i s f o r t h e
identified risks.
Action plonning. T o mitigate the risks of
insufficient experience with the target
processor, the project manager might
pro\-ide training for the present staff
o r hire additional, mort' experienced
personnel a s consultants o r staff. T o
deal with the lack of adequate software tools, the manager might acquire
Iiiore effective tools and provide training. However, he o r she would have
to eraluate the risk caused by inadequate tools against the risk of iiisufficient knowledge of the replacement
too 1s.
I used Boehm's Cocomo cost drivers to determine investment strategi e s for t r a i n i n g , consu I ta n ts , and
tools. If training and consultants are
expected to lower the effort multiplier
for target-machine experience hy 10
pcrcent, six pcrcent of this could be
invested in training aiid consultants to
zyxwvutsrqp
zyxwvuts
zyxwvutsrqpo
zyxwvutsrq
zyxwvutsr
Percentile
~
~
cost
50th
70th
$i70E;
$600K
Xjth
$66 7 I(
95th
~~
I
$76?K
I%inate i
Contingency
T h a t is, the difference tietween estimate and commit~iientis the coiitingency reserve for the project. In this
case, the contingetic). reserve is for
dealing with the impact of uncertainty in source-code size anti coniplexity, and the resulting effects of timing
and meniory constraints on estimated
effort.
In one organization I work u'ith,
project tea ins a nct in a nagein en t ro utinely set their development schedules anti budgets a t 70 percent probability of succcss, but cornniit t o their
custoniers at 90 percent. 'I'he -70 perc e n t d i f f e r e n c e is a c o n t i n g e n c y
reserve for each project.
~~~~~
~~
IEEE
~
~~
~~
zyxwvutsrqponmlkji
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
~~~
~~~
SOFTWARE
~~~~
~~~~~~
~
~~~
63
produce a four percent savings in esti- takes intci consideration that the probable s o u r c e size t r u n c a t e s a t 1 5
mated project cost.
Another action plan is to investi- KLOC:, which the code expansion facgate the possibility of buying more tor of 16 dictates if the major timing
memory and/or a faster processor. loop is to execute in no more than 10
For the teleconi project, the existing 111s.
rThus, preparation of a contingency
processor and memory were provided
by the customer and thus required (as plan involves
+ Specifiing the t m t w e of' the potenin governnient-furnished equipment),
although buying your way out of po- rid pi-obff,m. For the telecom project
tential software problems with more this was the effect of memory size and
and better hardware is sometimes a execution time on project effort and
schedule.
feasible alternative.
+ Cot~si~h-iiig
ulternirtiz?e upproacbes.
This solution might also invo1x.e
buying some of thc software rather For the teleconi project, these inthan building it all. However, buying cluded building a prototype, using
commercial off-the-shelf software is niemory overlays, using a faster procnot without risk, especially if you are essor, buying more Inernory, or pursugoing t o incorporate it into a larger I ing incremental development and
system. 'The tjox on the facing page 1 monitoring the timing and executionI
time budgets. Another approach that
describes some of these risks.
T h e size and coniplexity of softw- is usually considered is to eliminate
are in the telecoin project were factors unessential (desirable but not vital) refor which no iniinediate actions were I quirements. However, there arc n o
apparent: the communication proto- unessential requirements in a commucols were specified, the team had to nications protocol.
+ Specifiizg con.mnints. For the teleuse the specified hardware and algorithms, and they could not prioritize coni project, these were a memory size
reuuireinents and eliminate those that of 256 Kbytes, an execution time of 10
were desirable, but not
Ins, and the mandatory
essential.
use o f t h e existing
processor and memory.
Contingency planning.
+ L4~fa<y/-ing
filtertintiaes. Building a protoContingency planning
involves p r e p a r i n g a
v p e would require that
contingency plan, a crithe team know how to
scale u n t i m i n g and
s is - ma t i a ge m en t plan ,
memory requiretients.
and a crisis-recovery
Using memory overlays
procedure.
Contingency plans address
would have incurred an
the risks not addressed
unacceptable penalty on
in the action plans. h
execution time. Lsing a
crisis-management plan
faster processor wasn't
is the b a c h p plan to be
possible because use of
t h e c u r r e n t processor
used if the contingency
plan fails t o solve a
was mandatory. Buying
problem within a specified time. A cri- more memory wasn't feasible because
sis-recovery procedure is invoked the processor's address space was limwhen the crisis is over, whether the 1 ited to 256 Kbytes.
+ Selectiizg an approach. 'Thus, only
outcome is positive or negative.
T h e contingency plan for the tele- the last alternative was viable: pursue
com project is concerned with con- incremental development and monitor
trolling the timing budget and mem- the allocated memory and timing budory use on the target processor. It gets. To do this, the team had to parti-
tion the design into a series of builds,
allocate memory and timing budgets
to each build, and track actual versus
budgeted amounts of time and mernor y for each dernonstrated build as the
product evolved. A contingency plan
was t o b e invoked when t h e performance index for actual versus budgeted m e m o r y o r execution time
exceeded a predetermined threshold.
In allocating the timing and memory budgets, t h e t e a m held back a
contingency reserve. According t o
equations Z and 3, a code size of 15
KLOC would result in 93.75 percent
use of memory and 100 percent use of
execution time. Backsolving equation
3 showed that developers needed to
limit the code size to 13.5 KLOC if
they u-ished to hold 10 percent of the
execution time in reserve.
T h e next s t e p was t o f o r m t h e
contingency plan, which involves specifving
+ Riskfitctor-s. In the telecom project,
these were the 10-Ins timing constraint
and the 256-Kbyte memory constraint.
+ Ttzcking methodr. For the telecom
project, these were weekly demonstrations of incremental builds and the
monitoring of the memory and execution-time budgets
+ Re.yoiisibLe paaies. For the telecom
project, two members of the project
teani were assigned to monitor performance indices and execute the contingency plan ifnecessary.
+ Tbre.sholdr. T h e conditions under
which the contingency plan would be
.
invoked. T h e threshold for the telecom
project was a performance index
greater than 1.1 for budgeted memory
or budgeted execution time.
+ Resource authorizations. T h e responsible parties in the telecom project
were to be allowed unlimited overtime
for two weeks to solve the memory
and/or execution-time problem.
+ Coiistr-clints. For the telecom project, the project manager specified that
recovery efforts were not to affect the
ongoing activities of other project personnel.
Two items in the contingency plan
zyxwvutsrqp
I
1
~
l
~
1
!
I
,
zyxwvut
zyxwvutsrq
zyxwvutsrq
zyxwvutsrqpon
zyxwvutsrqp
'
zyxwvutsrq
zyxwvutsrqp
1
'
I
~
~
1
~
I
l
I
~
I
~
1
I
1
WITH NO
SCIENTIFIC
BASIS FOR
SOFTVVARE
DESIGN, IT 1s
HARD TO
SCALE UP
SIMULATION
RESULTS.
I
1
~
~
~
1
l
~
~
'
1
I
'
I
1
i
~
'
~
i
I
l
~
l
are particularly important: the threshold for initiating the plan (10 percent
overrun) and the time limit allotted to
fix the problem (two weeks). Because
10 percent of the timing budget is to
be withheld, exceeding the performance index for memory o r time by
less than 10 percent would still yield
an acceptable system. A more conservative approach would have been to
set the threshold at five percent, while
retaining the same 10 percent contingency reserve.
zyxwvutsrqp
zyxwvutsrqpo
zyxwvutsrq
zyxwvutsr
zyxwv
zyxwvuts
zyxwvut
zyxwvu
zyxwvutsrq
Risk monitoring and contingency planning. T o compute the performance
indices specified in the contingency
plan, the responsible parties compared
the actual amount of resources used
(time o r memory) t o the budgeted
amount for each incremental build
using
+ N o sofirre rode. If you need to enhance the system, you may only hrtve t h K
object code. In most cases, vendors are understandably reluctant to pro\i(ic
source code. In the rare instances that they do, the code is usuall!. ( ( I d i t f k x l t [ ( I
understand that it is very difficult to modify correctly.
LP7ulorfailures or buyouts. What happens to your system if the ventit )r )es
out of business or is bought out? In some cases, purchasers of COI‘S h.i\ e IH:I,ICJ
vendors place the source code in escrow, to be available should the i t i ~ l ~ r ’ ~
business fail or be acquired by another company. Again. however, ha\ 1112 rtie
source code does not guarantee that anyone can understand it well enough t l J
inodify it.
ti): COTS, of course, merely t h d t 1. o u
(:
~.
or memoq required to implement the
current build, BA is the cumulative
amount of time or memory budgeted
for all builds up to and including the
inability, more than any other factor,
differentiates software engineering
Each weekly build adds functionality to the previous build, so the performance indices track overall growth
of time and memory use as the im-
com project: design partitioning, allocation of resource budgets, incremental development, monitoring of
budgeted vs. demonstrated values, and
track the timing and memory budgets
for an evolving software product.
Because software is not a physical
entity, there are no physical laws o r
mathematical theories t o guide the
development of engineering models
that will let us design software t o
’ and two other team members stopped
Crisis management. A crisis is a show- all other work to concentrate on the
stopper. All project effort and re- problem. T h e crisis team had access
sources must be dedicated to resolving 1 to all necessary resources, subject to
the situation. You can define some j the project manager’s approval.
+ Update stn~zisji-eqziently.T h e proelements of crisis management, such
as the responsible parties and drop- ’ ject team held daily 15-minute stand-
terms of traditional engineering parameters, also makes it impossible to
scale the results of prototyping and
simulation to a full-scale system. This
occurs.
T h e elements of crisis management are to
+ Announce
the pmblem. For the telecom project, a
crisis was said to occur if the contingency plan failed to resolve the over-
__
1
~
after the team had implemented half
the required functions, overrun the
memory budget by 12 percent, and
two weeks of contingency actions had
i
s i u r c e s t o solving t h e p r o b l e m ,
including flying in two additional tar-
~
IEEE SOFTWARE
65
1
I
I
zyxwvutszyxwvutsrqpo
rqponmlkjihgfedcbaZYXWVUTSRQPON
zyxwvutsrqpo
zyxwvu
zyxwvutsr
Proiect activitv
Degree of completeness
I
Design elements coded:
7 5 of 100 coded (75%)
I
20 of 100 integrated (20%)
I
Tested modules integrated
L_Requirements tested
1
Coding
1
I
4 of30tested 04%)
-
26
35
I
stages and compare budgeted to actual
memory and timing.
Had they taken a waterfall
approach, they would have designed
all the requirements before beginning
coding and completed all coding
before beginning acceptance testing.
T h e disadvantage of the waterfall
approach is that you don’t know if you
have an acceptable product until the
end of the project. T h e team would
have had to wait too long to find out if
the software would fit in available
memory and run within an acceptable
time - this risk was unacceptable.
Tables 3 and 4 show that the project was 90 percent complete with 17
percent of the estimated project effort
(design); 75 percent complete uith 26
percent of the effort (coding), and so
on. Therefore, the project was 56 percent complete at crisis recovery.
’
zyxwvutsrqponmlk
zyxwvutsr
10
Integration
cisis recovery. It is important to examine what went wrong, evaluate how the
budget and schedule have been affected,
and reward key crisis-management personnel.
As part of crisis-recovery,you should
+ Conduct a crisis postmortem. This
gives you the opportunity to fix any
systemic problems that may have precipitated the crisis and t o document
any lessons learned. For the telecom
project, the postmortem revealed that
the design was overly complex in a key
area and that a simpler design would
have yielded a smaller, faster program.
T h e root cause was the team’s overall
lack of experience in designing software for the target processor.
+ Calculate cost to cmplete the piyect.
It is important to know how the crisis
has affected the project’s budget and
schedule. To determine this, I used a
technique developed by Karen Pullen
of Mitre Corp.,’ which involves multiplying the expected percentage of total
effort for each type of work activity by
the actual percentage of completion for
each activity. This gave me the current
percentage of project completion.
Table 3 shows the status of the telecom project after the crisis. Table 4
summarizes t h e effort distribution
among activities for similar projects.
T h e information in Table 3 indicates
an incremental development process;
that is, each activity is progressing in
parallel with the others. This is consistent with the approach the telecom
project team took: Build the product in
ing around the clock, including catering meals and providing sleeping facilities on site.
+ Have project personnel operate in
burnout mode. T h e crisis team worked
as many hours as were humanly possible. A
4
1
1 other project personnel were
on 24-hour call to assist them until the
problem was solved.
+ Establish a drop-dead date. Efforts
to resolve the problem were n o t to
continue longer than 30 days. If the
problem was not solved by then, marketing and upper management would
reevaluate project feasibility. As it
turned out, the team resolved the crisis
before the 30-day deadline.
+ Clear o u t unessential p e n o m e l .
Management requested that all personnel n o t assigned t o the telecom
project continue with normal work
activities, as long as they did not interfere w-ith the crisis team’s work.
One of the most important steps in
crisis management is to set a dropdead date because no one can sustain
this kind of effort indefinitely. If the
timing problem had not been fixed in
30 days, management would have
stopped crisis mode and reconsidered
earlier approaches that had been rejected because of project constraints,
such as using a different processor or
subcontracting the work to telecommunication specialists. T h e y might
also have considered moving the upper
levels of the protocol t o a network
server, or even canceling the project
altogether.
Y O ( . 1 3 ) + 7 5 ( . 2 6 ) + 5 0 ( . 35 )
+ 2 0 ( . l O ) + l 4 ( . 1 2 ) = -56
zyxwvutsrqpon
~ _
. _
_ ___
~_
~
_
_ _
~ __
~ ~
_ .-. _
_
_
_
~ ___
~
-_____
__ ~ _~
~~
~
From project data, I knew that 36
person-months of effort had been expended when t h e crisis occurred.
Therefore, 28 person-months of effort
would be required t o complete the
project, assuming the tasks completed
were representative of the remaining
tasks. However, the remaining work
may be more or less difficult than the
work already done, so this assumption
must be checked for validity.
Also, I knew that the team had expended six calendar months of a 10month schedule, uith a current staffing
level of six people (36/6). Using six
people, find amiming that e f f m t to date
was wpwsentative $jit2ii-e effort and that
no fiither- n-ist.s would arise, &e project
could be completed in another five
months (28/6). This would result in an
overall development cycle of 11
months (6+5), plus the time spent on
preparing and executing contingency
plans and managing the crisis. In the
end, the 10-month project was completed in 12 months with 68 personmonths of effort. Referring to Figure 2
_ _ _ . ~ . _ _ _ _ _ ~ ~
~
M A Y 1994
“New! Object models
and e++,
side-by-side,
continuously up-to-date.”
zy
zyxwvutsrqpon
zyxwvutsrqpo
zyxwvutsrqponmlk
zyxwvutsr
zyxwvutsrqp
w
and Table 2 , we see that the project
was completed a t the 87th percentile
of probable effort.
+ Qdate plans, schedules, and work
assignments. Time and resources have
been expended on the contingency
plan and crisis management, so original project budget and schedule are
likely invalid. For the telecom project,
mangement added 12 person-months
to the budget ($120,000) and extended t h e project schedule by two
months. T h e contingency plan
remained in effect b u t was n o t
invoked again.
+ Compensate workers for extraordiizaiy eflirts. Bonuses and overtime pay
are appropriate forms of compensation. However, there is no substitute
for resting, regrouping, and recharging. This means time off. T h e amount
of time depends of the level of stress
encountered during the crisis. Project
managers should factor in that time
off when they replan project schedules
and assignments. Each member of the
telecom project’s crisis team was given
three days off to recover.
+ Formally recognize outstanding
hat if you could have your OONOOD
model and all of your C++ code continu.
ously up-to-date, all the time, throughout
your development effort?
Consider the possibilities.
In one window, you see an object model, with
automatic, semi-automatic, and manual
layout modes, plus complete view management. Side-by-side, in another window, you
see fully-parsed C++ code. You edit in one
window or the other. Press a key. Both
windows agree with each other. Together.
Or suppose that you are working on a project
with some existing code. (That’sno surprise;
who’d consider developing in C++ without
some off-the-shelf classes?)You read the code
in. Hit a button. And seconds later, you see
an object model, automatically laid out for
you, ready for you to study side-by side with
the C++ code itself. Together.
Or suppose you are building software with
other people (that’s no surprise either).You
collaborate with others and develop software
with a lot less hassle, because the fully
integrated configuration management
features help you keep it all...Together.
The name of this product? It’s earned the
name...
pel.foiniers and their families. This may
include formal letters of commendation, accelerated promotions, and lett e r s t o t h e families of those w h o
worked around the clock. Free dinners and weekend vacations are other
ideas. For the telecom project’s crisis
team, management provided letters of
appreciation and dinner certificates.
M a n y techniques can be used to
implement the seven steps of risk
management. I have illustrated one
approach. Others are certainly possible. Risk management is an ongoing
process continually iterated throughout the life of a project; some potential problems never materialize; others
materialize and are dealt with; new
risks are identified and mitigation
strategies are devised as necessary; and
some potential problems submerge,
only to resurface later. Following the
risk-management procedures illustrated here can increase the probability
that potential problems will be identified, confronted, and overcome before
they become crisis situations.
continuouslyupto-dute
object modeling and C++ programming
Key features. Continuously up-to-date object
+
REFERENCES
1.
2.
3.
4.
B. Boehm, Tutorial: Sofrvzre Risk ,2funagemnir;IEEE CS Press, Los Alamitos. Calif., 1980.
F. Brooks, The MytbicalMan-~2~~zth,
Addison-Wesley, Reading, Mass., 1975.
B. Boehni, Sofmirre Engzneering Esonomk, Prentice-Hall, Englewood Cliffs, N.J., 1981.
K. Pullen, “Uncertainty Analysis with Coconio,” Puoc. Cotomo L‘rm Group, Software Eng. Institute,
Pittsburgh, Pa., 1987.
Richard Fairley is the founder and principal associate of Software Engineering
Management Associates, Inc. IIe i\ also a distinguished visiting professor of so&
ware engineering at Drexel University and has mort: than 20 years experience as
university professor, lecturer, and consultant. H i s research interests are risk management, software systems engineering, project management, cost and schedule
estimation, project planning and control, and process improvement,
Fairley received a BS from the University of Missouri and an MS froin the
University of New Mexico, both in electrical engineering, and a PhD in computer
science froin the University of California at Los Angeles.
~
Address questions about this article to Fairley a t Software Enginrering ill:inagement Assoc., P O Box 728,
Woodland Park, CO 80866; fax (719) 687-6Wl.
___
IEEE SOFTWARE
__
modeling and C++ programming, side-byside, so you can work back-and-forth between
the two (and let the tool keep them in-sync).
Automatic, semi-automatic, and manual
layout of object models, so you can feed in
existing class libraries and quickly see a
meaningful object model.
Object modeling view management, including
view control over model elements, files, and
directories, essential for presenting meaningful subsets of a fully-detailed object model.
And much more, including configuration
management, documentation generation, and
SQL options.
Money-backguarantee. Purchase Together/C++ and try it out risk-free for 30
days. (We’rethat confident about Together/
C++. You see, Together/C++ has already
helped software developers deliver better
systems, with success stones in telecommunications, insurance, and natural resource
management.)
How to order. Order Togethe&++ by
purchase order, check, or credit card. To
order, or for more information, please call
1-800-00A-2-00P (1-800-662-2667,24 hours,
7 days a week). Or contact:
Object International, Inc.
Education - Tools - Consulting
8140 N. MoPac 4-200
Austin TX 78759 USA
1-512-795-0202- fax 795-0332
Outside of North America, contact:
Object Int’l Ltd.
Eduard-Pfeiffer-Str. 73
D-70192 Stuttgart, Germany
++49-711-225-740- fax ++49-711-299-1032
zyxwvu
C1994 Object Int’l. Inc All rights reserved
“Together” IS a trademark of Object Int‘l, Inc
IEEE594
.
-