Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Risk management for software projects

2000, IEEE Software

RISK MANAGEMENT FOR S O F W A R E PROJECTS There is little to instruct software project managers on how to handle risk in a way that ensures the success of contingency planning and avoids crisis. This sevenstep procedure describes how to identify risk factors, calculate their probability and effect on a project, and plan for and conduct risk management. zyxwvutsrqpo RICHARD FAIRLEY Software Engineering Management Associates IEEE SOFTWARE zyxwvutsrqponm zyxwv zyxwv zyxwvu zyxwv zyxw M any software projects fail to deliver acceptable systems within schedule and budget. Many of these failures might have been avoided had the project team properly assessed and mitigated the risk factors, yet risk management is seldom applied as an explicit project-management activity. One reason risk management is not practiced is that very few guidelines are available that offer a practical, step-by-step approach to managing risk. To address this deficiency, I have created a seven-step process for risk management that can be applied to all types of software projects. I base the process on several years of work with numerous organizations to identify and overcome risk factors in software projects. My clients and I have used a variety of risk-management techniques within the framework of the process. I describe one set of techniques here, which incorporates regression-based cost modeling, but other techniques, such as decision theory, risk tables, and spiral process models, are equally applicable.' ELEMENTS OF RISK MANAGEMENT T h e seven steps of my risk-management process are 1. Identifi risk factors. A risk is a potential problem; a problem is a risk that has materialized. Exactly when t h e transformation takes place is somewhat subjective. A schedule delay zyxwvutsrqponm 07407459/94/$02 03 0 1994 IEEE 57 zyxwvutsrq zyxwvutsrq zyxwvutsrqp zyxwvutsrqpo zyxwv zyxwvu zyxwvut zyxwvutsrq + of one week might not be cause for concern, but ;I delay of one month could have serious consequences. T h e important thing is that all parties who map be affected by a schedule delay agree in advance o n t h e p o i n t a t which a risk will become a problem. That way, when the risk does become a p r o b l e m , i t is m i t i g a t e d by t h e planned corrective actions. In identifying a risk, you must take care to distinguish symptoms from underlying risk factors. A potential schedule delay tnay in fact be a symptom of difficult technical issues o r i n a d e q u a t e resources. Whether you identify a situation as a risk o r an opportunity depends on your point of view. Is the glass half full o r half empty? Situations with high potential for failure often have the potential for high payback as well. Risk management is not the same as risk aversion. Competitive pressures and the demands of modern society require that you take risks to be success- ful. 2. Assess risk probabilities and effects on the project. Because risk implies a potential loss, you must estimate two elements of a risk: the probability that the risk will become a problem and the effect the problem would have on the project’s desired outcome. For software projects, the desired outcome is an acceptable product deliv- ered o n time and within b u d g e t . Factors that influence product acceptability include delivered functionality, performance, resource use, safety, reliability, versatility, ease of learning, ease of use, and ease of modification. Depending on the situation, hilure to meet one or more of these criteria within the constraints of schedule and budget can precipitate a crisis for the developer, the customer, and/or the user community. Thus, the primary goal of risk management is to identify and confront risk factors with enough lead time to avoid a crisis. T h e approach I describe here is to assess the probability of a risk by computing probability distributions for ~~ REGRESSION-BASEDCOST MODELING I‘ou tlevelop a reyrcssion-hased coq rnodel l)y collecting data from past projects for I clationships of interest (like software size and required effort), deriving a repression eqwtion, and incorporating adclitional cost factors to explain deviations of actual project co’its from the costs predicted by the regression equation. A commonly used approach to regression-based cost inodeling is to derive a linear equation in the log-log domain (log Effort, E, as a linear slope-intercept function of log Size, S) that minimizes the residuals hemecn the equation and the . i kq E (Effort) 10,000 1,000 - 10- . . zyxwvutsrqponmlkji zyxwvutsrqpon . . 100 point^ fbr actu; irojcct5. ‘I.i.aii\firiiiinp t h e 11nc.11. ccl11,i tion, log E = log R + b lop .q, froin thc. lo~-lopd o i i i : i i i i t o r l ~ c real tfoin in gives you :In esponenti;il rel;ition4iip (dthc toriii R = a * . Figure A illustrate’;thk :ippro:ich, \I h u e I: i\ iiic:isuretf i n person-months and .Yis nieasuretl i n thouwntls of lines of source code @LO(;). As the figure shows, it is not ung+al to ohsenpc u.itlc scatter in actual project (l,it:t, d 1 i c . h intlic;ites large variations in the effort predicted I I ~1 1 1 ~ regiu\ion equation anti the actual effort. Residual error is one nieasiire of the variations. A large residual error nieans that factors in addition to s i x exert a strong influence on required effort. I f size were a perfect predictor of effort, every data point in Figure A would lie on the line of the equation, and the residual error would he zero. The next step in regression-based cost modeling is to identify the factors that cause variations between predicted and actual effort. We might, by examining our past projects, deterinine that 80 percent of the variation in required effort for projecm of similar size and type can be e.xplained by variations in stability of the requirements,familiarity of the development dat.1 log E log a t b log S 7 + remwith &e application domain, and involvement of users dwring the development cycle. As ihstrated inTable A, you cm assign weighting factors to these variables to model their’*. code size and complexity and use them to determine the effect of limited target memory and execution time o n overall project effort. I t h e n use Monte Carlo simulation to compute the distribution of estimated project effort as a function of size, complexity, timing, and memory, using regressionbased modeling. This approach uses estimated effort as the metric to assess the impact of risk factors. Because effort is the primary cost factor for most software projects, you can use it as a measure of overall project cost, especially when using loaded salaries (burdened with facilities, computer time, and nianagetnent, for example). zyxwvuts zyxwvutsrqp 3. Develop swategies t o mitigate iderztzfied ?.irks.In general, a risk becomes a problem when the value of a quantitative metric crosses a predetermined threshold. For that reason, two essential parts of risk management are setting thresholds, beyond which some corrective action is required, and determining ahead of time what that corrective action will be. \frithout such planning, you quickly realize the truth in the answer to Fred Brooks' rhetorical question, "How does a project get to be a year late?" One day a t a time.' Risk mitigation involves two types of strategies. Action planning addresses risks that can be mitigated by irnmedi- ate response. T o address the risk of insufficient experience with a new hardware architecture, for example, t h e action plan could provide for training the development team, hiring experienced personnel, or finding a consultant to work with the project team. Of course, you should not spend more on training or hiring than would be paid back in increased productivity. If you estimate that training and hiring can increase productivity by 10 percent, for example, you should n o t spend more than 10 percent of the project's personnel budget in this manner. Coiztitzgenq~planning, on the other hand, addresses risks that require mon- zyxwvutsrqpon zyxwvutsrqponm zyxwvutsrqp zyxwvutsrqpo zyxwvutsrqponmlkjihgfed 1 zyxwvu 1iic(11111ii ; i ~ q i l i ( ~ i t i c~ ~i i \i ~ ~ ~ ~ i ~:iiiJ i ~ ' Iii ii uc ~IIV'I~ , 1111 I I ~ I C I I ~ ~ ~ I I I ,111 I,'\ I . ( $ 1 1 . i o I .1 : I ,I) . I , :); Iii\\ I.CC~IIII-L'inclit\ v o l , i t i l i t \ , i i i e t l i t i i i i ;il)l)li(. i t i o n e\pc~-~cnc~'. ;incl hish tiscr invoI\criient \\(iiiI(lrcvitt i i i ,111 l , , \ l ( of0.W (0.H I .O * 0.8). l'he foriiier situation \ t i i , It1 rcqixre F h iwrccnt mire effort than the noinin;il estimate: uhilc the latter a.ould require 36 percent less effort th;iii the n o t n i i i a l caw. Using effort iriultipliers t o a d i u s t an estitriate implies that factors not accounted for in the model (Io not change from past projects t o the one being estimated. 1;or exarnplc, the model presented in Figure A and 'Jable A does not iiieorporate factors such a s personnel capabilities o r stability o f the development environment. If these factors should change, the correspmding impacts (positive or negative) must be incorporated into the estimate for a future project. Failure to do so increases risk. u.o111(1rc\ult in Cost driver Effort multiplier 1 .o High I 2 I 4 1.0 i r I .3 1.0 0.s low ~ I-! c qiI ireii ien t\ 1.1) I at iI it!, \ppl 1<.;1 I II I l l I'\ / ) e1.1 e I1C.C' User involveinent II s Medium Boehni illustrated, bp cuample. hot\ to construct :I I cyc.5sion-based cost model; hcrii "if 11 iilir' o l the model. 'I he model does not work withimt Icc.iIihr.iLion to allow for differences in Boehm's environment and the enwronment of interest, however. When organizations use the equations and tables without doing so, the estimates may be seriously skewed. Cocomo equutions and tables should not be used as published without recnlibating the model in the local environment. t zyxwvutsrq h t t p k . Barry Goehm's Cocomo (Constructive Cost Model) is perhaps the best known example of a regressionbased cost model. Cocomo is based on data from 63 projects, collected by Boehm during the mid-to-late 1970s. He clustered the data into three groupings, which he called modes. He then derived two linear equations for each mode in the log-log domain; one equation for estimated effort as a function of software size and one for estimated development time ~tified15 cost drivers as 9t m the observed mria- Automotion concerns. Several tools are available that automate regression-based cost modeling. One of the best tool sets, for versatility and ease of use, is from the %&tar Systems Company of Amherst, New Hampshire. The Softstar tools ihclude a tool (Calico)for entering local project data and deriving regression equatiims tailored to the local environment. a tool (Dbedit) M edit the effort i t o r i n g for s o m e f u t u r e response should the need arise. T o mitigate the risk of late delivery by a hardware vendor, for example, the contingency plan could provide for monitoring the vendor’s progress and developing a software emulator for the target machine. Of course, the risk of late hardware delivery must justify the added cost of preparing t h e contingency plan, monitoring the situation, and implementing the plan’s actions. If the cost is justified, plan preparation and vendor monitoring might be i n p 1e m e n t e d i I n m e d i a t e l y, b u t t h e action to develop an emulator might be postponed until the risk of late delivery hecanie a problem (the vendor’s s c h e d u l e slipped beyond a predetermined threshold). T h i s brings up the issue of sufficient lead time. W h e n do you start to develop the emulator? T h e answer lies in analyzing the probability of late delivery. As t h a t probability increases, t h e urgency of developing the emulator becomes greater. 4 . 1Vlonitor Yisk j ; l r t o n - . You must monitor the values of risk nietrics, taking care that the metrics data is objective, timely, and accurate. If rnetrics are based on subjective fact o r s , y o u r project will quickly be reported as 90 percent complete and remain there for many months. You must avoid situations in which the first 90 percent of the project takes the first 90 percent of the schedule, while the remaining 10 percent of the project takes another 90 percent of the schedule. F. Invoke a contingenry plan. A cont i n g e n c y plan is invoked w h e n a quantitative risk indicator crosses a predetermined threshold. You may find it difficult t o convince the affected parties that a serious problem has developed, especially in the early stages of a proiect. A typical response is to plan on catching up during the next reporting period, but most projects never catch up without the explicit, planned corrective actions of a c o n t i n g e n c y plan. You m u s t also specify the duration of each contingency plan t o avoid c o n t i n g e n t actions of interminable duration. If the team cannot solve the problem within a specified period (typically one to two weeks), they must invoke a crisis-management plan. 6. M a n a g e the crisis. D e s p i t e a team’s best efforts, the contingency plan may fail, in which case the project enters crisis mode. T h e r e must be some plan for seeing a project through this phase, including allocating sufficient resources and specifying a drop-dead date, at which time management must reevaluate the project for more drastic corrective action (possibly major redirection o r cancellation of the project). 7. R e c o z ~ ~ f i oamcyisis. After a crisis, certain actions are required, such as rewarding personnel who have worked in burnout mode for an extended period and reevaluating cost and schedule in light of the drain on resources from managing the crisis. I illustrate these seven steps for a project to implement a telecommunications protocol. T h e project, which is actually a composite of several real projects, gave me the opportunity t o explore key risk-management issues, such as the likelihood that an undesired situation might occur, the resulting effect of the risk situation, the cost of mitigating t h e risk, t h e degree of urgency in mitigation, and the lead time required to avoid a crisis. Riskidentifiiation. I used a regressionhased cost model to identify and assess the impact of risk factors on estimated project effort. T h e box on pp. 58-59 describes regression-based cost modeling in more detail, as well as some tools for automating it. For the teleconi project, I used a regression-based cost model for real-time telecommunications systems on microprocessors, which I had developed for the client, using historical data from similar projects. T h e regression equation I derived to relate effort to product size is zyxwvu zyxwvutsrqponm zyxwvutsrqp zyxwvutsrqpon zyxwvutsrqpo zyxwvutsrqponm zyxwvutsrqpon zyxwvu 1 L . - 60 .. ~~ CASE STUDY T h e project’s goal was t o implement a telecommunications protocol for a network gateway using a 10MHz microprocessor with a 2 5 6 Kbyte memory. T h e project had several constraints that challenged risk management. T h e project team could not enlarge the memory because the processor was provided by the customer and its use was mandatory. T h e maximum execution time for message processing was 10 ms. 7 - Effort = 7.6 (Size)”-’ here EAF is the effort-adjustment factor. EAF is the product of 15 cost factors taken from Barry Boehm’s Cocomo model:’ Required software reliability (Rely), ratio of database size to source-code size (Data), software complexity (Cplx), execution time constraint on the target machine (Time), m e m o r y constraint o n t h e t a r g e t machine (Stor), volatility of the development machine and software (Virt), response time of the development environment (Turn), analyst capability (Acap), applications experience for the development team (Aexp), programmer capability (Pcap), team experience on the development environment (Vexp), team experience with the programming language (Lexp), use of modern programming practices (Modp), use of software tools (Tool), and required development schedule (Sced). Using these cost drivers as a checklist for the telecom project, I identified five risk factors and added one (Size): + Cplx. Effect of algorithmic complexity + Tzme. 10-ms timing constraint + Stol: 256K memory of the target processor + Ve-vp. Lack of experience with the target processor + Tool. Lack of adequate software tools for the target processor U zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA ~. .. . -~ ~ ~~~~ EAF .~ .~ M A Y 1994 zyxwvutsrqponm zyxwvutsrqponmlkjihg zyxwvutsrqpo + ,Yi:.c. L-nccrtain t! i n est i iiia ted code s i x . 'l'hew sis f:ici:or5 ;ire interrel;ited: If thc algorithins are complex, cotle U h e r e F:-\E' i 5 the product of S t o r , 7'ime. ; i n d (:pis, and where Size and Cpls are modeled hy the prol);il)ility distri1)utions i n Figure I . Stor a n d 'l'iiiie :ire dependent o n Sile. I determincd \.slues f o r Stor I)! first randonil! selecting a due from t h e i n ve r s e p 1-0 I) :i1) i I i ty tl i s t r i hii t i on for Sire. I then u s e d ;I code-c\p;insion factor of I O (1)asetl on :i ratio o f 1 t o 4 for soiirce-to-ot)ject instruct i o n s ant1 1 t o -t for object instructions to ohject bytes), niultipliecl 1)). Sire. nntl divided 1 ) ~ 2C6K . (the iiieiiior!- size) to get t h e percent;ige o f inemor!- used. Th'it is, zyxwvutsrqpo zyxwvutsrq zyxwvutsrq zyxwvutsrqp si/,e is likely t o incre:iw; if size increases, more rnc:mor! and csecution tinie m i l l I)c requireti. \\'itti more e\perience on the target pi-ocessor ~ i r chitccture and \i-itli hetter softvxrc tools, the teain might 1)ettt.r control the code sile, isecution time, a n d rnemoi?- requirements. Probability and effects assessment. l e c o r d i n g t o e t itlence from siiiiil:ir projects a n d s o m e an;il!.sis, 1 estiin;ited that thc size o f the telecom project'\ c o d e \voidcl l i e no leis than () KI,O(; a n d no iiiore th;in 1 3 KI,O(;% ~ i t thh e m o s t likel!. si/.e I)cing apI~ro~iiiiateI!.1 0 KLO(;, ai I*.igure 1 ;I 5 h o n - i . l,.igyirc I l l is the prol);ihilit!--tlcnsit!. function for the prot)at)lc e f f e c t of ;iI g o r i th 111 i c c011ip1 c s i t! (Cpls) on proiect effort. Is the figure s h o w s , I e.;tini;iteil the niost likel!. iiiipct to I)e 1 3 , nith .I normal distriliiition of 1.0 1.01 .O. 'l'he fiinction for (:pis i i i o t l e l s t h e imp;ict t h a t u rice r t a i 11t !. i 11 t:i r g e t - iii ;I ch i 11e experience ( L ~ c s p )ant1 lack o f tooli ('1'001) \ < i l l ha\-e o n the ability to control the coniplexit! o f the p r o gram that implements the comiiiunication algy)rithrni. I used these pro1)at~ilit\--densit! fiinctioni t o deri1.e ;I d i s t r i hii t i o 11 o t' p r o 1) ;I I) I e 1) r o j e c t effort, a i the 1)o.c 011 11. 0 2 dcscritws. 'I'hus. the risk. f,ictors to l i e 11ioc1clcd a r e software sire. algorithmic coniplerit!., ancl t h e meni(ir! ancl eseciitioii-tiiiie CI )nstraint5 o f the target machine. - 1 ' 0 asscs4 the effect o f rin c c r t 21 i n t y i 11 si z c , c o 111 11 I e .xi t !., exec i i t i o 1 1 ti 111e , 11(1 t h e ni e i i i o r! constraint 011 the required effort, I c o n s t r ti c t c' t l ;I 11 r o 1) a 1) i I i 5 ti c cost m o d e l a n d used .'blontc ( h r l o siiniilation. 'T'he siiniiI;ition motlel is o f the zyxwvutsrqponmlkjihgfedcbaZYXWVU Memory used i I,esi t h m 50% 1 .oo 70% 1.06 1.21 1.56 X i'2, I Star 95% ~~~ Time used Time 1 .oo 1 . w thm S O Y , 70% 1.11 Xj'X, 1.30 95% ~~~ 1.66 ~~ ~~~ ~ zyxwvutsrqponmlk zyxwvutsrqponmlk form IEEE SOFTWARE 61 T h e last two columns of Table 1 show how execution time affects project effort. T i m e , which is also dependent on Size, is modeled as zyxwvutsrqp zyxwvuts zyxwvutsr zyxwvut zyxwvuts zyxwvut zyxwvut zyxwvutsrqp Percentage of time [ ( 1 / 2 ) * (1/3) SIZE)] / 10 This injegral is the probability that x will he in the range \i.to Z ; for example, the probability that Size m i l l be in the r.in?e of 10.000 to 12,000 lines o f code is: ~ ( 1 0 5 s i z e < 1 2 ) =Pj ( X j dx d where p(x) is the probability-densiv functioniGFipw !a in the main text. The inverse distribution funczion, P- l(s), provides yalues o f x that corre,pond to given values of P(x}. Inverse probabili~;-clis~ibuuon functions are used in hlonte Carlo simulation to compute values of x &at correspond to randomly \elected prohability values, P(x). In practice, you can calculate P-'(s) by table IooL~ipfor certain well-defined prolnbility distributions ( Z tables fur normal distributions, for erample) o r by ,ampling techniques such as the Latin Hypercube sampling method.' Monte Carlo simulation is a technique for modeling probabilistic situations that are too complex to solve analytically. Probability distributions are specified tor the input variables to the model. A random number generator is used to select independent sample points from the inverse probability distributions for each of the input variables. These sample values are used to compute one point on the specified output dismbution(s). Repeating the process a few hundred to a few thousand times producesa histogram that appro?dmatesthe resultant probabilitv distributions to any desired degree of accuracy. &til recently, Mon&Carlo simulation was the province ofm&e+specidlists. lntrodu&on of X-based and Macintosh-based ShmhtiOn padtpges,b made Monte Carlo . anyane who knowssmusticsand P G . Tito toois tor Monte Carlo simuiation are @Risk from._ Palisade.Corp. . .off Los Angeles and Crystal Ball &om Decisioneering Corp. of Denver, both ot which run in mnjunction with a spraadheet. For the telecom project drsetibed to Ypecify'probabil ert, r&& selec Percentage of m e m o r y = 100 * [16 * SIZE] / 256 (2) For example, I determined that the percentage of memory used is 93.75 when Size is 15 KLOC. Table 1 shows 62 values of Stor and T i m e taken from C o ~ o m o In . ~ the first two columns are the values of Stor for various percentages of use. From the table, I interpolated that Stor is approximately 1.55 when the percentage of memory is 93.75. * (4 * = 100 * (3) where 1/2 is the average cycle time in milliseconds for instruction processing o n the target processor (five clock ticks at 10 MHz); a third of the object bytes are instructions executed by the main timing loop (an assumption) and the remainder are data cells and exception-handling code; and 4 * Size is the expansion factor from source instructions to object instructions. I then divide Time by 10 ms (the timing constraint) to determine the percentage of time. T h e percentage of time is 100 when Size is 15 KLOC. Although, as this analysis shows, the timing constraint dominates the memory constraint, I tracked both factors because the assumption used to derive the percentage of time equation (Time) was n o t certain and because both S t o r and T i m e affect Droiect become effort. the I n dominant reality, m factor. e m o r y could I , T~compute the probable effort for t h e telecom project, I used M o n t e carlo and the Crystal Ball simulation tool from Decisioneering Corp., which randomly selected data points from the inverse probabilitydistribution functions for Size and Cplx and used the value of Size along with Table 1 to determine values for Time and Stor. T h e tool then used the values of Size, Cplx, Time, and Stor in the regression equation to compute a point on the probability-density histogram for effort. T h e tool should repeat this computation at least a few hundred times to produce a reasonable approximation of the probability-density function for estimated effort. Figure 2 shows the probable effort for the telecom project converted to dollars, because effort was the primary driver of this project's cost. T h e conversion factor was a loaded salary M A Y 1994 zyxwvutsrqpon of S10,000 per person month, loaded meaning that indirect and overhead costs are iIicluded. l ' h e right vertical axis indicates the actual number of times the tool computed a given cost. T h e left vertical axis indicates the probability of that cost occurring, as computed liy the ratio o f the number of occurrences t o total occurrences. T h e summation of probabilities up to any givcn dollar amount is the prohahility that the project can \)e completed for that atnoittit of money o r less. 'I'able 7 presents s o ~ i i eestimated costs and associated probahilities. F o r example, it is 70 percent p r o b a b l e t h a t t h e p r o j e c t can lie completed for $600,000 o r less (60 person months of effort at $ l O , O O O per person month). This cost might involve scheduling six people for I O months o r five people for 12 months. As illustrated i n F i g u r e 2 a n d Table 2 , low complexity ; i d a siiiall product size, with associated sniall values of T i m e arid Stor, would result i n low cost. I f the product is large a n d complex, t h e r e s u l t i n g c o s t would lie high. T h e nest issue to face is commitment t o a schedule and h d g e t . '1'0 distinguish estimates from commitments, I used the equation Comrnitinent = Risk mitigation. B o e h n i r e c o m av ()i d an c e , t r a n s f e r , a n d 111e 11d s acceptance as potential risk-mitigation strategies.' For the telecom project, avoidance techniques might lie t o b u y m o r e m e m o r y o r a faster processor o r to decline the project. T r a 11s fer tech n i q i t e s i n i g h t i nc 1U d c i 111p I e I 11e tit I ng t h e 1owest 1aye r s o f the communications protocol in hardware, placing t h e top levels of the protocol on a network server, o r suhcontracting the work t o specialists i n c o m m u n i c a t i o n s o f t w a r e . Acce 11 tan ce tech n i q U es require that all affected parties (customers, users, In a n a g e r s , d e ve 1o p e rs) , p u b I i cl y acknowledge t h e risk factors arid a c c e p t t h e m . '['hey also involve prep a ring a c t i o n , con t i n ge n cy, a n d crisis - m a n a g e in e n t 11 I a t i s f o r t h e identified risks. Action plonning. T o mitigate the risks of insufficient experience with the target processor, the project manager might pro\-ide training for the present staff o r hire additional, mort' experienced personnel a s consultants o r staff. T o deal with the lack of adequate software tools, the manager might acquire Iiiore effective tools and provide training. However, he o r she would have to eraluate the risk caused by inadequate tools against the risk of iiisufficient knowledge of the replacement too 1s. I used Boehm's Cocomo cost drivers to determine investment strategi e s for t r a i n i n g , consu I ta n ts , and tools. If training and consultants are expected to lower the effort multiplier for target-machine experience hy 10 pcrcent, six pcrcent of this could be invested in training aiid consultants to zyxwvutsrqp zyxwvuts zyxwvutsrqpo zyxwvutsrq zyxwvutsr Percentile ~ ~ cost 50th 70th $i70E; $600K Xjth $66 7 I( 95th ~~ I $76?K I%inate i Contingency T h a t is, the difference tietween estimate and commit~iientis the coiitingency reserve for the project. In this case, the contingetic). reserve is for dealing with the impact of uncertainty in source-code size anti coniplexity, and the resulting effects of timing and meniory constraints on estimated effort. In one organization I work u'ith, project tea ins a nct in a nagein en t ro utinely set their development schedules anti budgets a t 70 percent probability of succcss, but cornniit t o their custoniers at 90 percent. 'I'he -70 perc e n t d i f f e r e n c e is a c o n t i n g e n c y reserve for each project. ~~~~~ ~~ IEEE ~ ~~ ~~ zyxwvutsrqponmlkji zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA ~~~ ~~~ SOFTWARE ~~~~ ~~~~~~ ~ ~~~ 63 produce a four percent savings in esti- takes intci consideration that the probable s o u r c e size t r u n c a t e s a t 1 5 mated project cost. Another action plan is to investi- KLOC:, which the code expansion facgate the possibility of buying more tor of 16 dictates if the major timing memory and/or a faster processor. loop is to execute in no more than 10 For the teleconi project, the existing 111s. rThus, preparation of a contingency processor and memory were provided by the customer and thus required (as plan involves + Specifiing the t m t w e of' the potenin governnient-furnished equipment), although buying your way out of po- rid pi-obff,m. For the telecom project tential software problems with more this was the effect of memory size and and better hardware is sometimes a execution time on project effort and schedule. feasible alternative. + Cot~si~h-iiig ulternirtiz?e upproacbes. This solution might also invo1x.e buying some of thc software rather For the teleconi project, these inthan building it all. However, buying cluded building a prototype, using commercial off-the-shelf software is niemory overlays, using a faster procnot without risk, especially if you are essor, buying more Inernory, or pursugoing t o incorporate it into a larger I ing incremental development and system. 'The tjox on the facing page 1 monitoring the timing and executionI time budgets. Another approach that describes some of these risks. T h e size and coniplexity of softw- is usually considered is to eliminate are in the telecoin project were factors unessential (desirable but not vital) refor which no iniinediate actions were I quirements. However, there arc n o apparent: the communication proto- unessential requirements in a commucols were specified, the team had to nications protocol. + Specifiizg con.mnints. For the teleuse the specified hardware and algorithms, and they could not prioritize coni project, these were a memory size reuuireinents and eliminate those that of 256 Kbytes, an execution time of 10 were desirable, but not Ins, and the mandatory essential. use o f t h e existing processor and memory. Contingency planning. + L4~fa<y/-ing filtertintiaes. Building a protoContingency planning involves p r e p a r i n g a v p e would require that contingency plan, a crithe team know how to scale u n t i m i n g and s is - ma t i a ge m en t plan , memory requiretients. and a crisis-recovery Using memory overlays procedure. Contingency plans address would have incurred an the risks not addressed unacceptable penalty on in the action plans. h execution time. Lsing a crisis-management plan faster processor wasn't is the b a c h p plan to be possible because use of t h e c u r r e n t processor used if the contingency plan fails t o solve a was mandatory. Buying problem within a specified time. A cri- more memory wasn't feasible because sis-recovery procedure is invoked the processor's address space was limwhen the crisis is over, whether the 1 ited to 256 Kbytes. + Selectiizg an approach. 'Thus, only outcome is positive or negative. T h e contingency plan for the tele- the last alternative was viable: pursue com project is concerned with con- incremental development and monitor trolling the timing budget and mem- the allocated memory and timing budory use on the target processor. It gets. To do this, the team had to parti- tion the design into a series of builds, allocate memory and timing budgets to each build, and track actual versus budgeted amounts of time and mernor y for each dernonstrated build as the product evolved. A contingency plan was t o b e invoked when t h e performance index for actual versus budgeted m e m o r y o r execution time exceeded a predetermined threshold. In allocating the timing and memory budgets, t h e t e a m held back a contingency reserve. According t o equations Z and 3, a code size of 15 KLOC would result in 93.75 percent use of memory and 100 percent use of execution time. Backsolving equation 3 showed that developers needed to limit the code size to 13.5 KLOC if they u-ished to hold 10 percent of the execution time in reserve. T h e next s t e p was t o f o r m t h e contingency plan, which involves specifving + Riskfitctor-s. In the telecom project, these were the 10-Ins timing constraint and the 256-Kbyte memory constraint. + Ttzcking methodr. For the telecom project, these were weekly demonstrations of incremental builds and the monitoring of the memory and execution-time budgets + Re.yoiisibLe paaies. For the telecom project, two members of the project teani were assigned to monitor performance indices and execute the contingency plan ifnecessary. + Tbre.sholdr. T h e conditions under which the contingency plan would be . invoked. T h e threshold for the telecom project was a performance index greater than 1.1 for budgeted memory or budgeted execution time. + Resource authorizations. T h e responsible parties in the telecom project were to be allowed unlimited overtime for two weeks to solve the memory and/or execution-time problem. + Coiistr-clints. For the telecom project, the project manager specified that recovery efforts were not to affect the ongoing activities of other project personnel. Two items in the contingency plan zyxwvutsrqp I 1 ~ l ~ 1 ! I , zyxwvut zyxwvutsrq zyxwvutsrq zyxwvutsrqpon zyxwvutsrqp ' zyxwvutsrq zyxwvutsrqp 1 ' I ~ ~ 1 ~ I l I ~ I ~ 1 I 1 WITH NO SCIENTIFIC BASIS FOR SOFTVVARE DESIGN, IT 1s HARD TO SCALE UP SIMULATION RESULTS. I 1 ~ ~ ~ 1 l ~ ~ ' 1 I ' I 1 i ~ ' ~ i I l ~ l are particularly important: the threshold for initiating the plan (10 percent overrun) and the time limit allotted to fix the problem (two weeks). Because 10 percent of the timing budget is to be withheld, exceeding the performance index for memory o r time by less than 10 percent would still yield an acceptable system. A more conservative approach would have been to set the threshold at five percent, while retaining the same 10 percent contingency reserve. zyxwvutsrqp zyxwvutsrqpo zyxwvutsrq zyxwvutsr zyxwv zyxwvuts zyxwvut zyxwvu zyxwvutsrq Risk monitoring and contingency planning. T o compute the performance indices specified in the contingency plan, the responsible parties compared the actual amount of resources used (time o r memory) t o the budgeted amount for each incremental build using + N o sofirre rode. If you need to enhance the system, you may only hrtve t h K object code. In most cases, vendors are understandably reluctant to pro\i(ic source code. In the rare instances that they do, the code is usuall!. ( ( I d i t f k x l t [ ( I understand that it is very difficult to modify correctly. LP7ulorfailures or buyouts. What happens to your system if the ventit )r )es out of business or is bought out? In some cases, purchasers of COI‘S h.i\ e IH:I,ICJ vendors place the source code in escrow, to be available should the i t i ~ l ~ r ’ ~ business fail or be acquired by another company. Again. however, ha\ 1112 rtie source code does not guarantee that anyone can understand it well enough t l J inodify it. ti): COTS, of course, merely t h d t 1. o u (: ~. or memoq required to implement the current build, BA is the cumulative amount of time or memory budgeted for all builds up to and including the inability, more than any other factor, differentiates software engineering Each weekly build adds functionality to the previous build, so the performance indices track overall growth of time and memory use as the im- com project: design partitioning, allocation of resource budgets, incremental development, monitoring of budgeted vs. demonstrated values, and track the timing and memory budgets for an evolving software product. Because software is not a physical entity, there are no physical laws o r mathematical theories t o guide the development of engineering models that will let us design software t o ’ and two other team members stopped Crisis management. A crisis is a show- all other work to concentrate on the stopper. All project effort and re- problem. T h e crisis team had access sources must be dedicated to resolving 1 to all necessary resources, subject to the situation. You can define some j the project manager’s approval. + Update stn~zisji-eqziently.T h e proelements of crisis management, such as the responsible parties and drop- ’ ject team held daily 15-minute stand- terms of traditional engineering parameters, also makes it impossible to scale the results of prototyping and simulation to a full-scale system. This occurs. T h e elements of crisis management are to + Announce the pmblem. For the telecom project, a crisis was said to occur if the contingency plan failed to resolve the over- __ 1 ~ after the team had implemented half the required functions, overrun the memory budget by 12 percent, and two weeks of contingency actions had i s i u r c e s t o solving t h e p r o b l e m , including flying in two additional tar- ~ IEEE SOFTWARE 65 1 I I zyxwvutszyxwvutsrqpo rqponmlkjihgfedcbaZYXWVUTSRQPON zyxwvutsrqpo zyxwvu zyxwvutsr Proiect activitv Degree of completeness I Design elements coded: 7 5 of 100 coded (75%) I 20 of 100 integrated (20%) I Tested modules integrated L_Requirements tested 1 Coding 1 I 4 of30tested 04%) - 26 35 I stages and compare budgeted to actual memory and timing. Had they taken a waterfall approach, they would have designed all the requirements before beginning coding and completed all coding before beginning acceptance testing. T h e disadvantage of the waterfall approach is that you don’t know if you have an acceptable product until the end of the project. T h e team would have had to wait too long to find out if the software would fit in available memory and run within an acceptable time - this risk was unacceptable. Tables 3 and 4 show that the project was 90 percent complete with 17 percent of the estimated project effort (design); 75 percent complete uith 26 percent of the effort (coding), and so on. Therefore, the project was 56 percent complete at crisis recovery. ’ zyxwvutsrqponmlk zyxwvutsr 10 Integration cisis recovery. It is important to examine what went wrong, evaluate how the budget and schedule have been affected, and reward key crisis-management personnel. As part of crisis-recovery,you should + Conduct a crisis postmortem. This gives you the opportunity to fix any systemic problems that may have precipitated the crisis and t o document any lessons learned. For the telecom project, the postmortem revealed that the design was overly complex in a key area and that a simpler design would have yielded a smaller, faster program. T h e root cause was the team’s overall lack of experience in designing software for the target processor. + Calculate cost to cmplete the piyect. It is important to know how the crisis has affected the project’s budget and schedule. To determine this, I used a technique developed by Karen Pullen of Mitre Corp.,’ which involves multiplying the expected percentage of total effort for each type of work activity by the actual percentage of completion for each activity. This gave me the current percentage of project completion. Table 3 shows the status of the telecom project after the crisis. Table 4 summarizes t h e effort distribution among activities for similar projects. T h e information in Table 3 indicates an incremental development process; that is, each activity is progressing in parallel with the others. This is consistent with the approach the telecom project team took: Build the product in ing around the clock, including catering meals and providing sleeping facilities on site. + Have project personnel operate in burnout mode. T h e crisis team worked as many hours as were humanly possible. A 4 1 1 other project personnel were on 24-hour call to assist them until the problem was solved. + Establish a drop-dead date. Efforts to resolve the problem were n o t to continue longer than 30 days. If the problem was not solved by then, marketing and upper management would reevaluate project feasibility. As it turned out, the team resolved the crisis before the 30-day deadline. + Clear o u t unessential p e n o m e l . Management requested that all personnel n o t assigned t o the telecom project continue with normal work activities, as long as they did not interfere w-ith the crisis team’s work. One of the most important steps in crisis management is to set a dropdead date because no one can sustain this kind of effort indefinitely. If the timing problem had not been fixed in 30 days, management would have stopped crisis mode and reconsidered earlier approaches that had been rejected because of project constraints, such as using a different processor or subcontracting the work to telecommunication specialists. T h e y might also have considered moving the upper levels of the protocol t o a network server, or even canceling the project altogether. Y O ( . 1 3 ) + 7 5 ( . 2 6 ) + 5 0 ( . 35 ) + 2 0 ( . l O ) + l 4 ( . 1 2 ) = -56 zyxwvutsrqpon ~ _ . _ _ ___ ~_ ~ _ _ _ ~ __ ~ ~ _ .-. _ _ _ _ ~ ___ ~ -_____ __ ~ _~ ~~ ~ From project data, I knew that 36 person-months of effort had been expended when t h e crisis occurred. Therefore, 28 person-months of effort would be required t o complete the project, assuming the tasks completed were representative of the remaining tasks. However, the remaining work may be more or less difficult than the work already done, so this assumption must be checked for validity. Also, I knew that the team had expended six calendar months of a 10month schedule, uith a current staffing level of six people (36/6). Using six people, find amiming that e f f m t to date was wpwsentative $jit2ii-e effort and that no fiither- n-ist.s would arise, &e project could be completed in another five months (28/6). This would result in an overall development cycle of 11 months (6+5), plus the time spent on preparing and executing contingency plans and managing the crisis. In the end, the 10-month project was completed in 12 months with 68 personmonths of effort. Referring to Figure 2 _ _ _ . ~ . _ _ _ _ _ ~ ~ ~ M A Y 1994 “New! Object models and e++, side-by-side, continuously up-to-date.” zy zyxwvutsrqpon zyxwvutsrqpo zyxwvutsrqponmlk zyxwvutsr zyxwvutsrqp w and Table 2 , we see that the project was completed a t the 87th percentile of probable effort. + Qdate plans, schedules, and work assignments. Time and resources have been expended on the contingency plan and crisis management, so original project budget and schedule are likely invalid. For the telecom project, mangement added 12 person-months to the budget ($120,000) and extended t h e project schedule by two months. T h e contingency plan remained in effect b u t was n o t invoked again. + Compensate workers for extraordiizaiy eflirts. Bonuses and overtime pay are appropriate forms of compensation. However, there is no substitute for resting, regrouping, and recharging. This means time off. T h e amount of time depends of the level of stress encountered during the crisis. Project managers should factor in that time off when they replan project schedules and assignments. Each member of the telecom project’s crisis team was given three days off to recover. + Formally recognize outstanding hat if you could have your OONOOD model and all of your C++ code continu. ously up-to-date, all the time, throughout your development effort? Consider the possibilities. In one window, you see an object model, with automatic, semi-automatic, and manual layout modes, plus complete view management. Side-by-side, in another window, you see fully-parsed C++ code. You edit in one window or the other. Press a key. Both windows agree with each other. Together. Or suppose that you are working on a project with some existing code. (That’sno surprise; who’d consider developing in C++ without some off-the-shelf classes?)You read the code in. Hit a button. And seconds later, you see an object model, automatically laid out for you, ready for you to study side-by side with the C++ code itself. Together. Or suppose you are building software with other people (that’s no surprise either).You collaborate with others and develop software with a lot less hassle, because the fully integrated configuration management features help you keep it all...Together. The name of this product? It’s earned the name... pel.foiniers and their families. This may include formal letters of commendation, accelerated promotions, and lett e r s t o t h e families of those w h o worked around the clock. Free dinners and weekend vacations are other ideas. For the telecom project’s crisis team, management provided letters of appreciation and dinner certificates. M a n y techniques can be used to implement the seven steps of risk management. I have illustrated one approach. Others are certainly possible. Risk management is an ongoing process continually iterated throughout the life of a project; some potential problems never materialize; others materialize and are dealt with; new risks are identified and mitigation strategies are devised as necessary; and some potential problems submerge, only to resurface later. Following the risk-management procedures illustrated here can increase the probability that potential problems will be identified, confronted, and overcome before they become crisis situations. continuouslyupto-dute object modeling and C++ programming Key features. Continuously up-to-date object + REFERENCES 1. 2. 3. 4. B. Boehm, Tutorial: Sofrvzre Risk ,2funagemnir;IEEE CS Press, Los Alamitos. Calif., 1980. F. Brooks, The MytbicalMan-~2~~zth, Addison-Wesley, Reading, Mass., 1975. B. Boehni, Sofmirre Engzneering Esonomk, Prentice-Hall, Englewood Cliffs, N.J., 1981. K. Pullen, “Uncertainty Analysis with Coconio,” Puoc. Cotomo L‘rm Group, Software Eng. Institute, Pittsburgh, Pa., 1987. Richard Fairley is the founder and principal associate of Software Engineering Management Associates, Inc. IIe i\ also a distinguished visiting professor of so& ware engineering at Drexel University and has mort: than 20 years experience as university professor, lecturer, and consultant. H i s research interests are risk management, software systems engineering, project management, cost and schedule estimation, project planning and control, and process improvement, Fairley received a BS from the University of Missouri and an MS froin the University of New Mexico, both in electrical engineering, and a PhD in computer science froin the University of California at Los Angeles. ~ Address questions about this article to Fairley a t Software Enginrering ill:inagement Assoc., P O Box 728, Woodland Park, CO 80866; fax (719) 687-6Wl. ___ IEEE SOFTWARE __ modeling and C++ programming, side-byside, so you can work back-and-forth between the two (and let the tool keep them in-sync). Automatic, semi-automatic, and manual layout of object models, so you can feed in existing class libraries and quickly see a meaningful object model. Object modeling view management, including view control over model elements, files, and directories, essential for presenting meaningful subsets of a fully-detailed object model. And much more, including configuration management, documentation generation, and SQL options. Money-backguarantee. Purchase Together/C++ and try it out risk-free for 30 days. (We’rethat confident about Together/ C++. You see, Together/C++ has already helped software developers deliver better systems, with success stones in telecommunications, insurance, and natural resource management.) How to order. Order Togethe&++ by purchase order, check, or credit card. To order, or for more information, please call 1-800-00A-2-00P (1-800-662-2667,24 hours, 7 days a week). Or contact: Object International, Inc. Education - Tools - Consulting 8140 N. MoPac 4-200 Austin TX 78759 USA 1-512-795-0202- fax 795-0332 Outside of North America, contact: Object Int’l Ltd. Eduard-Pfeiffer-Str. 73 D-70192 Stuttgart, Germany ++49-711-225-740- fax ++49-711-299-1032 zyxwvu C1994 Object Int’l. Inc All rights reserved “Together” IS a trademark of Object Int‘l, Inc IEEE594 . -