Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
3tructural Redocumentation: A Case Study zyxwv zyxw zy KENNY WONG, SCOTT R . TILLEY, HAusr A. MULLER, and D.STOREY,University of Victoria MARGARET-ANNE L zyxwv zyxw Prorammers have become part historiavz, part detective, and part clairvoyant. - Thomas A. Corbi e Most softulare documentation typically describes the program at the algorithm and data-structure level. For large legacy systems, understanding the system’s architecture is more important. The authors propose a method of reverse engineering through redommentation that promises to extend the usefil l$e of large systems. 46 egacv software systems require a different approach to software documentation than has traditionally been used. In understanding large, evolving software systems, structural redocumentation through reverse engineering plays a key role. Reconstructing and effectively redocumenting t h e design of existing software systems is even more difficult than initial design. Recognizing abstractions in real-world systems is as crucial as designing adequate abstractions for new ones. This is especially true for legacy software written 10 to Z S years ago, which is often in poor condition because of prolonged, sometimes dramatic U 07407459/94/504 1 1 - . 00 D 1994 IEEE even traumatic - maintenance. More than SO percent of softwareevolution work is devoted to program understanding, a task in which documentation traditionally serves an important role. Yet documentation needs differ significantly for software systems of vastly different scales, say 1,000 lines versus one million lines. M o s t software documentation is in-the-small, because i t typically describes the program a t the algorithm and data-structure level. For large legacy systems, understanding the structural aspects of the system’s architecture is more important than understanding any single algorithmic component. zyx zyxwvutsrqpo JANUARY 1995 STRUCTURAL ANALYSIS zyxwvutsrqponmlk zyxw Software engineers and technical managers responsible for maintaining such systems find program understanding especially problematic. The documentation that exists for these systems usually describes isolated parts but not the overall architecture. Moreover, the documentation is often scattered throughout the system and on different media. Thus maintenance personnel must explore the low-level source code and piece together disparate information to form high-level structural models. Manually creating just one such architectural document is always arduous. Creating the necessary documents to describe the architecture from multiple viewpoints is often impossible. Yet this is exactly the sort of in-thelarge documentation needed to expose the structure of large software systems. Software structure is the collection of artifacts used by software engineers when forming mental models of software systems. These artifacts include software components such as procedures, modules, interfaces and subsystems; dependencies among components such as client-supplier, inheritance, and control-flow; and attributes such as component type, interface size, and interconnection strength. A system’s structure is the organization and interaction of these artifacts.’ Reverse engineering reconstructs structural models by identifying the system’s current components, discovering their dependencies, and generating abstractions to manage complexity - providing insights that can then improve subsequent development, ease maintenance and reengineering, and aid project management. Structural redocumentation is reverse-engineering the architectural aspects of software. As a result, the overall gestalt of the subject system can he derived, and some of its architectural design information can be recaptured. In addition, structural redocumentation does not involve physically restructuring the code, although this might he a desir- able outcome of a subsequent reengineering phase. At the University of Victoria, we have developed the Rigi environment, which focuses on the architectural aspects of program understanding. It supports a method for identifying, building, and documentinE lavered subsvstem hierarilues. Criical to Rigi’s usability is its ability to store and retrieve views - snapshots of reverse engineering states. These views are used to transfer information about the abstractions to software engineers. The box on page 48 contains a detailed description of Rigi’s features. Early experience has shown we can produce views that are compatible with the mental models used by the subject software’s maintainers. These maintainers benefited from the documentation produced by the Rigi system because the views + made concrete the logical software structure previously held only in their minds; + highlighted critical areas of the software structure that needed more attention, such as central components that have many incident dependencies; + provided an objective basis for discussion and software maintenance because the views are based on the actual source code, unlike system documentation that often becomes out-ofdate as the source code evolves; and + verified that the system’s software structure was a t least understandable to an experienced analyst from the outside. The reverse-engineering approach developed under the Rigi project has been successfully applied to several real-world software systems. These include a physician’s patient-record information system written in Cobol, a particle-accelerator control program written in C, and numerous Unix utilities. However, before the case study II reported here, the largest program analyzed using the Rigi methodology was about 120,000 lines of code. While such a program is reasonably large in an academic setting, it is not exceptional for commercial legacy software. Not until we undertook the challenge of redocumenting SQLD S did we begin - to validate our approach. IN-TH E- LARG E DOCUMENTATION DESCRIBES LEGACY ARCHITECTURES FROM MULTIPLE VIEWPOINTS. REDOCUMENTATION PROCESS zyxwvutsrqp Rigi consists of + Rigireverse, a parsing system that supports common imperative programming languages such as C and Cobol and a parser for LaTex, to analyze documentation; + Rigismer, a repository to store the information extracted from the source code; + and Rigiedit, an interactive, window-oriented graph editor to manipulate program representations. In Rigi, the first phase of structural redocumentation is automatic and involves parsing the source code of the legacy system and storing the extracted artifacts in the repository. This produces a flat resource-flow graph of the software. Software maintainers can use this graph to represent the structural dependencies of interest, such as function calls and data accesses. T o manage the complexity, the second phase involves human patternrecognition skills and features language-independent subsystem-composition techniques to generate multiple, layered hierarchies for higher level abstractions.2 For example, the analyst can cluster functions into subsystems according to business rules, personnel assignments, or by accepted principles of software modularity, providing the multiple, alternative perspectives needed for maintaining the software. Subsystem composition is a recursive process of grouping building blocks such as data types, procedures, and other zyxwvutsr zy IEEE SOFTWARE 47 ____ zyxwvu zyx zyxwvutsrq zyxwvutsrqponm zyxwvutsrq zyxwvutsrqp zyxwvutsr components into composite subsystems, to help manage the complexity of understanding the software structure. The composition criterion depends on the application. For program understanding, the process is guided by partitioning the resource-flow graph according to established modularity principles such as low couphg and strong cohesion. Exact interfaces and modularity quality are used to evaluate the generated software hierarchies. Evolved programs often contain lengthy import lists that base-management system based on a research prototype developed by the IBM Almaden Research Center in the mid-1970s. It has undergone numerous revisions since the first release in 1982. Originally written in PL/I to run on IBM’s VM, SQL/DS is now over two million lines of PL/AS code and runs on several IBM operating systems, including VM and VSE. T h e LEGACY SYSTEM hardware base for all SQL/DS verT h e Structured Query Language/ 1 sions is the IBM Systed370 architecData System is a large, relational data- 1 ture, but the current release supports contain many more declarations than really necessary. A subsystem’s exact interface lists only the resources -such as functions and data types - actually provided, required, or used internally by the modules that comprise that subsystem. 1 THE RIG1 REVERSE=ENGINEERING TOOL We designed Rigi to reverse-engineer large legacy software. Our research has focused on Dynamic views. Rigi presents structural documentation using a collection of views. A view is a bundle of visual and textual frames that contains, among other things, resource-flow graphs, subsystem-hierarchy diagrams, projections, exact interfaces, and annotations. Aview is similar to a database view and is a dynamic snapshot that reflects the current reverse-engineering state. Because views are ultimately based on the underlying source code, they remain up-to-date. Flexibility. Because program understanding involves many facets and applications, it is wise to make the approach as flexible as possible for use in many domains. Most reverse-engineering tools provide a fixed set of extraction, selection, filtering, organization, docu- 48 mentation, and representation techniques. W e provide a scripting language that lets analvsts customize, combine, and automate these activities in novel ways. For example, analysts have used this language to express or access additional software metrics, clustering strategies, and graph-layout algorithms. Human input. There is a trade-off in programunderstanding environments between what can be automated and what should (or must) be left to humans. The best solution lies in a combination of the two. The Rigi approach relies heavily on the experience of the analyst using it, who makes all the important decisions. For example, as the software engineer forms subsystems on the basis of various high-level criteria, the Rigi system can offer selection and search algorithms on the basis of aspects such as graph connectivity and component and dependen- cy type and provide statistics, such as exact interfaces, between subsystems and graph-quality metrics. Nevertheless, the process is synergistic because the analyst also learns and discovers interesting relationships by exploring software systems using the environment. We advocate a hands-on approach to reverse engineering to help transfer the constructed abstractions into the minds of the software engineers. Multiple views. Because the user is in charge, the subsystem-composition process can be based on diverse criteria, such as business rules, tax laws, requirements, and other semantic information. These alternative, independent decompositions may exist simultaneously under the structural representation supported by Rigi. Views can accurately capture coexisting architectural decompositions, providing many perspectives for later inspection. In effect, multiple, virtual representations of the software’s architecture can be created, manipulated, and saved. Scalability. T o deal effectively with legacy software, we must train programunderstanding tools and methods on large, multimillion-line source codes. Techniques that work on toy projects often do not scale up. Our current scalability objective is to analyze systems of up to five million lines of code. Customizable User Interface. Providing a single interface to a general toolkit poses the persistent problem, “What is a good user interface for this application?” Our solution was to let the tool user, rather than the tool builder, provide the answer. Analysts can customize Rigi’s user interface to suit their own personal taste and needs, while still working within the common look-and-feel imposed by the window manager. zyxw JANUARY 1995 several IBM-only operating-system platforms, including VWXA SP and VSEISP. PL/AS is a proprietary IBM systemsprogramming language that is PL/Ilike and allows embedded Systed370 assembler. Simultaneous support of S Q L D S for multiple releases on multiple operating systems requires multipath code maintenance, increasing the difficulty for its maintainers. SQL/DS consists of about 1,300 compilation units, roughly split into three large systems and several smaller ones. Because of its size and complex evolution, n o individual alone can comprehend the entire program. Developers are forced to specialize in a particular component, even though the various components interact. Existing program documentation is also a problem; there is too much to maintain and keep current with the source code and too much to read and digest. SQL/DS is a typical legacy software system: successful, mature, and supporting a large customer base as it is adapted to new environments and grows in functionality. Our main goals in applying Rigi to this system were + t o ease software maintenance by providing up-to-date, high-level perspectives of the operational system structure, + to transfer this information effectively into the minds of the maintainers, + to test the applicability of Rigi to industrial legacy systems, and + to promote the use of programunderstanding technology within IBM. inique challenges posed by SQLIDS: its proprietary implementation language, its size and complexity, and its ipplication domain. Our initial analysis of the SQL/DS :ode exposed some shortcomings in Rigi. The sheer amount of code compelled us to change all three components of the environment, especially the Rigiedit graph editor, which was augmented with a scripting language. W e also added a PL/AS language subsystem to Rigireverse and increased the memory limit of Rigiserver. The Rigireverse parsing system, once it was augmented with a PL/ AS interface, successfully processed the entire source code in an incremental manner. The Rigiserver repository wag able to handle the large graph structures produced by the parser. Deciding what to include and what to ignore is an art. When we decided to ignore intraprocedural details, we reduced the repository size for a multimillion-line S Q L D S program significantly, which made a big difference when loading the data into the Rigiedit graph editor and manipulating the graph interactively. For example, the generated database for SQL/DS is under two megabytes. HUMAN COGNlTlVE ABILITIES ARE STILL Proprietary language. MORE POWERFUL After deciding what inTHAN HARDWIRED formation to extract from the source code, ALGORITHMS, SO we added a new parsing to Rigireverse WE DID NOT USE A subsystem to handle PL/AS. ParsFULLY AUTOMATED ing PL/AS is problematic because the lanREVERSE-ENGINEERlNG guage is context-sensitive. However, i t was ENVlRONMENT. not necessary for us to zyxwvut zyxwvutsrqp zyxwvutsrqponm zyxwvutsrqponmlkji zyxwvuts zyxwvuts Extending Rigi. Because SQL/DS is written in a proprietary language, most commercial off-the-shelf analysis tools are unsuitable. This presented us with an enticing and rare opportunity to exercise our approach on a classic industrial legacy system and provided an excellent test of whether the Rigi tool and method would scale up. Before we could perform the analysis, however, we had to enhance some aspects of our tool to respond to the IEEE SOFTWARE Managing scale. T h e Rigireverse parsing system comprises several subsystems, one for each supported language. Users can specify which artifacts to extract and in what level of detail. For example, an option selects if the parser should extract calls to system routines. T h e ability to pinpoint specific software subsets helps us limit the flood of information and irrelevant de& that would have emerged had we tried to extract everything completely. Storing entire abstract syntax trees for such a large system would require several hundred megabytes of storage. While this level of detail may be necessary for tasks such as control-flow analyses or code optimization, it is not necessary for understanding and redocumenting the software architecture. For program understanding, it is important to build abstractions that emphasize important themes and suppress irrelevant details. parse the entire language completely. Because our interest was in the high-level architecture, we focused on extracting only external procedure definitions and calls, even though PL/AS supports nested procedures. For our initial experiments, we extracted neither intramodule procedure calls nor procedure calls to library routines or built-in functions. T h e easiest way to add support for PL/AS within Rigi was to extract the relevant information from the SQLDS source code with a collection of Unix’s csh, awk, and sed scripts, translate this information to its skeletal representation in C, and feed the result into the existing C parser. In this way, Rigireverse was isolated from most of the changes. In addition, by extracting just a subset of the available information in the PL/AS source, we immediately reduced one of the problems of scale. For example, a 400,000-line subsystem of SQL/DS reduces to only 2,000 lines of C. Figure 1 shows a sample PL/AS code fragment and its C equivalent, for Rigi’s purposes. T h e automatic parsing of all 1,303 49 zyxwvutsrqponm zyxwvutsrqp zyxw zyxwvut EXAMPLE : PROC (FOO,BAR) OPTIONS (ID)( 'EXAMPLE @EJECT ASM; OMACINCL SYSLIB(MEMBER1); PLAS ' ) ,REENTRANT; plos2c / * File examp1e.c created by plas2c * / DCL VERSION FIXED(15) CONSTANT(7); / * from file example.plas on / * Mon Jan 24 13:08:14 PST 1994. CALL ARIXEDB(FETCH,RSIBASEP,RSIRSSRC); IF RDATRACI >= '1' THEN DO; CALL ARIXETR(SC0RDSCD); GEN CODE( DC "6715'); END ; void EXAMPLE0 ALLDONE : RETURN ; END EXAMPLE; 1 */ */ ( ARIXEDB ( ) ARIXETR ( ) ; ; zyxwvutsrqp zyxwvutsrqpon return; Figure 1. A PL/AS code fjagment and its C representation. Using Unix scripts to create skeletal representations in C of the original SQL/DS subsystemsgreatly simj fied Rigi's analysis of the sojhare. modules of SQL/DS into Rigi took about 2.75 hours and required roughly 10 Mbytes of virtual memory on an IBM RISC System/6000 M375. T h e resultant database uses 1.6 Mbytes of disk space. are still much more powerful and flexible than hardwired algorithms. In essence, the user should always be in control. However, many of the operations performed during the initial decomposition of the SQL/DS code were repetitive and could be automated. The user would still be in charge of accepting, rejecting, or modifying automatically generated subsystem decompositions, but decomposing the graph interactively with the editor could be made easier. These observations led us to introduce a scripting layer into the editor. Previously, the graph editor consisted of two tightly-coupled subsystems: the user interface and the editor core itself. All editing and selection operations were intermingled with operations for manipulating the user interface, such as window size and menu selection. W e separated the user interface from the graph editor and added a transparent intermediate layer to make the environment programmable. Instead of writing yet another command language, we used the Tool command l a n g ~ a g eTcl . ~ provides an extendable core language and was specifically written to be embedded into interactive windowing applications. It is application-independent and provides two complementary interfaces: a textual interface to users who issue Tcl commands and a proce- dural interface to the host application. In the new Rigiedit, the Tcl interpreter sits between the graphical user interface and the editor core. This integration process is more fully described el~ewhere.~.~ zyxwvutsr Editor interface. T h e graph editor, Rigiedit, is the heart of our reverse-engineering environment. As such, it is extremely important that Rigiedit be easy to use and interactive. One challenge in editing graphical representations for programs such as SQL/DS is managing visual complexity. W e changed the editor so that the screen is not redrawn every time a single graph operation is carried out. Graphs with more than 1,000 nodes and arcs required efficient refreshing to avoid degrading interactive response time. Thus we tuned the user interface to avoid unnecessary redrawing by + redesigning i t to let the user batch a sequence of operations without redrawing and + allowing the user to specify when to update the window. Domain knowledge. Program understanding takes place within the context of a specific application domain. Aspects of the domain that affect reverse engineering include artifact representation, application semantics, and environmental concerns. W e initially analyzed SQL/DS without using any domain-dependent knowledge. However, it quickly became clear that to use the extracted information effectively we had to take advantage of existing informal application-specific domain knowledge. Our reverse-engineering approach adapts to new application domains through scripting. This lets users write customized routines for common activities such as artifact extraction, graph presentation, and object search and selection. T h i s makes the system domain-retargetable. For the SQL/DS source code, we created a library of specific scripts to aid in our analysis. One such script is shown in Figure 2. It is used to construct an initial decomposition of the subsystems of SQL/DS on the basis of existing documentation and the current physical modularization. zyxwvutsrqponm Scripting layer. A more dramatic change was needed in one of the philosophies underlying our approach. W e have always felt that a semiautomatic reverse-engineering environment is better than a fully automatic one, because human cognitive abilities 50 JANUARY 1995 DEVELOPER FEEDBACK zyxwvutsrqponml views. T h e prefix Ari is the unique code for the SQL/DS product, hence all module names begin with these three letters. T h e fourth letter in each module’s name represents the physical subsystem to which the artifact belongs, the fifth letter represents fur- ther subsystem refinement, and so on. Rigi captured this information using a series of scripts and views, resulting in decompositions the developers readily recognized. T h e developers valued these views because they established a common zy zyxwv zyxwvutsrqponml zyxwvutsr T h e target audiences for our case study were the SQL/DS product-development and management teams. T h e study was guided, in part, by the need to produce results directly applicable to them. An increased emphasis on product quality mandated a different approach to software maintenance than had previously been used. For the individual subsystems, we proceeded to analyze their intermodule dependencies, summarizing them in a set of views depicting different architectural perspectives. W e then presented these views to the development teams in a series of carefully designed one-hour demonstrations. These demonstrations let us exlubit the structural views of the subsystems pertinent to a particular development group, let the audience interact with the software structures using the views as starting points, and let individual developers create new views on the fly to reflect and record specific domain knowledge. The case study involved three major cycles of experiments and corresponding meetings with the developersfor feedback Call graph. For our first experiment, we generated a view of the entire call graph without considering any SQL/ DS-specific domain knowledge. T h e result was not as encouraging as we would have liked. T h e developers did not recognize the abstractions we generated, making it difficult for them to give us constructive feedback. This reaffirmed our belief that successful reverse engineering must do more than manipulate system representations independent of their domain; the results must add value for its customers. Informal information and applicationspecific knowledge provided by existing documentation and expert developers are rich veins of data that should be mined whenever possible. global recursionlimit set recursionlimit 2 Create a subsystem decomposition of SQL/DS // based on product code naming convention proc decompose-sql i [prefix “ A R I ” 1 I build-subsystems $prefix 1 [ I I/ Mutually recursive procedures to build I/ the decomposition . . . proc build-subsystems i substring counter 1 global recursionlimit if i $counter > Srecursionlimit I i sms rcl-select-none return I set char set Zchar ‘‘” scan “A” “%c” char scan “2” “%c” Zchar “I’ while i $char <= $Zchar I i set string [format “$substring%c” $char1 create-subsystem “$string” $counter incr char I I proc create-subsystem [ name counter I rcl-select-none set numnodes [rcl-select-grep Snamel if { Snumnodes > 1 ) { set parentwindownun [rcl-win-get-id1 rcl-collapse rcl-node-rename $name set windownun [rcl-open-select1 build-subsystems $name [expr $counter + 11 zyxwvutsrqpon zyxwvuts zyxwvu sms rcl-select-none rcl-win-close $windownun rcl-win-select Sparentwindownun zyxwvutsrq ) 1 Naming conventions. O u r second experiment used product-naming conventions and existing physical modularizations to construct another set of IEEE SOFTWARE iigur-e 2: A domain-dependent suiptfir SQL/DS that constructs an initial tem decomposition on the basis of existing documentation and the mbsysten; rent physical modularization. 51 zyxwvu zyxwvutsrqpo zyxwv zyxwvutsr zyxwvutsrqp ground for further discussions and analysis. Moreover, the developers could use them to verify their system documentation as well as to suggest where our decomposition did not fully conform to their mental models. Figure 3 shows one view created using Rigi and displayed in Rigiedit. The view contains three windows, each with icons representing different components of the software system. Arcs connecting icons represent resourceflow relationships between components, such as a procedure call. T h e arcs are typed and give rise to a multilayered semantic network representing the many different dependencies within the system. The top window shows a portion of all SQL/DS low-level modules. For clarity, the arcs have been filtered from the top window. Each module is represented by a mod icon. T h e lower left window shows a MIU I high-level view of the major subsystems, each depicted by a y s icon. Thc arcs in dus window represent compos. ite dependencies between the majoi subsystems. T h e hierarchical subsystem structure within the highlightec Arix component in this window i: expanded, filtered, and presented ir the bottom-right overview window. Subsystem in detail. Our third experiment decomposed the Arix relationa data subsystem. This subsystem i: roughly one million lines and consists oi nine physical subsubsystems. With the help of an SQL/DS developer, wz decomposed Anx alternatively into fow logical subsubsystems according to thc distinct development teams responsiblc for maintaining them. In this viewpoint. our criterion for subsystem composition was personnel assignment to the software, as follows: zation. W e then further decomposed the Arixo path-selection-optimizer subsubsystem. The developer in charge of this subsubsystem had created her own diagrams of its structure, based on product-development logbooks and the mental model she had formed from her maintenance experience. A view was easily created using Rigi to portray this mental model. More important, an alternative view was created, based on the actual structure as reflected by the current source code. Figure 4 shows these two views of Arixo. T h e window on the left contains the maintainer’s view and the window on the right the newly constructed view. T h e maintainer’s view reflects a left-to-right ordering of the major product components according to functional design layers. T h e new view reflects a top-down control flow based on actual information extracted from the source code. This second view presented a somewhat different perspective that conflicted with existing architectural documentation in some respects. However, because it was constructed using automatically extracted information, it was a more accurate representation of the subsubsystem’s “operational” architecture. The maintainer was able to form a more accurate mental model using a combination of information gained from maintenance experience and information extracted from the code. The two perspectives can be unified in a single Rigi view. zyxwv zyxwvutsrqponmlkjih I I ’ i p r e 3: A view of SQL/DS and the Arix subsystem. The top window shows a portion of all SQL/DS low-level modules, the lower left window shows a high-level view of the major sub.ystems, and the highlighted Arix component appears expanded andfiltered in the lower right window. 52 + a runtime-access generator; + optimizer pass one and two; + path-selection optimizer; and + executive, interpreter, and authori- LESSONS LEARNED While our prepared views did not uncover each developer’s exact mental model of the system, the audience readily recognized the structures presented. There were two main reasons JANUARY 1995 FmW2lm*Z(lWR)-ARM0 for this. First, these developers knew their subsystems intimately. Second and more important, the views represented the correct level of abstraction. Most satisfying for us was when the developers used their individual knowledge to design additional views to reflect their personal mental models more closely. Each developer could emphasize components of particular interest and filter out irrelevant information. Could we have achieved the same result without tool support? Possibly, but only with much greater effort and much less functionality. While it is true that system-description diagrams have been used for years, recreating and updating them for legacy systems of this magnitude would be ponderous, if not impossible. Moreover, our semiautomatic approach lets several such documents be created and maintained simultaneously. These system-level documents are always up-to-date because they are based on the underlying source code. Script-based decompositions capture domain-specific knowledge and can be generated quickly. For example, it took two days to semiautomatically create a decomposition using Rigi, but only minutes to automatically produce one via a prepared script. Either method would be much faster and use the analyst's time and effort more effectively than would a manual process of reading program listings and consulting volumes of possibly out-ofdate system documentation. 0 ur analysis of the SQLDS source code has proven valuable to its developers and to our own research. It has shown that our methodology scales up to the million-lines-of-code range and that the views produced during structural redocumentation aid in understanding such legacy software. Furthermore, we have shown that we can provide businesses with valuable information about the architecture of their legacy systems. Experienced softI' IEEE SOFTWARE ARMOGC MMOTS ARMOCS AAMODF zyxwvutsrqponmlkjihg zyxwvutsrqponmlk zyxwv ARMOSS WMOW AAMOMS ARMOSG ARMOIS ARMOCT ~ M O S IMIX( ARMOFR AAMOOU ARMOOM MUOSs ARMORM AAMOYI ARMOML AAMOMA AAMOLC AMOGC AAMOCS ARDOSF AAWOZJ AAMOCY AAMOML ARMOSR AAMOMW d O 0 F ARMOSS AAKOTB AAIXOMA ARMOMS ARMOLC ARMOSG M M O F R M M O O M AAMOCT zyxwvu ARM001 AAMOSR AAMOTB AADOSI Mhos3 AAMOIS ARMORM ARMOCY ARM002 ARMOZJ AAKOMB ARdOYI AAMOMW zyxwvutsr ARMOSF IMOW ARMOVR AAD.0SI Tipre 4: Two views of the path-selection optimizer subsystem. The window on the left contains a view constrmctedfiom the maintainer's mental model. The window on the right " contains an alternative view based on the actual stmctzlre as reflected by the current source code. ware developers who are familiar with the system can design additional views in Rigi according to multiple, alternative perspectives. This produces charts and graphs that can ease software maintenance, thereby saving the company time and money, as well as extending the useful life of its legacy system. Further analysis of SQL/DS is underway. One of our research associates is using Reasoning Systems' Software Refinery to parse PL/AS and export more fine-grained relationships among procedures and variables. Another is processing the source-code listings produced by the compiler to extract additional relationships. Most of our work until now has focused on exploring control dependencies. However, the architecture of SQLDS is based on global data structures manipulated by many different software modules. While the developer looks at code that is more than 90 percent control logic, the compiler sees more than 90 percent as data-structure declarations, placed in shared % include files. It ulll be fruitful to investigate this aspect of the system. W e are currently designing and developing a more ambitious reverseengineering environment based on our eight years' experience with Rigi. This new environment involves three universities and IBM as an industrial partner, with collaboration the main theme. McGill University is extending the structural patterr-matching capabilities of Rigi to support syntactic, semantic, functional, and behavioral search patterns. T h e University of Toronto is building a more flexible repository for storing software artifacts, patternmatching rules, and software-engineering knowledge. The University of Victoria is making the Rigi system more extensible by enhancing the scripting language, improving the user interface, and providing a method for modeling the domain of discourse. Preliminary results indicate that an extensible but integrated toolkit is required to support the multifaceted analysis necessary to understand legacy software systems. zyxwvuts zyxwvutsr zyxwvuts zyxwvutsrqpo + 53 ACKNOWLEDGMENTS zyxwvutsr zyxwvuts 2. H. A. Muller et al., “A Reverse Engineering Approach to Subsystem Structure Identification,”J. Research and Practzce, Dec. 1993, pp.181-204. 3. J. K. Ousterhout, An Introduction to Tcland Tk, Addison-Wesley, Reading, Mass., 1994. 4. S. R. Tilley et al., “Domain-Retargetable Reverse Engineering,” Proc. 1993 Int’l Conf Soffware Maintenance, IEEE CS Press, Los Alamitos, Calif., Sept. 1993, pp. 142-151. 5. S. R. Tilley, et al., “ProgrammableReverse Engineering: Int’lj! Softmare Eng. and Knowledge Eng., Dec. 1994. W e thank Michael Whimey and Brian Corrie for their work on the Rigi system. Without their efforts, much of our analysis would not have been possible. W e thank the anonymous referees for their constructive comments. W e also thank the SQL/DS group members at IBM for their participation in the case study and the IBM Centre for Advanced Studies for inviting us to take part in this research. In particular, we thank Jacob Slonim, director of CAS, for his guidance and support. T h i s work was supported i n part by the British Columbia Advanced Systems Institute, the IBM Software Solutions Toronto L a b o r a t o r y C e n t r e f o r Advanced Studies, t h e I R I S Federal Centres of Excellence, the Natural Sciences and Engineering Research Council of Canada, the Science Council of British Columbia, and the University of Victoria. REFERENCES Kenny Wong is a PhD candidate in computer science at the University of Victoria. He has worked at the IBM Centre for Advanced Studies with the program understanding group. His research interests include program-understanding, runtime analysis,user interfaces, object-oriented programming, and software design. Wong received a BSc and MSc in computer science from the University of Victoria. He is a member of ACM, Usenix, and the Planetary Society. zyxw zyxwvutsr zyxwvutsrqponml 1. H. L. Ossher, “AMechanism for Specifymg the Structure of Large, Layered Systems,” in Research Directions in Object-Oriented Programming, B.D. Shriver and P. Wegner, eds., MIT Press, Cambridge, Mass., 1987, pp. 219-252. Scott R. Tilley is currently on leave from the IBM DISTANCE EDUCATION LEADING TO A MASTER OF SCIENCE IN SOFTWARE ENGINEERING OFFERED BY THE SCHOOL OF ENGINEERING AND APPLIED SCIENCE AT SOUTHERN METHODIST UNIVERSITY The program is offered via VIDEOTAPE nationwide as well as on the SMU campus I Software Soluuons Toronto Laboratory and is a PhD candidate in computer science at the University of Victoria He has authored a book on home compuung His research interests include end-user programming, hypertext, program understanding, reverse e n p e e r ing, and user interfaces Tilley received a B Comp.Sci in digtal systems from Concordia University, Montreal, and an MSc in comDuter science from the Universiw of Victoria He is a member of the IEEE and ACM.’ zyxwvutsr 0 The program is centered about the problems facing the working professional in the field Emphasis is on both the fundamental principles of sofhvare and system design, and the practical problems of commercialization Much of the subject material of the core is based on the curriculum recommended by the Software Engineering Institute The elective courses enable the student to focus on particular interests 0 ADMISSION REQUIREMENTS: BWBA in one of the sciences or engineering disciplines with a 3 014 0 GPA and one year experience in software development or maintenance For complete information contact Michael Kirkpatrick Southern Methodist University (214) 768-1452 FAX (214) 768-3845 e-mail rmk@seassmu edu Hausi A. Muller is an associate professor of computer science at the University of Victoria. He has worked as a software engineer for Brown, Boveri L? Cie in Switzerland and with the program-understanding group in the Toronto Laboratory a t the IBM Centre for Advanced Studies. His research interests include software engineering, software analysis, reverse engineering, reengineering, programming-in-the-large, software metrics, and computational geometry. Muller received a PhD in computer science from Rice University. He is a member of the editorial board of IEEE Transactions on Sofrware Engineering and was program cochair of the International Conference on Software Maintenance in 1994. Margaret-Anne D. Storey is a PbD student in computer science at Simon Frdser University and the University of Victoria Her research interests include software enpeering and user interfaces, focusing on software msualization. Storey received a BSc in computer science from the University of Victoria She is a member of the IEEE SMUzyxwvutsrqponmlkjihgfed zyxwvutsrqponmlkj Address questions about this article to Hausi Muller at the Department of Computer Science, University of Victoria, Box 3055, Victoria, BC, Canada V8W 3P6, hausiQcsr.uvic.ca. SMJ dos5 not (LIYnIIUnaLC on UIC harm of r a ~ cEUIM natlunal a eUuc w i g n sex a disahdlty Reader Service Number 6 JANUARY 1995 ‘1-