Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Recovering traceability links in software artifact management systems using information retrieval methods

Published: 01 September 2007 Publication History

Abstract

The main drawback of existing software artifact management systems is the lack of automatic or semi-automatic traceability link generation and maintenance. We have improved an artifact management system with a traceability recovery tool based on Latent Semantic Indexing (LSI), an information retrieval technique. We have assessed LSI to identify strengths and limitations of using information retrieval techniques for traceability recovery and devised the need for an incremental approach. The method and the tool have been evaluated during the development of seventeen software projects involving about 150 students. We observed that although tools based on information retrieval provide a useful support for the identification of traceability links during software development, they are still far to support a complete semi-automatic recovery of all links. The results of our experience have also shown that such tools can help to identify quality problems in the textual description of traced artifacts.

References

[1]
Alexander, I. 2002. Towards automatic traceability in industrial practice. In Proceedings of 1st International Workshop on Traceability in Emerging Forms of Software Engineering (Edinburgh, UK). 26--31.
[2]
Antoniol, G., Canfora, G., Casazza, G., and De Lucia, A. 2000a. Identifying the starting impact set of a maintenance request. In Proceedings of 4th European Conference on Software Maintenance and Reengineering (Zurich, Switzerland, Feb.). IEEE Computer Society Press, Los Alamitos, CA, 227--230.
[3]
Antoniol, G., Caprile, B., Potrich, A., and Tonella, P. 2000b. Design-code traceability for object oriented systems. Ann. Softw. Eng. 9, 35--58.
[4]
Antoniol, G., Casazza, G., and Cimitile, A. 2000c. Traceability recovery by modelling programmer behavior. In Proceedings of 7th Working Conference on Reverse Engineering (Brisbane, Queensland, Australia, Nov.). IEEE Computer Society Press, Los Alamitos, CA, 240--247.
[5]
Antoniol, G., Canfora, G., Casazza, G., De Lucia, A., and Merlo, E. 2002. Recovering traceability links between code and documentation. IEEE Trans. Softw. Eng. 28, 10, 970--983.
[6]
Arnold, S. P., and Stepoway, S. L. 1988. The reuse system: Cataloging and retrieval of reusable software. In Software Reuse: Emerging Technology, W. Tracz, Ed. IEEE Computer Society Press, Los Alamitos, CA, 138--141.
[7]
Aversano, L., De Lucia, A., Gaeta, M., and Ritrovato, P. 2003. GENESIS: A flexible and distributed environment for cooperative software engineering. In Proceedings of 15th International Conference on Software Engineering and Knowledge Engineering (San Francisco, CA, July). 497--502.
[8]
Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison-Wesley, Reading, MA.
[9]
Buckley. C. and Voorhees, M. 2004. Retrieval evaluation with incomplete information. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Sheffield, UK, July). ACM, New York, 25--32.
[10]
Biggerstaff, T. 1989. Design recovery for maintenance and reuse. IEEE Comput. 22, 7, 36--49.
[11]
Briand, L. C., Labiche, Y., and O'Sullivan, L. 2003. Impact analysis and change management of UML models. In Proceedings of 19th International Conference on Software Maintenance (Amsterdam, The Netherlands, Sept.). IEEE Computer Society Press, Los Alamitos, CA, 256--265.
[12]
Boldyreff, C., Nutter, D., and Rank, S. 2002. Active artifact management for distributed software engineering. In Proceedings of 26th IEEE Annual International Computer Software and Applications Conference (Oxford, England, UK, Aug.). IEEE Computer Society Press, Los Alamitos, CA, 1081--1086.
[13]
Burton, B. A., Aragon, R. W., Bailey, S. A., Koelher, K., and Mayes, L. A. 1987. The reusable software library. In Software Reuse: Emerging Technology, W. Tracz, Ed. IEEE Computer Society Press, Los Alamitos, CA, 129--137.
[14]
Caprile B. and Tonella, P. 1999. Nomen est omen: Analyzing the language of function identifiers. In Proceedings of 6th IEEE Working Conference on Reverse Engineering (Atlanta, GA, Oct.). IEEE Computer Society Press, Los Alamitos, CA, 112--122.
[15]
Chen J. Y. J. and Chou, S. C. 1999. Consistency management in a process environment, J. Syst. Softw. 47, 2--3, 105--110.
[16]
Cleland-Huang, J., Chang, C. K., and Christensen, M. 2003. Event-based traceability for managing evolutionary change. IEEE Trans. Softw. Eng. 29, 9, 796--810.
[17]
Cleland-Huang, J., Settimi, R., Duan, C., and Zou, X. 2005. Utilizing supporting evidence to improve dynamic requirements traceability. In Proceedings of International Requirements Engineering Conference (Paris, France, Aug.). IEEE Computer Society Press, Los Alamitos, CA, 135--144.
[18]
Conklin J. and Begeman, M. L. 1988. Gibis: A hypertext tool for exploratory policy discussion. ACM Trans. Office Inf. Syst. 6, 4, 303--331.
[19]
Cugola, G. 1998. Tolerating deviations in process support systems via flexible enactment of process models. IEEE Trans. Softw. Eng. 24, 11, 982--1001.
[20]
Cugola, G., Di Nitto, E., Fuggetta, A., and Ghezzi, C. 1996. A framework for formalizing inconsistencies in human-centered systems. ACM Trans. Softw. Eng. Meth. 5, 3, 191--230.
[21]
Cullum, J. K. and Willoughby, R. A. 1985. Lanczos Algorithms for Large Symmetric Eigenvalue Computations, vol. 1: Theory. Chapter 5: “Real rectangular matrices,” Brikhauser, Boston, MA.
[22]
Dag, J., Regnell, B., Carlshamre, P., Andersson, M., and Karlsson, J. 2002. A feasibility study of automated natural language requirements analysis in market-driven development. Require. Eng. 7, 1, 20--33.
[23]
De Lucia, A., Fasano, F., Francese, R., and Tortora, G. 2004a. ADAMS: An artifact-based process support system. In Proceedings of 16th International Conference on Software Engineering and Knowledge Engineering (Banff, Alberta, Canada, June). F. Maurer and G. Ruhe, Eds. 31--36.
[24]
De Lucia, A., Fasano, F., Francese, R., and Oliveto, R. 2004b. Recovering traceability links between requirement artifacts: A case study. In Proceedings of 16th International Conference of Software Engineering and Knowledge Engineering (Banff, Alberta, Canada, June). F. Maurer, and G. Ruhe, Eds. 453--466.
[25]
De Lucia, A., Fasano, F., Oliveto, R., and Tortora, G. 2004c. Enhancing an artifact management system with traceability recovery features. In Proceedings of 20th IEEE International Conference on Software Maintenance (Chicago, IL). IEEE Computer Society Press, Los Alamitos, CA, USA, 306--315.
[26]
De Lucia, A., Fasano, F., Oliveto, R., and Tortora, G. 2005a. ADAMS Re-trace: A traceability recovery tool. In Proceedings of 9th IEEE European Conference on Software Maintenance and Reengineering (Manchester, UK). IEEE Computer Society Press, Los Alamitos, CA, 32--41.
[27]
De Lucia, A., Fasano, F., Francese, R., and Oliveto, R. 2005b. Traceability management in ADAMS. In Proceedings of 1st International Workshop on Distributed Software Development (Paris, France). 135--149.
[28]
De Lucia, A., Fasano, F., Oliveto, R., and Tortora, G. 2005c. Recovering traceability links in software artifact management systems: Detailed experimental results, Technical Report, Software Engineering Lab, Department of Mathematics and Informatics, University of Salerno, Italy (available from http://www.sesa.dmi.unisa.it/tr/TR05_01.pdf).
[29]
De Lucia, A., Oliveto, R., and Sgueglia, P. 2006a. Incremental approach and user feedbacks: A silver bullet for traceability recovery?. In Proceedings of 22nd International Conference on Software Maintenance (Sheraton Society Hill, Philadelphia, PA). 299--309.
[30]
De Lucia, A., Oliveto, R., and Tortora, G. 2006b. Supporting traceability link recovery via information retrieval: A controlled experiment. Technical Report, Software Engineering Lab, Department of Mathematics and Informatics, University of Salerno, Italy, submitted for publication (available from http://www.sesa.dmi.unisa.it/tr/TR06_01.pdf).
[31]
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci. 41, 391--407.
[32]
Di Lucca, G. A., Di Penta, M., and Gradara, S. 2002. An approach to classify software maintenance requests. In Proceedings of the IEEE International Conference on Software Maintenance (Montréal, Qué., Canada). IEEE Computer Society Press, Los Alamitos, CA, 93--102.
[33]
Di Penta, M., Gradara, S., and Antoniol, G. 2002. Traceability recovery in RAD software systems. In Proceedings of the 10th IEEE International Workshop on Program Comprehension (Paris, France). IEEE Computer Society Press, Los Alamitos, CA, 207--216.
[34]
Domges, R. and Pohl, K. 1998. Adapting traceability environments to project specific needs. Commun. ACM 41, 12, 55--62.
[35]
Dumais, S. T. 1991. Improving the retrieval of information from external sources. Behav. Res. Meth. Instrum. Comput. 23, 229--236.
[36]
Dumais, S. T. 1992. LSI meets TREC: A status report. The First Text REtrieval Conference, NIST special publication 500-207, D. Harman, Ed. 137--152.
[37]
Egyed, A. and Grünbacher, P. 2002. Automating requirements traceability: Beyond the record and replay paradigm. In Proceedings of 17th IEEE International Conference on Automated Software Engineering (Edinburgh, UK, Sept.). IEEE Computer Society Press, Los Alamitos, CA, 163--171.
[38]
Finkelstein, A., Spanoudakis, G., and Till, D. 1996. Managing interference. In Joint Proceedings of the 2nd International Software Architecture Workshop and International Workshop on Multiple Perspectives in Software Development on SIGSOFT '96 workshops (San Francisco, CA). ACM, New York, 172--174.
[39]
Frakes, W. B. and Nejmeh, B. A. 1987. Software reuse through information retrieval. In Proceedings of 20th Hawaii International Conference on System Science (Kola, HI). IEEE Computer Society Press, Los Alamitos, CA, 530--535.
[40]
Gall, H., Hajek, K., and Jazayeri, M. 1998. Detection of logical coupling based on product release history. In Proceedings of IEEE International Conference on Software Maintenance (Bethesda, MD). IEEE Computer Society Press, Los Alamitos, CA, 190--198.
[41]
Gall, H., Jazayeri, M., and Krajewski, J. 2003. CVS release history data for detecting logical couplings. In Proceedings of the 6th International Workshop on Principles of Software Evolution, IEEE Computer Society Press, Los Alamitos, CA, 13--23.
[42]
Gotel, O. and Finkelstein, A. 1994. An analysis of the requirements traceability problem. In Proceedings of 1st International Conference on Requirements Engineering (Colorado Springs, CO). IEEE Computer Society Press, Los Alamitos, CA, 94--101.
[43]
Harman, D. 1992. Ranking algorithms. In Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs, NJ, 363--392.
[44]
Holagent Corporation, 2006. RDD-100, http://www.holagent.com/products/product1.html.
[45]
Huffman Hayes, J., Dekhtyar, A., and Osborne, J. 2003. Improving requirements tracing via information retrieval. In Proceedings of 11th IEEE International Requirements Engineering Conference (Monterey, CA). IEEE Computer Society Press, Los Alamitos, CA, 138--147.
[46]
Huffman Hayes, J., Dekhtyar, A., and Karthikeyan Sundaram, S. 2006. Advancing candidate link generation for requirements tracing: The study of methods. Trans. Softw. Eng. 32, 1, 4--19.
[47]
Leffingwell, D. 1997. Calculating your return on investment from more effective requirements management. Rational Software Corporation. (Available online from http://www.rational.com/products/whitepapers).
[48]
Lormans, M. and van Deursen, A. 2006. Can LSI help reconstructing requirements traceability in design and test? In Proceedings of 10th European Conference on Software Maintenance and Reengineering (Bari, Italy). 45--54.
[49]
Maarek, Y., Berry, D., and Kaiser, G. 1991. An information retrieval approach for automatically constructing software libraries. IEEE Trans. Softw. Eng. 17, 8, 800--813.
[50]
Maletic, J. I. and Marcus, A. 2001. Supporting program comprehension using semantic and structural information. In Proceedings of 23rd International Conference on Software Engineering (Toronto, Ont., Canada). 103--112.
[51]
Maletic, J. I., Collard, M. L., and Simoes, B. 2005. An XML based approach to support the evolution of model-to-model traceability links. In Proceedings of the 3rd ACM International Workshop on Traceability in Emerging Forms of Software Engineering (Long Beach, CA). 67--72.
[52]
Maletic, J. I., Munson, E. V., Marcus, A., and Nguyen, T. N. 2003. Using a hypertext model for traceability link conformance analysis. In Proceedings of 2nd International Workshop on Traceability in Emerging Forms of Software Engineering (Montreal, Que., Canada). 47--54.
[53]
Marcus, A. and Maletic, J. I. 2003. Recovering documentation-to-source-code traceability links using latent semantic indexing. In Proceedings of 25th International Conference on Software Engineering (Portland, OR). 125--135.
[54]
Marcus, A., Sergeyev, A., Rajlich, V., and Maletic, J. I. 2004. An information retrieval approach to concept location in source code. In Proceedings of 11th IEEE Working Conference on Reverse Engineering (Delft, The Netherlands). IEEE Computer Society Press, Los Alamitos, CA, 214--223.
[55]
Marcus, A., Xie, X., and Poshyvanyk, D. 2005. When and how to visualize traceability links? In Proceedings of the 3rd ACM International Workshop on Traceability in Emerging Forms of Software Engineering (Long Beach, CA). ACM, New York, 56--61.
[56]
Merlo, E., McAdam, I., and Mori, R. D. 1993. Source code informal information analysis using connectionist models. In Proceedings of International Joint Conference on Artificial Intelligence (Chambéry, France). 1339--1344.
[57]
Murphy, G. C., Notkin, D., and Sullivan, K. 2001. Software reflexion models: Bridging the gap between design and implementation. IEEE Trans. Softw. Eng. 27, 4, 364--380.
[58]
Nguyen, T. N., Thao, C., and Munson, E. V. 2005. On product versioning for hypertexts. In Proceedings of the 12th International Workshop on Software Configuration Management (Lisbon, Portugal). 99--111.
[59]
Nistor, E. C., Erenkrantz, J. R., Hendrickson, S. A., and van der Hoek, A. 2005. ArchEvol: Versioning architectural-implementation relationships. In Proceedings of the 12th International Workshop on Software Configuration Management (Lisbon, Portugal). 99--111.
[60]
Nuseibeh, B. 1996. Towards a framework for managing inconsistency between multiple views. In Joint Proceedings of the 2nd International Software Architecture Workshop and International Workshop on Multiple Perspectives in Software Development on SIGSOFT '96 workshops (San Francisco, CA). ACM, New York, 184--186.
[61]
Palmer, J. D. 2000. Traceability. In Software Requirements Engineering, Second Edition, R. H. Thayer and M. Dorfman, Eds. IEEE Computer Society Press, Los Alamitos, CA, 412--422.
[62]
Pighin, M. 2001. A new methodology for component reuse and maintenance. In Proceedings of 5th European Conference on Software Maintenance and Reengineering (Lisbon, Portugal). IEEE Computer Society Press, Los Alamitos, CA, 196--199.
[63]
Pinheiro, F. A. C. and Goguen, J. A. 1996. An object-oriented tool for tracing requirements. IEEE Softw. 13, 2, 52--64.
[64]
Ramesh, B. and Dhar, V. 1992. Supporting systems development using knowledge captured during requirements engineering. IEEE Transactions on Software Engineering 9, 2, 498--510.
[65]
Rational Software, 2006. Rational RequisitePro, http://www.rational.com/products/reqpro/index.jsp.
[66]
Richardson, J. and Green, J. 2004. Automating traceability for generated software artifacts. In Proceedings of 19th IEEE International Conference on Automated Software Engineering (Linz, Austria). IEEE Computer Society Press, Los Alamitos, CA, 24--33.
[67]
Rittel, H. and Kunz, W. 1970. Issues as elements of information systems. Working paper N°I 31, Institut fur Grundlagen der Planung I.A. University of Stuttgart.
[68]
Salton, G. and Buckley, C. 1988. Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24, 5, 513--523.
[69]
Sarma, A. and van der Hoek, A. 2002. Palantír: Coordinating distributed workspaces. In Proceedings of the 26th Annual IEEE International Computer Software and Applications Conference (Oxford, UK). IEEE Computer Society Press, Los Alamitos, CA, 1093--1097.
[70]
Sefika, M., Sane, A., and Campbell, R. H. 1996. Monitoring compliance of a software system with its high-level design models. In Proceedings of 16th International Conference on Software Engineering (Berlin, Germany). 387--396.
[71]
Settimi, R., Cleland-Huang, J., Ben Khadra, O., Mody, J., Lukasik, W., and DePalma, C. 2004. Supporting software evolution through dynamically retrieving traces to UML artifacts. In Proceedings of 7th International Workshop on Principles of Software Evolution (Kyoto, Japan). IEEE Computer Society Press, Los Alamitos, CA, 49--54.
[72]
Smith, M., Weiss, D., Wilcox, P., and Dewer, R. 2003. The Ophelia traceability layer. In Cooperative Methods and Tools for Distributed Software Processes, A. Cimitile, A. De Lucia, and H. Gall, Eds., Franco Angeli, 150--161.
[73]
Spanoudakis, G. and Zisman, A. 2001. Inconsistency management in software engineering: Survey and open research issues. In Handbook of Software Engineering and Knowledge Engineering, S. K. Chang, Ed. World Scientific Publishing Co., 24--29.
[74]
Telelogic, 2006. DOORS, http://www.telelogic.com.
[75]
von Knethen, A. and Grund, M. 2003. QuaTrace: A tool environment for (semi-) automatic impact analysis based on traces. In Proceedings of IEEE International Conference on Software Maintenance (Amsterdam, The Netherlands). IEEE Computer Society Press, Los Alamitos, CA, 246--255.
[76]
Weidl, J. and Gall, H. 1998. Binding object models to source code. In Proceedings of 22nd IEEE Annual International Computer Software and Applications Conference (Vienna, Austria). IEEE Computer Society Press, Los Alamitos, CA, 26--31.
[77]
Ying, A. T. T., Murphy, G. C., Ng, R., and Chu-Carroll, M. C. 2004. Predicting source code changes by mining change history. IEEE Trans. Softw. Eng. 30, 9, 574--586.
[78]
Zhao, W., Zhang, L., Liu, Y., Sun, J., Yang, F. 2004. SNIAFL: Towards a static non-interactive approach to feature location. In Proceedings of 26th International Conference on Software Engineering (Edinburgh, UK). 293--303.
[79]
Zimmermann, T., Weissgerber, P., Diehl, S., and Zeller, A. 2005. Mining version histories to guide software changes. IEEE Trans. Softw. Eng. 31, 6, 429--445.
[80]
Zisman, A., Spanoudakis, G., Perez-Miñana, E., and Krause, P. 2003. Tracing software requirements artifacts. In Proceedings of International Conference on Software Engineering Research and Practice (Las Vegas, NV). 448--455.

Cited By

View all
  • (2024)Help Them Understand: Testing and Improving Voice User InterfacesACM Transactions on Software Engineering and Methodology10.1145/365443833:6(1-33)Online publication date: 27-Jun-2024
  • (2023)On Using Information Retrieval to Recommend Machine Learning Good Practices for Software EngineersProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613093(2142-2146)Online publication date: 30-Nov-2023
  • (2023)Impact of Software Engineering Research in Practice: A Patent and Author Survey AnalysisIEEE Transactions on Software Engineering10.1109/TSE.2022.320821049:4(2020-2038)Online publication date: 1-Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 16, Issue 4
September 2007
117 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/1276933
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2007
Published in TOSEM Volume 16, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Software artifact management
  2. impact analysis
  3. latent semantic indexing
  4. traceability management

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)7
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Help Them Understand: Testing and Improving Voice User InterfacesACM Transactions on Software Engineering and Methodology10.1145/365443833:6(1-33)Online publication date: 27-Jun-2024
  • (2023)On Using Information Retrieval to Recommend Machine Learning Good Practices for Software EngineersProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613093(2142-2146)Online publication date: 30-Nov-2023
  • (2023)Impact of Software Engineering Research in Practice: A Patent and Author Survey AnalysisIEEE Transactions on Software Engineering10.1109/TSE.2022.320821049:4(2020-2038)Online publication date: 1-Apr-2023
  • (2023)Analyzing Tools and Techniques for Evaluating Requirements Traceability2023 25th International Multitopic Conference (INMIC)10.1109/INMIC60434.2023.10465703(1-6)Online publication date: 17-Nov-2023
  • (2022)Sorry, I don’t Understand: Improving Voice User Interface TestingProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3556934(1-12)Online publication date: 10-Oct-2022
  • (2022)Retrieving data constraint implementations using fine-grained code patternsProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510167(1893-1905)Online publication date: 21-May-2022
  • (2022)Automated assertion generation via information retrieval and its integration with deep learningProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510149(163-174)Online publication date: 21-May-2022
  • (2022)Enhancing Traceability Link Recovery with Unlabeled Data2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE55969.2022.00050(446-457)Online publication date: Oct-2022
  • (2022)Enhancing software model encoding for feature location approaches based on machine learning techniquesSoftware and Systems Modeling (SoSyM)10.1007/s10270-021-00920-y21:1(399-433)Online publication date: 1-Feb-2022
  • (2022)Feature Location in Software Variants Toward Software Product Line EngineeringHandbook of Re-Engineering Software Intensive Systems into Software Product Lines10.1007/978-3-031-11686-5_1(3-30)Online publication date: 5-Jul-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media