Abstract
Traceability link recovery (TLR) is an important and costly software task that requires humans establish relationships between source and target artifact sets within the same project. Previous research has proposed to establish traceability links by machine learning approaches. However, current machine learning approaches cannot be well applied to projects without traceability information (links), because training an effective predictive model requires humans label too many traceability links. To save manpower, we propose a new TLR approach based on active learning (AL), which is called the AL-based approach. We evaluate the AL-based approach on seven commonly used traceability datasets and compare it with an information retrieval based approach and a state-of-the-art machine learning approach. The results indicate that the AL-based approach outperforms the other two approaches in terms of F-score.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Antoniol G, Canfora G, Lucia A, et al., 2000. Information retrieval models for recovering traceability links between code and documentation. 16th Int Conf on Software Maintenance, p.40–49. https://doi.org/10.1109/ICSM.2000.883003
Asuncion HU, Asuncion AU, Taylor RN, 2010. Software traceability with topic modeling. 32nd Int Conf on Software Engineering, p.5–104. https://doi.org/10.1145/1806799.1806817
Borg M, Runeson P, Ardö A, 2013. Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Emp Softw Eng, 19(6):565–1616. https://doi.org/10.1007/s10664-013-9255-y
Breiman L, 2001. Random forests. Mach Learn, 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Chawla NV, Bowyer KW, Hall LO, et al., 2002. Smote: synthetic minority over-sampling technique. J Artif Intell Res, 16(1):321–357. https://doi.org/10.1613/jair.953
Cheng Y, Chen ZZ, Liu L, et al., 2013. Feedback driven multiclass active learning for data streams. 22nd Int Conf on Information & Knowledge Management, p.1311–1320. https://doi.org/10.1145/2505515.2505528
Cleland-Huang J, Settimi R, Duan C, et al., 2005. Utilizing supporting evidence to improve dynamic requirements traceability. 13th Int Conf on Requirements Engineering, p.135–144. https://doi.org/10.1109/RE.2005.78
Cleland-Huang J, Settimi R, Zou XC, et al., 2007. Automated classification of non-functional requirements. Req Eng, 12(2):103–120. https://doi.org/10.1007/s00766-007-0045-1
Cleland-Huang J, Czauderna A, Gibiec M, et al., 2010. A machine learning approach for tracing regulatory codes to product specific requirements. 32nd Int Conf on Software Engineering, p.155–164. https://doi.org/10.1145/1806799.1806825
Gethers M, Oliveto R, Poshyvanyk D, et al., 2011. On integrating orthogonal information retrieval methods to improve traceability recovery. 27th Int Conf on Software Maintenance, p.133–142. https://doi.org/10.1109/ICSM.2011.6080780
He H, Garcia E, 2009. Learning from imbalanced data. IEEE Trans Knowl Data Eng, 21(9):1263–1284. https://doi.org/10.1109/TKDE.2008.239
Jin G, Gibiec M, Cleland-Huang J, 2017. Tackling the term-mismatch problem in automated trace retrieval. Emp Softw Eng, 22(3):1103–1142. https://doi.org/10.1007/s10664-016-9479-8
Kuang HY, Nie J, Hu H, et al., 2017. Analyzing closeness of code dependencies for improving IR-based traceability recovery. 24th Int Conf on Software Analysis, Evolution, and Reengineering, p.68–78. https://doi.org/10.1109/SANER.2017.7884610
Li ZH, Chen MR, Huang LG, et al., 2015. Recovering traceability links in requirements documents. 19th Conf on Computational Natural Language Learning, p.237–246. https://doi.org/10.18653/v1/K15-1024
Lucia A, Fasano F, Oliveto R, et al., 2007. Recovering trace-ability links in software artifact management systems using information retrieval methods. ACM Trans Softw Eng Methodol, 16(4):13. https://doi.org/10.1145/1276933.1276934
Lucia A, Marcus A, Oliveto R, et al., 2012. Information retrieval methods for automated traceability recovery. In: Cleland-Huang J, Gotel O, Zisman A (Eds.), Software and Systems Traceability. Springer, London, p.71–98. https://doi.org/10.1007/978-1-4471-2239-5
Marcus A, Maletic JI, 2003. Recovering documentation-to-source-code traceability links using latent semantic indexing. 25th Int Conf on Software Engineering, p.125–135. https://doi.org/10.1109/ICSE.2003.1201194
Marcus A, Maletic JI, Sergeyev A, 2005. Recovery of trace-ability links between software documentation and source code. Int J Soft Eng Knowl Eng, 15(5):811–836. https://doi.org/10.1142/S0218194005002543
Mills C, Haiduc S, 2017a. The impact of retrieval direction on IR-based traceability link recovery. 39th Int Conf on Software Engineering: New Ideas and Emerging Technologies Results Track, p.51–54. https://doi.org/10.1109/ICSE-NIER.2017.14
Mills C, Haiduc S, 2017b. A machine learning approach for determining the validity of traceability links. 39th Int Conf on Software Engineering Companion, p.121–123. https://doi.org/10.1109/ICSE-C.2017.86
Mills C, Bavota G, Haiduc S, et al., 2017. Predicting query quality for applications of text retrieval to software engineering tasks. ACM Trans Softw Eng Methodol, 26(1):3. https://doi.org/10.1145/3078841
Mills C, Escobar-Avila J, Haiduc S, 2018. Automatic trace-ability maintenance via machine learning classification. 34th Int Conf on Software Maintenance and Evolution, p.369–380. https://doi.org/10.1109/ICSME.2018.00045
Mirakhorli M, Shin Y, Cleland-Huang J, et al., 2012. A tactic-centric approach for automating traceability of quality concerns. 34th Int Conf on Software Engineering, p.639–649. https://doi.org/10.1109/ICSE.2012.6227153
Panichella A, McMillan C, Moritz E, et al., 2013. When and how using structural information to improve IR-based traceability recovery. 17th European Conf on Software Maintenance and Reengineering, p.199–208. https://doi.org/10.1109/CSMR.2013.29
Rempel P, Mäder P, 2017. Preventing defects: the impact of requirements traceability completeness on software quality. IEEE Trans Softw Eng, 43(8):777–797. https://doi.org/10.1109/TSE.2016.2622264
Author information
Authors and Affiliations
Corresponding author
Additional information
Project supported by the National Natural Science Foundation of China (No. 61772270), the National Key Research and Development Project of China (Nos. 2016YFB1000802 and 2018YFB1003902), and the Funding of the Key Laboratory of Safety-Critical Software, China (No. 1015-XCA1816403)
Contributors
Tian-bao DU designed the research. Guo-hua SHEN drafted the manuscript. Zhi-qiu HUANG, Yao-shen YU, and De-xiang WU helped organized the manuscript. Tian-bao DU revised and finalized the paper.
Compliance with ethics guidelines
Tian-bao DU, Guo-hua SHEN, Zhi-qiu HUANG, Yao-shen YU, and De-xiang WU declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Du, Tb., Shen, Gh., Huang, Zq. et al. Automatic traceability link recovery via active learning. Front Inform Technol Electron Eng 21, 1217–1225 (2020). https://doi.org/10.1631/FITEE.1900222
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.1900222