Journal Publications by Markus Borg
Change Impact Analysis (CIA) during software evolution of safety-critical systems is a labor-inte... more Change Impact Analysis (CIA) during software evolution of safety-critical systems is a labor-intensive task. Several authors have proposed tool support for CIA, but very few tools were evaluated in industry. We present a case study on ImpRec, a recommendation System for Software Engineering (RSSE), tailored for CIA at a process automation company. ImpRec builds on assisted tracing, using information retrieval solutions and mining software repositories to recommend development artifacts, potentially impacted when resolving incoming issue reports. In contrast to the majority of tools for automated CIA, ImpRec explicitly targets development artifacts that are not source code. We evaluate ImpRec in a two-phase study. First, we measure the correctness of ImpRec's recommendations by a simulation based on 12 years' worth of issue reports in the company. Second, we assess the utility of working with ImpRec by deploying the RSSE in two development teams on different continents. The results suggest that ImpRec presents about 40% of the true impact among the top-10 recommendations. Furthermore, user log analysis indicates that ImpRec can support CIA in industry, and developers acknowledge the value of ImpRec in interviews. In conclusion, our findings show the potential of reusing traceability associated with developers' past activities in an RSSE.
[Context] It is an enigma that agile projects can succeed "without requirements" when weak requir... more [Context] It is an enigma that agile projects can succeed "without requirements" when weak requirements engineering is a known cause for project failures. While agile development projects often manage well without extensive requirements test cases are commonly viewed as requirements and detailed requirements are documented as test cases. [Objective] We have investigated this agile practice of using test cases as requirements to understand how test cases can support the main requirements activities, and how this practice varies. [Method] We performed an iterative case study at three companies and collected data through 14 interviews and 2 focus groups. [Results] The use of test cases as requirements poses both benefits and challenges when eliciting, validating, verifying, and managing requirements, and when used as a documented agreement. We have identified five variants of the test-cases-as-requirements practice, namely de facto, behaviour-driven, story-test driven, stand-alone strict and stand-alone manual for which the application of the practice varies concerning the time frame of requirements documentation, the requirements format, the extent to which the test cases are a machine executable specification and the use of tools which provide specific support for the practice of using test cases as requirements. [Conclusions] The findings provide empirical insight into how agile development projects manage and communicate requirements. The identified variants of the practice of using test cases as requirements can be used to perform in-depth investigations into agile requirements engineering. Practitioners can use the provided recommendations as a guide in designing and improving their agile requirements practices based on project characteristics such as number of stakeholders and rate of change.
Context. In many application domains, critical systems must comply with safety standards. This in... more Context. In many application domains, critical systems must comply with safety standards. This involves gathering safety evidence in the form of artefacts such as safety analyses, system specifications, and testing results. These artefacts can evolve during a system's lifecycle, creating a need for change impact analysis to guarantee that system safety and compliance are not jeopardised. Objective. We aim to provide new insights into how safety evidence change impact analysis is addressed in practice. The knowledge about this activity is limited despite the extensive research that has been conducted on change impact analysis and on safety evidence management. Method. We conducted an industrial survey on the circumstances under which safety evidence change impact analysis is addressed, the tool support used, and the challenges faced. Results. We obtained 97 valid responses representing 16 application domains, 28 countries, and 47 safety standards. The respondents had most often performed safety evidence change impact analysis during system development, from system specifications, and fully manually. No commercial change impact analysis tool was reported as used for all artefact types and insufficient tool support was the most frequent challenge. Conclusion. The results suggest that the different artefact types used as safety evidence co-evolve. In addition, the evolution of safety cases should probably be better managed, the level of automation in safety evidence change impact analysis is low, and the state of the practice can benefit from over 20 improvement areas.
Bug report assignment is an important part of software maintenance. In particular, incorrect assi... more Bug report assignment is an important part of software maintenance. In particular, incorrect assignments of bug reports to development teams can be very expensive in large software development projects. Several studies propose automating bug assignment techniques using machine learning in open source software contexts, but no study exists for large-scale proprietary projects in industry. The goal of this study is to evaluate automated bug assignment techniques that are based on machine learning classification. In particular, we study the state-of-the-art ensemble learner Stacked Generalization (SG) that combines several classifiers. We collect more than 50,000 bug reports from five development projects from two companies in different domains. We implement automated bug assignment and evaluate the performance in a set of controlled experiments. We show that SG scales to large scale industrial application and that it outperforms the use of individual classifiers for bug assignment, reaching prediction accuracies from 50 % to 89 % when large training sets are used. In addition, we show how old training data can decrease the prediction accuracy of bug assignment. We advice industry to use SG for bug assignment in proprietary contexts, using at least 2,000 bug reports for training. Finally, we highlight the importance of not solely relying on results from cross-validation when evaluating automated bug assignment.
Empirical Software Engineering, Jun 27, 2015
Defect management is a central task in software maintenance. When a defect is reported, appropria... more Defect management is a central task in software maintenance. When a defect is reported, appropriate resources must be allocated to analyze and resolve the defect. An important issue in resource allocation is the estimation of Defect Resolution Time (DRT). Prior research has considered different approaches for DRT prediction exploiting information retrieval techniques and similarity in textual defect descriptions. In this article, we investigate the potential of text clustering for DRT prediction. We build on a study published by Raja (2013) which demonstrated that clusters of similar defect reports had statistically significant differences in DRT. Raja’s study also suggested that this difference between clusters could be used for DRT prediction. Our aims are twofold: First, to conceptually replicate Raja’s study and to assess the repeatability of its results in different settings; Second, to investigate the potential of textual clustering of issue reports for DRT prediction with focus on accuracy. Using different data sets and a different text mining tool and clustering technique, we first conduct an independent replication of the original study. Then we design a fully automated prediction method based on clustering with a simulated test scenario to check the accuracy of our method. The results of our independent replication are comparable to those of the original study and we confirm the initial findings regarding significant differences in DRT between clusters of defect reports. However, the simulated test scenario used to assess our prediction method yields poor results in terms of DRT prediction accuracy. Although our replication confirms the main finding from the original study, our attempt to use text clustering as the basis for DRT prediction did not achieve practically useful levels of accuracy.
Issue management, a central part of software maintenance, requires much effort for complex softwa... more Issue management, a central part of software maintenance, requires much effort for complex software systems. The continuous inflow of issue reports makes it hard for developers to stay on top of the situation, and the threatening information overload makes activities such as duplicate management,
Issue Assignment (IA), and Change Impact Analysis (CIA) tedious and error-prone. Still, most practitioners work with tools that act as little more than issue containers. Machine Learning encompasses approaches that identify patterns or make predictions based on empirical data. While humans have limited ability to work with big data, ML instead tends to
improve the more training data that is available. Consequently, we argue that the challenge of information overload in issue management appears to be particularly suitable for ML-based tool support. While others have initially explored the area, we develop two ML-based tools, and evaluate them in proprietary software engineering contexts. We replicated [1] for five projects in two companies, and our automated IA obtains an accuracy matching the current manual processes. Thus, as our solution delivers instantaneous IA, an organization can potentially save considerable analysis effort. Moreover, for the most comprehensive of the five projects, we implemented automated CIA in the tool ImpRec [3]. We evaluated the tool
in a longitudinal in situ study, i.e., deployment in two development teams in industry. Based on log analysis and complementary interviews using the QUPER model [2] for utility assessment, we conclude that ImpRec offered helpful support in the CIA task.
Empirical Software Engineering, 2014
Engineers in large-scale software development have to manage large amounts of information, spread... more Engineers in large-scale software development have to manage large amounts of information, spread across many artifacts. Several researchers have proposed expressing retrieval of trace links among artifacts, i.e. trace recovery, as an Information Retrieval (IR) problem. The objective of this study is to produce a map of work on IR-based trace recovery, with a particular focus on previous evaluations and strength of evidence. We conducted a systematic mapping of IR-based trace recovery. Of the 79 publications classified, a majority applied algebraic IR models. While a set of studies on students indicate that IR-based trace recovery tools support certain work tasks, most previous studies do not go beyond reporting precision and recall of candidate trace links from evaluations using datasets containing less than 500 artifacts. Our review identified a need of industrial case studies. Furthermore, we conclude that the overall quality of reporting should be improved regarding both context and tool details, measures reported, and use of IR terminology. Finally, based on our empirical findings, we present suggestions on how to advance research on IR-based trace recovery.
Empirical Software Engineering, 2014
Weak alignment of requirements engineering (RE) with verification and validation (VV) may lead to... more Weak alignment of requirements engineering (RE) with verification and validation (VV) may lead to problems in delivering the required products in time with the right quality. For example, weak communication of requirements changes to testers may result in
lack of verification of new requirements and incorrect verification of old invalid requirements, leading to software quality problems, wasted effort and delays. However, despite the serious implications of weak alignment research and practice both tend to focus on one or the other of RE or VV rather than on the alignment of the two.We have performed a multi-unit case study to gain
insight into issues around aligning RE and VV by interviewing 30 practitioners from 6 software developing companies, involving 10 researchers in a flexible research process for case studies.
The results describe current industry challenges and practices in aligning RE with VV, ranging from quality of the individual RE and VVactivities, through tracing and tools, to change control and sharing a common understanding at strategy, goal and design level. The study identified that human aspects are central, i.e. cooperation and communication, and that requirements engineering practices are a critical basis for alignment. Further, the size of an organisation and its motivation for applying alignment practices, e.g. external enforcement of traceability, are
variation factors that play a key role in achieving alignment. Our results provide a strategic roadmap for practitioners improvement work to address alignment challenges. Furthermore, the
study provides a foundation for continued research to improve the alignment of RE with VV.
Students working in groups is a commonly used method of instruction in higher education, populari... more Students working in groups is a commonly used method of instruction in higher education, popularized by the introduction of problem based learning. As a result, management of small groups of people has become an important skill for teachers. The objective of our study is to investigate why conflicts arise in student groups at the Faculty of Engineering at Lund University and how teachers manage them. We have conducted an exploratory interdepartmental interview study on teachers' views on this matter, interviewing ten university teachers with different levels of seniority. Our results show that conflicts frequently arise in group work, most commonly caused by different levels of ambition among students. We also found that teachers prefer to work proactively against conflicts and stress the student’s responsibility. Finally, we show that teachers at our faculty tend to avoid the more drastic conflict resolution strategies suggested by previous research. The outcome of our study could be used as input to future guidelines on conflict management in student groups.
Conference Papers by Markus Borg
Proc. of the 16th International Conference on Agile Software Development (XP)
It is a conundrum that agile projects can succeed 'without requirements' when weak requirements e... more It is a conundrum that agile projects can succeed 'without requirements' when weak requirements engineering is a known cause for project failures. While Agile development projects often manage well without extensive requirements documentation, test cases are commonly used as requirements. We have investigated this agile practice at three companies in order to understand how test cases can fill the role of requirements. We performed a case study based on twelve interviews performed in a previous study. The findings include a range of benefits and challenges in using test cases for eliciting, validating , verifying, tracing and managing requirements. In addition, we identified three scenarios for applying the practice, namely as a mature practice, as a de facto practice and as part of an agile transition. The findings provide insights into how the role of requirements may be met in agile development including challenges to consider.
Proceedings of LU:s femte högskolepedagogiska utvecklingskonferens
Students completing a Swedish Master's degree in engineering should have knowledge and skills to ... more Students completing a Swedish Master's degree in engineering should have knowledge and skills to independently solve engineering issues. This autonomy should be developed and demonstrated within the M.Sc. project course. But, how can supervisors encourage independence? We have explored this in a case study through semi-structured interviews with students, supervisors and examiners of two M.Sc. projects. We investigated their view of independence, and how supervision correlates to independence. The results identify areas relevant to independence, namely supervision roles and relationships, student characteristics, M.Sc. process, and view on independence. The results confirm previous findings that students' knowledge of and motivation for the topic support independence. The supervisor's role is to guide and support through frequent peer-level discussions and to act as a discussion partner, while the student should have the main responsibility for the project. We conclude that it is important for supervisors to encourage students to take ownership of their M.Sc. projects and to design their own solutions, while providing the overall process and timelines.
Background. Test automation is a widely used technique
to increase the efficiency of software tes... more Background. Test automation is a widely used technique
to increase the efficiency of software testing. However,
executing more test cases increases the effort required to analyze test results. At Qlik, automated tests run nightly for up to 20 development branches, each containing thousands of test cases, resulting in information overload. Aim. We therefore develop a tool that supports the analysis of test results. Method. We create NIOCAT, a tool that clusters similar test case failures, to help the analyst identify underlying causes. To evaluate the tool, experiments on manually created subsets of failed test cases representing different use cases are conducted, and a focus group meeting is held with test analysts at Qlik. Results. The case study shows that NIOCAT creates accurate clusters, in line with analyses performed by human analysts. Further, the potential time-savings of our approach is confirmed by the participants in the focus group. Conclusions. NIOCAT provides a feasible complement to current automated testing practices at Qlik by reducing information overload.
Proc. of the 8th International Symposium on Empirical Software Engineering and Measurement, Sep 18, 2014
Context: Duplicate detection is a fundamental part of issue management. Systems able to predict w... more Context: Duplicate detection is a fundamental part of issue management. Systems able to predict whether a new defect report will be closed as a duplicate, may decrease costs by limiting rework and collecting related pieces of information. Previous work relies on the textual content of the defect reports, often assuming that better results are obtained if the title is weighted as more important than the description. Method: We conduct a conceptual replication of a well-cited study conducted at Sony Ericsson, using
Apache Lucene for searching in the public Android defect repository. In line with the original study, we explore how varying the weighting of the title and the description affects the accuracy. Results and conclusions: Our work shows the potential of using Lucene as a scalable solution for duplicate detection. Also, we show that Lucene obtains the best results the when the defect report title is weighted three times higher than the description, a bigger differencethan has been previously acknowledged.
Proc. of the Euromicro Conference series on Software Engineering and Advanced Applications (SEAA), Aug 27, 2014
The popularity of Open Source Software (OSS) has increased the interest in using it in safety cri... more The popularity of Open Source Software (OSS) has increased the interest in using it in safety critical applications. The aim of this study is to review research carried out on usage of open source code in development of safety-critical software and systems. We conducted a systematic mapping study through searches in library databases and manual identification of articles from open source conferences. We have identified 22 studies about using open source software, mainly in automotive, aerospace, medical and nuclear domains. Moreover, only a few studies present complete safety systems that are released as OSS in full and that the most commonly used OSS functionalities are operating systems, imaging, control and data management. Finally most of the integrated OSS have mature code bases and a commit history of more than five years.
Proc. of the 7th International Symposium on Empirical Software Engineering and Measurement
Several researchers have proposed creating after-the-fact structure among software artifacts usin... more Several researchers have proposed creating after-the-fact structure among software artifacts using trace recovery based on Information Retrieval (IR) approaches. Due to significant variation points in previous studies, results are not easily aggregated. We provide an initial overview picture of the outcome of previous evaluations. Based on a systematic mapping study, we perform a synthesis of published research. Our results show that there are no empirical evidence that any IR model outperforms another model consistently. We also display a strong dependency between the P-R values and the input datasets. Finally, our mapping of Precision and Recall (P-R) values on the possible output space highlights the difficulty of recovering accurate trace links using naïve cut-off strategies. Thus, our work presents empirical evidence that confirms several previous claims on IR-based trace recovery and stresses the needs for empirical evaluations beyond the basic P-R "race".
Proceedings of the 7th International Conference on Software Testing, Verification and Validation, Mar 31, 2014
Background: Test managers have to repeatedly select test cases for test activities during evoluti... more Background: Test managers have to repeatedly select test cases for test activities during evolution of large software systems. Researchers have widely studied automated test scoping, but have not fully investigated decision support with human interaction. We previously proposed the introduction of visual analytics for this purpose. Aim: In this empirical study we investigate how to design such decision support. Method: We
explored the use of visual analytics using heat maps of historical
test data for test scoping support by letting test managers
evaluate prototype visualizations in three focus groups with in
total nine industrial test experts. Results: All test managers in
the study found the visual analytics useful for supporting test
planning. However, our results show that different tasks and
contexts require different types of visualizations. Conclusion:
Important properties for test planning support are: ability to
overview testing from different perspectives, ability to filter and
zoom to compare subsets of the testing with respect to various
attributes and the ability to manipulate the subset under analysis
by selecting and deselecting test cases. Our results may be used
to support the introduction of visual test analytics in practice.
Proc. of the 17th European Conference on Software Maintenance and Reengineering, Mar 6, 2013
Completely analyzed and closed issue reports in software development projects, particularly in th... more Completely analyzed and closed issue reports in software development projects, particularly in the development of safety-critical systems, often carry important information about issue-related change locations. These locations may be in the source code, as well as traces to test cases affected by the issue, and related design and requirements documents. In order to help developers analyze new issues, knowledge about issue clones and duplicates, as well as other relations between the new issue and existing issue reports would be useful. This paper analyses, in an exploratory study, issue reports contained in two Issue Management Systems (IMS) containing approximately 20.000 issue reports. The purpose of the analysis is to gain a better understanding of relationships between issue reports
in IMSs. We found that link-mining explicit references can reveal complex networks of issue reports. Furthermore, we found that textual similarity analysis might have the potential to complement the explicitly signaled links by recommending additional relations. In line with work in other fields, links between software artifacts have a potential to improve search and navigation in large software engineering projects.
Background: Development of complex, software intensive
systems generates large amounts of inform... more Background: Development of complex, software intensive
systems generates large amounts of information. Several
researchers have developed tools implementing information
retrieval (IR) approaches to suggest traceability links among
artifacts. Aim: We explore the consequences of the fact that
a majority of the evaluations of such tools have been focused
on benchmarking of mere tool output. Method: To illustrate this
issue, we have adapted a framework of general IR evaluations to a context taxonomy specifically for IR-based traceability recovery. Furthermore, we evaluate a previously proposed experimental framework by conducting a study using two publicly available tools on two datasets originating from development of embedded software systems. Results: Our study shows that even though both datasets contain software artifacts from embedded development, the characteristics of the two datasets differ considerably, and consequently the traceability outcomes. Conclusions: To enable replications and secondary studies, we suggest that datasets should be thoroughly characterized in future studies on traceability
recovery, especially when they can not be disclosed. Also, while
we conclude that the experimental framework provides useful
support, we argue that our proposed context taxonomy is a useful complement. Finally, we discuss how empirical evidence of the feasibility of IR-based traceability recovery can be strengthened in future research.
Since software development is of a dynamic nature, the impact analysis is an inevitable work task... more Since software development is of a dynamic nature, the impact analysis is an inevitable work task. Traceability is known as one factor that supports this task, and several researchers have proposed traceability recovery tools to propose trace links in an existing system. However, these semi-automatic tools have not yet proven useful in industrial applications. Based on an established automation model, we analyzed the potential value of such a tool. We based our analysis on a pilot case study of an impact analysis process in a safety-critical development context, and argue that traceability recovery should be considered an investment in findability. Moreover, several risks involved in an increased level of impact analysis automation are already plaguing the state-of-practice work flow. Consequently, deploying a traceability recovery tool involves a lower degree of change than has previously been acknowledged.
About a hundred studies on traceability recovery have been published in software engineering fora... more About a hundred studies on traceability recovery have been published in software engineering fora. In roughly half of them, software artifacts developed by students have been used as input. To what extent student artifacts differ from industrial counterparts has not been fully explored in the literature. We conducted a survey among authors of studies on traceability recovery, including both academics and practitioners, to explore their perspectives on the matter. Our results indicate that a majority of authors consider software artifacts originating from student projects to be only partly representative to industrial artifacts. Moreover, only few respondents validated student artifacts for industrial representativeness. Furthermore, our respondents made suggestions for improving the description of artifact sets used in studies by adding contextual, domain-specificand artifact-centric information. Example suggestions include adding descriptions of processes used for artifact development,meaning of traceability links, and the structure of artifacts. Our findings call for further research on characterization and validation of software artifacts to support aggregation of results from empirical studies.
Uploads
Journal Publications by Markus Borg
Issue Assignment (IA), and Change Impact Analysis (CIA) tedious and error-prone. Still, most practitioners work with tools that act as little more than issue containers. Machine Learning encompasses approaches that identify patterns or make predictions based on empirical data. While humans have limited ability to work with big data, ML instead tends to
improve the more training data that is available. Consequently, we argue that the challenge of information overload in issue management appears to be particularly suitable for ML-based tool support. While others have initially explored the area, we develop two ML-based tools, and evaluate them in proprietary software engineering contexts. We replicated [1] for five projects in two companies, and our automated IA obtains an accuracy matching the current manual processes. Thus, as our solution delivers instantaneous IA, an organization can potentially save considerable analysis effort. Moreover, for the most comprehensive of the five projects, we implemented automated CIA in the tool ImpRec [3]. We evaluated the tool
in a longitudinal in situ study, i.e., deployment in two development teams in industry. Based on log analysis and complementary interviews using the QUPER model [2] for utility assessment, we conclude that ImpRec offered helpful support in the CIA task.
lack of verification of new requirements and incorrect verification of old invalid requirements, leading to software quality problems, wasted effort and delays. However, despite the serious implications of weak alignment research and practice both tend to focus on one or the other of RE or VV rather than on the alignment of the two.We have performed a multi-unit case study to gain
insight into issues around aligning RE and VV by interviewing 30 practitioners from 6 software developing companies, involving 10 researchers in a flexible research process for case studies.
The results describe current industry challenges and practices in aligning RE with VV, ranging from quality of the individual RE and VVactivities, through tracing and tools, to change control and sharing a common understanding at strategy, goal and design level. The study identified that human aspects are central, i.e. cooperation and communication, and that requirements engineering practices are a critical basis for alignment. Further, the size of an organisation and its motivation for applying alignment practices, e.g. external enforcement of traceability, are
variation factors that play a key role in achieving alignment. Our results provide a strategic roadmap for practitioners improvement work to address alignment challenges. Furthermore, the
study provides a foundation for continued research to improve the alignment of RE with VV.
Conference Papers by Markus Borg
to increase the efficiency of software testing. However,
executing more test cases increases the effort required to analyze test results. At Qlik, automated tests run nightly for up to 20 development branches, each containing thousands of test cases, resulting in information overload. Aim. We therefore develop a tool that supports the analysis of test results. Method. We create NIOCAT, a tool that clusters similar test case failures, to help the analyst identify underlying causes. To evaluate the tool, experiments on manually created subsets of failed test cases representing different use cases are conducted, and a focus group meeting is held with test analysts at Qlik. Results. The case study shows that NIOCAT creates accurate clusters, in line with analyses performed by human analysts. Further, the potential time-savings of our approach is confirmed by the participants in the focus group. Conclusions. NIOCAT provides a feasible complement to current automated testing practices at Qlik by reducing information overload.
Apache Lucene for searching in the public Android defect repository. In line with the original study, we explore how varying the weighting of the title and the description affects the accuracy. Results and conclusions: Our work shows the potential of using Lucene as a scalable solution for duplicate detection. Also, we show that Lucene obtains the best results the when the defect report title is weighted three times higher than the description, a bigger differencethan has been previously acknowledged.
explored the use of visual analytics using heat maps of historical
test data for test scoping support by letting test managers
evaluate prototype visualizations in three focus groups with in
total nine industrial test experts. Results: All test managers in
the study found the visual analytics useful for supporting test
planning. However, our results show that different tasks and
contexts require different types of visualizations. Conclusion:
Important properties for test planning support are: ability to
overview testing from different perspectives, ability to filter and
zoom to compare subsets of the testing with respect to various
attributes and the ability to manipulate the subset under analysis
by selecting and deselecting test cases. Our results may be used
to support the introduction of visual test analytics in practice.
in IMSs. We found that link-mining explicit references can reveal complex networks of issue reports. Furthermore, we found that textual similarity analysis might have the potential to complement the explicitly signaled links by recommending additional relations. In line with work in other fields, links between software artifacts have a potential to improve search and navigation in large software engineering projects.
systems generates large amounts of information. Several
researchers have developed tools implementing information
retrieval (IR) approaches to suggest traceability links among
artifacts. Aim: We explore the consequences of the fact that
a majority of the evaluations of such tools have been focused
on benchmarking of mere tool output. Method: To illustrate this
issue, we have adapted a framework of general IR evaluations to a context taxonomy specifically for IR-based traceability recovery. Furthermore, we evaluate a previously proposed experimental framework by conducting a study using two publicly available tools on two datasets originating from development of embedded software systems. Results: Our study shows that even though both datasets contain software artifacts from embedded development, the characteristics of the two datasets differ considerably, and consequently the traceability outcomes. Conclusions: To enable replications and secondary studies, we suggest that datasets should be thoroughly characterized in future studies on traceability
recovery, especially when they can not be disclosed. Also, while
we conclude that the experimental framework provides useful
support, we argue that our proposed context taxonomy is a useful complement. Finally, we discuss how empirical evidence of the feasibility of IR-based traceability recovery can be strengthened in future research.
Issue Assignment (IA), and Change Impact Analysis (CIA) tedious and error-prone. Still, most practitioners work with tools that act as little more than issue containers. Machine Learning encompasses approaches that identify patterns or make predictions based on empirical data. While humans have limited ability to work with big data, ML instead tends to
improve the more training data that is available. Consequently, we argue that the challenge of information overload in issue management appears to be particularly suitable for ML-based tool support. While others have initially explored the area, we develop two ML-based tools, and evaluate them in proprietary software engineering contexts. We replicated [1] for five projects in two companies, and our automated IA obtains an accuracy matching the current manual processes. Thus, as our solution delivers instantaneous IA, an organization can potentially save considerable analysis effort. Moreover, for the most comprehensive of the five projects, we implemented automated CIA in the tool ImpRec [3]. We evaluated the tool
in a longitudinal in situ study, i.e., deployment in two development teams in industry. Based on log analysis and complementary interviews using the QUPER model [2] for utility assessment, we conclude that ImpRec offered helpful support in the CIA task.
lack of verification of new requirements and incorrect verification of old invalid requirements, leading to software quality problems, wasted effort and delays. However, despite the serious implications of weak alignment research and practice both tend to focus on one or the other of RE or VV rather than on the alignment of the two.We have performed a multi-unit case study to gain
insight into issues around aligning RE and VV by interviewing 30 practitioners from 6 software developing companies, involving 10 researchers in a flexible research process for case studies.
The results describe current industry challenges and practices in aligning RE with VV, ranging from quality of the individual RE and VVactivities, through tracing and tools, to change control and sharing a common understanding at strategy, goal and design level. The study identified that human aspects are central, i.e. cooperation and communication, and that requirements engineering practices are a critical basis for alignment. Further, the size of an organisation and its motivation for applying alignment practices, e.g. external enforcement of traceability, are
variation factors that play a key role in achieving alignment. Our results provide a strategic roadmap for practitioners improvement work to address alignment challenges. Furthermore, the
study provides a foundation for continued research to improve the alignment of RE with VV.
to increase the efficiency of software testing. However,
executing more test cases increases the effort required to analyze test results. At Qlik, automated tests run nightly for up to 20 development branches, each containing thousands of test cases, resulting in information overload. Aim. We therefore develop a tool that supports the analysis of test results. Method. We create NIOCAT, a tool that clusters similar test case failures, to help the analyst identify underlying causes. To evaluate the tool, experiments on manually created subsets of failed test cases representing different use cases are conducted, and a focus group meeting is held with test analysts at Qlik. Results. The case study shows that NIOCAT creates accurate clusters, in line with analyses performed by human analysts. Further, the potential time-savings of our approach is confirmed by the participants in the focus group. Conclusions. NIOCAT provides a feasible complement to current automated testing practices at Qlik by reducing information overload.
Apache Lucene for searching in the public Android defect repository. In line with the original study, we explore how varying the weighting of the title and the description affects the accuracy. Results and conclusions: Our work shows the potential of using Lucene as a scalable solution for duplicate detection. Also, we show that Lucene obtains the best results the when the defect report title is weighted three times higher than the description, a bigger differencethan has been previously acknowledged.
explored the use of visual analytics using heat maps of historical
test data for test scoping support by letting test managers
evaluate prototype visualizations in three focus groups with in
total nine industrial test experts. Results: All test managers in
the study found the visual analytics useful for supporting test
planning. However, our results show that different tasks and
contexts require different types of visualizations. Conclusion:
Important properties for test planning support are: ability to
overview testing from different perspectives, ability to filter and
zoom to compare subsets of the testing with respect to various
attributes and the ability to manipulate the subset under analysis
by selecting and deselecting test cases. Our results may be used
to support the introduction of visual test analytics in practice.
in IMSs. We found that link-mining explicit references can reveal complex networks of issue reports. Furthermore, we found that textual similarity analysis might have the potential to complement the explicitly signaled links by recommending additional relations. In line with work in other fields, links between software artifacts have a potential to improve search and navigation in large software engineering projects.
systems generates large amounts of information. Several
researchers have developed tools implementing information
retrieval (IR) approaches to suggest traceability links among
artifacts. Aim: We explore the consequences of the fact that
a majority of the evaluations of such tools have been focused
on benchmarking of mere tool output. Method: To illustrate this
issue, we have adapted a framework of general IR evaluations to a context taxonomy specifically for IR-based traceability recovery. Furthermore, we evaluate a previously proposed experimental framework by conducting a study using two publicly available tools on two datasets originating from development of embedded software systems. Results: Our study shows that even though both datasets contain software artifacts from embedded development, the characteristics of the two datasets differ considerably, and consequently the traceability outcomes. Conclusions: To enable replications and secondary studies, we suggest that datasets should be thoroughly characterized in future studies on traceability
recovery, especially when they can not be disclosed. Also, while
we conclude that the experimental framework provides useful
support, we argue that our proposed context taxonomy is a useful complement. Finally, we discuss how empirical evidence of the feasibility of IR-based traceability recovery can be strengthened in future research.
In this thesis, we address two tasks involved in issue management: Issue Assignment (IA) and Change Impact Analysis (CIA). IA is the early task of allocating an issue report to a development team, and CIA is the subsequent activity of identifying how source code changes affect the existing software artifacts. While IA is fundamental in all large software projects, CIA is particularly important to safety-critical development.
Our solution approach, grounded on surveys of industry practice as well as scientific literature, is to support navigation by combining information retrieval and machine learning into Recommendation Systems for Software Engineering (RSSE). While the sheer number of incoming issue reports might challenge the overview of a human developer, our techniques instead benefit from the availability of ever-growing training data. We leverage the volume of issue reports to develop accurate decision support for software evolution.
We evaluate our proposals both by deploying an RSSE in two development teams, and by simulation scenarios, i.e., we assess the correctness of the RSSEs' output when replaying the historical inflow of issue reports. In total, more than 60,000 historical issue reports are involved in our studies, originating from the evolution of five proprietary systems for two companies. Our results show that RSSEs for both IA and CIA can help developers navigate large software projects, in terms of locating development teams and software artifacts. Finally, we discuss how to support the transfer of our results to industry, focusing on addressing the context dependency of our tool support by systematically tuning parameters to a specific operational setting.
This thesis is based on empirical software engineering research. In a Systematic Literature Review (SLR) we show that a majority of previous evaluations of IR-based trace recovery have been technology-oriented, conducted in "the cave of IR evaluation", using small datasets as experimental input. Also, software artifacts originating from student projects have frequently been used in evaluations. We conducted a survey among traceability researchers and found that while a majority consider student artifacts to be only partly representative of industrial counterparts, such artifacts were typically not validated for industrial representativeness. Our findings call for additional case studies to evaluate IR-based trace recovery within the full complexity of an industrial setting. Thus, we outline future research on IR-based trace recovery in an industrial study on safety-critical impact analysis.
Also, this thesis contributes to the body of empirical evidence of IR-based trace recovery in two experiments with industrial software artifacts. The technology-oriented experiment highlights the clear dependence between datasets and the accuracy of IR-based trace recovery, in line with findings from the SLR. The human-oriented experiment investigates how different quality levels of tool output affect the tracing accuracy of engineers. While the results are not conclusive, there are indications that it is worthwhile further investigating into the actual value of improving tool support for IR-based trace recovery. Finally, we present how tools and methods are evaluated in the general field of IR research, and propose a taxonomy of evaluation contexts tailored for IR-based trace recovery in software engineering.