Prof (full). Ph.D. computer science (SE, AI, data mining, prog. languages), ex-nurse, rocketman, taxi-driver, journalist (it all made sense at the time).
The current generation of software analytics tools are mostly prediction algorithms (e.g. support... more The current generation of software analytics tools are mostly prediction algorithms (e.g. support vector machines, naive bayes, logistic regression, etc). While prediction is useful, after prediction comes planning about what actions to take in order to improve quality. This research seeks methods that generate demonstrably useful guidance on ''what to do'' within the context of a specific software project. Specifically, we propose XTREE (for within-project planning) and BELLTREE (for cross-project planning) to generating plans that can improve software quality. Each such plan has the property that, if followed, it reduces the probability of future defect reports. When compared to other planning algorithms from the SE literature, we find that this new approach is most effective at learning plans from one project, then applying those plans to another. In 10 open-source JAVA systems, several hundreds of defects were reduced in sections of the code that followed the pla...
This report documents the program and the outcomes of Dagstuhl Seminar 14261 "Software Devel... more This report documents the program and the outcomes of Dagstuhl Seminar 14261 "Software Development Analytics". We briefly summarize the goals and format of the seminar, the results of the break out groups, and a draft of a manifesto for software analytics. The report also includes the abstracts of the talks presented at the seminar. Seminar June 22-27, 2014-http://www.dagstuhl.de/14261 1998 ACM Subject Classification D.2 Software Engineering
Transfer learning has been the subject of much recent research. In practice, that research means ... more Transfer learning has been the subject of much recent research. In practice, that research means that the models are unstable since they are continually revised whenever new data arrives. This paper offers a very simple “bellwether” transfer learner. Given N datasets, we find which one produces the best predictions on all the others. This “bellwether” dataset is then used for all subsequent predictions (when its predictions start failing, one may seek another bellwether). Bellwethers are interesting since they are very simple to find (wrap a for-loop around standard data miners). They simplify the task of making general policies in software engineering since as long as one bellwether remains useful, stable conclusions for N datasets can be achieved by reasoning over that bellwether. This paper shows that this bellwether approach works for multiple datasets from various domains in SE. From this, we conclude that (1) bellwether method is a useful (and simple) transfer learner; (2) Unl...
Before researchers rush to reason across all available data, they should first check if the infor... more Before researchers rush to reason across all available data, they should first check if the information is densest within some small region. We say this since, in 240 GitHub projects, we find that the information in that data “clumps” towards the earliest parts of the project. In fact, a defect prediction model learned from just the first 150 commits works as well, or better than state-of-the-art alternatives. Using just this early life cycle data, we can build models very quickly (using weeks, not months, of CPU time). Also, we can find simple models (with just two features) that generalize to hundreds of software projects. Based on this experience, we warn that prior work on generalizing software engineering defect prediction models may have needlessly complicated an inherently simple process. Further, prior work that focused on later-life cycle data now needs to be revisited since their conclusions were drawn from relatively uninformative regions. Replication note: all our data a...
There has been much recent interest in the application of deep learning neural networks in softwa... more There has been much recent interest in the application of deep learning neural networks in software engineering. Some researchers are worried that deep learning is being applied with insufficient critical ssessment. Hence, for one well-studied software analytics task (defect prediction), this paper compares deep learning versus prior-state-of-the-art results. Deep learning will outperform those prior results, but only after adjusting its hyperparameters using GHOST (Goal-oriented Hyperparameter Optimization for Scalable Training). For defect prediction, GHOST terminates in just a few minutes and scales to larger data sets; i.e. it is practical to tune deep learning tuning for defect prediction. Hence this paper recommends deep learning for defect prediction, but only adjusting its goal predicates and tuning its hyperparameters (using some hyperparameter optimization tool, like GHOST)
Despite decades of research, SE lacks widely accepted models (that offer precise quantitative pre... more Despite decades of research, SE lacks widely accepted models (that offer precise quantitative predictions) about what factors most influence software quality. This paper provides a “good news” result that such general models can be generated using a new transfer learning framework called “GENERAL”. Given a tree of recursively clustered projects (using project meta-data), GENERAL promotes a model upwards if it performs best in the lower clusters (stopping when the promoted model performs worse than the models seen at a lower level). The number of models found by GENERAL is minimal: one for defect prediction (756 projects) and less than a dozen for project health (1628 projects). Hence, via GENERAL, it is possible to make conclusions that hold across hundreds of projects at a time. Further, the models produced in this manner offer predictions that perform as well or better than prior state-of-the-art. To the best of our knowledge, this is the largest demonstration of the generalizabil...
BACKGROUND: Given many possible changes to a software project, which ones are recommended? AIM: T... more BACKGROUND: Given many possible changes to a software project, which ones are recommended? AIM: To comparatively assess different decision procedures for recommending project changes. METHOD: We search for project recommendations within data from eight projects using various AI tools: six model-based methods and one instance-based method called W2. Results were assessed by comparing effort, defects, development time values in the raw data versus the subset of the data selected by those recommendations. RESULTS: In the majority case, significantly large reductions on effort, defects and development time were achieved. Further, W2 performed as well, or better, than any other methods in this study. W2 does not rely on an underlying model of software process so it does not demand that domain data be expressed in the terminology of that model. Hence, it can be quickly adapted to a new domain and easy to maintain (just add more instances). CONCLUSION: We recommend instance-based methods s...
The current generation of software analytics tools are mostly prediction algorithms (e.g. support... more The current generation of software analytics tools are mostly prediction algorithms (e.g. support vector machines, naive bayes, logistic regression, etc). While prediction is useful, after prediction comes planning about what actions to take in order to improve quality. This research seeks methods that generate demonstrably useful guidance on ''what to do'' within the context of a specific software project. Specifically, we propose XTREE (for within-project planning) and BELLTREE (for cross-project planning) to generating plans that can improve software quality. Each such plan has the property that, if followed, it reduces the probability of future defect reports. When compared to other planning algorithms from the SE literature, we find that this new approach is most effective at learning plans from one project, then applying those plans to another. In 10 open-source JAVA systems, several hundreds of defects were reduced in sections of the code that followed the pla...
This report documents the program and the outcomes of Dagstuhl Seminar 14261 "Software Devel... more This report documents the program and the outcomes of Dagstuhl Seminar 14261 "Software Development Analytics". We briefly summarize the goals and format of the seminar, the results of the break out groups, and a draft of a manifesto for software analytics. The report also includes the abstracts of the talks presented at the seminar. Seminar June 22-27, 2014-http://www.dagstuhl.de/14261 1998 ACM Subject Classification D.2 Software Engineering
Transfer learning has been the subject of much recent research. In practice, that research means ... more Transfer learning has been the subject of much recent research. In practice, that research means that the models are unstable since they are continually revised whenever new data arrives. This paper offers a very simple “bellwether” transfer learner. Given N datasets, we find which one produces the best predictions on all the others. This “bellwether” dataset is then used for all subsequent predictions (when its predictions start failing, one may seek another bellwether). Bellwethers are interesting since they are very simple to find (wrap a for-loop around standard data miners). They simplify the task of making general policies in software engineering since as long as one bellwether remains useful, stable conclusions for N datasets can be achieved by reasoning over that bellwether. This paper shows that this bellwether approach works for multiple datasets from various domains in SE. From this, we conclude that (1) bellwether method is a useful (and simple) transfer learner; (2) Unl...
Before researchers rush to reason across all available data, they should first check if the infor... more Before researchers rush to reason across all available data, they should first check if the information is densest within some small region. We say this since, in 240 GitHub projects, we find that the information in that data “clumps” towards the earliest parts of the project. In fact, a defect prediction model learned from just the first 150 commits works as well, or better than state-of-the-art alternatives. Using just this early life cycle data, we can build models very quickly (using weeks, not months, of CPU time). Also, we can find simple models (with just two features) that generalize to hundreds of software projects. Based on this experience, we warn that prior work on generalizing software engineering defect prediction models may have needlessly complicated an inherently simple process. Further, prior work that focused on later-life cycle data now needs to be revisited since their conclusions were drawn from relatively uninformative regions. Replication note: all our data a...
There has been much recent interest in the application of deep learning neural networks in softwa... more There has been much recent interest in the application of deep learning neural networks in software engineering. Some researchers are worried that deep learning is being applied with insufficient critical ssessment. Hence, for one well-studied software analytics task (defect prediction), this paper compares deep learning versus prior-state-of-the-art results. Deep learning will outperform those prior results, but only after adjusting its hyperparameters using GHOST (Goal-oriented Hyperparameter Optimization for Scalable Training). For defect prediction, GHOST terminates in just a few minutes and scales to larger data sets; i.e. it is practical to tune deep learning tuning for defect prediction. Hence this paper recommends deep learning for defect prediction, but only adjusting its goal predicates and tuning its hyperparameters (using some hyperparameter optimization tool, like GHOST)
Despite decades of research, SE lacks widely accepted models (that offer precise quantitative pre... more Despite decades of research, SE lacks widely accepted models (that offer precise quantitative predictions) about what factors most influence software quality. This paper provides a “good news” result that such general models can be generated using a new transfer learning framework called “GENERAL”. Given a tree of recursively clustered projects (using project meta-data), GENERAL promotes a model upwards if it performs best in the lower clusters (stopping when the promoted model performs worse than the models seen at a lower level). The number of models found by GENERAL is minimal: one for defect prediction (756 projects) and less than a dozen for project health (1628 projects). Hence, via GENERAL, it is possible to make conclusions that hold across hundreds of projects at a time. Further, the models produced in this manner offer predictions that perform as well or better than prior state-of-the-art. To the best of our knowledge, this is the largest demonstration of the generalizabil...
BACKGROUND: Given many possible changes to a software project, which ones are recommended? AIM: T... more BACKGROUND: Given many possible changes to a software project, which ones are recommended? AIM: To comparatively assess different decision procedures for recommending project changes. METHOD: We search for project recommendations within data from eight projects using various AI tools: six model-based methods and one instance-based method called W2. Results were assessed by comparing effort, defects, development time values in the raw data versus the subset of the data selected by those recommendations. RESULTS: In the majority case, significantly large reductions on effort, defects and development time were achieved. Further, W2 performed as well, or better, than any other methods in this study. W2 does not rely on an underlying model of software process so it does not demand that domain data be expressed in the terminology of that model. Hence, it can be quickly adapted to a new domain and easy to maintain (just add more instances). CONCLUSION: We recommend instance-based methods s...
Uploads
Papers by Tim Menzies