Most interesting software systems are large and complex, and as a consequence, understanding their structure is difficult. One of the reasons for this complexity is that source code contains many entities (e.g., classes, modules) that depend on each other in intricate ways (e.g., procedure calls, variable references). Additionally, once a software engineer understands a system's structure, it is difficult to preserve this understanding, because the structure tends to change during maintenance.
Research into the software clustering problem has proposed several approaches to deal with the above issue by defining techniques that partition the structure of a software system into subsystems (clusters). Subsystems are collections of source code resources that exhibit similar features, properties or behaviors. Because there are far fewer subsystems than modules, studying the subsystem structure is easier than trying to understand the system by analyzing the source code manually.
Our research addresses several aspects of the software clustering problem. Specifically, we created several heuristic search algorithms that automatically cluster the source code into subsystems. We implemented our clustering algorithms in a tool named Bunch, and conducted extensive evaluation via case studies and experiments. Bunch also includes a variety of services to integrate user knowledge into the clustering process, and to help users navigate through complex system structures manually.
Since the criteria used to decompose the structure of a software system into subsystems vary across different clustering algorithms, mechanisms that can compare different clustering results objectively are needed. To address this need we first examined two techniques that have been used to measure the similarity between system decompositions, and then created two new similarity measurements to overcome some of the problems that we discovered with the existing measurements.
Similarity measurements enable the results of clustering algorithms to be compared to each other, and preferably to be compared to an agreed upon “benchmark” standard. Since benchmark standards are not documented for most systems, we created another tool, called CRAFT, that derives a “reference decomposition” automatically by exploiting similarities in the results produced by several different clustering algorithms.
Cited By
- Janvattanavong P and Muenchaisri P Software Remodularization Using Tabu Search Proceedings of the 2020 International Conference on Internet Computing for Science and Engineering, (25-29)
- Kramer H, Uchoa E, Fampa M, Köhler V and Vanderbeck F (2016). Column generation approaches for the software clustering problem, Computational Optimization and Applications, 64:3, (843-864), Online publication date: 1-Jul-2016.
- Jeet K and Dhir R (2015). Software Architecture Recovery using Genetic Black Hole Algorithm, ACM SIGSOFT Software Engineering Notes, 40:1, (1-5), Online publication date: 6-Feb-2015.
- Shtern M and Tzerpos V (2012). Clustering methodologies for software engineering, Advances in Software Engineering, 2012, (1-1), Online publication date: 1-Jan-2012.
- Harman M, Mansouri S and Zhang Y (2012). Search-based software engineering, ACM Computing Surveys, 45:1, (1-61), Online publication date: 1-Nov-2012.
- Bavota G, Carnevale F, De Lucia A, Di Penta M and Oliveto R Putting the developer in-the-loop Proceedings of the 4th international conference on Search Based Software Engineering, (75-89)
- Kontogiannis K, Wasfy A and Mankovskii S Event clustering for log reduction and run time system understanding Proceedings of the 2011 ACM Symposium on Applied Computing, (191-192)
- Faunes M, Kessentini M and Sahraoui H Deriving high-level abstractions from legacy software using example-driven clustering Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research, (188-199)
- Faunes M, Kessentini M and Sahraoui H Software clustering by example Proceedings of the 13th annual conference companion on Genetic and evolutionary computation, (245-246)
- Hall M, McMinn P and Walkinshaw N Superstate identification for state machines using search-based clustering Proceedings of the 12th annual conference on Genetic and evolutionary computation, (1381-1388)
- Dietrich J, Yakovlev V, McCartin C, Jenson G and Duchrow M Cluster analysis of Java dependency graphs Proceedings of the 4th ACM symposium on Software visualization, (91-94)
- O'Keeffe M and Cinnéide M (2008). Search-based refactoring: an empirical study, Journal of Software Maintenance and Evolution: Research and Practice, 20:5, (345-364), Online publication date: 1-Sep-2008.
- O'Keeffe M and Cinneide M Getting the most from search-based refactoring Proceedings of the 9th annual conference on Genetic and evolutionary computation, (1114-1120)
- Harman M The Current State and Future of Search Based Software Engineering 2007 Future of Software Engineering, (342-357)
- Maqbool O and Babri H (2007). Hierarchical Clustering for Software Architecture Recovery, IEEE Transactions on Software Engineering, 33:11, (759-780), Online publication date: 1-Nov-2007.
- Mitchell B and Mancoridis S (2006). On the Automatic Modularization of Software Systems Using the Bunch Tool, IEEE Transactions on Software Engineering, 32:3, (193-208), Online publication date: 1-Mar-2006.
- Kazem A and Lotfi S A modified genetic algorithm for software clustering problem Proceedings of the 6th WSEAS International Conference on Applied Informatics and Communications, (306-311)
- Cohen M, Kooi S and Srisa-an W Clustering the heap in multi-threaded applications for improved garbage collection Proceedings of the 8th annual conference on Genetic and evolutionary computation, (1901-1908)
- Harman M, Swift S and Mahdavi K An empirical study of the robustness of two module clustering fitness functions Proceedings of the 7th annual conference on Genetic and evolutionary computation, (1029-1036)
- Mitchell B and Mancoridis S Modeling the search landscape of metaheuristic software clustering algorithms Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII, (2499-2510)
- Mahdavi K, Harman M and Hierons R Finding building blocks for software clustering Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII, (2513-2514)
Index Terms
- A heuristic search approach to solving the software clustering problem
Recommendations
Evolution-Based Tabu Search Approach to Automatic Clustering
Traditional clustering algorithms (e.g., the K-means algorithm and its variants) are used only for a fixed number of clusters. However, in many clustering applications, the actual number of clusters is unknown beforehand. The general solution to this ...
Solving document clustering problem through meta heuristic algorithm: black hole
ICMLSC '18: Proceedings of the 2nd International Conference on Machine Learning and Soft ComputingThe paper proposed a soft computing approach to solve document clustering problem. Document clustering is a specialized clustering problem in which textual documents autonomously segregated to a number of identifiable, subject homogenous and smaller sub-...
Local search approach for the pairwise constrained clustering problem
SoICT '16: Proceedings of the 7th Symposium on Information and Communication TechnologyThe pairwise constrained clustering is the problem of partitioning a set of data points into clusters when we know in advance that some pairs of points should be in the same cluster and some pairs should not. Previous studies on this problem can be ...