Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
A heuristic search approach to solving the software clustering problem
Publisher:
  • Drexel University
  • Philadelphia, PA
  • United States
ISBN:978-0-493-52606-5
Order Number:AAI3039424
Pages:
262
Bibliometrics
Skip Abstract Section
Abstract

Most interesting software systems are large and complex, and as a consequence, understanding their structure is difficult. One of the reasons for this complexity is that source code contains many entities (e.g., classes, modules) that depend on each other in intricate ways (e.g., procedure calls, variable references). Additionally, once a software engineer understands a system's structure, it is difficult to preserve this understanding, because the structure tends to change during maintenance.

Research into the software clustering problem has proposed several approaches to deal with the above issue by defining techniques that partition the structure of a software system into subsystems (clusters). Subsystems are collections of source code resources that exhibit similar features, properties or behaviors. Because there are far fewer subsystems than modules, studying the subsystem structure is easier than trying to understand the system by analyzing the source code manually.

Our research addresses several aspects of the software clustering problem. Specifically, we created several heuristic search algorithms that automatically cluster the source code into subsystems. We implemented our clustering algorithms in a tool named Bunch, and conducted extensive evaluation via case studies and experiments. Bunch also includes a variety of services to integrate user knowledge into the clustering process, and to help users navigate through complex system structures manually.

Since the criteria used to decompose the structure of a software system into subsystems vary across different clustering algorithms, mechanisms that can compare different clustering results objectively are needed. To address this need we first examined two techniques that have been used to measure the similarity between system decompositions, and then created two new similarity measurements to overcome some of the problems that we discovered with the existing measurements.

Similarity measurements enable the results of clustering algorithms to be compared to each other, and preferably to be compared to an agreed upon “benchmark” standard. Since benchmark standards are not documented for most systems, we created another tool, called CRAFT, that derives a “reference decomposition” automatically by exploiting similarities in the results produced by several different clustering algorithms.

Cited By

  1. ACM
    Janvattanavong P and Muenchaisri P Software Remodularization Using Tabu Search Proceedings of the 2020 International Conference on Internet Computing for Science and Engineering, (25-29)
  2. Kramer H, Uchoa E, Fampa M, Köhler V and Vanderbeck F (2016). Column generation approaches for the software clustering problem, Computational Optimization and Applications, 64:3, (843-864), Online publication date: 1-Jul-2016.
  3. ACM
    Jeet K and Dhir R (2015). Software Architecture Recovery using Genetic Black Hole Algorithm, ACM SIGSOFT Software Engineering Notes, 40:1, (1-5), Online publication date: 6-Feb-2015.
  4. Shtern M and Tzerpos V (2012). Clustering methodologies for software engineering, Advances in Software Engineering, 2012, (1-1), Online publication date: 1-Jan-2012.
  5. ACM
    Harman M, Mansouri S and Zhang Y (2012). Search-based software engineering, ACM Computing Surveys, 45:1, (1-61), Online publication date: 1-Nov-2012.
  6. Bavota G, Carnevale F, De Lucia A, Di Penta M and Oliveto R Putting the developer in-the-loop Proceedings of the 4th international conference on Search Based Software Engineering, (75-89)
  7. ACM
    Kontogiannis K, Wasfy A and Mankovskii S Event clustering for log reduction and run time system understanding Proceedings of the 2011 ACM Symposium on Applied Computing, (191-192)
  8. Faunes M, Kessentini M and Sahraoui H Deriving high-level abstractions from legacy software using example-driven clustering Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research, (188-199)
  9. ACM
    Faunes M, Kessentini M and Sahraoui H Software clustering by example Proceedings of the 13th annual conference companion on Genetic and evolutionary computation, (245-246)
  10. ACM
    Hall M, McMinn P and Walkinshaw N Superstate identification for state machines using search-based clustering Proceedings of the 12th annual conference on Genetic and evolutionary computation, (1381-1388)
  11. ACM
    Dietrich J, Yakovlev V, McCartin C, Jenson G and Duchrow M Cluster analysis of Java dependency graphs Proceedings of the 4th ACM symposium on Software visualization, (91-94)
  12. O'Keeffe M and Cinnéide M (2008). Search-based refactoring: an empirical study, Journal of Software Maintenance and Evolution: Research and Practice, 20:5, (345-364), Online publication date: 1-Sep-2008.
  13. ACM
    O'Keeffe M and Cinneide M Getting the most from search-based refactoring Proceedings of the 9th annual conference on Genetic and evolutionary computation, (1114-1120)
  14. Harman M The Current State and Future of Search Based Software Engineering 2007 Future of Software Engineering, (342-357)
  15. Maqbool O and Babri H (2007). Hierarchical Clustering for Software Architecture Recovery, IEEE Transactions on Software Engineering, 33:11, (759-780), Online publication date: 1-Nov-2007.
  16. Mitchell B and Mancoridis S (2006). On the Automatic Modularization of Software Systems Using the Bunch Tool, IEEE Transactions on Software Engineering, 32:3, (193-208), Online publication date: 1-Mar-2006.
  17. Kazem A and Lotfi S A modified genetic algorithm for software clustering problem Proceedings of the 6th WSEAS International Conference on Applied Informatics and Communications, (306-311)
  18. ACM
    Cohen M, Kooi S and Srisa-an W Clustering the heap in multi-threaded applications for improved garbage collection Proceedings of the 8th annual conference on Genetic and evolutionary computation, (1901-1908)
  19. ACM
    Harman M, Swift S and Mahdavi K An empirical study of the robustness of two module clustering fitness functions Proceedings of the 7th annual conference on Genetic and evolutionary computation, (1029-1036)
  20. Mitchell B and Mancoridis S Modeling the search landscape of metaheuristic software clustering algorithms Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII, (2499-2510)
  21. Mahdavi K, Harman M and Hierons R Finding building blocks for software clustering Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII, (2513-2514)
Contributors
  • Drexel University
  • Drexel University College of Engineering

Recommendations