A heuristic search approach to solving the software clustering problem

January 2002

Author:
Brian Scott Mitchell,
Adviser:
Spiros Mancoridis

Publisher:

Drexel University
Philadelphia, PA
United States

ISBN:978-0-493-52606-5

Order Number:AAI3039424

Pages:

262

Purchase on ProQuest

Bibliometrics

Abstract

Most interesting software systems are large and complex, and as a consequence, understanding their structure is difficult. One of the reasons for this complexity is that source code contains many entities (e.g., classes, modules) that depend on each other in intricate ways (e.g., procedure calls, variable references). Additionally, once a software engineer understands a system's structure, it is difficult to preserve this understanding, because the structure tends to change during maintenance.

Research into the software clustering problem has proposed several approaches to deal with the above issue by defining techniques that partition the structure of a software system into subsystems (clusters). Subsystems are collections of source code resources that exhibit similar features, properties or behaviors. Because there are far fewer subsystems than modules, studying the subsystem structure is easier than trying to understand the system by analyzing the source code manually.

Our research addresses several aspects of the software clustering problem. Specifically, we created several heuristic search algorithms that automatically cluster the source code into subsystems. We implemented our clustering algorithms in a tool named Bunch, and conducted extensive evaluation via case studies and experiments. Bunch also includes a variety of services to integrate user knowledge into the clustering process, and to help users navigate through complex system structures manually.

Since the criteria used to decompose the structure of a software system into subsystems vary across different clustering algorithms, mechanisms that can compare different clustering results objectively are needed. To address this need we first examined two techniques that have been used to measure the similarity between system decompositions, and then created two new similarity measurements to overcome some of the problems that we discovered with the existing measurements.

Similarity measurements enable the results of clustering algorithms to be compared to each other, and preferably to be compared to an agreed upon “benchmark” standard. Since benchmark standards are not documented for most systems, we created another tool, called CRAFT, that derives a “reference decomposition” automatically by exploiting similarities in the results produced by several different clustering algorithms.

Cited By

Contributors

Spiros Mancoridis
Drexel University
- Publication Years1992 - 2022
- Publication counts68
- Citation count692
- Available for Download21
- Downloads (cumulative)6,726
- Downloads (12 months)805
- Downloads (6 weeks)105
- Average Downloads per Article320
- Average Citation per Article10
View Full Profile
Brian Scott Mitchell
Drexel University College of Engineering
- Publication Years1998 - 2008
- Publication counts13
- Citation count452
- Available for Download3
- Downloads (cumulative)968
- Downloads (12 months)45
- Downloads (6 weeks)8
- Average Downloads per Article323
- Average Citation per Article35
View Full Profile

Index Terms

A heuristic search approach to solving the software clustering problem
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
      1. Heuristic function construction
  2. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Comments

Recommendations

Evolution-Based Tabu Search Approach to Automatic Clustering

Traditional clustering algorithms (e.g., the K-means algorithm and its variants) are used only for a fixed number of clusters. However, in many clustering applications, the actual number of clusters is unknown beforehand. The general solution to this ...
Read More
Solving document clustering problem through meta heuristic algorithm: black hole
ICMLSC '18: Proceedings of the 2nd International Conference on Machine Learning and Soft Computing

The paper proposed a soft computing approach to solve document clustering problem. Document clustering is a specialized clustering problem in which textual documents autonomously segregated to a number of identifiable, subject homogenous and smaller sub-...
Read More
Local search approach for the pairwise constrained clustering problem
SoICT '16: Proceedings of the 7th Symposium on Information and Communication Technology

The pairwise constrained clustering is the problem of partitioning a set of data points into clusters when we know in advance that some pairs of points should be in the same cluster and some pairs should not. Previous studies on this problem can be ...
Read More

Browse Theses

Sections

Cited By

Index Terms

Evolution-Based Tabu Search Approach to Automatic Clustering

Solving document clustering problem through meta heuristic algorithm: black hole

Local search approach for the pairwise constrained clustering problem

Sections

Cited By

Save to Binder

Index Terms

Recommendations

Evolution-Based Tabu Search Approach to Automatic Clustering

Solving document clustering problem through meta heuristic algorithm: black hole

Local search approach for the pairwise constrained clustering problem