Abstract
During a long maintenance period, software projects experience architectural erosion and drift, making maintenance tasks more challenging to perform for software engineers unfamiliar with the code base. This paper presents a framework that assists software engineers in recovering a software project’s architecture from its source code. The architectural recovery process is an iterative one that combines clustering based on contextual and structural information in the code base with incremental developer feedback. This process converges when the developer is satisfied with the proposed decomposition of the software, and, as an additional benefit, the framework becomes tuned to aid future evolution of the project. The paper provides both analytic and empirical evaluations of the obtained results; experimental results show a reasonably superior performance of our framework over alternative conventional methods. The proposed framework utilizes a novel compartmentalization technique Coordinated Clustering of Heterogeneous Datasets (CCHD) that relies on contextual and structural information in the code base, but, unlike most previous approaches, does not require specific weights for each information type, which allows it to adapt to different project types and domains.
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
In this paper, we use class to refer to the programming language context of the word, rather than to a collection or category.
References
Andritsos, P., Tzerpos, V.: Information-theoretic software clustering. IEEE Trans. Softw. Eng. 31(2), 150–165 (2005)
Bae, E., Bailey, J.: Coala: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: Proceedings of the Sixth International Conference on Data Mining (ICDM’06), IEEE, pp 53–62 (2006)
Banerjee, A., Dhillon, I., Ghosh, J., Merugu, S., Modha, D.: A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. In: Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining (KDD’04), pp. 509–514 (2004)
Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications. CRC Press, Boca Raton (2008)
Bauer, M., Trifu, M.: Architecture-aware Adaptive Clustering of OO Systems. In: Proceedings of the 8th European Conference on Software Maintenance and Reengineering (CSMR’04), pp. 3–14 (2004)
Bavota, G., Carnevale, F., Lucia, A., Penta, M., Oliveto, R.: Putting the developer in-the-loop: an interactive GA for software re-modularization. In: Proceedings of the 4th International Symposium on Search Based Software Engineering (SSBSE’12), pp. 75–89 (2012)
Bavota, G., Lucia, A., Marcus, A., Oliveto, R.: Using structural and semantic measures to improve software modularization. Empir. Softw. Eng. 18(5), 901–932 (2013)
Berkopec, A.: HyperQuick algorithm for discrete hypergeometric distribution. J. Discrete Algorithms 5(2), 341–347 (2007)
Böhm, C., Faloutsos, C., Pan, J., Plant, C.: Robust information-theoretic clustering. In: Proceedings of the 12th International Conference on Knowledge Discovery and Data Mining (KDD’06), pp. 65–75 (2006)
Cai, Y., Iannuzzi, D., Wong, S.: Leveraging design structure matrices in software design education. In: Proceedings of the 24th IEEE-CS Conference on Software Engineering Education and Training (CSEET’11). IEEE, pp. 179–188 (2011)
Cai, Y., Wang, H., Wong, S., Wang, L.: Leveraging design rules to improve software architecture recovery. In: Proceedings of the 9th International ACM Sigsoft Conference on Quality of Software Architectures, ACM, New York, NY, USA, QoSA’13, pp. 133–142. doi:10.1145/2465478.2465480 (2013)
Chaitin, G.: Algorithmic Information Theory. Wiley Online Library, New York (1982)
Christl, A., Koschke, R., Storey, M.: Equipping the reflexion method with automated clustering. In: 12th Working Conference on Reverse Engineering. IEEE, pp. 10–20 (2005)
Corazza, A., Di Martino, S., Scanniello, G.: A probabilistic based approach towards software system clustering. In: 2010 14th European Conference on Software Maintenance and Reengineering (CSMR). IEEE, pp. 88–96 (2010)
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.: Weighing lexical information for software clustering in the context of architecture recovery. Empir. Softw. Eng. 21(1), 72–103 (2016)
Cressie, N.: Statistics for Spatial Data, vol. 900. Wiley, New York (1993)
Dai, W., Xue, G., Yang, Q., Yu, Y.: Co-clustering based classification for out-of-domain documents. In: Proceedings of the 13th International Conference on Knowledge Discovery and Data Mining (KDD’07), pp. 210–219 (2007)
Dhillon, I.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD’01), pp. 269–274 (2001)
Dhillon, I., Guan, Y.: Information theoretic clustering of sparse cooccurrence data. In: Proceedings of the 3rd International Conference on Data Mining (ICDM’03), pp. 517–520 (2003)
Dhillon, I., Mallela, S., Modha, D.: Information-theoretic co-clustering. In: Proceedings of the 9th International Conference on Knowledge Discovery and Data Mining (KDD’03), pp. 89–98 (2003)
Dunn, J.: A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. Cybern. (1973)
Gao, B., Liu, T., Zheng, X., Cheng, Q., Ma, W.: Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In: Proceedings of the 11th International Conference on Knowledge Discovery in Data Mining (KDD’05), pp. 41–50 (2005)
Garcia, J., Popescu, D., Mattmann, C., Medvidovic, N., Cai, Y.: Enhancing architectural recovery using concerns. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering. IEEE Computer Society, pp. 552–555 (2011)
Garcia, J., Ivkovic, I., Medvidovic, N.: A comparative analysis of software architecture recovery techniques. In: Proceedings of the 28th International Conference on Automated Software Engineering (ICASE’13), pp. 486–496 (2013a)
Garcia, J., Krka, I., Mattmann, C., Medvidovic, N.: Obtaining ground-truth software architectures. In: Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, pp. 901–910 (2013b)
Gokcay, E., Principe, J.: Information theoretic clustering. Pattern Anal. Mach. Intell. 24(2), 158–171 (2002)
Hossain, M.S., Tadepalli, S., Watson, L., Davidson, I., Helm, R., Ramakrishnan, N.: Unifying dependent clustering and disparate clustering for non-homogeneous data. In: Proceedings of the 16th International Conference on Knowledge Discovery and Data Mining (KDD’10), pp. 593–602 (2010)
Hossain, M.S., Gresock, J., Edmonds, Y., Helm, R., Potts, M., Ramakrishnan, N.: Connecting the dots between pubmed abstracts. PLoS ONE 7(1), e29,509 (2012)
Hossain, M.S., Marwah, M., Shah, A., Watson, L., Ramakrishnan, N.: AutoLCA: a framework for sustainable redesign and assessment of products. ACM Trans. Intell. Syst. Technol. 5(2) (2014)
Koschke, R.: Atomic architectural component recovery for program understanding and evolution. In: IEEE International Conference on Software Maintenance. IEEE Computer Society, pp. 478–488 (2002)
Lutellier, T., Chollak, D., Garcia, J., Tan, L., Rayside, D., Medvidovic, N., Kroeger, R.: Comparing software architecture recovery techniques using accurate dependencies. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE). IEEE, vol. 2, pp. 69–78 (2015)
Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Mancoridis, S., Mitchell, B.S., Chen, Y., Gansner, E.R.: Bunch: a clustering tool for the recovery and maintenance of software system structures. In: IEEE International Conference on Software Maintenance, 1999 (ICSM’99). Proceedings. IEEE, pp. 50–59 (1999)
Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
Maqbool, O., Babri, H.A.: The weighted combined algorithm: a linkage algorithm for software clustering. In: Eighth European Conference on Software Maintenance and Reengineering, 2004. CSMR 2004. Proceedings. IEEE, pp. 15–24 (2004)
Mises, R., Pollaczek-Geiringer, H.: Praktische verfahren der gleichungsauflösung. ZAMM 9(1), 58–77 (1929)
Misra, J., Annervaz, K., Kaulgud, V., Sengupta, S., Titus, G.: Software Clustering: Unifying Syntactic and Semantic Features. Working Conference on Reverse Engineering, pp. 113–122 (2012)
Mohar, B.: Some Applications of Laplace Eigenvalues of Graphs. Springer, Berlin (1997)
Mohar, B., Alavi, Y.: The Laplacian Spectrum of Graphs. Graph Theory Comb. Appl. 2, 871–898 (1991)
Momtazpour, M., Butler, P., Hossain, M.S., Bozchalui, M., Ramakrishnan, N., Sharma, R.: Coordinated clustering algorithms to support charging infrastructure design for electric vehicles. In: Proceedings of the 18th International Conference on Knowledge Discovery and Data Mining (KDD UrbComp’12), pp. 126–133 (2012)
Na, S., Xumin, L., Yong, G.: Research on k-means clustering algorithm: an improved k-means clustering algorithm. In: In Proceedings of the 3rd International Symposium on Intelligent Information Technology and Security Informatics (IITSI’10). IEEE, pp. 63–67 (2010)
Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2, 849–856 (2002)
Pohlhausen, E.: Berechnung der eigenschwingungen statisch-bestimmter fachwerke. ZAMM 1(1), 28–42 (1921)
Praditwong, K., Harman, M., Yao, X.: Software module clustering as a multi-objective search problem. IEEE Trans. Softw. Eng. 37(2), 264–282 (2011)
Scanniello, G., Marcus, A.: Clustering support for static concept location in source code. In: Proceedings of the 19th International Conference on Program Comprehension (ICPC’11), pp. 1–10 (2011)
Shi, J., Malik, J.: Normalized cuts and image segmentation. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Shtern, M., Tzerpos, V.: Clustering methodologies for software engineering. Adv. Softw. Eng. (2012). doi:10.1155/2012/792024
Struyf, A., Hubert, M., Rousseeuw, P.: Clustering in an object-oriented environment. J. Stat. Softw. 1(4), 1–30 (1997)
Taylor, R.N., Medvidovic, N., Dashofy, E.M.: Software Architecture: Foundations, Theory, and Practice. Wiley, New York (2009)
Tzerpos, V., Holt, R.C.: Acdc: an algorithm for comprehension-driven clustering. In: 2013 20th Working Conference on Reverse Engineering (WCRE). IEEE Computer Society, pp. 258–258 (2000)
Wen, Z., Tzerpos, V.: An effectiveness measure for software clustering algorithms. In: 12th IEEE International Workshop on Program Comprehension, 2004. Proceedings. IEEE, pp. 194–203 (2004)
Yang, C., Zhou, J.: HClustream: a novel approach for clustering evolving heterogeneous data stream. In: Proceedings of the 6th International Conference on Data Mining (ICDM’03), pp. 682–688 (2006)
Yoon, H., Ahn, S., Lee, S., Cho, S., Kim, J.: Heterogeneous clustering ensemble method for combining different cluster results. Data Min. Biomed. Appl. 3916, 82–92 (2006)
Yue, J., Clayton, M.: A similarity measure based on species proportions. Commun. Stat. Theory Methods 34(11), 2123–2131 (2005)
Zheng, F., Webb, G.I.: A comparative study of semi-naive Bayes methods in classification learning. In: Proceedings of the Fourth Australasian Data Mining Conference (AusDM05), Citeseer, pp. 141–156 (2005)
Zhu, J., Huang, J., Zhou, D., Yin, Z., Zhang, G., He, Q.: Software architecture recovery through similarity-based graph clustering. Int. J. Softw. Eng. Knowl. Eng. 23(04), 559–586 (2013)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Naim, S.M., Damevski, K. & Hossain, M.S. Reconstructing and evolving software architectures using a coordinated clustering framework. Autom Softw Eng 24, 543–572 (2017). https://doi.org/10.1007/s10515-017-0211-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10515-017-0211-8