Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/381473.381484acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
Article

Supporting program comprehension using semantic and structural information

Published: 01 July 2001 Publication History

Abstract

The paper focuses on investigating the combined use of semantic and structural information of programs to support the comprehension tasks involved in the maintenance and reengineering of software systems. Here, semantic refers to the domain specific issues (both problem and development domains) of a software system. The other dimension, structural, refers to issues such as the actual syntactic structure of the program along with the control and data flow that it represents. An advanced information retrieval method, latent semantic indexing, is used to define a semantic similarity measure between software components. Components within a software system are then clustered together using this similarity measure. Simple structural information (i.e., file organization) of the software system is then used to assess the semantic cohesion of the clusters and files, with respect to each other. The measures are formally defined for general application. A set of experiments is presented which demonstrates how these measures can assist in the understanding of a nontrivial software system, namely a version of NCSA Mosaic.

References

[1]
Abd El, H. and Basili, V., "A Knowledge-Based Approach to the Analysis of Loops", IEEE Transactions on Software Engineering, vol. 22, no. 5, May 1996, pp. 339-360.
[2]
Anquetil, N., "A Comparison of Graphs of Concept for Reverse Engineering", in Proceedings of IWPC'00, Limerick, Ireland, June 2000.
[3]
Anquetil, N. and Lethbridge, T., "Extracting Concepts from File Names; a New File Clustering Criterion", in Proceedings of 20th International Conference on Software Engineering (ICSE'98), 1998.
[4]
Anquetil, N. and Lethbridge, T., "Experiments with Clustering as a Software Remodularization Method", in Proceedings of 6th Working Conference on Reverse Engineering, 1999.
[5]
Berry, M. W., "Large Scale Singular Value Computations", International Journal of Supercomputer Applications,vol.6, 1992, pp. 13-49.
[6]
Berry,M.W.,Dumais,S.T.,and O'Brien,G.W.,"Using Linear Algebra for Intelligent Information Retrieval", SIAM: Review, vol. 37, no. 4, 1995, pp. 573-595.
[7]
Biggerstaff, T. J., Mitbander, B. G., and Webster, D. E., "Program Understanding and the Concept Assignment Problem", CACM, vol. 37, no. 5, May 1994, pp. 72-82.
[8]
Canfora, G. and al., e., "Experiments in Identifying Reusable Abstract Data Types in Program Code", in Proceedings of IEEE 2nd Workshop on Program Comprehension, 1993, pp. 36-45.
[9]
Deerwester,S.,Dumais,S.T.,Furnas,G.W.,Landauer,T. K., and Harshman, R., "Indexing by Latent Semantic Analysis", Journal of the American Society for Information Science,vol. 41, 1990, pp. 391-407.
[10]
Duda, R. O. and Hart, P. E., Pattern Classification and Scene Analysis, Wiley, 1973.
[11]
Dumais, S. T., "Latent Semantic Indexing (LSI) and TREC-2", in Proceedings of The Second Text Retrieval Conference (TREC-2), March 1994, pp. 105-115.
[12]
Etzkorn, L. H., Bowen, L. L., and Davis, C. G., "An Approach to Program Understanding by Natural Language Understanding", Natural Language Engineering, vol. 5, no. 1, 1999, pp. 1-18.
[13]
Etzkorn, L. H. and Davis, C. G., "Automatically Identifying Reusable OO Legacy Code", IEEE Computer, vol. 30, no. 10, October 1997, pp. 66-72.
[14]
Faloutsos, C. and Oard, D. W., "A Survey of Information Retrieval and Filtering Methods", University of Maryland, Technical Report CS-TR-3514, August 1995.
[15]
Fischer, B., "Specification-Based Browsing of Software Component Libraries", in Proceedings of 13th ASE, 1998, pp. 74-83.
[16]
Frakes, W., "Software Reuse Through Information Retrieval", in Proceedings of 20th Annual HICSS, Kona, HI, Jan. 1987, pp. 530-535.
[17]
Girard, J. F. and Koschke, R., "A Comparison of Abstract Data Type and Objects Recovery Techniques", Journal Science of Computer Programming, Elsevier 1999.
[18]
Girard, J. F., Koschke, R., and Schied, G., "Comparison of Abstract Data Type and Abstract State Encapsulation Detection Techniques for Architectural Understanding", in Proceedings of Working Conference on Reverse Engineering, 1997, pp. 66-75.
[19]
Girard, J. F., Koschke, R., and Schied, G., "A Metric-Based Approach to Detect Abstract Data Types and State Encapsluation", Journal Automated Software Engineering,vol. 6, no. 4, October 1999.
[20]
Harandi, M. and Ning, J., "Knowledge-Based Program Anaylsis", IEEE Software, vol. 7, no. 1, January 1990, pp. 74- 81.
[21]
Hutchens, D. and Basili, V., "System Structure Analysis: Clustering With Data Bindings", IEEE Transactions on Software Engineering, vol. 11, no. 8, 1985, pp. 749-757.
[22]
Jolliffe,I.T.,Principal Component Analysis,Springer Verlag, 1986.
[23]
Kruskal, J. B., "On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem", Proc. Amer. Math. Soc., vol. 7, no. 1, 1956, pp. 48-50.
[24]
Lakhotia, A., "A Unified Framework for Expressing Software Subsystem Classification techniques", Journal of Systems and Software, vol. 36, March 1997, pp. 211-231.
[25]
Landauer, T. K. and Dumais, S. T., "A Solution to Plato's Problem: The Latent Semantic Analysis Theory of the Acquisition, Induction, and Representation of Knowledge", Psychological Review, vol. 104, no. 2, 1997, pp. 211-240.
[26]
LEDA, "The LEDA Manual Version R-3.7", LEDA Research, Webpage, Date Accessed: 4/29/1999, http://www.mpi-sb.mpg.de/LEDA/index.html, 1998.
[27]
Livadas, P. E. and Alden, S. D., "A Toolset for Program Understanding", in Proceedings of IEEE 2nd Workshop on Program Comprehension, 1993, pp. 110-118.
[28]
Maarek, Y. S., Berry, D. M., and Kaiser, G. E., "An Information Retrieval Approach for Automatically Constructing Software Libraries", IEEE Transactions on Software Engineering, vol. 17, no. 8, 1991, pp. 800-813.
[29]
Maarek, Y. S. and Smadja, F. A., "Full Text Indexing Based on Lexical Relations, an Application: Software Libraries", in Proceedings of SIGIR89, Cambridge, MA, June 1989, pp. 198-206.
[30]
Maletic, J. I. and Marcus, A., "Using Latent Semantic Analysis to Identify Similarities in Source Code to Support Program Understanding", in Proceedings of 12th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Vancouver, British Columbia, November 13-15 2000, pp. 46-53.
[31]
Maletic, J. I. and Reynolds, R. G., "A Tool to Support Knowledge Based Software Maintenance: The Software Service Bay", in Proceedings of The 6th IEEE International Conference on Tools with Artificial Intelligence, New Orleans LA, Nov. 6-9 1994, pp. 11-17.
[32]
Maletic, J. I. and Valluri, N., "Automatic Software Clustering via Latent Semantic Analysis", in Proceedings of 14th IEEE International Conference on Automated Software Engineering (ASE'99), Cocoa Beach Florida, October 1999, pp. 251-254.
[33]
Mancoridis, S., Mitchell, B. S., Rorres, C., Chen, Y., and Gansner, E. R., "Using Automatic Clustering to Produce High- Level Organization of Source Code", in Proceedings of 6th International Workshop on Program Comprehension (IWPC'98), Italy, June 1998.
[34]
Merlo, E., McAdam, I., and De Mori, R., "Source code informal information analysis using connectionist models", in Proceedings of Int'l Joint Conference on Artificial Intelligence (IJCAI'93), 1993, pp. 1339-1344.
[35]
Michail, A. and Notkin, D., "Assessing Software Libraries by Browsing Similar Classes, Functions and Relationships", in Proceedings of International Conference on Software Engineering, 1999.
[36]
Mosaic, "Mosaic Source Code v2.7b5", NCSA, ftp site, Date Accessed: 4/12/2000, ftp://ftp.ncsa.uiuc.edu/Mosaic/Unix/source/, 1996.
[37]
M~ller, H. A., Orgun, M. A., Tilley, S. R., and Uhl, J. S., "A Reverse Engineering Approach to Subsystem Structure Identification", Software Maintenance: Research and Practice, vol. 5, no. 4, 1993, pp. 181-204.
[38]
Ning, J. Q., Engberts, A., and Kozaczynski, W., "Recovering Reusable Components from Legacy Systems", in Proceedings of Working Conference on Reverse Engineering, 1993.
[39]
Parnas, D. L., "Information Distribution Aspects of Design Methodology", in Information Processing 71, North-Holland, 1972.
[40]
Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery,B.P.,Numerical Recipes in C, The Art of Scientific Computing, Cambridge University Press, 1996.
[41]
Rich, C. and Waters, R. C., "The Programmer's Apprentice: A Research Overview", IEEE Computer, vol. 21, no. 11, November 1988, pp. 12-25.
[42]
Rist, R., "Plans in Program Design and Understanding", in Proceedings of Workshop Notes for AI & Automated Program Understanding, AAAI-92, San Jose CA 1992, pp. 98-102.
[43]
Salton, G., Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer, Addison-Wesley, 1989.
[44]
Schwanke, R. W., "An intelligent tool for re-engineering software modularity", in Proceedings of 13th International Conference on Software Engineering, 1991, pp. 83-92.
[45]
Soloway, E. and Ehrlich, K., "Empirical Studies of Programming Knowledge", IEEE Transactions on Software Engineering, vol. 10, no. 5, September 1984, pp. 595-609.
[46]
Strang, G., Linear Algebra and its Applications, 2nd ed., Academic Press, 1980.
[47]
Tanenbaum, A. and Woodhull, A., Operating Systems Design and Implementation, Prentice Hall, 1997.
[48]
Tversky, A., "Features of similarity", Psychological Review, vol. 84, no. 4, July 1977.
[49]
Wiggerts, T., "Using clustering algorithms in legacy systems remodularization", in Proceedings of Working Conference on Reverse Engineering, 1997, pp. 33-43.

Cited By

View all
  • (2021)On the Naming of MethodsProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00061(587-599)Online publication date: 22-May-2021
  • (2019)A neural model for generating natural language summaries of program subroutinesProceedings of the 41st International Conference on Software Engineering10.1109/ICSE.2019.00087(795-806)Online publication date: 25-May-2019
  • (2019)A novel neural source code representation based on abstract syntax treeProceedings of the 41st International Conference on Software Engineering10.1109/ICSE.2019.00086(783-794)Online publication date: 25-May-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '01: Proceedings of the 23rd International Conference on Software Engineering
July 2001
844 pages
ISBN:0769510507

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 July 2001

Check for updates

Qualifiers

  • Article

Conference

ICSE01
Sponsor:
ICSE01: 23rd International Conference on Software Engineering
May 12 - 19, 2001
Ontario, Toronto, Canada

Acceptance Rates

ICSE '01 Paper Acceptance Rate 47 of 268 submissions, 18%;
Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2021)On the Naming of MethodsProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00061(587-599)Online publication date: 22-May-2021
  • (2019)A neural model for generating natural language summaries of program subroutinesProceedings of the 41st International Conference on Software Engineering10.1109/ICSE.2019.00087(795-806)Online publication date: 25-May-2019
  • (2019)A novel neural source code representation based on abstract syntax treeProceedings of the 41st International Conference on Software Engineering10.1109/ICSE.2019.00086(783-794)Online publication date: 25-May-2019
  • (2019)Towards consistency analysis between formal and informal software architecture artefactsProceedings of the 2nd International Workshop on Establishing a Community-Wide Infrastructure for Architecture-Based Software Engineering10.1109/ECASE.2019.00010(6-12)Online publication date: 27-May-2019
  • (2018)Hierarchical abstraction of execution traces for program comprehensionProceedings of the 26th Conference on Program Comprehension10.1145/3196321.3196343(86-96)Online publication date: 28-May-2018
  • (2018)A comparison of code similarity analysersEmpirical Software Engineering10.1007/s10664-017-9564-723:4(2464-2519)Online publication date: 1-Aug-2018
  • (2015)Unsupervised software categorization using bytecodeProceedings of the 2015 IEEE 23rd International Conference on Program Comprehension10.5555/2820282.2820315(229-239)Online publication date: 16-May-2015
  • (2015)Automated decomposition of build targetsProceedings of the 37th International Conference on Software Engineering - Volume 110.5555/2818754.2818772(123-133)Online publication date: 16-May-2015
  • (2015)Document Retrieval Metrics for Program UnderstandingProceedings of the 7th Annual Meeting of the Forum for Information Retrieval Evaluation10.1145/2838706.2838710(8-15)Online publication date: 4-Dec-2015
  • (2015)Clustering Student Programming Assignments to Multiply Instructor LeverageProceedings of the Second (2015) ACM Conference on Learning @ Scale10.1145/2724660.2728695(367-372)Online publication date: 14-Mar-2015
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media