research-article

Normalizing source code vocabulary to support program comprehension and software quality

Author:

Latifa GuerroujAuthors Info & Claims

ICSE '13: Proceedings of the 2013 International Conference on Software Engineering

Pages 1385 - 1388

Published: 18 May 2013 Publication History

Abstract

The literature reports that source code lexicon plays a paramount role in program comprehension, especially when software documentation is scarce, outdated or simply not available. In source code, a significant proportion of vocabulary can be either acronyms and-or abbreviations or concatenation of terms that can not be identified using consistent mechanisms such as naming conventions. It is, therefore, essential to disambiguate concepts conveyed by identifiers to support program comprehension and reap the full benefit of Information Retrieval-based techniques (e.g., feature location and traceability) whose linguistic information (i.e., source code identifiers and comments) used across all software artifacts (e.g., requirements, design, change requests, tests, and source code) must be consistent. To this aim, we propose source code vocabulary normalization approaches that exploit contextual information to align the vocabulary found in the source code with that found in other software artifacts. We were inspired in the choice of context levels by prior works and by our findings. Normalization consists of two tasks: splitting and expansion of source code identifiers. We also investigate the effect of source code vocabulary normalization approaches on software maintenance tasks. Results of our evaluation show that our contextual-aware techniques are accurate and efficient in terms of computation time than state of the art alternatives. In addition, our findings reveal that feature location techniques can benefit from vocabulary normalization approaches when no dynamic information is available.

References

[1]

F. Deißenböck and M. Pizka, “Concise and consistent naming,” in Proc. of the International Workshop on Program Comprehension (IWPC), May 2005.

Digital Library

[2]

A. Takang, P. A. Grubb, and R. D. Macredie, “The effects of comments and identifier names on program comprehensibility: an experiential study,” Journal of Program Languages, vol. 4, no. 3, pp. 143–167, 1996.

[3]

B. Caprile and P. Tonella, “Nomen est omen: Analyzing the language of function identifiers,” in Proc. of the Working Conference on Reverse Engineering (WCRE), Atlanta Georgia USA, October 1999, pp. 112– 122.

Digital Library

[4]

D. Lawrie, C. Morrell, H. Feild, and D. Binkley, “Effective identifier names for comprehension and memory,” Innovations in Systems and Software Engineering, vol. 3, no. 4, pp. 303–318, 2007.

[5]

——, “What’s in a name? a study of identifiers,” in Proceedings of 14th IEEE International Conference on Program Comprehension. Athens, Greece: IEEE CS Press, 2006, pp. 3–12.

Digital Library

[6]

G. Antoniol, G. Canfora, G. Casazza, A. D. Lucia, and E. Merlo, “Recovering traceability links between code and documentation,” IEEE Trans. on Software Engineering, vol. 28, pp. 970–983, Oct 2002.

Digital Library

[7]

J. I. Maletic, G. Antoniol, J. Cleland-Huang, and J. H. Hayes, “3rd international workshop on traceability in emerging forms of software engineering (tefse 2005).” in ASE, 2005, p. 462.

Digital Library

[8]

A. Marcus and J. I. Maletic, “Recovering documentation-to-source-code traceability links using latent semantic indexing.” in Proceedings of the International Conference on Software Engineering, 2003, pp. 125–137.

Digital Library

[9]

A. Marcus, D. Poshyvanyk, and R. Ferenc, “Using the conceptual cohesion of classes for fault prediction in object-oriented systems,” IEEE Transactions on Software Engineering, vol. 34, no. 2, pp. 287–300, 2008.

Digital Library

[10]

D. Poshyvanyk and A. Marcus, “The conceptual coupling metrics for object-oriented systems,” in Proceedings of 22nd IEEE International Conference on Software Maintenance. Philadelphia Pennsylvania USA: IEEE CS Press, 2006, pp. 469 – 478.

Digital Library

[11]

B. Caprile and P. Tonella, “Restructuring program identifier names,” in Proc. of the International Conference on Software Maintenance (ICSM), 2000, pp. 97–107.

Digital Library

[12]

E. Merlo, I. McAdam, and R. D. Mori, “Feed-forward and recurrent neural networks for source code informal information analysis,” Journal of Software Maintenance, vol. 15, no. 4, pp. 205–244, 2003.

Digital Library

[13]

E. Enslen, E. Hill, L. L. Pollock, and K. Vijay-Shanker, “Mining source code to automatically split identifiers for software analysis,” in Proceedings of the 6th International Working Conference on Mining Software Repositories, MSR 2009, Vancouver, BC, Canada, May 16-17, 2009, 2009, pp. 71–80.

Digital Library

[14]

D. Lawrie, D. Binkley, and C. Morrell, “Normalizing source code vocabulary,” in Proc. of the Working Conference on Reverse Engineering (WCRE), 2010, pp. 112–122.

Digital Library

[15]

D. Lawrie and D. Binkley, “Expanding identifiers to normalize source code vocabulary,” in Proc. of the International Conference on Software Maintenance (ICSM), 2011, pp. 113–122.

Digital Library

[16]

L. Guerrouj, M. D. Penta, G. Antoniol, and Y. G. Guéhéneuc, “Tidier: An identifier splitting approach using speech recognition techniques,” Journal of Software Maintenance - Research and Practice, p. 31, 2011.

[17]

L. Guerrouj, P.Galinier, Y. G. Guéhéneuc, G. Antoniol, and M. Di Penta, “Tris: A fast and accurate identifiers splitting and expansion algorithm,” in Proc. of the Working Conference on Reverse Engineering (WCRE), Kingston, 2012, pp. 103–112.

Digital Library

[18]

B. Dit, L. Guerrouj, D. Poshyvanyk, and G. Antoniol, “Can better identifier splitting techniques help feature location?” in Proc. of the International Conference on Program Comprehension (ICPC), Kingston, 2011, pp. 11–20.

Digital Library

[19]

E. Soloway, J. Bonar, and K. Ehrlich, “Cognitive strategies and looping constructs: an empirical study,” Commun. ACM, vol. 26, no. 11, pp. 853–860, 1983.

Digital Library

[20]

A. von Mayrhauser and A. M. Vans, “Program comprehension during software maintenance and evolution,” Computer, vol. 28, no. 8, pp. 44– 55, 1995.

Digital Library

[21]

N. Anquetil and T. Lethbridge, “Assessing the relevance of identifier names in a legacy software system,” in Proceedings of CASCON, December 1998, pp. 213–222.

Digital Library

[22]

D. Binkley, M. Davis, D. Lawrie, and C. Morrell, “To camelcase or under score,” in The 17th IEEE International Conference on Program Comprehension, ICPC 2009, Vancouver, British Columbia, Canada, May 17-19, 2009. IEEE Computer Society, 2009, pp. 158–167.

[23]

M.-A. D. Storey, A Cognitive Framework For Describing And Evaluating Software Exploration Tools. PhD thesis Simon Fraser University, 1998.

Digital Library

[24]

M. P. Robillard, W. Coelho, and G. C. Murphy, “How effective developers investigate source code: An exploratory study.” IEEE Trans. Software Eng., vol. 30, no. 12, pp. 889–903, 2004.

Digital Library

[25]

M. Kersten and G. C. Murphy, “Using task context to improve programmer productivity,” in SIGSOFT ’06/FSE-14: Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering. Portland, Oregon, USA: ACM Press, 2006, pp. 1–11.

Digital Library

[26]

J. Sillito, G. C. Murphy, and K. D. Volder, “Asking and answering questions during a programming change task,” IEEE Transactions on Software Engineering, vol. 34, pp. 434–451, 2008.

Digital Library

Cited By

Severo NJob R(2019)An approach to detect false design patternsProceedings of the XIII Brazilian Symposium on Software Components, Architectures, and Reuse10.1145/3357141.3357146(63-72)Online publication date: 23-Sep-2019
https://dl.acm.org/doi/10.1145/3357141.3357146
Daka ECampos JFraser GDorn JWeimer WDi Nitto EHarman MHeymans P(2015)Modeling readability to improve unit testsProceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering10.1145/2786805.2786838(107-118)Online publication date: 30-Aug-2015
https://dl.acm.org/doi/10.1145/2786805.2786838

Index Terms

Normalizing source code vocabulary to support program comprehension and software quality

Recommendations

Normalizing Source Code Vocabulary
WCRE '10: Proceedings of the 2010 17th Working Conference on Reverse Engineering

Information Retrieval (IR) based tools complement traditional static and dynamic analysis tools by exploiting the natural language found within a program's text. Tools incorporating IR have tackled problems, such as feature location, that previously ...
Modeling source code in bimodal for program comprehension
Abstract
Source code is an intermediary through which humans communicate with computer systems. It contains a large amount of domain knowledge which can be learned by statistical models. Furthermore, this knowledge can be used to build software engineering ...
Code Reuse in Open Source Software

Code reuse is a form of knowledge reuse in software development that is fundamental to innovation in many fields. However, to date there has been no systematic investigation of code reuse in open source software projects. This study uses quantitative ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '13: Proceedings of the 2013 International Conference on Software Engineering

May 2013

1561 pages

ISBN:9781467330763

General Chair:
David Notkin,
Program Chairs:
Betty H. C. Cheng,
Klaus Pohl

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

IEEE Press

Publication History

Published: 18 May 2013

Check for updates

Qualifiers

Research-article

Conference

ICSE '13

Sponsor:

SIGSOFT

ICSE '13: 35th International Conference on Software Engineering

May 18 - 26, 2013

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
265
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Severo NJob R(2019)An approach to detect false design patternsProceedings of the XIII Brazilian Symposium on Software Components, Architectures, and Reuse10.1145/3357141.3357146(63-72)Online publication date: 23-Sep-2019
https://dl.acm.org/doi/10.1145/3357141.3357146
Daka ECampos JFraser GDorn JWeimer WDi Nitto EHarman MHeymans P(2015)Modeling readability to improve unit testsProceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering10.1145/2786805.2786838(107-118)Online publication date: 30-Aug-2015
https://dl.acm.org/doi/10.1145/2786805.2786838

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents