Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2486788.2487012acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Normalizing source code vocabulary to support program comprehension and software quality

Published: 18 May 2013 Publication History

Abstract

The literature reports that source code lexicon plays a paramount role in program comprehension, especially when software documentation is scarce, outdated or simply not available. In source code, a significant proportion of vocabulary can be either acronyms and-or abbreviations or concatenation of terms that can not be identified using consistent mechanisms such as naming conventions. It is, therefore, essential to disambiguate concepts conveyed by identifiers to support program comprehension and reap the full benefit of Information Retrieval-based techniques (e.g., feature location and traceability) whose linguistic information (i.e., source code identifiers and comments) used across all software artifacts (e.g., requirements, design, change requests, tests, and source code) must be consistent. To this aim, we propose source code vocabulary normalization approaches that exploit contextual information to align the vocabulary found in the source code with that found in other software artifacts. We were inspired in the choice of context levels by prior works and by our findings. Normalization consists of two tasks: splitting and expansion of source code identifiers. We also investigate the effect of source code vocabulary normalization approaches on software maintenance tasks. Results of our evaluation show that our contextual-aware techniques are accurate and efficient in terms of computation time than state of the art alternatives. In addition, our findings reveal that feature location techniques can benefit from vocabulary normalization approaches when no dynamic information is available.

References

[1]
F. Deißenböck and M. Pizka, “Concise and consistent naming,” in Proc. of the International Workshop on Program Comprehension (IWPC), May 2005.
[2]
A. Takang, P. A. Grubb, and R. D. Macredie, “The effects of comments and identifier names on program comprehensibility: an experiential study,” Journal of Program Languages, vol. 4, no. 3, pp. 143–167, 1996.
[3]
B. Caprile and P. Tonella, “Nomen est omen: Analyzing the language of function identifiers,” in Proc. of the Working Conference on Reverse Engineering (WCRE), Atlanta Georgia USA, October 1999, pp. 112– 122.
[4]
D. Lawrie, C. Morrell, H. Feild, and D. Binkley, “Effective identifier names for comprehension and memory,” Innovations in Systems and Software Engineering, vol. 3, no. 4, pp. 303–318, 2007.
[5]
——, “What’s in a name? a study of identifiers,” in Proceedings of 14th IEEE International Conference on Program Comprehension. Athens, Greece: IEEE CS Press, 2006, pp. 3–12.
[6]
G. Antoniol, G. Canfora, G. Casazza, A. D. Lucia, and E. Merlo, “Recovering traceability links between code and documentation,” IEEE Trans. on Software Engineering, vol. 28, pp. 970–983, Oct 2002.
[7]
J. I. Maletic, G. Antoniol, J. Cleland-Huang, and J. H. Hayes, “3rd international workshop on traceability in emerging forms of software engineering (tefse 2005).” in ASE, 2005, p. 462.
[8]
A. Marcus and J. I. Maletic, “Recovering documentation-to-source-code traceability links using latent semantic indexing.” in Proceedings of the International Conference on Software Engineering, 2003, pp. 125–137.
[9]
A. Marcus, D. Poshyvanyk, and R. Ferenc, “Using the conceptual cohesion of classes for fault prediction in object-oriented systems,” IEEE Transactions on Software Engineering, vol. 34, no. 2, pp. 287–300, 2008.
[10]
D. Poshyvanyk and A. Marcus, “The conceptual coupling metrics for object-oriented systems,” in Proceedings of 22nd IEEE International Conference on Software Maintenance. Philadelphia Pennsylvania USA: IEEE CS Press, 2006, pp. 469 – 478.
[11]
B. Caprile and P. Tonella, “Restructuring program identifier names,” in Proc. of the International Conference on Software Maintenance (ICSM), 2000, pp. 97–107.
[12]
E. Merlo, I. McAdam, and R. D. Mori, “Feed-forward and recurrent neural networks for source code informal information analysis,” Journal of Software Maintenance, vol. 15, no. 4, pp. 205–244, 2003.
[13]
E. Enslen, E. Hill, L. L. Pollock, and K. Vijay-Shanker, “Mining source code to automatically split identifiers for software analysis,” in Proceedings of the 6th International Working Conference on Mining Software Repositories, MSR 2009, Vancouver, BC, Canada, May 16-17, 2009, 2009, pp. 71–80.
[14]
D. Lawrie, D. Binkley, and C. Morrell, “Normalizing source code vocabulary,” in Proc. of the Working Conference on Reverse Engineering (WCRE), 2010, pp. 112–122.
[15]
D. Lawrie and D. Binkley, “Expanding identifiers to normalize source code vocabulary,” in Proc. of the International Conference on Software Maintenance (ICSM), 2011, pp. 113–122.
[16]
L. Guerrouj, M. D. Penta, G. Antoniol, and Y. G. Guéhéneuc, “Tidier: An identifier splitting approach using speech recognition techniques,” Journal of Software Maintenance - Research and Practice, p. 31, 2011.
[17]
L. Guerrouj, P.Galinier, Y. G. Guéhéneuc, G. Antoniol, and M. Di Penta, “Tris: A fast and accurate identifiers splitting and expansion algorithm,” in Proc. of the Working Conference on Reverse Engineering (WCRE), Kingston, 2012, pp. 103–112.
[18]
B. Dit, L. Guerrouj, D. Poshyvanyk, and G. Antoniol, “Can better identifier splitting techniques help feature location?” in Proc. of the International Conference on Program Comprehension (ICPC), Kingston, 2011, pp. 11–20.
[19]
E. Soloway, J. Bonar, and K. Ehrlich, “Cognitive strategies and looping constructs: an empirical study,” Commun. ACM, vol. 26, no. 11, pp. 853–860, 1983.
[20]
A. von Mayrhauser and A. M. Vans, “Program comprehension during software maintenance and evolution,” Computer, vol. 28, no. 8, pp. 44– 55, 1995.
[21]
N. Anquetil and T. Lethbridge, “Assessing the relevance of identifier names in a legacy software system,” in Proceedings of CASCON, December 1998, pp. 213–222.
[22]
D. Binkley, M. Davis, D. Lawrie, and C. Morrell, “To camelcase or under score,” in The 17th IEEE International Conference on Program Comprehension, ICPC 2009, Vancouver, British Columbia, Canada, May 17-19, 2009. IEEE Computer Society, 2009, pp. 158–167.
[23]
M.-A. D. Storey, A Cognitive Framework For Describing And Evaluating Software Exploration Tools. PhD thesis Simon Fraser University, 1998.
[24]
M. P. Robillard, W. Coelho, and G. C. Murphy, “How effective developers investigate source code: An exploratory study.” IEEE Trans. Software Eng., vol. 30, no. 12, pp. 889–903, 2004.
[25]
M. Kersten and G. C. Murphy, “Using task context to improve programmer productivity,” in SIGSOFT ’06/FSE-14: Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering. Portland, Oregon, USA: ACM Press, 2006, pp. 1–11.
[26]
J. Sillito, G. C. Murphy, and K. D. Volder, “Asking and answering questions during a programming change task,” IEEE Transactions on Software Engineering, vol. 34, pp. 434–451, 2008.

Cited By

View all
  • (2019)An approach to detect false design patternsProceedings of the XIII Brazilian Symposium on Software Components, Architectures, and Reuse10.1145/3357141.3357146(63-72)Online publication date: 23-Sep-2019
  • (2015)Modeling readability to improve unit testsProceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering10.1145/2786805.2786838(107-118)Online publication date: 30-Aug-2015

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '13: Proceedings of the 2013 International Conference on Software Engineering
May 2013
1561 pages
ISBN:9781467330763

Sponsors

Publisher

IEEE Press

Publication History

Published: 18 May 2013

Check for updates

Qualifiers

  • Research-article

Conference

ICSE '13
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2019)An approach to detect false design patternsProceedings of the XIII Brazilian Symposium on Software Components, Architectures, and Reuse10.1145/3357141.3357146(63-72)Online publication date: 23-Sep-2019
  • (2015)Modeling readability to improve unit testsProceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering10.1145/2786805.2786838(107-118)Online publication date: 30-Aug-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media