Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2487085.2487155guideproceedingsArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
Article
Free access

Automatically mining software-based, semantically-similar words from comment-code mappings

Published: 18 May 2013 Publication History

Abstract

Many software development and maintenance tools involve matching between natural language words in different software artifacts (e.g., traceability) or between queries submitted by a user and software artifacts (e.g., code search). Because different people likely created the queries and various artifacts, the effectiveness of these tools is often improved by expanding queries and adding related words to textual artifact representations. Synonyms are particularly useful to overcome the mismatch in vocabularies, as well as other word relations that indicate semantic similarity. However, experience shows that many words are semantically similar in computer science situations, but not in typical natural language documents. In this paper, we present an automatic technique to mine semantically similar words, particularly in the software context. We leverage the role of leading comments for methods and programmer conventions in writing them. Our evaluation of our mined related comment-code word mappings that do not already occur in WordNet are indeed viewed as computer science, semantically-similar word pairs in high proportions.

References

[1]
C. D. Manning, P. Raghavan, and H. Schuetze, Introduction to Information Retrieval. Cambridge University Press, 2008.
[2]
D. Shepherd, Z. P. Fry, E. Hill, L. Pollock, and K. Vijay-Shanker, “Using natural language program analysis to locate and understand actionoriented concerns,” in AOSD ’07: Proceedings of the 6th International Conference on Aspect-oriented Software Development, 2007.
[3]
C. Fellbaum, WordNet: An Electronic Lexical Database. MIT Press, 1998.
[4]
E. Hill, L. Pollock, and K. Vijay-Shanker, “Exploring the neighborhood with Dora to expedite software maintenance,” in 22nd IEEE International Conference on Automated Software Engineering (ASE), 2007.
[5]
M. P. Robillard, “Automatic generation of suggestions for program investigation,” in 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2005.
[6]
Z. M. Saul, V. Filkov, P. Devanbu, and C. Bird, “Recommending random walks,” in Proceedings of the European Software Engineering Conference, 2007.
[7]
J. Krinke, “Identifying similar code with program dependence graphs,” in Eighth Working Conference on Reverse Engineering (WCRE), 2001.
[8]
S. Bajracharya, T. Ngo, E. Linstead, Y. Dou, P. Rigor, P. Baldi, and C. Lopes, “Sourcerer: a search engine for open source code supporting structure-based search,” in OOPSLA: Companion to the 21st ACM SIGPLAN, 2006.
[9]
R. Holmes and G. C. Murphy, “Using structural context to recommend source code examples,” in 27th International Conference on Software Engineering, 2005.
[10]
J. Anvik, L. Hiew, and G. C. Murphy, “Who should fix this bug?” in ICSE ’06: Proceedings of the 28th Int. Conference on Software Engineering, 2006.
[11]
T. Apiwattanapong, A. Orso, and M. J. Harrold, “A differencing algorithm for object-oriented programs,” in 19th IEEE International Conference on Automated Software Engineering, 2004.
[12]
G. Sridhara, E. Hill, L. Pollock, and K. Vijay-Shanker, “Identifying word relations in software: A comparative study of semantic similarity tools,” in Proceedings of the 2008 The 16th IEEE International Conference on Program Comprehension. IEEE Computer Society, 2008, pp. 123–132.
[13]
A. Budanitsky and G. Hirst, “Evaluating WordNet-based Measures of Lexical Semantic Relatedness,” Computational Linguistics, vol. 32, no. 1, 2006.
[14]
D. Jurafsky and J. H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Second Edition. Prentice Hall, 2008.
[15]
T. Pedersen, S. Banerjee, and S. Patwardhan, “Maximizing semantic relatedness to perform word sense disambiguation,” University of Minnesota, Duluth, Tech. Rep., 2005.
[16]
J. Yang and L. Tan, “Inferring semantically related words from software context,” in Proceedings of the Working Conference on Mining Software Repositories (MSR’12), June 2012.
[17]
G. Sridhara, “Automatic generation of descriptive summary comments for methods in object-oriented programs,” Ph.D. dissertation, University of Delaware, Jan 2012.
[18]
M.-A. Storey, J. Ryall, R. I. Bull, D. Myers, and J. Singer, “Todo or to bug: exploring how task annotations play a role in the work practices of software developers,” in Proceedings of the 30th international conference on Software engineering, ser. ICSE ’08, 2008.
[19]
K. Toutanova, D. Klein, C. Manning, and Y. Singer, “Feature-rich partof-speech tagging with a cyclic dependency network,” in Proceedings of HLT-NAACL 2003, 2003, pp. 252–259.
[20]
S. L. Abebe and P. Tonella, “Natural language parsing of program element names for concept extraction,” in Proceedings of the 2010 IEEE 18th International Conference on Program Comprehension, ser. ICPC ’10, 2010, pp. 156–159.
[21]
S. Gupta, K. Vijay-Shanker, and L. Pollock, “Part-of-speech tagging of method names,” U of Delaware, Tech. Rep. UD-CIS; 2013-002, February 2013.
[22]
E. Enslen, E. Hill, L. Pollock, and K. Vijay-Shanker, “Mining source code to automatically split identifiers for software analysis,” in 6th IEEE Working Conference on Mining Software Repositories (MSR), May 2009.
[23]
C. D. Manning and H. Schuetze, Foundations of Statistical Natural Language Processing. The MIT Press, 1999.
[24]
L. Tan, D. Yuan, G. Krishna, and Y. Zhou, “/*iComment: Bugs or bad comments?*/,” in SOSP ’07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles. ACM, 2007, pp. 145–158.
[25]
L. Tan, Y. Zhou, and Y. Padioleau, “aComment: mining annotations from comments and code to detect interrupt related concurrency bugs,” in 33rd International Conference on Software Engineering, 2011.
[26]
B. Fluri, M. Würsch, and H. C. Gall, “Do code and comments coevolve? on the relation between source code and comment changes,” in 14th Working Conference on Reverse Engineering (WCRE), 2007.
[27]
D. J. Lawrie, H. Feild, and D. Binkley, “Leveraged quality assessment using information retrieval techniques,” in Proceedings of the 14th IEEE International Conference on Program Comprehension (ICPC’06), 2006.
[28]
A. Marcus, A. Sergeyev, V. Rajlich, and J. I. Maletic, “An information retrieval approach to concept location in source code,” in 11th Working Conference on Reverse Engineering (WCRE’04), 2004.

Cited By

View all
  • (2021)How Far Have We Progressed in Identifying Self-admitted Technical Debts? A Comprehensive Empirical StudyACM Transactions on Software Engineering and Methodology10.1145/344724730:4(1-56)Online publication date: 23-Jul-2021
  • (2021)Automated Query Reformulation for Efficient Search based on Query Logs From Stack OverflowProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00116(1273-1285)Online publication date: 22-May-2021
  • (2019)Exploring Programming Semantic Analytics with Deep Learning ModelsProceedings of the 9th International Conference on Learning Analytics & Knowledge10.1145/3303772.3303823(155-159)Online publication date: 4-Mar-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
MSR '13: Proceedings of the 10th Working Conference on Mining Software Repositories
May 2013
438 pages
ISBN:9781467329361

Publisher

IEEE Press

Publication History

Published: 18 May 2013

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)8
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)How Far Have We Progressed in Identifying Self-admitted Technical Debts? A Comprehensive Empirical StudyACM Transactions on Software Engineering and Methodology10.1145/344724730:4(1-56)Online publication date: 23-Jul-2021
  • (2021)Automated Query Reformulation for Efficient Search based on Query Logs From Stack OverflowProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00116(1273-1285)Online publication date: 22-May-2021
  • (2019)Exploring Programming Semantic Analytics with Deep Learning ModelsProceedings of the 9th International Conference on Learning Analytics & Knowledge10.1145/3303772.3303823(155-159)Online publication date: 4-Mar-2019
  • (2019)NL2TypeProceedings of the 41st International Conference on Software Engineering10.1109/ICSE.2019.00045(304-315)Online publication date: 25-May-2019
  • (2019)Supporting code search with context-aware, analytics-driven, effective query reformulationProceedings of the 41st International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion.2019.00088(226-229)Online publication date: 25-May-2019
  • (2019)Automatic query reformulation for code search using crowdsourced knowledgeEmpirical Software Engineering10.1007/s10664-018-9671-024:4(1869-1924)Online publication date: 1-Aug-2019
  • (2018)PerfLearner: learning from bug reports to understand and generate performance test framesProceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering10.1145/3238147.3238204(17-28)Online publication date: 3-Sep-2018
  • (2018)Modeling Semantics between Programming Codes and AnnotationsProceedings of the 29th on Hypertext and Social Media10.1145/3209542.3209578(101-105)Online publication date: 3-Jul-2018
  • (2018)Correlation-based software search by leveraging software term databaseFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-017-6573-z12:5(923-938)Online publication date: 1-Oct-2018
  • (2017)Improved query reformulation for concept location using CodeRank and document structuresProceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering10.5555/3155562.3155618(428-439)Online publication date: 30-Oct-2017
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media