Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2961111.2962622acmconferencesArticle/Chapter ViewAbstractPublication PagesesemConference Proceedingsconference-collections
short-paper

Semantic Coupling Between Classes: Corpora or Identifiers?

Published: 08 September 2016 Publication History

Abstract

Context: Conceptual coupling is a measure of how loosely or closely related two software artifacts are, by considering the semantic information embedded in the comments and identifiers. This type of coupling is typically evaluated using the semantic information from source code into a words corpus. The extraction of words corpora can be lengthy, especially when systems are large and many classes are involved.
Goal: This study investigates whether using only the class identifiers (e.g., the class names) can be used to evaluate the conceptual coupling between classes, as opposed to the words corpora of the entire classes.
Method: In this study, we analyze two Java systems and extract the conceptual coupling between pairs of classes, using (i) a corpus-based approach; and (ii) two identifier-based tools.
Results: Our results show that measuring the semantic similarity between classes using (only) their identifiers is similar to using the class corpora. Additionally, using the identifiers is more efficient in terms of precision, recall, and computation time.
Conclusions: Using only class identifiers to measure their semantic similarity can save time on program comprehension tasks for large software projects; the findings of this paper support this hypothesis, for the systems that were used in the evaluation and can also be used to guide researchers developing future generations of tools supporting program comprehension.

References

[1]
B. Fluri, M. Würsch, and H. C. Gall. Do code and comments co-evolve? on the relation between source code and comment changes. In Reverse Engineering, 2007. WCRE 2007. 14th Working Conference on, pages 70--79. IEEE, 2007.
[2]
M. M. Geipel and F. Schweitzer. The link between dependency and cochange: empirical evidence. Software Engineering, IEEE Transactions on, 38(6): 1432--1444, 2012.
[3]
Q. Guo. The similarity computing of documents based on vsm. In Network-Based Information Systems, pages 142--148. Springer, 2008.
[4]
H. Kagdi, M. Gethers, and D. Poshyvanyk. Integrating conceptual and logical couplings for change impact analysis in software. Empirical Software Engineering, 18(5):933--969, 2013.
[5]
V. Kešelj, F. Peng, N. Cercone, and C. Thomas. N-gram-based author profiles for authorship attribution. In Proceedings of the conference pacific association for computational linguistics, PACLING, volume 3, pages 255--264, 2003.
[6]
G. Kondrak. N-gram similarity and distance. In String processing and information retrieval, pages 115--126. Springer, 2005.
[7]
A. Kuhn, S. Ducasse, and T. Gírba. Semantic clustering: Identifying topics in source code. Information and Software Technology, 49(3):230--243, 2007.
[8]
C. A. Kumar, M. Radvansky, and J. Annapurna. Analysis of a vector space model, latent semantic indexing and formal concept analysis for information retrieval. CYBERNETICS AND INFORMATION TECHNOLOGIES, 12(1), 2012.
[9]
A. Marcus, A. Sergeyev, V. Rajlich, J. Maletic, et al. An information retrieval approach to concept location in source code. In Reverse Engineering, 2004-Proceedings. 11th Working Conference on, pages 214--223. IEEE, 2004.
[10]
P. Mcnamee and J. Mayfield. Character n-gram tokenization for european language text retrieval. Information retrieval, 7(1-2):73--97, 2004.
[11]
G. Miller and C. Fellbaum. Wordnet: An electronic lexical database, 1998.
[12]
A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In Software Maintenance, 2000. Proceedings. International Conference on, pages 120--130. IEEE, 2000.
[13]
G. A. Oliva and M. Gerosa. Experience report: How do structural dependencies influence change propagation? an empirical study. In Proceedings of the 26th IEEE International Symposium on Software Reliability Engineering, 2015.
[14]
G. A. Oliva and M. A. Gerosa. On the interplay between structural and logical dependencies in open-source software. In Software Engineering (SBES), 2011 25th Brazilian Symposium on, pages 144--153. IEEE, 2011.
[15]
D. Poshyvanyk and A. Marcus. The conceptual coupling metrics for object-oriented systems. In Software Maintenance, 2006. ICSM'06. 22nd IEEE International Conference on, pages 469--478. IEEE, 2006.
[16]
D. Poshyvanyk, A. Marcus, R. Ferenc, and T. Gyimóthy. Using information retrieval based coupling measures for impact analysis. Empirical software engineering, 14(1):5--32, 2009.
[17]
M. Revelle, M. Gethers, and D. Poshyvanyk. Using structural and textual information to capture feature coupling in object-oriented software. Empirical software engineering, 16(6):773--811, 2011.
[18]
G. Sridhara, E. Hill, L. Pollock, and K. Vijay-Shanker. Identifying word relations in software: A comparative study of semantic similarity tools. In Program Comprehension, 2008. ICPC 2008. The 16th IEEE International Conference on, pages 123--132. IEEE, 2008.
[19]
B. Újházi, R. Ferenc, D. Poshyvanyk, and T. Gyimóthy. New conceptual coupling and cohesion metrics for object-oriented systems. In Source Code Analysis and Manipulation (SCAM), 2010 10th IEEE Working Conference on, pages 33--42. IEEE, 2010.
[20]
J. Winkelman, K. D. Sethi, C. Kushida, P. Becker, J. Koester, J. Cappola, and J. Reess. Efficacy and safety of pramipexole in restless legs syndrome. Neurology, 67(6):1034--1039, 2006.

Cited By

View all
  • (2023)Weak Labelling for File-level Source Code Classification2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00074(698-702)Online publication date: Mar-2023
  • (2023)Multi-granular software annotation using file-level weak labellingEmpirical Software Engineering10.1007/s10664-023-10423-729:1Online publication date: 30-Nov-2023
  • (2018)National boundaries and semantics of artefacts in open source developmentProceedings of the 1st International Workshop on Software Health10.1145/3194124.3194131(33-39)Online publication date: 28-May-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEM '16: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement
September 2016
457 pages
ISBN:9781450344272
DOI:10.1145/2961111
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 September 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Corpora
  2. Corpus
  3. Latent Semantic Indexing (LSI)
  4. Object-oriented software (OO)
  5. Open-source software (OSS)
  6. Semantic coupling
  7. Semantic similarity
  8. Vector Space Model (VSM)

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

ESEM '16
Sponsor:

Acceptance Rates

ESEM '16 Paper Acceptance Rate 27 of 122 submissions, 22%;
Overall Acceptance Rate 130 of 594 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)3
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Weak Labelling for File-level Source Code Classification2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00074(698-702)Online publication date: Mar-2023
  • (2023)Multi-granular software annotation using file-level weak labellingEmpirical Software Engineering10.1007/s10664-023-10423-729:1Online publication date: 30-Nov-2023
  • (2018)National boundaries and semantics of artefacts in open source developmentProceedings of the 1st International Workshop on Software Health10.1145/3194124.3194131(33-39)Online publication date: 28-May-2018
  • (2018)Coupling and Cohesion Metrics for Object-Oriented SoftwareProceedings of the 11th Innovations in Software Engineering Conference10.1145/3172871.3172878(1-11)Online publication date: 9-Feb-2018
  • (2018)An empirical study on the interplay between semantic coupling and co-change of software classesEmpirical Software Engineering10.1007/s10664-017-9569-223:3(1791-1825)Online publication date: 1-Jun-2018
  • (2017)Managing hidden dependencies in OO softwareProceedings of the 11th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1109/ESEM.2017.21(141-150)Online publication date: 9-Nov-2017

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media