short-paper

Open access

Analysing Static Source Code Features to Determine a Correlation to Steady State Performance in Java Microbenchmarks

Authors:

Jared Chad Swanzen,

Kyle Thomas Botes,

Omphile Monchwe,

Dustin van der HaarAuthors Info & Claims

ICPE '23 Companion: Companion of the 2023 ACM/SPEC International Conference on Performance Engineering

Pages 89 - 93

https://doi.org/10.1145/3578245.3584692

Published: 15 April 2023 Publication History

Abstract

Source code analysis is an important aspect of software development that provides insight into a program's quality, security and performance. There are few methods for consistently predicting or determining when a written piece of code will end its warm-up state and proceed to a steady state. In this study, we use the data gathered by the SEALABQualityGroup at the University of L'Aquila and Charles University and extend their research of steady state analysis to determine whether certain source code features could provide a basis for developers to make more informed predictions on when a steady state would occur. We explore if there is a direct correlation between source code features on the time and ability of a Java microbenchmark to reach a steady state to build a machine learning-based approach for steady-state prediction. We found that the correlation between source code features and the probability of reaching a steady state go as high as 10.9% for Pearson's correlation coefficient, whereas the correlation between source code features and the time it takes to reach a steady state go as high as 21.6% for Spearman's correlation coefficient. Our results also show that a K Nearest Neighbour Classifier with features selected with either Spearman's or Kendall's correlation coefficient boasts an accuracy of 78.6%.

References

[1]

Naomi S Altman. 1992. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, Vol. 46, 3 (1992), 175--185.

[2]

Edd Barrett, Carl Friedrich Bolz-Tereick, Rebecca Killick, Sarah Mount, and Laurence Tratt. 2017. Virtual machine warmup blows hot and cold. Proceedings of the ACM on Programming Languages, Vol. 1, OOPSLA (2017), 1--27.

Digital Library

[3]

David Binkley. 2007. Source code analysis: A road map. Future of Software Engineering (FOSE'07) (2007), 104--119.

[4]

Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning, Vol. 20, 3 (1995), 273--297.

[5]

David R Cox. 1958. The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), Vol. 20, 2 (1958), 215--232.

[6]

Charlie Curtsinger and Emery D Berger. 2013. Stabilizer: Statistically sound performance evaluation. ACM SIGARCH Computer Architecture News, Vol. 41, 1 (2013), 219--228.

Digital Library

[7]

David Freedman, Robert Pisani, and Roger Purves. 2007. Statistics (international student edition). Pisani, R. Purves, 4th edn. WW Norton & Company, New York (2007).

[8]

Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically rigorous java performance evaluation. ACM SIGPLAN Notices, Vol. 42, 10 (2007), 57--76.

Digital Library

[9]

Wilhelmiina Hamalainen and Mikko Vinni. 2006. Comparison of Machine Learning Methods for Intelligent Tutoring Systems. In Intelligent Tutoring Systems, Mitsuru Ikeda, Kevin D. Ashley, and Tak-Wai Chan (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 525--534.

[10]

Tin Kam Ho. 1995. Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, Vol. 1. IEEE, 278--282.

Digital Library

[11]

Arvinder Kaur and Ruchikaa Nayyar. 2020. A comparative study of static code analysis tools for vulnerability detection in C/C and JAVA source code. Procedia Computer Science, Vol. 171 (2020), 2023--2029.

[12]

M. G. Kendall. 1938. A New Measure of Rank Correlation. Biometrika, Vol. 30, 1--2 (June 1938), 81--93. https://doi.org/10.1093/biomet/30.1--2.81

[13]

Christoph Laaber, Mikael Basmaci, and Pasquale Salza. 2021. Predicting unstable software benchmarks using static source code features. Empirical Software Engineering, Vol. 26, 6 (2021), 1--53.

Digital Library

[14]

Terence J. Parr and Russell W. Quong. 1995. ANTLR: A predicated-LL (k) parser generator. Software: Practice and Experience, Vol. 25, 7 (1995), 789--810.

Digital Library

[15]

Jared Chad Swanzen. 2023. Reproduce - Analysing Static Source Code Features to Determine a Correlation to Steady State Performance in Java Microbenchmarks. https://doi.org/10.5281/zenodo.7646968

[16]

Luca Traini, Vittorio Cortellessa, Daniele Di Pompeo, and Michele Tucci. 2023. Towards effective assessment of steady state performance in Java software: are we there yet? Empirical Software Engineering, Vol. 28, 1 (2023), 1--57.

Digital Library

[17]

Hannes Tribus. 2010. Static Code Features for a Machine Learning based Inspection: An approach for C.

[18]

Geoffrey I Webb, Eamonn Keogh, and Risto Miikkulainen. 2010. Na"ive Bayes. Encyclopedia of machine learning, Vol. 15 (2010), 713--714.

[19]

Xindong Wu, Vipin Kumar, J Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J McLachlan, Angus Ng, Bing Liu, S Yu Philip, et al. 2008. Top 10 algorithms in data mining. Knowledge and information systems, Vol. 14, 1 (2008), 1--37.

[20]

Jerrold H Zar. 2005. Spearman rank correlation. Encyclopedia of Biostatistics, Vol. 7 (2005).

Index Terms

Analysing Static Source Code Features to Determine a Correlation to Steady State Performance in Java Microbenchmarks
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation

Recommendations

Comparison of correlation-based measures of concordance in terms of asymptotic variance
Abstract
We compare measures of concordance that arise as Pearson’s linear correlation coefficient between two random variables transformed so that they follow the so-called concordance-inducing distributions. The class of such transformed rank ...
Order Statistics Correlation Coefficient as a Novel Association Measurement With Applications to Biosignal Analysis

In this paper, we propose a novel correlation coefficient based on order statistics and rearrangement inequality. The proposed coefficient represents a compromise between the Pearson's linear coefficient and the two rank-based coefficients, namely ...
Similarity-Based Correlation Functions for Binary Data
Advances in Computational Intelligence
Abstract
The purpose of this study is to survey the correlation and association coefficients introduced previously on the set of binary n-tuples and to determine coefficients satisfying the properties of correlation functions. These functions were recently ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICPE '23 Companion: Companion of the 2023 ACM/SPEC International Conference on Performance Engineering

April 2023

421 pages

ISBN:9798400700729

DOI:10.1145/3578245

General Chairs:
Marco Vieira
University of Coimbra, Portugal
,
Valeria Cardellini
University of Rome Tor Vergata, Italy
,
Program Chairs:
Antinisca Di Marco
University of L'Aquila, Italy
,
Petr Tuma
Charles University, Czechia

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 April 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

ICPE '23

Sponsor:

ICPE '23: ACM/SPEC International Conference on Performance Engineering

April 15 - 19, 2023

Coimbra, Portugal

Acceptance Rates

Overall Acceptance Rate 252 of 851 submissions, 30%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
191
Total Downloads

Downloads (Last 12 months)113
Downloads (Last 6 weeks)10

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents