short-paper

A Curated Benchmark Collection of Python Systems for Empirical Studies on Software Engineering

Authors:

Michele Marchesi,

Roberto Tonelli,

Giuseppe DestefanisAuthors Info & Claims

PROMISE '15: Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering

Article No.: 2, Pages 1 - 4

https://doi.org/10.1145/2810146.2810148

Published: 21 October 2015 Publication History

Abstract

The aim of this paper is to present a dataset of metrics associated to the first release of a curated collection of Python software systems. We describe the dataset along with the adopted criteria and the issues we faced while building such corpus. This dataset can enhance the reliability of empirical studies, enabling their reproducibility, reducing their cost, and it can foster further research on Python software.

References

[1]

Python Package Index: https://pypi.python.org/pypi.

[2]

The Promise Repository of Empirical Software Engineering data: http://openscience.us/repo, 2015.

[3]

Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. The DaCapo Benchmarks: Java benchmarking development and analysis. SIGPLAN Not., 41(10):169--190, October 2006.

Digital Library

[4]

Christian Collberg, Ginger Myles, and Michael Stepp. An empirical study of Java bytecode programs. Softw. Pract. Exper., 37(6):581--641, May 2007.

Digital Library

[5]

Giulio Concas, Cristina Monni, Matteo Orrù, and Roberto Tonelli. A study of the community structure of a complex software network. In 4th International Workshop on Emerging Trends in Software Metrics, WETSoM 2013, San Francisco, CA, USA, May 21, 2013, pages 14--20, 2013.

[6]

Giuseppe Destefanis. Technical report: Which programming language should a company use? a twitter-based analysis. CRIM - Technical Report, 2014.

[7]

Giuseppe Destefanis, Steve Counsell, Giulio Concas, and Roberto Tonelli. Software metrics in agile software: An empirical study. In Agile Processes in Software Engineering and Extreme Programming, pages 157--170. Springer, 2014.

Digital Library

[8]

Giuseppe Destefanis, Roberto Tonelli, Ewan Tempero, Giulio Concas, and Michele Marchesi. Micro pattern fault-proneness. In Software Engineering and Advanced Applications (SEAA), 2012 38th EUROMICRO Conference on, pages 302--306. IEEE, 2012.

Digital Library

[9]

Hyunsook Do, Sebastian Elbaum, and Gregg Rothermel. Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empirical Softw. Engg., 10(4):405--435, October 2005.

Digital Library

[10]

Gabriel Farah, Juan Sebastian Tejada, and Dario Correal. Openhub: A scalable architecture for the analysis of software quality attributes. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pages 420--423, New York, NY, USA, 2014. ACM.

Digital Library

[11]

Joseph Yossi Gil and Itay Maman. Micro patterns in Java code. In ACM SIGPLAN Notices, volume 40, pages 97--116. ACM, 2005.

Digital Library

[12]

Georgios Gousios. The GHTorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR '13, pages 233--236, Piscataway, NJ, USA, 2013. IEEE Press.

Digital Library

[13]

Philip Guo. Python is now the most popular introductory teaching language at top us universities. BLOG@ CACM, July, 2014.

[14]

Susan Hunston. Corpora in applied linguistics. Cambridge University Press, 2006.

[15]

Johnny Wei-Bing Lin. Why Python is the next wave in earth sciences computing. Bulletin of the American Meteorological Society, 93(12):1823--1824, 2012.

[16]

Tim Menzies, Bora Caglayan, Ekrem Kocaguneli, Joe Krall, Fayola Peters, and Burak Turhan. The PROMISE Repository of empirical software engineering data, June 2012.

[17]

Sebastian Nanz and Carlo A. Furia. A Comparative Study of Programming Languages in Rosetta Code. CoRR, abs/1409.0252, 2014.

Digital Library

[18]

Giancarlo Succi, Witold Pedrycz, Snezana Djokic, Paolo Zuliani, and Barbara Russo. An empirical exploration of the distributions of the Chidamber and Kemerer Object-Oriented Metrics Suite. Empirical Softw. Engg., 10(1):81--104, January 2005.

Digital Library

[19]

Ewan Tempero, Craig Anslow, Jens Dietrich, Ted Han, Jing Li, Markus Lumpe, Hayden Melton, and James Noble. Qualitas corpus: A curated collection of Java code for empirical studies. In 2010 Asia Pacific Software Engineering Conference (APSEC2010), pages 336--345, December 2010.

Digital Library

[20]

Ricardo Terra, Luis Fernando Miranda, Marco Tulio Valente, and Roberto S. Bigonha. Qualitas.class Corpus: A compiled version of the Qualitas Corpus. Software Engineering Notes, 38(5):1--4, 2013.

Digital Library

[21]

Understand. Scitools.com: https://scitools.com.

[22]

Ian H. Witten, Sally Jo Cunningham, and Mark D. Apperley. The New Zealand digital library project. New Zealand Libraries, 48:146--152, 1996.

[23]

Thomas Zimmermann, Massimiliano Di Penta, and Sunghun Kim, editors. Proceedings of the 10th Working Conference on Mining Software Repositories, MSR '13, San Francisco, CA, USA, May 18-19, 2013. IEEE Computer Society, 2013.

Digital Library

Cited By

Gruner BSonnekalb THeinze TBrust C(2023)Cross-Domain Evaluation of a Deep Learning-Based Type Inference System2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)10.1109/MSR59073.2023.00034(158-169)Online publication date: May-2023
https://doi.org/10.1109/MSR59073.2023.00034
Meidani MLamothe MMcIntosh S(2023)Assessing the exposure of software changesEmpirical Software Engineering10.1007/s10664-022-10270-y28:2Online publication date: 8-Feb-2023
https://dl.acm.org/doi/10.1007/s10664-022-10270-y
Liu DFeng YYan YXu B(2023)Towards understanding bugs in Python interpretersEmpirical Software Engineering10.1007/s10664-022-10239-x28:1Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1007/s10664-022-10239-x
Show More Cited By

Index Terms

A Curated Benchmark Collection of Python Systems for Empirical Studies on Software Engineering
1. General and reference
  1. Cross-computing tools and techniques
    1. Metrics
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types

Recommendations

Ethical Issues in Empirical Studies of Software Engineering

The popularity of empirical methods in software engineering research is on the rise. Surveys, experiments, metrics, case studies, and field studies are examples of empirical methods used to investigate both software engineering processes and products. ...
Understanding Python's Garbage Collection: Reference counting in Python
Web development with python and django (abstract only)
SIGCSE '12: Proceedings of the 43rd ACM technical symposium on Computer Science Education

Many instructors have already discovered the joy of teaching programming using the Python programming language. Now it's time to take Python to the next level. This workshop will introduce Django, an open source Python web framework that saves you time ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

PROMISE '15: Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering

October 2015

63 pages

ISBN:9781450337151

DOI:10.1145/2810146

General Chair:
Ayse Bener
Ryerson University, Canada
,
Program Chair:
Leandro Minku
University of Birmingham, UK
,
Publications Chair:
Burak Turhan
University of Oulu, Finland

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

PROMISE '15

PROMISE '15: The 11th International Conference on Predictive Models and Data Analytics in Software Engineering

October 21, 2015

Beijing, China

Acceptance Rates

PROMISE '15 Paper Acceptance Rate 8 of 16 submissions, 50%;

Overall Acceptance Rate 98 of 213 submissions, 46%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
238
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gruner BSonnekalb THeinze TBrust C(2023)Cross-Domain Evaluation of a Deep Learning-Based Type Inference System2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)10.1109/MSR59073.2023.00034(158-169)Online publication date: May-2023
https://doi.org/10.1109/MSR59073.2023.00034
Meidani MLamothe MMcIntosh S(2023)Assessing the exposure of software changesEmpirical Software Engineering10.1007/s10664-022-10270-y28:2Online publication date: 8-Feb-2023
https://dl.acm.org/doi/10.1007/s10664-022-10270-y
Liu DFeng YYan YXu B(2023)Towards understanding bugs in Python interpretersEmpirical Software Engineering10.1007/s10664-022-10239-x28:1Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1007/s10664-022-10239-x
Yousuf MRashid M(2022)Analysis of the change in bugginess and adaptiveness of python software systemsMultimedia Tools and Applications10.1007/s11042-022-13246-881:30(43107-43123)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1007/s11042-022-13246-8
Ruohonen JHjerppe KRindell K(2021)A Large-Scale Security-Oriented Static Analysis of Python Packages in PyPI2021 18th International Conference on Privacy, Security and Trust (PST)10.1109/PST52912.2021.9647791(1-10)Online publication date: 13-Dec-2021
https://doi.org/10.1109/PST52912.2021.9647791
Cotroneo DDe Simone LLiguori PNatella RBidokhti NDumas MPfahl DApel SRusso A(2019)How bad can a bug get? an empirical analysis of software failures in the OpenStack cloud computing platformProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3338906.3338916(200-211)Online publication date: 12-Aug-2019
https://dl.acm.org/doi/10.1145/3338906.3338916
Biswas SIslam MHuang YRajan HStorey MAdams BHaiduc S(2019)Boa meets pythonProceedings of the 16th International Conference on Mining Software Repositories10.1109/MSR.2019.00086(577-581)Online publication date: 26-May-2019
https://dl.acm.org/doi/10.1109/MSR.2019.00086
Ortu MOrru MDestefanis G(2019)On Comparing Software Quality Metrics of Traditional vs Blockchain-Oriented Software: An Empirical Study2019 IEEE International Workshop on Blockchain Oriented Software Engineering (IWBOSE)10.1109/IWBOSE.2019.8666575(32-37)Online publication date: Feb-2019
https://doi.org/10.1109/IWBOSE.2019.8666575
Malloy BPower J(2019)An empirical analysis of the transition from Python 2 to Python 3Empirical Software Engineering10.1007/s10664-018-9637-224:2(751-778)Online publication date: 1-Apr-2019
https://dl.acm.org/doi/10.1007/s10664-018-9637-2
Omari SMartinez G(2019)Enabling Empirical Research: A Corpus of Large-Scale Python SystemsProceedings of the Future Technologies Conference (FTC) 201910.1007/978-3-030-32523-7_49(661-669)Online publication date: 10-Oct-2019
https://doi.org/10.1007/978-3-030-32523-7_49
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten