Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2810146.2810148acmotherconferencesArticle/Chapter ViewAbstractPublication PagespromiseConference Proceedingsconference-collections
short-paper

A Curated Benchmark Collection of Python Systems for Empirical Studies on Software Engineering

Published: 21 October 2015 Publication History

Abstract

The aim of this paper is to present a dataset of metrics associated to the first release of a curated collection of Python software systems. We describe the dataset along with the adopted criteria and the issues we faced while building such corpus. This dataset can enhance the reliability of empirical studies, enabling their reproducibility, reducing their cost, and it can foster further research on Python software.

References

[1]
Python Package Index: https://pypi.python.org/pypi.
[2]
The Promise Repository of Empirical Software Engineering data: http://openscience.us/repo, 2015.
[3]
Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. The DaCapo Benchmarks: Java benchmarking development and analysis. SIGPLAN Not., 41(10):169--190, October 2006.
[4]
Christian Collberg, Ginger Myles, and Michael Stepp. An empirical study of Java bytecode programs. Softw. Pract. Exper., 37(6):581--641, May 2007.
[5]
Giulio Concas, Cristina Monni, Matteo Orrù, and Roberto Tonelli. A study of the community structure of a complex software network. In 4th International Workshop on Emerging Trends in Software Metrics, WETSoM 2013, San Francisco, CA, USA, May 21, 2013, pages 14--20, 2013.
[6]
Giuseppe Destefanis. Technical report: Which programming language should a company use? a twitter-based analysis. CRIM - Technical Report, 2014.
[7]
Giuseppe Destefanis, Steve Counsell, Giulio Concas, and Roberto Tonelli. Software metrics in agile software: An empirical study. In Agile Processes in Software Engineering and Extreme Programming, pages 157--170. Springer, 2014.
[8]
Giuseppe Destefanis, Roberto Tonelli, Ewan Tempero, Giulio Concas, and Michele Marchesi. Micro pattern fault-proneness. In Software Engineering and Advanced Applications (SEAA), 2012 38th EUROMICRO Conference on, pages 302--306. IEEE, 2012.
[9]
Hyunsook Do, Sebastian Elbaum, and Gregg Rothermel. Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empirical Softw. Engg., 10(4):405--435, October 2005.
[10]
Gabriel Farah, Juan Sebastian Tejada, and Dario Correal. Openhub: A scalable architecture for the analysis of software quality attributes. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pages 420--423, New York, NY, USA, 2014. ACM.
[11]
Joseph Yossi Gil and Itay Maman. Micro patterns in Java code. In ACM SIGPLAN Notices, volume 40, pages 97--116. ACM, 2005.
[12]
Georgios Gousios. The GHTorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR '13, pages 233--236, Piscataway, NJ, USA, 2013. IEEE Press.
[13]
Philip Guo. Python is now the most popular introductory teaching language at top us universities. BLOG@ CACM, July, 2014.
[14]
Susan Hunston. Corpora in applied linguistics. Cambridge University Press, 2006.
[15]
Johnny Wei-Bing Lin. Why Python is the next wave in earth sciences computing. Bulletin of the American Meteorological Society, 93(12):1823--1824, 2012.
[16]
Tim Menzies, Bora Caglayan, Ekrem Kocaguneli, Joe Krall, Fayola Peters, and Burak Turhan. The PROMISE Repository of empirical software engineering data, June 2012.
[17]
Sebastian Nanz and Carlo A. Furia. A Comparative Study of Programming Languages in Rosetta Code. CoRR, abs/1409.0252, 2014.
[18]
Giancarlo Succi, Witold Pedrycz, Snezana Djokic, Paolo Zuliani, and Barbara Russo. An empirical exploration of the distributions of the Chidamber and Kemerer Object-Oriented Metrics Suite. Empirical Softw. Engg., 10(1):81--104, January 2005.
[19]
Ewan Tempero, Craig Anslow, Jens Dietrich, Ted Han, Jing Li, Markus Lumpe, Hayden Melton, and James Noble. Qualitas corpus: A curated collection of Java code for empirical studies. In 2010 Asia Pacific Software Engineering Conference (APSEC2010), pages 336--345, December 2010.
[20]
Ricardo Terra, Luis Fernando Miranda, Marco Tulio Valente, and Roberto S. Bigonha. Qualitas.class Corpus: A compiled version of the Qualitas Corpus. Software Engineering Notes, 38(5):1--4, 2013.
[21]
Understand. Scitools.com: https://scitools.com.
[22]
Ian H. Witten, Sally Jo Cunningham, and Mark D. Apperley. The New Zealand digital library project. New Zealand Libraries, 48:146--152, 1996.
[23]
Thomas Zimmermann, Massimiliano Di Penta, and Sunghun Kim, editors. Proceedings of the 10th Working Conference on Mining Software Repositories, MSR '13, San Francisco, CA, USA, May 18-19, 2013. IEEE Computer Society, 2013.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
PROMISE '15: Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering
October 2015
63 pages
ISBN:9781450337151
DOI:10.1145/2810146
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Curated Code Collection
  2. Empirical Studies
  3. Python

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

PROMISE '15

Acceptance Rates

PROMISE '15 Paper Acceptance Rate 8 of 16 submissions, 50%;
Overall Acceptance Rate 98 of 213 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Cross-Domain Evaluation of a Deep Learning-Based Type Inference System2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)10.1109/MSR59073.2023.00034(158-169)Online publication date: May-2023
  • (2023)Assessing the exposure of software changesEmpirical Software Engineering10.1007/s10664-022-10270-y28:2Online publication date: 8-Feb-2023
  • (2023)Towards understanding bugs in Python interpretersEmpirical Software Engineering10.1007/s10664-022-10239-x28:1Online publication date: 1-Jan-2023
  • (2022)Analysis of the change in bugginess and adaptiveness of python software systemsMultimedia Tools and Applications10.1007/s11042-022-13246-881:30(43107-43123)Online publication date: 1-Dec-2022
  • (2021)A Large-Scale Security-Oriented Static Analysis of Python Packages in PyPI2021 18th International Conference on Privacy, Security and Trust (PST)10.1109/PST52912.2021.9647791(1-10)Online publication date: 13-Dec-2021
  • (2019)How bad can a bug get? an empirical analysis of software failures in the OpenStack cloud computing platformProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3338906.3338916(200-211)Online publication date: 12-Aug-2019
  • (2019)Boa meets pythonProceedings of the 16th International Conference on Mining Software Repositories10.1109/MSR.2019.00086(577-581)Online publication date: 26-May-2019
  • (2019)On Comparing Software Quality Metrics of Traditional vs Blockchain-Oriented Software: An Empirical Study2019 IEEE International Workshop on Blockchain Oriented Software Engineering (IWBOSE)10.1109/IWBOSE.2019.8666575(32-37)Online publication date: Feb-2019
  • (2019)An empirical analysis of the transition from Python 2 to Python 3Empirical Software Engineering10.1007/s10664-018-9637-224:2(751-778)Online publication date: 1-Apr-2019
  • (2019)Enabling Empirical Research: A Corpus of Large-Scale Python SystemsProceedings of the Future Technologies Conference (FTC) 201910.1007/978-3-030-32523-7_49(661-669)Online publication date: 10-Oct-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media