Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3196398.3196450acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
short-paper

50K-C: a dataset of compilable, and compiled, Java projects

Published: 28 May 2018 Publication History

Abstract

We provide a repository of 50,000 compilable Java projects. Each project in this dataset comes with references to all the dependencies required to compile it, the resulting bytecode, and the scripts with which the projects were built.
The dependencies and the build scripts provide a mechanism to re-create compilation of the projects, if needed (to instruct source code for bytecode analysis, for example). The bytecode is ready for testing, execution, and dynamic analysis tools.

References

[1]
Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications (OOPSLA '06). ACM, New York, NY, USA, 169--190.
[2]
Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2013. Boa: A Language and Infrastructure for Analyzing Ultra-large-scale Software Repositories. In Proceedings of the 2013 International Conference on Software Engineering (ICSE '13). IEEE Press, Piscataway, NJ, USA, 422--431. http://dl.acm.org/citation.cfm?id=2486788.2486844
[3]
Michael D. Ernst, Adam Czeisler, William G. Griswold, and David Notkin. 2000. Quickly Detecting Relevant Program Invariants. In Proceedings of the 22Nd International Conference on Software Engineering (ICSE '00). ACM, New York, NY, USA, 449--458.
[4]
Cormac Flanagan and Stephen N. Freund. 2009. FastTrack: Efficient and Precise Dynamic Race Detection. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '09). ACM, New York, NY, USA, 121--133.
[5]
Georgios Gousios. 2013. The GHTorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR '13). IEEE Press, Piscataway, NJ, USA, 233--236. http://dl.acm.org/citation.cfm?id=2487085.2487132
[6]
Pallavi Joshi, Chang-Seo Park, Koushik Sen, and Mayur Naik. 2009. A Randomized Dynamic Program Analysis Technique for Detecting Real Deadlocks. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '09). ACM, New York, NY, USA, 110--120.
[7]
Cristina V. Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. DéJàVu: A Map of Code Duplicates on GitHub. Proc. ACM Program. Lang. 1, OOPSLA, Article 84 (Oct. 2017), 28 pages.
[8]
Cristina V. Lopes and Joel Ossher. 2015. How Scale Affects Structure in Java Programs. SIGPLAN Not. 50, 10 (Oct. 2015), 675--694.
[9]
Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. 2016. SourcererCC: Scaling Code Clone Detection to Big-code. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). ACM, New York, NY, USA, 1157--1168.
[10]
SPEC {n. d.}. The SPEC Benchmarks. Standard Performance Evaluation Corporation. ({n. d.}). https://www.spec.org/benchmarks.html

Cited By

View all
  • (2024)BinEq - A Benchmark of Compiled Java Programs to Assess Alternative BuildsProceedings of the 2024 Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses10.1145/3689944.3696162(15-25)Online publication date: 19-Nov-2024
  • (2024)SourcererJBF: A Java Build Framework For Large-Scale CompilationACM Transactions on Software Engineering and Methodology10.1145/363571033:3(1-35)Online publication date: 15-Mar-2024
  • (2024)Delving into Parameter-Efficient Fine-Tuning in Code Change Learning: An Empirical Study2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00055(465-476)Online publication date: 12-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '18: Proceedings of the 15th International Conference on Mining Software Repositories
May 2018
627 pages
ISBN:9781450357166
DOI:10.1145/3196398
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. large scale compilation
  2. runnable software repositories
  3. software mining

Qualifiers

  • Short-paper

Conference

ICSE '18
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)BinEq - A Benchmark of Compiled Java Programs to Assess Alternative BuildsProceedings of the 2024 Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses10.1145/3689944.3696162(15-25)Online publication date: 19-Nov-2024
  • (2024)SourcererJBF: A Java Build Framework For Large-Scale CompilationACM Transactions on Software Engineering and Methodology10.1145/363571033:3(1-35)Online publication date: 15-Mar-2024
  • (2024)Delving into Parameter-Efficient Fine-Tuning in Code Change Learning: An Empirical Study2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00055(465-476)Online publication date: 12-Mar-2024
  • (2023)JaMaBuild: Mass Building of Java ProjectsCompanion Proceedings of the 2023 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity10.1145/3618305.3623613(56-57)Online publication date: 22-Oct-2023
  • (2023)INSPECT: Intrinsic and Systematic Probing Evaluation for Code TransformersIEEE Transactions on Software Engineering10.1109/TSE.2023.334162450:2(220-238)Online publication date: 12-Dec-2023
  • (2023)Multi-Modal API Recommendation2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00034(272-283)Online publication date: Mar-2023
  • (2023)The Devil is in the Tails: How Long-Tailed Code Distributions Impact Large Language ModelsProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00157(40-52)Online publication date: 11-Nov-2023
  • (2023)JEMMA: An extensible Java dataset for ML4Code applicationsEmpirical Software Engineering10.1007/s10664-022-10275-728:2Online publication date: 10-Mar-2023
  • (2022)Open Source Software Development ChallengesResearch Anthology on Agile Software, Software Development, and Testing10.4018/978-1-6684-3702-5.ch102(2134-2164)Online publication date: 2022
  • (2022)Find bugs in static bug findersProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527899(516-527)Online publication date: 16-May-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media