research-article

On Reproducible AI: : Towards Reproducible Research, Open Science, and Digital Scholarship in AI Publications

Authors:

Odd Erik Gundersen,

David W. AhaAuthors Info & Claims

AI Magazine, Volume 39, Issue 3

Pages 56 - 68

https://doi.org/10.1609/aimag.v39i3.2816

Published: 01 September 2018 Publication History

Abstract

Artificial intelligence, like any science, must rely on reproducible experiments to validate results. Our objective is to give practical and pragmatic recommendations for how to document AI research so that results are reproducible. Our analysis of the literature shows that AI publications currently fall short of providing enough documentation to facilitate reproducibility. Our suggested best practices are based on a framework for reproducibility and recommendations for best practices given by scientific organizations, scholars, and publishers. We have made a reproducibility checklist based on our investigation and described how every item in the checklist can be documented by authors and examined by reviewers. We encourage authors and reviewers to use the suggested best practices and author checklist when considering submissions for AAAI publications and conferences.

References

[1]

Altman, M., and King, G. 2007. A Proposed Standard for the Scholarly Citation of Quantitative Data. D‐Lib Magazine 13 (3/43/4).

[2]

Baker, M. 2016. Is There a Reproducibility Crisis? Nature 533(76047604): 452–54.

[3]

Begley, C. G., and Ellis, L. M. 2012. Drug Development: Raise Standards for Preclinical Cancer Research. Nature 483(73917391): 531–33. https://doi.org/10.1038/483531a.

[4]

Bouquet, P., Serafini, L., Zanobini, S.; and Benerecetti, M. 2003. An Algorithm for Semantic Coordination. Paper presented at the Second International Semantic Integration Workshop. Sanibel Island, FL, October 20–23.

[5]

Braun, M. L., and Ong, C. S. 2014. Open Science in Machine Learning. In Implementing Reproducible Research, edited by V. Stodden, F. Leish, and R. D. Peng, 343. Boca Raton, FL: CRC Press.

[6]

Data Citation Synthesis Group. 2014. Joint Declaration of Data Citation Principles, edited by M. Martone. San Diego, CA: FORCE 11. https://doi.org/10.25490/a97f-egyk.

[7]

DeRisi, S., Kennison, R.; and Twyman, N. 2003. The What and Whys of DOIs. PLoS Biology 1(22): 133–34, e57. https://doi.org/10.1371/journal.pbio.0000057.

[8]

De Weerdt, M. M., Gerding, E. H., Stein, S., Robu, V.; and Jennings, N. R. 2013. Intention‐Aware Routing to Minimise Delays at Electric Vehicle Charging Stations. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence, 83–89. Palo Alto, CA: AAAI Press.

[9]

Downs, R. R., Duerr, R., Hills, D. J.; and Ramapriyan, H. K. 2015. Data Stewardship in the Earth Sciences. D‐Lib Magazine 21 (7/87/8). https://doi.org/10.1045/july2015-downs.

[10]

Fokkens, A., Erp M. V., Postma, M., Pedersen, M., Vossen, P.; and Freire, N. 2013. Offspring from Reproduction Problems: What Replication Failure Teaches Us. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 1691–701. Stroudsburg, PA: Association for Computational Linguistics.

[11]

Garijo, D., Kinnings, S., Xie, L., Xie, L., Zhang, Y., Bourne, P. E.; and Gil, Y. 2013. Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome. PLos ONE 8(1111): e80278. https://doi.org/10.1371/journal.pone.0080278.

[12]

Gil, Y. 2017. Thoughtful Artificial Intelligence: Forging A New Partnership for Data Science and Scientific Discovery. Data Science 1(1–21–2): 119–29. https://doi.org/10.3233/DS-170011.

[13]

Gil, Y., David, C. H., Demir, I., Essawy, B. T., Fulweiler, R. W., Goodall, J. L., Karlstrom, L., Lee, H., Mills, H. J., Oh, J., Pierce, S. A.; Pope, A., Tzeng, M. W., Villamizar, S. R.; and Yu, X. 2016. Towards the Geoscience Paper of the Future: Best Practices for Documenting and Sharing Research from Data to Software to Provenance. Earth and Space Science 3(1010): 388–415. https://doi.org/10.1002/2015EA000136.

[14]

Gil, Y., Garijo, D., Ratnakar, V., Mayani, M. R., Adusumilli, R., Boyce, H., Srivastava, A.; and Mallick, P. 2017. Towards Continuous Scientific Data Analysis and Hypothesis Evolution. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, 4406–14. Palo Alto, CA: AAAI Press.

[15]

Goodman, A., Pepe, A., Blocker, A. W., Borgman, C. L., Cranmer, K., Crosas, M., Stefano, R. D., Gil, Y., Groth, P., Hedstrom, M., Hogg, D. W., Kashyap, V., Mahabal, A., Siemiginowska, A.; and Slavkovic, A. 2014. Ten Simple Rules for the Care and Feeding of Scientific Data. PLOS Computational Biology 10(44): e1003542. https://doi.org/10.1371/journal.pcbi.1003542.

[16]

Gundersen, O. E., and Kjensmo, S. 2018. State of the Art: Reproducibility in Artificial Intelligence. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 1644–51. Palo Alto, CA: AAAI Press.

[17]

Hanson, B., Lehnert, K.; and Cutcher‐Gershenfeld, J. 2015. Committing to Publishing Data in the Earth and Space Sciences. Eos: Earth and Space Science News 96. doi.org/doi:10.1029/2015EO022207.

[18]

He, K., Zhang, X., Ren, S.; and Sun, J. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: Institute of Electrical and Electronics Engineers.

[19]

Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D.; and Meger, D. 2018. Deep Reinforcement Learning that Matters. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 3207–14. Palo Alto, CA: AAAI Press.

[20]

Hunold, S. 2015. A Survey on Reproducibility in Parallel Computing. arXiv preprint arXiv:1511.04217 [cs.DC]. Ithaca, NY: Cornell University Library.

[21]

Hunold, S., and Träff, J. S. 2013. On the State and Importance of Reproducible Experimental Research in Parallel Computing. arXiv preprint arXiv:1308.3648 [cs.DC]. Ithaca, NY: Cornell University Library.

[22]

Ioannidis, J. P. 2005. Why Most Published Research Findings Are False. PLoS Medicine 2(88): e124. https://doi.org/10.1371/journal.pmed.0020124.

[23]

Joly, Y., Dove, E. S., Kennedy, K. L., Bobrow, M., Ouellette, B. F. F., Dyke, S. O. M.; Kato, K.; and Knoppers, B. M. 2012. Open Science and Community Norms: Data Retention and Publication Moratoria Policies in Genomics Projects. Medical Law International 12(22): 92–120. https://doi.org/10.1177/0968533212458431.

[24]

Klein, M., de Sompel, H. V., Sanderson, R., Shankar, H., Balakireva, L., Zhou, K.; and Tobin, R. 2014. Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. PLoS One 9(1212): e115253. https://doi.org/10.1371/journal.pone.0115253.

[25]

Lithgow, G. J., Driscoll, M.; and Phillips, P. 2017. A Long Journey to Reproducible Results. Nature 548(76687668): 387–88.

[26]

Mooney, H., and Newton, M. P. 2012. The Anatomy of a Data Citation: Discovery, Reuse, and Credit. Journal of Librarianship and Scholarly Communication 1(11): eP1035. https://doi.org/10.7710/2162-3309.1035.

[27]

National Research Council. 2012. For Attribution: Developing Data Attribution and Citation Practices and Standards: Summary of an International Workshop. Washington, DC: The National Academies Press. https://doi.org/10.17226/13564.

[28]

Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., Buck, S., Chambers, C. D., Chin, G., Christensen, G., Contestabile, M., Dafoe, A., Eich, E., Freese, J., Glennerster, R., Goroff, D., Green, D. P., Hesse, B., Humphreys, M., Ishiyama, J., Karlan, D., Kraut, A., Lupia, A., Mabry, P., Madon, T., Malhotra, N., Mayo‐Wilson, E., McNutt, M., Miguel, E., Levy Paluck E., Simonsohn, U., Soderberg, C., Spellman, B. A., Turitto, J., VandenBos, G., Vazire, S., Wagenmakers, E. J., Wilson, R.; and Yarkoni, T. 2015. Promoting an Open Research Culture. Science 348(62426242): 1422–25. https://doi.org/10.1126/science.aab2374.

[29]

Piwowar, H. A., Day, R. S.; and Fridsma, D. B. 2007. Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(33): e308. https://doi.org/10.1371/journal.pone.0000308.

[30]

Starr, J., Castro, E., Crosas, M., Dumontier, M., Downs, R. R., Duerr, R., Haak, L. L., Haendel, M., Herman, I., Hodson, S., Hourclé, J., Kratz, J. E., Lin, J., Nielsen, L. H., Nurnberger, A., Proell, S., Rauber, A., Sacchi, S., Smith, A., Taylor, M.; and Clark, T. 2015. Achieving Human and Machine Accessibility of Cited Data in Scholarly Publications. PeerJ Computer Science 1(11): e1. https://doi.org/10.7717/peerj-cs.1.

[31]

Stodden, V., McNutt, M., Bailey, D. H., Deelman, E., Gil, Y., Hanson, B., Heroux, M. A., Ioannidis, J. P. A.; and Taufer, M. 2016. Enhancing Reproducibility for Computational Methods. Science 354(63176317): 1240–41. https://doi.org/10.1126/science.aah6168.

[32]

Task Group on Data Citation Standards and Practices. 2013. Out of Cite, Out of Mind: Current State of Practice, Policy, and Technology for the Citation of Data. Data Science Journal 12:CIDCR1–7 https://doi.org/10.2481/dsj.OSOM13-043.

[33]

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.‐W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., Gonzalez‐Beltran, A., Gray, A. J. G., Groth, P., Goble, C., Grethe, J. S., Heringa, J., Hoen, P. A. C., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S. J., Martone, M. E., Mons, A., Packer, A. L., Persson, B., Rocca‐Serra, P., Roos, M., van Schaik, R., Sansone, S.‐A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M. A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J.; and Mons, B. 2016. The FAIR Guiding Principles for Scientific Data Management and Stewardship. Scientific Data 3: article 160018. https://doi.org/10.1038/sdata.2016.18.

Cited By

Liesenfeld ADingemanse M(2024)Rethinking open source generative AI: open-washing and the EU AI ActProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3659005(1774-1787)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3630106.3659005
Weber RJohs AGoel PSilva J(2024)XAI is in troubleAI Magazine10.1002/aaai.1218445:3(300-316)Online publication date: 25-Sep-2024
https://dl.acm.org/doi/10.1002/aaai.12184
Daniil SCuper MLiem Cvan Ossenbruggen JHollink L(2023)Reproducing Popularity Bias in Recommendation: The Effect of Evaluation StrategiesACM Transactions on Recommender Systems10.1145/36370662:1(1-39)Online publication date: 11-Dec-2023
https://dl.acm.org/doi/10.1145/3637066
Show More Cited By

Recommendations

Reproducible Research in Computational Harmonic Analysis

Scientific computation is emerging as absolutely central to the scientific method. Unfortunately, it's error-prone and currently immature—traditional scientific publication is incapable of finding and rooting out errors in scientific computation—which ...
Reproducible Online Search Experiments
Advances in Information Retrieval
Abstract
In the empirical sciences, the evidence is commonly manifested by experimental results. However, very often, these findings are not reproducible, hindering scientific progress. Innovations in the field of information retrieval (IR) are mainly ...
Reproducible Computer Network Experiments: A Case Study Using Popper
P-RECS '19: Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems

Computer network research experiments can be broadly grouped in three categories: simulated, controlled, and real-world experiments. Simulation frameworks, experiment testbeds and measurement tools, respectively, are commonly used as the platforms for ...

Comments

Information & Contributors

Information

Published In

© 2018 The Authors. AI Magazine published by John Wiley & Sons Ltd on behalf of Association for the Advancement of Artificial Intelligence.

Publisher

American Association for Artificial Intelligence

United States

Publication History

Published: 01 September 2018

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liesenfeld ADingemanse M(2024)Rethinking open source generative AI: open-washing and the EU AI ActProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3659005(1774-1787)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3630106.3659005
Weber RJohs AGoel PSilva J(2024)XAI is in troubleAI Magazine10.1002/aaai.1218445:3(300-316)Online publication date: 25-Sep-2024
https://dl.acm.org/doi/10.1002/aaai.12184
Daniil SCuper MLiem Cvan Ossenbruggen JHollink L(2023)Reproducing Popularity Bias in Recommendation: The Effect of Evaluation StrategiesACM Transactions on Recommender Systems10.1145/36370662:1(1-39)Online publication date: 11-Dec-2023
https://dl.acm.org/doi/10.1145/3637066
Shehzad FJannach D(2023)Everyone’s a Winner! On Hyperparameter Tuning of Recommendation ModelsProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3609488(652-657)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1145/3604915.3609488
Weise TWu ZSilva SPaquete L(2023)Replicable Self-Documenting Experiments with Arbitrary Search Spaces and AlgorithmsProceedings of the Companion Conference on Genetic and Evolutionary Computation10.1145/3583133.3596306(1891-1899)Online publication date: 15-Jul-2023
https://dl.acm.org/doi/10.1145/3583133.3596306
Liesenfeld ALopez ADingemanse M(2023)Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generatorsProceedings of the 5th International Conference on Conversational User Interfaces10.1145/3571884.3604316(1-6)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3571884.3604316
Li BQi PLiu BDi SLiu JPei JYi JZhou B(2023)Trustworthy AI: From Principles to PracticesACM Computing Surveys10.1145/355580355:9(1-46)Online publication date: 16-Jan-2023
https://dl.acm.org/doi/10.1145/3555803
Olorunnimbe KViktor H(2022)Deep learning in the stock market—a systematic survey of practice, backtesting, and applicationsArtificial Intelligence Review10.1007/s10462-022-10226-056:3(2057-2109)Online publication date: 30-Jun-2022
https://dl.acm.org/doi/10.1007/s10462-022-10226-0
Cremonesi PJannach D(2021)Progress in recommender systems researchAI Magazine10.1609/aimag.v42i3.1814542:3(43-54)Online publication date: 1-Sep-2021
https://dl.acm.org/doi/10.1609/aimag.v42i3.18145
Gil Y(2021)Will AI write scientific papers in the future?AI Magazine10.1609/aaai.1202742:4(3-15)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1609/aaai.12027
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents