Abstract
Despite growing evidence of open biodiversity data reuse by scientists, information about how data is reused and cited is rarely openly accessible from research data repositories. This study explores data citation and reuse practices in biodiversity by using openly available metadata for 43,802 datasets indexed in the Global Biodiversity Information Facility (GBIF) and content analyses of articles citing GBIF data. Results from quantitative and content analyses suggest that even though the number of studies making use of openly available biodiversity data has been increasing steadily, best practice for data citation is not yet common. It is encouraging, however, that an increasing number of recent articles (16 out of 23 in 2019) in biodiversity cite datasets in a standard way. A content analysis of a random sample of unique citing articles (n = 100) found various types of background (n = 18) and foreground (n = 81) reuse cases for GBIF data, ranging from combining with other data sources to create species distribution modelling to software testing. This demonstrates some unique research opportunities created by open data. Among the citing articles, 27% mentioned the dataset in references and 13% in data access statements in addition to the methods section. Citation practice was inconsistent especially when a large number of subsets (12 ~ 50) were used. Even though many GBIF dataset records had altmetric scores, most posts only mentioned the articles linked to those datasets. Among the altmetric mentions of datasets, blogs can be the most informative, even though rare, and most tweets and Facebook posts were for promotional purposes.
Similar content being viewed by others
Availability of data and materials
All data created during this study are available on Figshare at https://doi.org/10.6084/m9.figshare.8181098.v1 (Khan and Thelwall 2019a) and https://doi.org/10.6084/m9.figshare.11357693 (Khan and Thelwall 2019b).
Code availability
Not applicable.
References
Anagnostou, P., Capocasa, M., Milia, N., & Bisol, G. D. (2013). Research data sharing: Lessons from forensic genetics. Forensic Science International: Genetics, 7(6), e117–e119.
Bishop, L., & Kuula-Luumi, A. (2017). Revisiting qualitative data reuse: A decade on. Sage Open, 7(1), 2158244016685136.
Borgman, C. L. (2012). The conundrum of sharing research data. Journal of the American Society for Information Science and Technology, 63(6), 1059–1078.
Bornmann, L. (2015). Alternative metrics in scientometrics: A meta-analysis of research into three altmetrics. Scientometrics, 103(3), 1123–1144.
Chavan, V., & Penev, L. (2011). The data paper: A mechanism to incentivize data publishing in biodiversity science. BMC Bioinformatics, 12(15), S2.
Costello, M. J., & Wieczorek, J. (2014). Best practice for biodiversity data management and publication. Biological Conservation, 173, 68–73.
Costello, M. J., Michener, W. K., Gahegan, M., Zhang, Z. Q., & Bourne, P. E. (2013). Biodiversity data should be published, cited, and peer reviewed. Trends in Ecology & Evolution, 28(8), 454–461.
Edmunds, S. C., Pollard, T. J., Hole, B., & Basford, A. T. (2012). Adventures in data citation: Sorghum genome data exemplifies the new gold standard. BMC research notes, 5(1), 223.
Enke, N., Thessen, A., Bach, K., Bendix, J., Seeger, B., & Gemeinholzer, B. (2012). The user’s view on biodiversity data sharing—Investigating facts of acceptance and requirements to realize a sustainable use of research data. Ecological Informatics, 11, 25–33.
Escribano, N., Galicia, D., & Ariño, A. H. (2019). Completeness of Digital Accessible Knowledge (DAK) about terrestrial mammals in the Iberian Peninsula. PLoS ONE, 14(3), e0213542.
Huang, X., Hawkins, B. A., Lei, F., Miller, G. L., Favret, C., Zhang, R., & Qiao, G. (2012). Willing or unwilling to share primary biodiversity data: Results and implications of an international survey. Conservation Letters, 5(5), 399–406.
Ingwersen, P., & Chavan, V. (2011). Indicators for the Data Usage Index (DUI): an incentive for publishing primary biodiversity data through global information infrastructure. BMC Bioinformatics, 12(15), S3.
Khan, N., & Thelwall, M. (2019a). Dataset supporting “Data Citation and Reuse Practice in Biodiversity.” FigShare: Dataset. https://doi.org/10.6084/m9.figshare.8181098.v1.
Khan, N., & Thelwall, M. (2019b). Dataset supporting “Measuring the Impact of Biodiversity Datasets: Data Reuse, Citations and Altmetrics.” FigShare: Dataset. https://doi.org/10.6084/m9.figshare.11357693.
Kim, Y., & Zhang, P. (2015). Understanding data sharing behaviors of STEM researchers: The roles of attitudes, norms, and data repositories. Library & Information Science Research, 37(3), 189–200.
Konkiel, S. (2013). Tracking citations and altmetrics for research data: Challenges and opportunities. Bulletin of the American Society for Information Science and Technology, 39(6), 27–32.
Kratz, J., & Strasser, C. (2014). Data publication consensus and controversies. F1000Research, 3.
Kratz, J. E., & Strasser, C. (2015). Making data count. Scientific data, 2(1), 1–5.
Kratz, J. E., & Strasser, C. (2015). Researcher perspectives on publication and peer review of data. PLoS ONE, 10(2), e0117619.
Magurran, A. E., Baillie, S. R., Buckland, S. T., Dick, J. M., Elston, D. A., Scott, E. M., et al. (2010). Long-term datasets in biodiversity research and monitoring: Assessing change in ecological communities through time. Trends in ecology & evolution, 25(10), 574–582.
Mayo, C., Vision, T. J., & Hull, E. A. (2016). The location of the citation: Changing practices in how publications cite original data in the Dryad Digital Repository. International Journal of Digital Curation, 11(1), 150–155.
Moritz, T., Krishnan, S., Roberts, D., Ingwersen, P., Agosti, D., Penev, L., et al. (2011). Towards mainstreaming of biodiversity data publishing: recommendations of the GBIF Data Publishing Framework Task Group. BMC Bioinformatics, 12(15), S1.
Park, H., & Wolfram, D. (2017). An examination of research data sharing and re-use: implications for data citation practice. Scientometrics, 111(1), 443–461.
Peters, I., Kraker, P., Lex, E., Gumpenberger, C., & Gorraiz, J. (2016). Research data explored: an extended analysis of citations and altmetrics. Scientometrics, 107(2), 723–744.
Piwowar, H. A. (2011). Who shares? Who doesn’t? Factors associated with openly archiving raw research data. PLoS ONE, 6(7), e18657.
Piwowar, H. A., & Vision, T. J. (2013). Data reuse and the open data citation advantage. PeerJ, 1, e175.
Sayogo, D. S., & Pardo, T. A. (2013). Exploring the determinants of scientific data sharing: Understanding the motivation to publish research data. Government Information Quarterly, 30, S19–S31.
Shema, H., Bar-Ilan, J., & Thelwall, M. (2014). Do blog citations correlate with a higher number of future citations? Research blogs as a potential source for alternative metrics. Journal of the Association for Information Science and Technology, 65(5), 1018–1027.
Silvello, G. (2018). Theory and practice of data citation. Journal of the Association for Information Science and Technology, 69(1), 6–20.
Starr, J., Castro, E., Crosas, M., Dumontier, M., Downs, R., Duerr, R., et al. (2015). Achieving human and machine accessibility of cited data in scholarly publications. PeerJ Computer Science, 1, e1.
Robinson-García, N., Jiménez-Contreras, E., & Torres-Salinas, D. (2016). Analyzing data citation practices using the data citation index. Journal of the Association for Information Science and Technology, 67(12), 2964–2975.
Troudet, J., Vignes-Lebbe, R., Grandcolas, P., & Legendre, F. (2018). The increasing disconnection of primary biodiversity data from specimens: How does it happen and how to handle it? Systematic Biology., 67(6), 1110–1119.
Wallis, J. C., Rolando, E., & Borgman, C. L. (2013). If we share data, will anyone use them? Data sharing and reuse in the long tail of science and technology. PLoS ONE, 8(7), e67332.
Acknowledgements
The study is an extended version of the paper “Data Citation and Reuse Practice in Biodiversity—Challenges of Adopting a Standard Citation Model”, which was presented at the 2019 ISSI Conference in Rome, Italy.
Funding
This study was funded by the University of Wolverhampton.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Not applicable.
Rights and permissions
About this article
Cite this article
Khan, N., Thelwall, M. & Kousha, K. Measuring the impact of biodiversity datasets: data reuse, citations and altmetrics. Scientometrics 126, 3621–3639 (2021). https://doi.org/10.1007/s11192-021-03890-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-021-03890-6