Abstract
The massive growth in scholarly outputs during the last few decades has resulted into the creation of several scholarly databases to index the outputs. These scholarly databases index publication records and provide different metadata fields for different kinds of usage ranging from retrieval and research evaluation to various scientometric analysis. The ‘author keywords’ is one such important metadata field provided by many databases and used for different text-based and thematic structure analysis. The Dimensions database, however, does not provide ‘author keywords’ metadata field, instead it provides automatically generated terms from the article full texts, called ‘concepts’. Therefore, it is not clear whether different text-based analysis can be done with data provided by Dimensions database. Therefore, this article explores the distributional characteristics of Dimensions concepts. The Dimensions concept data obtained for a sufficiently large sample of scholarly articles is analysed through rank frequency distribution plots in the log–log space. Existence of Zipfian distribution is explored. The results indicate that Dimensions concepts adhere to the Zipfian properties which in turn indicates that Dimensions concepts have similar distributional characteristics as author keywords and hence they may have the same expressive power as that of author or index keywords for scientometric exercises. The study is novel as it is the first study to explore the distributional characteristics of the Dimensions concepts, particularly with respect to Zipfian properties, which provide the statistical foundation for understanding the Dimensions concepts and help to model and analyse them.
Similar content being viewed by others
References
Ausloos, M., Nedic, O., Fronczak, A., & Fronczak, P. (2016). Quantifying the quality of peer reviewers through Zipf’s law. Scientometrics, 106, 347–368.
Banshal, S. K., Gupta, S., Lathabai, H. H., & Singh, V. K. (2022). Power laws in altmetrics: An empirical analysis. Journal of Informetrics, 16(3), 101309. https://doi.org/10.1016/j.joi.2022.101309
Barker, M. A. A. R. (1969). An Urdu Newspaper Word Count. McGill University Press.
Bode, C., Herzog, C., Hook, D., & McGrath, R. (2018). A Guide to the Dimensions Data Approach. Dimensions Report. Digital Science.
Brzezinski, M. (2015). Power laws in citation distributions: Evidence from scopus. Scientometrics, 103, 213–228.
Cardoso, L., Araújo-Vila, N., Soliman, M., Araújo, A. F., & Almeida, G. G. F. (2022). How to employ Zipf’s laws for content analysis in tourism studies. International Journal of Hospitality and Tourism Systems, 15, 1–16.
Clauset, A., Shalizi, C. R., & Newman, M. E. (2009). Power-law distributions in empirical data. SIAM Review, 51(4), 661–703.
Fedorowicz, J. (1982). The theoretical foundation of Zipf’s law and its application to the bibliographic database environment. Journal of the American Society for Information Science, 33(5), 285–293.
García-Sánchez, P., Mora, A. M., Castillo, P. A., & Pérez, I. J. (2019). A bibliometric study of the research area of videogames using dimensions. AI database. Procedia Computer Science, 162, 737–744.
Ghatage, A. M. (1964). Phonemic and Morphemic Frequencies in Hindi. Deccan College Postgraduate and Research Institute.
Haitun, S. (1982). Stationary scientometric distributions: Part III. The role of the Zipf distribution. Scientometrics, 4(3), 181–194.
Herzog, C., Hook, D., & Konkiel, S. (2020). Dimensions: Bringing down barriers between scientometricians and data. Quantitative Science Studies, 1(1), 387–395.
Hook, D. W., Porter, S. J., & Herzog, C. (2018). Dimensions: Building context for search and evaluation. Frontiers in Research Metrics and Analytics, 3, 23.
Hou, Z., & Wang, D. (2022). New observations on Zipf’s Law in passwords. IEEE Transactions on Information Forensics and Security, 18, 517–532.
Jayaram, B. D., & Vidya, M. N. (2008). Zipf’s law for Indian languages. Journal of Quantitative Linguistics, 15(4), 293–317.
Lathabai, H. H., Nandy, A., & Singh, V. K. (2021). x-index: Identifying core competency and thematic research strengths of institutions using an NLP and network based ranking framework. Scientometrics, 126, 9557–9583.
Lu, W., Liu, Z., Huang, Y., Bu, Y., Li, X., & Cheng, Q. (2020). How do authors select keywords? A preliminary study of author keyword selection behavior. Journal of Informetrics, 14(4), 101066.
Mahi, M., Ismail, I., Phoong, S. W., & Isa, C. R. (2021). Mapping trends and knowledge structure of energy efficiency research: What we know and where we are going. Environmental Science and Pollution Research, 28(27), 35327–35345.
Moreno-Sánchez, I., Font-Clos, F., & Corral, Á. (2016). Large-scale analysis of Zipf’s law in english texts. PLoS ONE, 11(1), e0147073.
Newman, M. E. (2005). Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, 46(5), 323–351.
Okuyama, K., Takayasu, M., & Takayasu, H. (1999). Zipf’s law in income distribution of companies. Physica a: Statistical Mechanics and Its Applications, 269(1), 125–131.
Piantadosi, S. T. (2014). Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review, 21, 1112–1130.
Rana, M. S. (2015). Content analysis and application of Zipf's law in computer science literature. In 2015 4th International Symposium on emerging trends and technologies in libraries and information services (pp. 223–227). IEEE.
Singh, P., Singh, V. K., & Pinto, D. (2020). Revisiting subject classification in academic databases: A comparison of the classification accuracy of web of science, scopus & dimensions. Journal of Intelligent and Fuzzy Systems, 39(2), 2471–2476.
Singh, V. K., Singh, P., Karmakar, M., Leta, J., & Mayr, P. (2021). The journal coverage of Web of science, scopus and dimensions: A comparative analysis. Scientometrics, 126, 5113–5142.
Valderrama-Zurián, J. C., García-Zorita, C., Marugán-Lázaro, S., & Sanz-Casado, E. (2021). Comparison of MeSH terms and KeyWords plus terms for more accurate classification in medical research fields. A case study in cannabis research. Information Processing & Management, 58(5), 102658.
Wang, D., Cheng, H., Wang, P., Huang, X., & Jian, G. (2017). Zipf’s law in passwords. IEEE Transactions on Information Forensics and Security, 12(11), 2776–2791.
Yu, D., & Hong, X. (2022). A theme evolution and knowledge trajectory study in AHP using science mapping and main path analysis. Expert Systems with Applications, 205, 117675.
Zhang, J., Yu, Q., Zheng, F., Long, C., Lu, Z., & Duan, Z. (2016). Comparing keywords plus of WOS and author keywords: A case study of patient adherence research. Journal of the Association for Information Science and Technology, 67(4), 967–972.
Zhang, Z. K., Lü, L., Liu, J. G., & Zhou, T. (2008). Empirical analysis on a keyword-based semantic system. The European Physical Journal B, 66, 557–561.
Zipf, G. (1936). The Psychobiology of Language. Routledge.
Zipf, G. K. (1949). Human Behaviour and the Principle of Least-Effort (p. 24). Addison-Wesley.
Funding
This work is partly supported by extramural research Grant No.: MTR/2020/000625 from Science and Engineering Research Board (SERB), India, and by HPE Aruba Centre for Research in Information Systems at BHU (Grant No.: M-22-69 of BHU), to the second author.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that the manuscript complies with ethical standards of the journal and there is no conflict of interests whatsoever.
Additional information
Publisher's Note
Springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gupta, S., Singh, V.K. Distributional characteristics of Dimensions concepts: An Empirical Analysis using Zipf’s law. Scientometrics 129, 1037–1053 (2024). https://doi.org/10.1007/s11192-023-04899-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-023-04899-9