Abstract
This is the first detailed study on the coverage of Microsoft Academic (MA). Based on the complete and verified publication list of a university, the coverage of MA was assessed and compared with two benchmark databases, Scopus and Web of Science (WoS), on the level of individual publications. Citation counts were analyzed, and issues related to data retrieval and data quality were examined. A Perl script was written to retrieve metadata from MA based on publication titles. The script is freely available on GitHub. We find that MA covers journal articles, working papers, and conference items to a substantial extent and indexes more document types than the benchmark databases (e.g., working papers, dissertations). MA clearly surpasses Scopus and WoS in covering book-related document types and conference items but falls slightly behind Scopus in journal articles. The coverage of MA is favorable for evaluative bibliometrics in most research fields, including economics/business, computer/information sciences, and mathematics. However, MA shows biases similar to Scopus and WoS with regard to the coverage of the humanities, non-English publications, and open-access publications. Rank correlations of citation counts are high between MA and the benchmark databases. We find that the publication year is correct for 89.5% of all publications and the number of authors is correct for 95.1% of the journal articles. Given the fast and ongoing development of MA, we conclude that MA is on the verge of becoming a bibliometric superpower. However, comprehensive studies on the quality of MA metadata are still lacking.
Similar content being viewed by others
Notes
For an overview of the AK API see https://docs.microsoft.com/en-us/azure/cognitive-services/academic-knowledge/home.
References
Bertin, M. (2008). Categorizations and annotations of citation in research evaluation. In D.C. Wilson, & H.C. Lane (Eds.), Proceedings of the 21st international Florida artificial intelligence research society conference (pp. 456–461). Menlo Park: AAAI Press.
Bertin, M., Atanassova, I., Sugimoto, C. R., & Lariviere, V. (2016). The linguistic patterns and rhetorical structure of citation context: an approach using n-grams. Scientometrics, 109(3), 1417–1434. doi:10.1007/s11192-016-2134-8.
Bosman, J., van Mourik, I., Rasch, M., Sieverts, E., & Verhoeff, H. (2006). Scopus reviewed and compared: The coverage and functionality of the citation database Scopus, including comparisons with Web of Science and Google Scholar. Netherlands: Utrecht University Library.
Clarivate. (2017). Web of science core collection. Retrieved from http://wokinfo.com/products_tools/multidisciplinary/webofscience/.
Currano, J. L., & Roth, D. L. (Eds.). (2014). Chemical information for chemists. A primer. Cambridge: RSC Publishing.
De Domenico, M., Omodei, E., & Arenas, A. (2016). Quantifying the diaspora of knowledge in the last century. Applied Network Science, 1(15), 1–13. doi:10.1007/s41109-016-0017-9.
Effendy, S., & Yap, R. H. (2016). Investigations on rating computer sciences conferences: An experiment with the Microsoft Academic Graph dataset. In J. Bourdeau, J. A. Hendler, & R. Nkambou Nkambou (Eds.), Proceedings of the 25th international conference companion on world wide web (pp. 425–430). Geneva: International World Wide Web Conferences Steering Committee. doi:10.1145/2872518.2890525.
Effendy, S., & Yap, R. H. (2017). Analysing trends in computer science research: A preliminary study using the Microsoft Academic Graph. In R. Barret, & R. Cummings (Eds.), Proceedings of the 26th international conference companion on world wide web (pp. 1245–1250). Geneva: International World Wide Web Conferences Steering Committee. doi:10.1145/3041021.3053064.
Elsevier. (2017). Scopus content coverage guide January 2016. Retrieved from https://www.elsevier.com/__data/assets/pdf_file/0007/69451/scopus_content_coverage_guide.pdf.
Fagan, J. C. (2017). An evidence-based review of academic web search engines, 2014–2016: Implications for librarians’ practice and research agenda. Information Technology and Libraries, 36(2), 7–47. doi:10.6017/ital.v36i2.9718.
Gorraiz, J., Melero-Fuentes, D., Gumpenberger, C., & Valderrama-Zurian, J. C. (2016). Availability of digital object identifiers (DOIs) in Web of Science and Scopus. Journal of Informetrics, 10(1), 98–109. doi:10.1016/j.joi.2015.11.008.
Gumpenberger, C., Sorz, J., Wieland, M., & Gorraiz, J. (2016). Humanities and social sciences in the bibliometric spotlight—Research output analysis at the University of Vienna and considerations for increasing visibility. Research Evaluation, 25(3), 271–278. doi:10.1093/reseval/rvw013.
Harzing, A.-W. (2016). Microsoft Academic (Search): A Phoenix arisen from the ashes? Scientometrics, 108(3), 1637–1647. doi:10.1007/s11192-016-2026-y.
Harzing, A.-W., & Alakangas, S. (2017a). Microsoft Academic: Is the phoenix getting wings? Scientometrics, 110(1), 371–383. doi:10.1007/s11192-016-2185-x.
Harzing, A.-W., & Alakangas, S. (2017b). Microsoft Academic is one year old: The Phoenix is ready to leave the nest. Scientometrics, 112(3), 1887–1894. doi:10.1007/s11192-017-2454-3.
Herrmannova, D., & Knoth, P. (2016a). An analysis of the Microsoft Academic Graph. D-Lib Magazine. doi:10.1045/september2016-herrmannova.
Herrmannova, D., & Knoth, P. (2016b). Semantometrics: Towards fulltext-based research evaluation. In N.R. Adam, B. Cassel, & Y. Yesha (Eds.), Proceedings of the 16th ACM/IEEE-CS on joint conference on digital libraries (pp. 235–236). New York: ACM. doi: 10.1145/2910896.2925448.
Hug, S. E., & Brändle, M. P. (2017). Microsoft Academic is on the verge of becoming a bibliometric superpower. LSE Impact Blog. Retrieved from http://blogs.lse.ac.uk/impactofsocialsciences/2017/06/19/microsoft-academic-is-on-the-verge-of-becoming-a-bibliometric-superpower/.
Hug, S. E., Ochsner, M., & Brändle, M. P. (2017). Citation analysis with Microsoft Academic. Scientometrics, 111(1), 371–378. doi:10.1007/s11192-017-2247-8.
Larsen, P. O., & von Ins, M. (2010). The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index. Scientometrics, 84(3), 575–603. doi:10.1007/s11192-010-0202-z.
Luo, D., Gong, C., Hu, R., Duan, L., & Ma, S. (2016). Ensemble enabled weighted PageRank. Retrieved from https://arxiv.org/abs/1604.05462v1.
Main Library of the University of Zurich. (2017). Regulations. Retrieved from http://www.oai.uzh.ch/en/working-with-zora/regulations.
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. New York: Cambridge University Press.
Mas-Bleda, A., & Thelwall, M. (2016). Can alternative indicators overcome language biases in citation counts? A comparison of Spanish and UK research. Scientometrics, 109(3), 2007–2030. doi:10.1007/s11192-016-2118-8.
Medo, M., & Cimini, G. (2016). Model-based evaluation of scientific impact indicators. Physical Review E, 94(3), 032312. doi:10.1103/PhysRevE.94.032312.
Microsoft. (2017a). Microsoft Academic Graph. Retrieved from https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/.
Microsoft. (2017b). Microsoft Academic. Retrieved from https://www.microsoft.com/en-us/research/project/academic/.
Microsoft. (2017c). Microsoft Academic. Frequently Asked Questions. Retrieved from https://academic.microsoft.com/#/faq.
Microsoft Academic [@MSFTAcademic]. (2017). Some facts about the current size of our data. Stop & meet us at #kdd2017 @MLatMSFT [Tweet]. Retrieved from https://twitter.com/MSFTAcademic/status/897494672200921088.
Moed, H. F. (2005). Citation analysis in research evaluation. Dordrecht: Springer.
Moed, H. F., Bar-Ilan, J., & Halevi, G. (2016). A new methodology for comparing Google Scholar and Scopus. Journal of Informetrics, 10(2), 533–551. doi:10.1016/j.joi.2016.04.017.
Mongeon, P., & Paul-Hus, A. (2016). The journal coverage of Web of Science and Scopus: a comparative analysis. Scientometrics, 106(1), 213–228. doi:10.1007/s11192-015-1765-5.
OECD. (2007). Revised field of science and technology (FOS) classification in the Frascati manual. Paris, France: Working Party of National Experts on Science and Technology Indicators, Organisation for Economic Co-operation and Development (OECD).
Orduna-Malea, E., Ayllón, J. M., Martín-Martín, A., & López-Cózar, E. D. (2015). Methods for estimating the size of Google Scholar. Scientometrics, 104(3), 931–949. doi:10.1007/s11192-015-1614-6.
Ortega, J. L. (2014). Academic search engines: A quantitative outlook. Cambridge: Chandos Publishing.
Portenoy, J., Hullman, J., & West, J. D. (2016). Leveraging citation networks to visualize scholarly influence over time. Retrieved from https://arxiv.org/abs/1611.07135v2.
Portenoy, J., & West, J. D. (2017). Visualizing scholarly publications and citations to enhance author profiles. In R. Barret, & R. Cummings (Eds.), Proceedings of the 26th International Conference Companion on World Wide Web (pp. 1279-1282). Geneva: International World Wide Web Conferences Steering Committee. doi:10.1145/3041021.3053058.
Prins, A. A. M., Costas, R., van Leeuwen, T. N., & Wouters, P. F. (2016). Using Google Scholar in research evaluation of humanities and social science programs: A comparison with Web of Science data. Research Evaluation, 25(3), 264–270. doi:10.1093/reseval/rvv049.
Ribas, S., Ueda, A., Santos, R. L., Ribeiro-Neto, B., & Ziviani, N. (2016). Simplified relative citation ratio for static paper ranking. Retrieved from https://arxiv.org/abs/1603.01336v1.
Sandulescu, V., & Chiru, M. (2016). Predicting the future relevance of research institutions—The winning solution of the KDD Cup 2016. Retrieved from https://arxiv.org/abs/1609.02728v1.
Sinha, A., Shen, Z., Song, Y., Ma, H., Eide, D., Hsu, B.-j. P., & Wang, K. (2015). An overview of Microsoft Academic Service (MAS) and applications. In A. Gangemi, S. Leonardi, & A. Panconesi (Eds.), Proceedings of the 24th international conference on world wide web (pp. 243–246). New York: ACM. doi: 10.1145/2740908.2742839.
Tan, Z., Liu, C., Mao, Y., Guo, Y., Shen, J., & Wang, X. (2016). AceMap: A novel approach towards displaying relationship among academic literatures. In J. Bourdeau, J.A. Hendler, & R. Nkambou Nkambou (Eds.), Proceedings of the 25th international conference companion on world wide web (pp. 437–442). Geneva: International World Wide Web Conferences Steering Committee. doi: 10.1145/2872518.2890514.
Vaccario, G., Medo, M., Wider, N., & Mariani, M. S. (2017). Quantifying and suppressing ranking bias in a large citation network. Journal of Informetrics, 11(3), 766–782. doi:10.1016/j.joi.2017.05.014.
Wade, A., Kuasan, W., Yizhou, S., & Gulli, A. (2016). WSDM cup 2016: Entity ranking challenge. In P. N. Bennet, V. Josifovski, J. Neville, & F. Radlinski (Eds.), Proceedings of the ninth ACM international conference on web search and data mining (pp. 593–594). New York: ACM.
Waltman, L. (2016). A review of the literature on citation impact indicators. Journal of Informetrics, 10(2), 365–391. doi:10.1016/j.joi.2016.02.007.
Wesley-Smith, I., Bergstrom, C. T., & West, J. D. (2016). Static ranking of scholarly papers using article-level eigenfactor (ALEF). Retrieved from https://arxiv.org/abs/1606.08534v1.
Wilson, J., Mohan, R., Arif, M., Chaudhury, S., & Lall, B. (2016). Ranking academic institutions on potential paper acceptance in upcoming conferences. Retrieved from https://arxiv.org/abs/1610.02828v1.
Xiao, S., Yan, J., Li, C., Jin, B., Wang, X., Zha, H., & Yang, X. (2016). On modeling and predicting individual paper citation count over time. In S. Kambhampati (Ed.), Proceedings of the 25th international joint conference on artificial intelligence (pp. 2676–2682). New York: AAAI Press.
Acknowledgments
The authors thank the development team of Microsoft Academic for their support, the ZORA editorial team for their advice, Robin Haunschild for comments, Mirjam Aeschbach for proofreading, and the reviewers for their remarks.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hug, S.E., Brändle, M.P. The coverage of Microsoft Academic: analyzing the publication output of a university. Scientometrics 113, 1551–1571 (2017). https://doi.org/10.1007/s11192-017-2535-3
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-017-2535-3