Abstract
Identifying (and fixing) homonymous and synonymous author profiles is one of the major tasks of curating personalized bibliographic metadata repositories like the dblp computer science bibliography. In this paper, we present a machine learning approach to identify homonymous profiles. We train our model on a novel gold-standard data set derived from the past years of active, manual curation at dblp.
F. Reitz—Research funded by a grant of the Leibniz Competition, grant no. LZI-SAW-2015-2.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ley, M.: DBLP - some lessons learned. PVLDB 2(2), 1493–1500 (2009)
Ferreira, A.A., Gonçalves, M.A., Laender, A.H.F.: A brief survey of automatic methods for author name disambiguation. SIGMOD Rec. 41(2), 15–26 (2012)
de Carvalho, A.P., Ferreira, A.A., Laender, A.H.F., Gonçalves, M.A.: Incremental unsupervised name disambiguation in cleaned digital libraries. JIDM 2(3), 289–304 (2011)
Esperidião, L.V.B., et al.: Reducing fragmentation in incremental author name disambiguation. JIDM 5(3), 293–307 (2014)
Qian, Y., Zheng, Q., Sakai, T., Ye, J., Liu, J.: Dynamic author name disambiguation for growing digital libraries. Inf. Retr. J. 18(5), 379–412 (2015)
Santana, A.F., Gonçalves, M.A., Laender, A.H.F., Ferreira, A.A.: Incremental author name disambiguation by exploiting domain-specific heuristics. JASIST 68(4), 931–945 (2017)
Zhao, Z., Rollins, J., Bai, L., Rosen, G.: Incremental author name disambiguation for scientific citation data. In: DSAA 2017, pp. 175–183. IEEE (2017)
Tran, H.N., Huynh, T., Do, T.: Author name disambiguation by using deep neural network. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds.) ACIIDS 2014. LNCS (LNAI), vol. 8397, pp. 123–132. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05476-6_13
Müller, M.-C.: Semantic author name disambiguation with word embeddings. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 300–311. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_24
Müller, M., Reitz, F., Roy, N.: Data sets for author name disambiguation: an empirical analysis and a new resource. Scientometrics 111(3), 1467–1500 (2017)
Reitz, F.: Two test collections for the author name disambiguation problem based on DBLP, March 2018. https://doi.org/10.5281/zenodo.1215650
Ley, M., Reuther, P.: Maintaining an online bibliographical database: The problem of data quality. In: EGC 2006. RNTI, vol. E-6, pp. 5–10. Èditions Cépaduès (2006)
Reuther, P.: Personal name matching: new test collections and a social network based approach. Technical report 06-1, University of Trier (2006)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS 26, pp. 3111–3119 (2013)
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML 2014. JMLR Proceedings, vol. 32, pp. 1188–1196. JMLR.org (2014)
Ackermann, M.R., Reitz, F.: Homonym detection in curated bibliographies: learning from dblp’s experience (full version). arXiv:1806.06017 [cs.DL] (June 2018)
Gibson, A., Nicholson, C., Patterson, J.: Eclipse DeepLearning4J v0.9.1. https://deeplearning4j.org
Powers, D.M.W.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Technical report SIE-07-001, Flinders University (2007)
Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. BBA Protein Struct. 405(2), 442–451 (1975)
Ling, C.X., Huang, J., Zhang, H.: AUC: a statistically consistent and more discriminating measure than accuracy. In: IJCAI 2003, pp. 519–526. Morgan Kaufmann (2003)
DBLP: XML of 1 April 2018. https://dblp.org/xml/release/dblp-2018-04-01.xml.gz
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Ackermann, M.R., Reitz, F. (2018). Homonym Detection in Curated Bibliographies: Learning from dblp’s Experience. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J. (eds) Digital Libraries for Open Knowledge. TPDL 2018. Lecture Notes in Computer Science(), vol 11057. Springer, Cham. https://doi.org/10.1007/978-3-030-00066-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-00066-0_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00065-3
Online ISBN: 978-3-030-00066-0
eBook Packages: Computer ScienceComputer Science (R0)