Semantic Author Name Disambiguation with Word Embeddings

Müller, Mark-Christoph

doi:10.1007/978-3-319-67008-9_24

Mark-Christoph Müller ORCID: orcid.org/0000-0001-5639-7682¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10450))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

2821 Accesses

Abstract

We present a supervised machine learning AND system which tackles semantic similarity between publication titles by means of word embeddings. Word embeddings are integrated as external components, which keeps the model small and efficient, while allowing for easy extensibility and domain adaptation. Initial experiments show that word embeddings can improve the Recall and F score of the binary classification sub-task of AND. Results for the clustering sub-task are less clear, but also promising and overall show the feasibility of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Framework for Author Name Disambiguation in Scientific Papers Using an Ontological Approach and Deep Learning

Learning semantic and relationship joint embedding for author name disambiguation

Article 20 June 2020

ANDI: a Joint Disambiguation Framework Integrating Author Name Disambiguation Goals

Notes

1.
We only consider venue identity rather than similarity because our data set only contains abstract, uninterpretable venue identifiers.
2.
[7] report similar baseline results with summing instead of averaging over the embeddings of a sequence.
3.
https://networkx.github.io/.
4.
https://nlp.stanford.edu/projects/glove/.
5.
http://dblp.uni-trier.de/xml/.
6.
https://radimrehurek.com/gensim/.
7.
http://conll.github.io/reference-coreference-scorers/.
8.
mpc = 0.5 corresponds to no threshold, as 0.5 is the minimum confidence in a binary classification.

References

Bagga, A., Baldwin, B.: Algorithms for scoring coreference chains. In: Proceedings of the 1st International Conference on Language Resources and Evaluation, Granada, Spain, 28–30 May 1998, pp. 563–566 (1998)
Google Scholar
Chollet, F.: Keras (2015). https://github.com/fchollet/keras
Cota, R.G., Ferreira, A.A., Nascimento, C., Gonçalves, M.A., Laender, A.H.F.: An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. J. Am. Soc. Inf. Sci. Technol. 61(9), 1853–1870 (2010)
Article Google Scholar
Ferreira, A.A., Gonçalves, M.A., Laender, A.H.: A brief survey of automatic methods for author name disambiguation. SIGMOD Rec. 41(2), 15–26 (2012)
Article Google Scholar
Ghannay, S., Favre, B., Estève, Y., Camelin, N.: Word embedding evaluation and combination. In: Proceedings of LREC 2016, Portorož, Slovenia, 23–28 May 2016 (2016)
Google Scholar
Gurney, T., Horlings, E., van den Besselaar, P.: Author disambiguation using multi-aspect similarity indicators. Scientometrics 91(2), 435–449 (2012)
Article Google Scholar
Hu, B., Lu, Z., Li, H., Chen, Q.: Convolutional neural network architectures for matching natural language sentences. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, Quebec, Canada, 8–13 December 2014, pp. 2042–2050 (2014)
Google Scholar
Kang, I.-S., Kim, P., Lee, S., Jung, H., You, B.-J.: Construction of a large-scale test set for author disambiguation. Inf. Process. Manage. 47(3), 452–465 (2011)
Article Google Scholar
Kenter, T., de Rijke, M.: Short text similarity with word embeddings. In: Proceedings of CIKM 2015, New York, NY, USA, pp. 1411–1420 (2015)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations (2013)
Google Scholar
Monath, N., McCallum, A.: Discriminative hierarchical coreference for inventor disambiguation. Presentation at PatentsView Inventor Disambiguation Technical Workshop, September 2015
Google Scholar
Müller, M.-C., Reitz, F., Roy, N.: Data sets for author name disambiguation: an empirical analysis and a new resource. Scientometrics 111(3), 1467–1500 (2017)
Article Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014, pp. 1532–1543 (2014)
Google Scholar
Qian, Y., Zheng, Q., Sakai, T., Ye, J., Liu, J.: Dynamic author name disambiguation for growing digital libraries. Inf. Retrieval J. 18(5), 379–412 (2015)
Article Google Scholar
Santana, A.F., Gonçalves, M.A., Laender, A.H.F., Ferreira, A.A.: On the combination of domain-specific heuristics for author name disambiguation: the nearest cluster method. Int. J. Digit. Libr. 16(3–4), 229–246 (2015)
Article Google Scholar
Schnabel, T., Labutov, I., Mimno, D.M., Joachims, T.: Evaluation methods for unsupervised word embeddings. In: Proceedings of EMNLP 2015, Lisbon, Portugal, 17–21 September 2015, pp. 298–307 (2015)
Google Scholar
Shin, D., Kim, T., Choi, J., Kim, J.: Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics 100(1), 15–50 (2014)
Article Google Scholar
Smalheiser, N.R., Torvik, V.I.: Author name disambiguation. ARIST 43(1), 1–43 (2009)
Google Scholar
Theano Development Team: Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints, abs/1605.02688, May 2016
Google Scholar
Tran, H.N., Huynh, T., Do, T.: Author name disambiguation by using deep neural network. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds.) ACIIDS 2014. LNCS (LNAI), vol. 8397, pp. 123–132. Springer, Cham (2014). doi:10.1007/978-3-319-05476-6_13
Chapter Google Scholar

Download references

Acknowledgments

The research described in this paper was conducted in the project SCAD – Scalable Author Name Disambiguation, funded in part by the Leibniz Association (grant SAW-2015-LZI-2), and in part by the Klaus Tschira Foundation. We thank Florian Reitz (dblp) for data preparation and the anonymous TPDL reviewers for their useful suggestions.

Author information

Authors and Affiliations

Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
Mark-Christoph Müller

Authors

Mark-Christoph Müller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mark-Christoph Müller .

Editor information

Editors and Affiliations

Faculteit der Geesteswetenschappen, Universiteit van Amsterdam , Amsterdam, The Netherlands
Jaap Kamps
Library & Information Center, University of Patras , Patras, Greece
Giannis Tsakonas
Aristotle University of Thessaloniki , Thessaloniki, Greece
Yannis Manolopoulos
Civil Engineering, University of Thrace , Kimmeria, Greece
Lazaros Iliadis
Informatics, Ionian University , Kerkyra, Greece
Ioannis Karydis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Müller, MC. (2017). Semantic Author Name Disambiguation with Word Embeddings. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science(), vol 10450. Springer, Cham. https://doi.org/10.1007/978-3-319-67008-9_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-67008-9_24
Published: 02 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67007-2
Online ISBN: 978-3-319-67008-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Semantic Author Name Disambiguation with Word Embeddings

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Framework for Author Name Disambiguation in Scientific Papers Using an Ontological Approach and Deep Learning

Learning semantic and relationship joint embedding for author name disambiguation

ANDI: a Joint Disambiguation Framework Integrating Author Name Disambiguation Goals

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Semantic Author Name Disambiguation with Word Embeddings

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Framework for Author Name Disambiguation in Scientific Papers Using an Ontological Approach and Deep Learning

Learning semantic and relationship joint embedding for author name disambiguation

ANDI: a Joint Disambiguation Framework Integrating Author Name Disambiguation Goals

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation