research-article

Author Name Disambiguation via Paper Association Refinement and Compositional Contrastive Embedding

Authors:

Xinyue ChenAuthors Info & Claims

WWW '24: Proceedings of the ACM Web Conference 2024

Pages 2193 - 2203

https://doi.org/10.1145/3589334.3645596

Published: 13 May 2024 Publication History

Abstract

Author name disambiguation (AND) is an essential task for online academic retrieval systems. Recent models adopt representation learning in the author's name disambiguation. Despite achieving remarkable success, these methods may be limited in two aspects. First, the heuristically constructed paper association graphs used for representation learning contain uncertainties that may cause negative supervision. Second, existing algorithms, such as binary cross-entropy loss, used to train representation learning models may not produce sufficiently high-quality representations for AND. To tackle the above problems, we propose an association refining and compositional contrasting (ARCC) framework for AND tasks. ARCC first adopts an iterative graph structure refinement process to dynamically reduce the uncertainties in paper graphs. Then, a compositional contrastive learning method is proposed to encourage learning more discriminative representations for AND. Empirical studies on two benchmark datasets suggest that ARCC is effective for AND and outperforms the state-of-the-art models.

Supplemental Material

MP4 File

Supplemental video

Download
33.89 MB

References

[1]

Bo Chen, Jing Zhang, Fanjin Zhang, Tianyi Han, Yuqing Cheng, Xiaoyan Li, Yuxiao Dong, and Jie Tang. 2023. Web-Scale Academic Name Disambiguation: The WhoIsWho Benchmark, Leaderboard, and Toolkit. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '23). Association for Computing Machinery, New York, NY, USA, 3817--3828. https://doi.org/10.1145/3580305.3599930

Digital Library

[2]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020a. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 1597--1607. https://proceedings.mlr.press/v119/chen20j.html

[3]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. 2020b. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13--18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 1597--1607. http://proceedings.mlr.press/v119/chen20j.html

[4]

Ya Chen, Hongliang Yuan, Tingting Liu, and Nan Ding. 2021. Name disambiguation based on graph convolutional network. Scientific Programming, Vol. 2021 (2021), 1--11.

Digital Library

[5]

DBLP. 2023. [Online]. http://dblp.uni-trier.de/.

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19--1423

[7]

Xiaoming Fan, Jianyong Wang, Xu Pu, Lizhu Zhou, and Bing Lv. 2011. On graph-based name disambiguation. Journal of Data and Information Quality (JDIQ), Vol. 2, 2 (2011), 1--23.

Digital Library

[8]

Google Scholar. 2023. [Online]. https://scholar.google.com/.

[9]

Florian Graf, Christoph Hofer, Marc Niethammer, and Roland Kwitt. 2021. Dissecting supervised contrastive learning. In International Conference on Machine Learning. PMLR, 3821--3830.

[10]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855--864.

Digital Library

[11]

Hui Han, Lee Giles, Hongyuan Zha, Cheng Li, and Kostas Tsioutsiouliklis. 2004. Two supervised learning approaches for name disambiguation in author citations. In Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004. IEEE, 296--305.

Digital Library

[12]

Jian Huang, Seyda Ertekin, and C Lee Giles. 2006. Efficient name disambiguation for large-scale databases. In European conference on principles of data mining and knowledge discovery. Springer, 536--544.

[13]

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. Advances in neural information processing systems, Vol. 33 (2020), 18661--18673.

[14]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980

[15]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).

[16]

Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24--26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=SJU4ayYgl

[17]

Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International conference on machine learning. PMLR, 1188--1196.

[18]

Gilles Louppe, Hussein T Al-Natsheh, Mateusz Susik, and Eamonn James Maguire. 2016. Ethnicity sensitive author disambiguation using semi-supervised learning. In international conference on knowledge engineering and the semantic web. Springer, 272--287.

[19]

Yingying Ma, Youlong Wu, and Chengqiang Lu. 2020. A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory. Entropy, Vol. 22, 4 (2020). https://doi.org/10.3390/e22040416

[20]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).

[21]

Km Pooja, Samrat Mondal, and Joydeep Chandra. 2022. Exploiting Higher Order Multi-Dimensional Relationships with Self-Attention for Author Name Disambiguation. ACM Trans. Knowl. Discov. Data, Vol. 16, 5, Article 88 (mar 2022), 23 pages.

Digital Library

[22]

Cristian Santini, Genet Asefa Gesese, Silvio Peroni, Aldo Gangemi, Harald Sack, and Mehwish Alam. 2022. A knowledge graph embeddings based approach for author name disambiguation using literals. Scientometrics, Vol. 127, 8 (2022), 4887--4912.

Digital Library

[23]

Mengxiao Song, Bowen Yu, Li Quangang, Wang Yubin, Tingwen Liu, and Hongbo Xu. 2022. Enhancing Joint Multiple Intent Detection and Slot Filling with Global Intent-Slot Co-occurrence. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 7967--7977. https://doi.org/10.18653/v1/2022.emnlp-main.543

[24]

Jie Tang, Jing Zhang, Duo Zhang, and Juanzi Li. 2008. A unified framework for name disambiguation. In Proceedings of the 17th international conference on World Wide Web. 1205--1206.

Digital Library

[25]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, 11 (2008).

[26]

Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=rJXMpikCZ

[27]

Haiwen Wang, Ruijie Wan, Chuan Wen, Shuhao Li, Yuting Jia, Weinan Zhang, and Xinbing Wang. 2020. Author name disambiguation on heterogeneous information network with adversarial representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 238--245.

[28]

Tongzhou Wang and Phillip Isola. 2020. Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13--18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 9929--9939. http://proceedings.mlr.press/v119/wang20k.html

[29]

Joe H Ward Jr. 1963. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, Vol. 58, 301 (1963), 236--244.

[30]

Bo Xiong, Peng Bao, and Yilin Wu. 2021. Learning semantic and relationship joint embedding for author name disambiguation. Neural Computing and Applications, Vol. 33 (2021), 1987--1998.

Digital Library

[31]

Minoru Yoshida, Masaki Ikeda, Shingo Ono, Issei Sato, and Hiroshi Nakagawa. 2010. Person name disambiguation by bootstrapping. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. 10--17.

Digital Library

[32]

Baichuan Zhang and Mohammad Al Hasan. 2017. Name disambiguation in anonymized graphs using network embedding. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1239--1248.

Digital Library

[33]

Yutao Zhang, Fanjin Zhang, Peiran Yao, and Jie Tang. 2018. Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1002--1011.

Digital Library

[34]

Zhiqiang Zhang, Chunqi Wu, Zhao Li, Juanjuan Peng, Haiyan Wu, Haiyu Song, Shengchun Deng, and Biao Wang. 2021. Author Name Disambiguation Using Multiple Graph Attention Networks. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.

[35]

Zhenyu Zhang, Bowen Yu, Tingwen Liu, and Dong Wang. 2020. Strong Baselines for Author Name Disambiguation with and Without Neural Networks. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 369--381.

[36]

Qian Zhou, Wei Chen, Weiqing Wang, Jiajie Xu, and Lei Zhao. 2021. Multiple Features Driven Author Name Disambiguation. (2021), 506--515. https://doi.org/10.1109/ICWS53863.2021.00071

Index Terms

Author Name Disambiguation via Paper Association Refinement and Compositional Contrastive Embedding
1. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Evaluating author name disambiguation for digital libraries: a case of DBLP

Author name ambiguity in a digital library may affect the findings of research that mines authorship data of the library. This study evaluates author name disambiguation in DBLP, a widely used but insufficiently evaluated digital library for its ...
Generating automatically labeled data for author name disambiguation: an iterative clustering method

To train algorithms for supervised author name disambiguation, many studies have relied on hand-labeled truth data that are very laborious to generate. This paper shows that labeled data can be automatically generated using information features such as ...
Cost-effective on-demand associative author name disambiguation

Authorship disambiguation is an urgent issue that affects the quality of digital library services and for which supervised solutions have been proposed, delivering state-of-the-art effectiveness. However, particular challenges such as the prohibitive ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '24: Proceedings of the ACM Web Conference 2024

May 2024

4826 pages

ISBN:9798400701719

DOI:10.1145/3589334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Proceedings Chair:
Roy Ka-Wei Lee
Singapore University of Technology and Design
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Artifacts Available / v1.1

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

WWW '24

Sponsor:

SIGWEB

WWW '24: The ACM Web Conference 2024

May 13 - 17, 2024

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
136
Total Downloads

Downloads (Last 12 months)136
Downloads (Last 6 weeks)11

Reflects downloads up to 24 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten