Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Exploiting Higher Order Multi-dimensional Relationships with Self-attention for Author Name Disambiguation

Published: 09 March 2022 Publication History

Abstract

Name ambiguity is a prevalent problem in scholarly publications due to the unprecedented growth of digital libraries and number of researchers. An author is identified by their name in the absence of a unique identifier. The documents of an author are mistakenly assigned due to underlying ambiguity, which may lead to an improper assessment of the author. Various efforts have been made in the literature to solve the name disambiguation problem with supervised and unsupervised approaches. The unsupervised approaches for author name disambiguation are preferred due to the availability of a large amount of unlabeled data. Bibliographic data contain heterogeneous features, thus recently, representation learning-based techniques have been used in literature to embed heterogeneous features in common space. Documents of a scholar are connected by multiple relations. Recently, research has shifted from a single homogeneous relation to multi-dimensional (heterogeneous) relations for the latent representation of document. Connections in graphs are sparse, and higher order links between documents give an additional clue. Therefore, we have used multiple neighborhoods in different relation types in heterogeneous graph for representation of documents. However, different order neighborhood in each relation type has different importance which we have empirically validated also. Therefore, to properly utilize the different neighborhoods in relation type and importance of each relation type in the heterogeneous graph, we propose attention-based multi-dimensional multi-hop neighborhood-based graph convolution network for embedding that uses the two levels of an attention, namely, (i) relation level and (ii) neighborhood level, in each relation. A significant improvement over existing state-of-the-art methods in terms of various evaluation matrices has been obtained by the proposed approach.

References

[1]
Diego R. Amancio, Osvaldo N. Oliveira Jr, and Luciano da F. Costa. 2015. Topological-collaborative approach for disambiguating authors names in collaborative networks. Scientometrics 102, 1 (2015), 465–485.
[2]
Ron Bekkerman and Andrew McCallum. 2005. Disambiguating web appearances of people in a social network. In Proceedings of the 14th International Conference on World Wide Web. ACM, 463–470.
[3]
Jinmiao Cai, Nianjuan Jiang, Xiaoguang Han, Kui Jia, and Jiangbo Lu. 2021. JOLO-GCN: Mining joint-centered light-weight information for skeleton-based action recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2735–2744.
[4]
Lei Cen, Eduard C. Dragut, Luo Si, and Mourad Ouzzani. 2013. Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 741–744.
[5]
Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. 2019. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 257–266.
[6]
Ricardo G. Cota, Anderson A. Ferreira, Cristiano Nascimento, Marcos André Gonçalves, and Alberto HF Laender. 2010. An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the Association for Information Science and Technology 61, 9 (2010), 1853–1870.
[7]
Chunhui Deng, Huifang Deng, and Chaoran Li. 2020. A scholar disambiguation method based on heterogeneous relation-fusion and attribute enhancement. IEEE Access 8 (2020), 28375–28384.
[8]
Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 135–144.
[9]
Hongliang Du, Zhiyi Jiang, and Jianliang Gao. 2019. Who is who: Name disambiguation in large-scale scientific literature. In Proceedings of the 2019 International Conference on Data Mining Workshops. IEEE, 1037–1044.
[10]
Xiaoming Fan, Jianyong Wang, Xu Pu, Lizhu Zhou, and Bing Lv. 2011. On graph-based name disambiguation. Journal of Data and Information Quality 2, 2 (2011), 1–23.
[11]
Anderson A. Ferreira, Adriano Veloso, Marcos André Gonçalves, and Alberto HF Laender. 2014. Self-training author name disambiguation for information scarce scenarios. Journal of the Association for Information Science and Technology 65, 6 (2014), 1257–1278.
[12]
Pascal Francq. 2010. A semi-supervised algorithm to manage communities of interests. Collaborative Search and Communities of Interest: Trends in Knowledge Sharing and Assessment: Trends in Knowledge Sharing and Assessment 35, 4 (2010), 98.
[13]
C. Lee Giles, Hongyuan Zha, and Hui Han. 2005. Name disambiguation in author citations using a k-way spectral clustering method. In Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital libraries. IEEE, 334–343.
[14]
Maria Halkidi, Michalis Vazirgiannis, and Yannis Batistakis. 2000. Quality scheme assessment in the clustering process. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery. Springer, 265–276.
[15]
Donghong Han, Siqi Liu, Yachao Hu, Bin Wang, and Yongjiao Sun. 2015. ELM-based name disambiguation in bibliography. World Wide Web 18, 2 (2015), 253–263.
[16]
Hui Han, Lee Giles, Hongyuan Zha, Cheng Li, and Kostas Tsioutsiouliklis. 2004. Two supervised learning approaches for name disambiguation in author citations. In Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries. IEEE, 296–305.
[17]
Madian Khabsa, Pucktada Treeratpituk, and C. Lee Giles. 2015. Online person name disambiguation with constraints. In Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM, 37–46.
[18]
Jinseok Kim. 2019. A fast and integrative algorithm for clustering performance evaluation in author name disambiguation. Scientometrics 120, 2 (2019), 661–681.
[19]
Kunho Kim, Shaurya Rohatgi, and C. Lee Giles. 2019. Hybrid deep pairwise classification for author name disambiguation. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2369–2372.
[20]
Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017.
[21]
Itshak Lapidot. 2002. Self-organizing-maps with BIC for speaker clustering. Technical Report. IDIAP.
[22]
Boning Li, Xiangbo Shu, and Rui Yan. 2021. Storyboard relational model for group activity recognition. In Proceedings of the 2nd ACM International Conference on Multimedia in Asia. 1–7.
[23]
Shaohua Li, Gao Cong, and Chunyan Miao. 2012. Author name disambiguation using a new categorical distribution similarity. In Proceedings of the 2012th European Conference on Machine Learning and Knowledge Discovery in Databases (2012), 569–584.
[24]
Yu Liu, Weijia Li, Zhen Huang, and Qiang Fang. 2015. A fast method based on multiple clustering for name disambiguation in bibliographic citations. Journal of the Association for Information Science and Technology 66, 3 (2015), 634–644.
[25]
Gilles Louppe, Hussein T. Al-Natsheh, Mateusz Susik, and Eamonn James Maguire. 2016. Ethnicity sensitive author disambiguation using semi-supervised learning. In Proceedings of the International Conference on Knowledge Engineering and the Semantic Web. Springer, 272–287.
[26]
Xiao Ma, Ranran Wang, and Yin Zhang. 2019. Author name disambiguation in heterogeneous academic networks. In Proceedings of the International Conference on Web Information Systems and Applications. Springer, 126–137.
[27]
Mark-Christoph Müller. 2017. Semantic author name disambiguation with word embeddings. In Proceedings of the International Conference on Theory and Practice of Digital Libraries. Springer, 300–311.
[28]
Hsin-Tsung Peng, Cheng-Yu Lu, William Hsu, and Jan-Ming Ho. 2012. Disambiguating authors in citations on the web and authorship correlations. Expert Systems with Applications 39, 12 (2012), 10521–10532.
[29]
KM Pooja, Samrat Mondal, and Joydeep Chandra. 2019. A graph combination with edge pruning-based approach for author name disambiguation. Journal of the Association for Information Science and Technology 75, 1 (2019), 69–83.
[30]
Ziyue Qiao, Yi Du, Yanjie Fu, Pengfei Wang, and Yuanchun Zhou. 2019. Unsupervised author disambiguation using heterogeneous graph convolutional network embedding. In Proceedings of the 2019 IEEE International Conference on Big Data. IEEE, 910–919.
[31]
Alan Filipe Santana, Marcos André Gonçalves, Alberto HF Laender, and Anderson A Ferreira. 2015. On the combination of domain-specific heuristics for author name disambiguation: The nearest cluster method. International Journal on Digital Libraries 16, 3–4 (2015), 229–246.
[32]
Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-june Paul Hsu, and Kuansan Wang. 2015. An overview of microsoft academic service (mas) and applications. In Proceedings of the 24th International Conference on World Wide Web. ACM, 243–246.
[33]
D. A. Spielman. 2007. Spectral graph theory and its applications. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science. 29–38.
[34]
Jie Tang, Alvis CM Fong, Bo Wang, and Jing Zhang. 2012. A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering 24, 6 (2012), 975–987.
[35]
Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. ArnetMiner: Extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 990–998.
[36]
Hung Nghiep Tran, Tin Huynh, and Tien Do. 2014. Author name disambiguation by using deep neural network. In Proceedings of the Asian Conference on Intelligent Information and Database Systems. Springer, 123–132.
[37]
C. J. Van Rijsbergen. 1979. Information retrieval. dept. of computer science, university of glasgow. Retrieved from https://Citeseer.ist.psu.edu/vanrijsbergen79information.html 14.
[38]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems. 5998–6008.
[39]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. In International Conference on Learning Representations.
[40]
Jian Wang, Kaspars Berzins, Diana Hicks, Julia Melkers, Fang Xiao, and Diogo Pinheiro. 2012. A boosted-trees method for name disambiguation. Scientometrics 93, 2 (2012), 391–411.
[41]
Xuezhi Wang, Jie Tang, Hong Cheng, and S. Yu Philip. 2011. Adana: Active name disambiguation. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining. IEEE, 794–803.
[42]
Hao Wu, Bo Li, Yijian Pei, and Jun He. 2014. Unsupervised author disambiguation using Dempster–Shafer theory. Scientometrics 101, 3 (2014), 1955–1972.
[43]
Lingfei Wu, Yu Chen, Heng Ji, and Yunyao Li. 2021. Deep learning on graphs for natural language processing. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorials. 11–14.
[44]
Lingfei Wu, Yu Chen, Kai Shen, Xiaojie Guo, Hanning Gao, Shucheng Li, Jian Pei, and Bo Long. 2021. Graph neural networks for natural language processing: A survey. arXiv:2106.06090. Retrieved from https://arxiv.org/abs/2106.06090.
[45]
Jun Xu, Siqi Shen, Dongsheng Li, and Yongquan Fu. 2018. A network-embedding based method for author disambiguation. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 1735–1738.
[46]
Hao Yan, Hao Peng, Chen Li, Jianxin Li, and Lihong Wang. 2019. Bibliographic name disambiguation with graph convolutional network. In Proceedings of the International Conference on Web Information Systems Engineering. Springer, 538–551.
[47]
Hao Yan, Hao Peng, Chen Li, Jianxin Li, and Lihong Wang. 2020. Bibliographic name disambiguation with graph convolutional network. In Proceedings of the International Conference on Web Information Systems Engineering. Springer, 538–551.
[48]
Baichuan Zhang and Mohammad Al Hasan. 2017. Name disambiguation in anonymized graphs using network embedding. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 1239–1248.
[49]
Wenjing Zhang, Zhongmin Yan, and Yongqing Zheng. 2019. Author name disambiguation using graph node embedding method. In Proceedings of the 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design. IEEE, 410–415.
[50]
Yutao Zhang, Fanjin Zhang, Peiran Yao, and Jie Tang. 2018. Name disambiguation in AMiner: Clustering, maintenance, and human in the loop. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1002–1011.

Cited By

View all
  • (2024)Author Name Disambiguation via Paper Association Refinement and Compositional Contrastive EmbeddingProceedings of the ACM Web Conference 202410.1145/3589334.3645596(2193-2203)Online publication date: 13-May-2024
  • (2024)BOND: Bootstrapping From-Scratch Name Disambiguation with Multi-task PromotingProceedings of the ACM Web Conference 202410.1145/3589334.3645580(4216-4226)Online publication date: 13-May-2024
  • (2024)A cross-domain transfer learning model for author name disambiguation on heterogeneous graph with pretrained language modelKnowledge-Based Systems10.1016/j.knosys.2024.112624305(112624)Online publication date: Dec-2024
  • Show More Cited By

Index Terms

  1. Exploiting Higher Order Multi-dimensional Relationships with Self-attention for Author Name Disambiguation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 16, Issue 5
    October 2022
    532 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3514187
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 March 2022
    Accepted: 01 November 2021
    Revised: 01 November 2021
    Received: 01 December 2020
    Published in TKDD Volume 16, Issue 5

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Multi-dimensional
    2. multi-hop neighborhood
    3. name disambiguation
    4. graph convolution networks
    5. representation learning

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • Ministry of Electronics and Information Technology Government of India

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)87
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 09 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Author Name Disambiguation via Paper Association Refinement and Compositional Contrastive EmbeddingProceedings of the ACM Web Conference 202410.1145/3589334.3645596(2193-2203)Online publication date: 13-May-2024
    • (2024)BOND: Bootstrapping From-Scratch Name Disambiguation with Multi-task PromotingProceedings of the ACM Web Conference 202410.1145/3589334.3645580(4216-4226)Online publication date: 13-May-2024
    • (2024)A cross-domain transfer learning model for author name disambiguation on heterogeneous graph with pretrained language modelKnowledge-Based Systems10.1016/j.knosys.2024.112624305(112624)Online publication date: Dec-2024
    • (2024)Author name disambiguation literature review with consolidated meta-analytic approachInternational Journal on Digital Libraries10.1007/s00799-024-00398-125:4(765-785)Online publication date: 1-Dec-2024
    • (2024)High‐degree penalty based global statistical network embedding for name disambiguation in anonymized graphConcurrency and Computation: Practice and Experience10.1002/cpe.8195Online publication date: 2-Jun-2024
    • (2023)Graph-based methods for Author Name Disambiguation: a surveyPeerJ Computer Science10.7717/peerj-cs.15369(e1536)Online publication date: 11-Sep-2023
    • (2023)CluEval: A Python tool for evaluating clustering performance in named entity disambiguationSoftware Impacts10.1016/j.simpa.2023.10051016(100510)Online publication date: May-2023
    • (2023)Literature ReviewKnowledge Recommendation Systems with Machine Intelligence Algorithms10.1007/978-3-031-32696-7_2(9-27)Online publication date: 1-Oct-2023
    • (2022)Author Name Disambiguation via Heterogeneous Network Embedding from Structural and Semantic Perspectives2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI56018.2022.00043(245-250)Online publication date: Oct-2022

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media