Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Exploiting Higher Order Multi-dimensional Relationships with Self-attention for Author Name Disambiguation

Published: 09 March 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Name ambiguity is a prevalent problem in scholarly publications due to the unprecedented growth of digital libraries and number of researchers. An author is identified by their name in the absence of a unique identifier. The documents of an author are mistakenly assigned due to underlying ambiguity, which may lead to an improper assessment of the author. Various efforts have been made in the literature to solve the name disambiguation problem with supervised and unsupervised approaches. The unsupervised approaches for author name disambiguation are preferred due to the availability of a large amount of unlabeled data. Bibliographic data contain heterogeneous features, thus recently, representation learning-based techniques have been used in literature to embed heterogeneous features in common space. Documents of a scholar are connected by multiple relations. Recently, research has shifted from a single homogeneous relation to multi-dimensional (heterogeneous) relations for the latent representation of document. Connections in graphs are sparse, and higher order links between documents give an additional clue. Therefore, we have used multiple neighborhoods in different relation types in heterogeneous graph for representation of documents. However, different order neighborhood in each relation type has different importance which we have empirically validated also. Therefore, to properly utilize the different neighborhoods in relation type and importance of each relation type in the heterogeneous graph, we propose attention-based multi-dimensional multi-hop neighborhood-based graph convolution network for embedding that uses the two levels of an attention, namely, (i) relation level and (ii) neighborhood level, in each relation. A significant improvement over existing state-of-the-art methods in terms of various evaluation matrices has been obtained by the proposed approach.

    References

    [1]
    Diego R. Amancio, Osvaldo N. Oliveira Jr, and Luciano da F. Costa. 2015. Topological-collaborative approach for disambiguating authors names in collaborative networks. Scientometrics 102, 1 (2015), 465–485.
    [2]
    Ron Bekkerman and Andrew McCallum. 2005. Disambiguating web appearances of people in a social network. In Proceedings of the 14th International Conference on World Wide Web. ACM, 463–470.
    [3]
    Jinmiao Cai, Nianjuan Jiang, Xiaoguang Han, Kui Jia, and Jiangbo Lu. 2021. JOLO-GCN: Mining joint-centered light-weight information for skeleton-based action recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2735–2744.
    [4]
    Lei Cen, Eduard C. Dragut, Luo Si, and Mourad Ouzzani. 2013. Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 741–744.
    [5]
    Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. 2019. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 257–266.
    [6]
    Ricardo G. Cota, Anderson A. Ferreira, Cristiano Nascimento, Marcos André Gonçalves, and Alberto HF Laender. 2010. An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the Association for Information Science and Technology 61, 9 (2010), 1853–1870.
    [7]
    Chunhui Deng, Huifang Deng, and Chaoran Li. 2020. A scholar disambiguation method based on heterogeneous relation-fusion and attribute enhancement. IEEE Access 8 (2020), 28375–28384.
    [8]
    Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 135–144.
    [9]
    Hongliang Du, Zhiyi Jiang, and Jianliang Gao. 2019. Who is who: Name disambiguation in large-scale scientific literature. In Proceedings of the 2019 International Conference on Data Mining Workshops. IEEE, 1037–1044.
    [10]
    Xiaoming Fan, Jianyong Wang, Xu Pu, Lizhu Zhou, and Bing Lv. 2011. On graph-based name disambiguation. Journal of Data and Information Quality 2, 2 (2011), 1–23.
    [11]
    Anderson A. Ferreira, Adriano Veloso, Marcos André Gonçalves, and Alberto HF Laender. 2014. Self-training author name disambiguation for information scarce scenarios. Journal of the Association for Information Science and Technology 65, 6 (2014), 1257–1278.
    [12]
    Pascal Francq. 2010. A semi-supervised algorithm to manage communities of interests. Collaborative Search and Communities of Interest: Trends in Knowledge Sharing and Assessment: Trends in Knowledge Sharing and Assessment 35, 4 (2010), 98.
    [13]
    C. Lee Giles, Hongyuan Zha, and Hui Han. 2005. Name disambiguation in author citations using a k-way spectral clustering method. In Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital libraries. IEEE, 334–343.
    [14]
    Maria Halkidi, Michalis Vazirgiannis, and Yannis Batistakis. 2000. Quality scheme assessment in the clustering process. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery. Springer, 265–276.
    [15]
    Donghong Han, Siqi Liu, Yachao Hu, Bin Wang, and Yongjiao Sun. 2015. ELM-based name disambiguation in bibliography. World Wide Web 18, 2 (2015), 253–263.
    [16]
    Hui Han, Lee Giles, Hongyuan Zha, Cheng Li, and Kostas Tsioutsiouliklis. 2004. Two supervised learning approaches for name disambiguation in author citations. In Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries. IEEE, 296–305.
    [17]
    Madian Khabsa, Pucktada Treeratpituk, and C. Lee Giles. 2015. Online person name disambiguation with constraints. In Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM, 37–46.
    [18]
    Jinseok Kim. 2019. A fast and integrative algorithm for clustering performance evaluation in author name disambiguation. Scientometrics 120, 2 (2019), 661–681.
    [19]
    Kunho Kim, Shaurya Rohatgi, and C. Lee Giles. 2019. Hybrid deep pairwise classification for author name disambiguation. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2369–2372.
    [20]
    Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017.
    [21]
    Itshak Lapidot. 2002. Self-organizing-maps with BIC for speaker clustering. Technical Report. IDIAP.
    [22]
    Boning Li, Xiangbo Shu, and Rui Yan. 2021. Storyboard relational model for group activity recognition. In Proceedings of the 2nd ACM International Conference on Multimedia in Asia. 1–7.
    [23]
    Shaohua Li, Gao Cong, and Chunyan Miao. 2012. Author name disambiguation using a new categorical distribution similarity. In Proceedings of the 2012th European Conference on Machine Learning and Knowledge Discovery in Databases (2012), 569–584.
    [24]
    Yu Liu, Weijia Li, Zhen Huang, and Qiang Fang. 2015. A fast method based on multiple clustering for name disambiguation in bibliographic citations. Journal of the Association for Information Science and Technology 66, 3 (2015), 634–644.
    [25]
    Gilles Louppe, Hussein T. Al-Natsheh, Mateusz Susik, and Eamonn James Maguire. 2016. Ethnicity sensitive author disambiguation using semi-supervised learning. In Proceedings of the International Conference on Knowledge Engineering and the Semantic Web. Springer, 272–287.
    [26]
    Xiao Ma, Ranran Wang, and Yin Zhang. 2019. Author name disambiguation in heterogeneous academic networks. In Proceedings of the International Conference on Web Information Systems and Applications. Springer, 126–137.
    [27]
    Mark-Christoph Müller. 2017. Semantic author name disambiguation with word embeddings. In Proceedings of the International Conference on Theory and Practice of Digital Libraries. Springer, 300–311.
    [28]
    Hsin-Tsung Peng, Cheng-Yu Lu, William Hsu, and Jan-Ming Ho. 2012. Disambiguating authors in citations on the web and authorship correlations. Expert Systems with Applications 39, 12 (2012), 10521–10532.
    [29]
    KM Pooja, Samrat Mondal, and Joydeep Chandra. 2019. A graph combination with edge pruning-based approach for author name disambiguation. Journal of the Association for Information Science and Technology 75, 1 (2019), 69–83.
    [30]
    Ziyue Qiao, Yi Du, Yanjie Fu, Pengfei Wang, and Yuanchun Zhou. 2019. Unsupervised author disambiguation using heterogeneous graph convolutional network embedding. In Proceedings of the 2019 IEEE International Conference on Big Data. IEEE, 910–919.
    [31]
    Alan Filipe Santana, Marcos André Gonçalves, Alberto HF Laender, and Anderson A Ferreira. 2015. On the combination of domain-specific heuristics for author name disambiguation: The nearest cluster method. International Journal on Digital Libraries 16, 3–4 (2015), 229–246.
    [32]
    Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-june Paul Hsu, and Kuansan Wang. 2015. An overview of microsoft academic service (mas) and applications. In Proceedings of the 24th International Conference on World Wide Web. ACM, 243–246.
    [33]
    D. A. Spielman. 2007. Spectral graph theory and its applications. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science. 29–38.
    [34]
    Jie Tang, Alvis CM Fong, Bo Wang, and Jing Zhang. 2012. A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering 24, 6 (2012), 975–987.
    [35]
    Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. ArnetMiner: Extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 990–998.
    [36]
    Hung Nghiep Tran, Tin Huynh, and Tien Do. 2014. Author name disambiguation by using deep neural network. In Proceedings of the Asian Conference on Intelligent Information and Database Systems. Springer, 123–132.
    [37]
    C. J. Van Rijsbergen. 1979. Information retrieval. dept. of computer science, university of glasgow. Retrieved from https://Citeseer.ist.psu.edu/vanrijsbergen79information.html 14.
    [38]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems. 5998–6008.
    [39]
    Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. In International Conference on Learning Representations.
    [40]
    Jian Wang, Kaspars Berzins, Diana Hicks, Julia Melkers, Fang Xiao, and Diogo Pinheiro. 2012. A boosted-trees method for name disambiguation. Scientometrics 93, 2 (2012), 391–411.
    [41]
    Xuezhi Wang, Jie Tang, Hong Cheng, and S. Yu Philip. 2011. Adana: Active name disambiguation. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining. IEEE, 794–803.
    [42]
    Hao Wu, Bo Li, Yijian Pei, and Jun He. 2014. Unsupervised author disambiguation using Dempster–Shafer theory. Scientometrics 101, 3 (2014), 1955–1972.
    [43]
    Lingfei Wu, Yu Chen, Heng Ji, and Yunyao Li. 2021. Deep learning on graphs for natural language processing. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorials. 11–14.
    [44]
    Lingfei Wu, Yu Chen, Kai Shen, Xiaojie Guo, Hanning Gao, Shucheng Li, Jian Pei, and Bo Long. 2021. Graph neural networks for natural language processing: A survey. arXiv:2106.06090. Retrieved from https://arxiv.org/abs/2106.06090.
    [45]
    Jun Xu, Siqi Shen, Dongsheng Li, and Yongquan Fu. 2018. A network-embedding based method for author disambiguation. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 1735–1738.
    [46]
    Hao Yan, Hao Peng, Chen Li, Jianxin Li, and Lihong Wang. 2019. Bibliographic name disambiguation with graph convolutional network. In Proceedings of the International Conference on Web Information Systems Engineering. Springer, 538–551.
    [47]
    Hao Yan, Hao Peng, Chen Li, Jianxin Li, and Lihong Wang. 2020. Bibliographic name disambiguation with graph convolutional network. In Proceedings of the International Conference on Web Information Systems Engineering. Springer, 538–551.
    [48]
    Baichuan Zhang and Mohammad Al Hasan. 2017. Name disambiguation in anonymized graphs using network embedding. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 1239–1248.
    [49]
    Wenjing Zhang, Zhongmin Yan, and Yongqing Zheng. 2019. Author name disambiguation using graph node embedding method. In Proceedings of the 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design. IEEE, 410–415.
    [50]
    Yutao Zhang, Fanjin Zhang, Peiran Yao, and Jie Tang. 2018. Name disambiguation in AMiner: Clustering, maintenance, and human in the loop. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1002–1011.

    Cited By

    View all
    • (2024)Author Name Disambiguation via Paper Association Refinement and Compositional Contrastive EmbeddingProceedings of the ACM on Web Conference 202410.1145/3589334.3645596(2193-2203)Online publication date: 13-May-2024
    • (2024)BOND: Bootstrapping From-Scratch Name Disambiguation with Multi-task PromotingProceedings of the ACM on Web Conference 202410.1145/3589334.3645580(4216-4226)Online publication date: 13-May-2024
    • (2024)Author name disambiguation literature review with consolidated meta-analytic approachInternational Journal on Digital Libraries10.1007/s00799-024-00398-1Online publication date: 10-Apr-2024
    • Show More Cited By

    Index Terms

    1. Exploiting Higher Order Multi-dimensional Relationships with Self-attention for Author Name Disambiguation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 16, Issue 5
      October 2022
      532 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/3514187
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 March 2022
      Accepted: 01 November 2021
      Revised: 01 November 2021
      Received: 01 December 2020
      Published in TKDD Volume 16, Issue 5

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Multi-dimensional
      2. multi-hop neighborhood
      3. name disambiguation
      4. graph convolution networks
      5. representation learning

      Qualifiers

      • Research-article
      • Refereed

      Funding Sources

      • Ministry of Electronics and Information Technology Government of India

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)93
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 09 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Author Name Disambiguation via Paper Association Refinement and Compositional Contrastive EmbeddingProceedings of the ACM on Web Conference 202410.1145/3589334.3645596(2193-2203)Online publication date: 13-May-2024
      • (2024)BOND: Bootstrapping From-Scratch Name Disambiguation with Multi-task PromotingProceedings of the ACM on Web Conference 202410.1145/3589334.3645580(4216-4226)Online publication date: 13-May-2024
      • (2024)Author name disambiguation literature review with consolidated meta-analytic approachInternational Journal on Digital Libraries10.1007/s00799-024-00398-1Online publication date: 10-Apr-2024
      • (2024)High‐degree penalty based global statistical network embedding for name disambiguation in anonymized graphConcurrency and Computation: Practice and Experience10.1002/cpe.8195Online publication date: 2-Jun-2024
      • (2023)Graph-based methods for Author Name Disambiguation: a surveyPeerJ Computer Science10.7717/peerj-cs.15369(e1536)Online publication date: 11-Sep-2023
      • (2023)CluEval: A Python tool for evaluating clustering performance in named entity disambiguationSoftware Impacts10.1016/j.simpa.2023.10051016(100510)Online publication date: May-2023
      • (2023)Literature ReviewKnowledge Recommendation Systems with Machine Intelligence Algorithms10.1007/978-3-031-32696-7_2(9-27)Online publication date: 1-Oct-2023
      • (2022)Author Name Disambiguation via Heterogeneous Network Embedding from Structural and Semantic Perspectives2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI56018.2022.00043(245-250)Online publication date: Oct-2022

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media