Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1835449.1835454acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Person name disambiguation by bootstrapping

Published: 19 July 2010 Publication History

Abstract

In this paper, we report our system that disambiguates person names in Web search results. The system uses named entities, compound key words, and URLs as features for document similarity calculation, which typically show high precision but low recall clustering results. We propose to use a two-stage clustering algorithm by bootstrapping to improve the low recall values, in which clustering results of the first stage are used to extract features used in the second stage clustering. Experimental results revealed that our algorithm yields better score than the best systems at the latest WePS workshop.

References

[1]
E. Amigo, J. Gonzalo, J. Artiles, and F. Verdejo. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval, 12(4), 2009.
[2]
J. Artiles, J. Gonzalo, and S. Sekine. The SemEval-2007 WePS evaluation: Establishing a benchmark for the web people search task. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), pages 64--69, 2007.
[3]
J. Artiles, J. Gonzalo, and S. Sekine. WePS 2 Evaluation Campaign: overview of the Web People Search Clustering Task. 2nd Web People Search Evaluation Workshop (WePS 2009), 2009.
[4]
A. Bagga and B. Baldwin. Entity-based cross-document coreferencing using the vector space model. In Proceedings of COLING-ACL 1998, pages 79--85, 1998.
[5]
K. Balog, L. Azzopardi, and M. de Rijke. Personal name resolution of web people search. In WWW2008 Workshop: NLP Challenges in the Information Explosion Era (NLPIX 2008), 2008.
[6]
R. Bekkerman and A. McCallum. Disambiguating web appearances of people in a social network. In Proceedings of The 14th International World Wide Web Conference (WWW2005), pages 463--470, 2005.
[7]
D. Bollegala, Y. Matsuo, and M. Ishizuka. Extracting key phrases to disambiguate personal name queries in web search. In Proceedings of the Workshop: How can Computational Linguistics improve Information Retreival? at COLING-ACL 2006, pages 17--24, 2006.
[8]
R. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-06), 2006.
[9]
E. Elmacioglu, Y. F. Tan, S. Yan, M.-Y. Kan, and D. Lee. PSNUS: Web people name disambiguation by simple clustering with rich features. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), pages 268--271, 2007.
[10]
M. Ikeda, S. Ono, I. Sato, M. Yoshida, and H. Nakagawa. Person Name Disambiguation on the Web by Two-Stage Clustering. 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference, 2009.
[11]
M. Ikeda, S. Ono, I. Sato, M. Yoshida, and H. Nakagawa. Person name disambiguation on the web by twostage clustering. In Proceedings of the 2nd Web People Search Evaluation Workshop (WePS 2009) at WWW-2009, 2009.
[12]
D. Kalashnikov, R. Nuray-Turan, and S. Mehrotra. Towards breaking the quality curse.: a web-querying approach to web people search. In Proceedings of SIGIR '08, pages 27--34, 2008.
[13]
M. Komachi, T. Kudo, M. Shimbo, and Y. Matsumoto. Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 1010--1019, 2008.
[14]
X. Liu, Y. Gong, W. Xu, and S. Zhu. Document clustering with cluster refinement and model selection capabilities. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 191--198, 2002.
[15]
G. S. Mann and D. Yarowsky. Unsupervised personal name disambiguation. In Proceedings of CoNLL2003, pages 33--40, 2003.
[16]
H. Nakagawa and T. Mori. Automatic term recognition based on statistics of compound nouns and their components. Terminology, 9(2):201--219, 2003.
[17]
C. Niu, W. Li, and R. K. Srihari. Weakly supervised learning for cross-document person name disambiguation supported by information extraction. In Proceedings of 42nd Annual Meeting of the Association for Computational Linguistics (ACL-2004), pages 598--605, 2004.
[18]
R. Nuray-Turan, Z. Chen, D. Kalashnikov, and S. Mehrotra. Exploiting web querying for web people search in weps2. 2nd Web People Search Evaluation Workshop (WePS 2009), 2009.
[19]
S. Ono, I. Sato, M. Yoshida, and H. Nakagawa. Person name disambiguation in web pages using social network, compound words and latent topics. In Proceedings of the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD2008), pages 260--271, 2008.
[20]
P. Pantel and M. Pennacchiotti. Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL, pages 113--120, 2006.
[21]
P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: An open architecture for collaborative filtering of netnews. In Proceedings of CSCW 94 Conference on Computer Supported Cooperative Work, pages 175--186. ACM Press, 1994.
[22]
N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 208--215, 2000.
[23]
N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. Proceedings of the 37-th Annual Allerton Conference on Communication, 2000.
[24]
X. Wan, M. L. J. Gao, and B. Ding. Person resolution in person search results: WebHawk. In Proceedings of CIKM2005, pages 163--170, 2005.

Cited By

View all
  • (2024)Author Name Disambiguation via Paper Association Refinement and Compositional Contrastive EmbeddingProceedings of the ACM Web Conference 202410.1145/3589334.3645596(2193-2203)Online publication date: 13-May-2024
  • (2024)BOND: Bootstrapping From-Scratch Name Disambiguation with Multi-task PromotingProceedings of the ACM Web Conference 202410.1145/3589334.3645580(4216-4226)Online publication date: 13-May-2024
  • (2024)A cross-domain transfer learning model for author name disambiguation on heterogeneous graph with pretrained language modelKnowledge-Based Systems10.1016/j.knosys.2024.112624(112624)Online publication date: Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
July 2010
944 pages
ISBN:9781450301534
DOI:10.1145/1835449
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering
  2. person name disambiguation
  3. web people search

Qualifiers

  • Research-article

Conference

SIGIR '10
Sponsor:

Acceptance Rates

SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Author Name Disambiguation via Paper Association Refinement and Compositional Contrastive EmbeddingProceedings of the ACM Web Conference 202410.1145/3589334.3645596(2193-2203)Online publication date: 13-May-2024
  • (2024)BOND: Bootstrapping From-Scratch Name Disambiguation with Multi-task PromotingProceedings of the ACM Web Conference 202410.1145/3589334.3645580(4216-4226)Online publication date: 13-May-2024
  • (2024)A cross-domain transfer learning model for author name disambiguation on heterogeneous graph with pretrained language modelKnowledge-Based Systems10.1016/j.knosys.2024.112624(112624)Online publication date: Oct-2024
  • (2024)Towards Effective Author Name Disambiguation by Hybrid AttentionJournal of Computer Science and Technology10.1007/s11390-023-2070-z39:4(929-950)Online publication date: 1-Jul-2024
  • (2023)An Improved Chaotic Sine Cosine Firefly Algorithm for Arabic Feature SelectionProceedings of the 6th International Conference on Big Data and Internet of Things10.1007/978-3-031-28387-1_8(84-94)Online publication date: 29-Mar-2023
  • (2022)A New Metaheuristic Approach Based Feature Selection for Arabic Text Categorization2022 International Arab Conference on Information Technology (ACIT)10.1109/ACIT57182.2022.9994102(1-7)Online publication date: 22-Nov-2022
  • (2021)Identifying Mis-Configured Author Profiles on Google Scholar Using Deep LearningApplied Sciences10.3390/app1115691211:15(6912)Online publication date: 27-Jul-2021
  • (2021)Multiple Features Driven Author Name Disambiguation2021 IEEE International Conference on Web Services (ICWS)10.1109/ICWS53863.2021.00071(506-515)Online publication date: Sep-2021
  • (2021)A supervised and distributed framework for cold-start author disambiguation in large-scale publicationsNeural Computing and Applications10.1007/s00521-020-05684-y35:18(13093-13108)Online publication date: 5-Mar-2021
  • (2020)Strong Baselines for Author Name Disambiguation with and Without Neural NetworksAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-47426-3_29(369-381)Online publication date: 11-May-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media