Abstract
The study on person name disambiguation aims to identify different entities with the same person name through document linking to different entities. The traditional disambiguation approach makes use of words in documents as features to distinguish different entities. Due to the lack of use of word order as a feature and the limited use of external knowledge, the traditional approach has performance limitations. This paper presents an approach for named entity disambiguation through entity linking based on a multikernel function and Internet verification to improve Chinese person name disambiguation. The proposed approach extends a linear kernel that uses in-document word features by adding a string kernel to construct a multi-kernel function. This multi-kernel can then calculate the similarities between an input document and the entity descriptions in a named person knowledge base to form a ranked list of candidates to different entities. Furthermore, Internet search results based on keywords extracted from the input document and entity descriptions in the knowledge base are used to train classifiers for verification. The evaluations on CIPS-SIGHAN 2012 person name disambiguation bakeoff dataset show that the use of word orders and Internet knowledge through a multi-kernel function can improve both precision and recall and our system has achieved state-of-the-art performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Chen L W, Feng Y S, Zou L, Zhao D Y. Explore person specific evidence in Web person name disambiguation. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012, 832–842
Zhang B L, Huang H Z, Pan X M, Ji H, Knight K, Wen Z, Sun Y Z, Han J W, Yener B. Be appropriate and funny: automatic entity morph encoding. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 2014
Huang H Z, Wen Z, Yu D, Ji H, Sun Y Z, Han J W, Li H. Resolving entity morphs in censored data. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 2013, 1083–1093
Wang H F,Mei Z. Chinese multi-document personal name disambiguation. High Techlology Letters, 2005, 11(3): 280–283
Xu J, Lu Q, Liu Z Z. Aggregating skip bigrams into key phrase-based vector space model for Web person disambiguation. In: Proceedings of KONVENS 2012 (Main track: oral presentations). 2012, 108–117
Yoshida M, Ikeda M, Ono S, Sato I, Nakagawa H. Person name disambiguation by bootstrapping. In: Proceedings of the 33rd international ACMSIGIR Conference on Research and Development in Information Retrieval. 2010, 10–17
Xu J, Lu Q, Liu Z Z. Combining classification with clustering for web person disambiguation. In: Proceedings of the 21st International Conference Companion on World Wide Web. 2012, 637–638
Chen C, Hu J F, Wang H F. Clustering technique in multi-document personal name disambiguation. In: Proceedings of the ACL-IJCNLP 2009 Student Research Workshop. 2009, 88–95
Chen Z, Tamang S, Lee A, Li X, Lin W P, Snover M, Artiles J, Passantino M, Ji H. Cunyblender TAC-KBP 2010 entity linking and slot filling system description. In: Proceedings of the Text Analysis Conference. 2010
Lehmann J, Monahan S, Nezda L, Jung A, Shi Y. LCC approaches to knowledge base population at TAC 2010. In: Proceedings of the Text Analysis Conference. 2010
Radford W, Hachey B, Nothman J, Honnibal M, Curran J R. Document-level entity linking: CMCRC at TAC 2010. In: Proceedings of the Text Analysis Conference. 2010
Varma V, Bysani P, Reddy K, Reddy V B, Kovelamudi S, Vaddepally S R, Nanduri R, N K K, Gsk S, Pingali P. IIIT hyderabad in guided summarization and knowledge base guided summarization track. In: Proceedings of the Text Analysis Conference. 2010
Agirre E, Chang A X, Jurafsky D S, Manning C D, Spitkovsky V I, Yeh E. Stanford-UBC at TAC-KBP. In: Proceedings of Test Analysis Conference 2009. 2009
Li S, Gao S Y, Zhang Z Y, Li X S, Guan J Y, Xu W R, Guo J. PRIS at TAC 2009: experiments in KBP track. In: Proceedings of Test Analysis Conference 2009. 2009
McNamee P. HLTCOE efforts in entity linking at TAC KBP 2010. In: Proceedings of the Text Analysis Conference. 2010
Zhang W, Su J, Chen B, Wang W, Toh Z, Sim Y, Cao Y, Lin C, Tan C L. I2R-NUS-MSRA at TAC 2011: entity linking. In: Proceedings of the Text Analysis Conference. 2011
Han X P, Zhao J. Named entity disambiguation by leveraging wikipedia semantic knowledge. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 215–224
Song Y, Huang J, Councill I G, Li J, Giles C L. Efficient topicbased unsupervised name disambiguation. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries. 2007, 342–351
Bekkerman R, McCallum A. Disambiguating web appearances of people in a social network. In: Proceedings of the 14th International Conference on World Wide Web. 2005, 463–470
Han X P, Zhao J. Web personal name disambiguation based on reference entity tables mined from the Web. In: Proceedings of the 11th International Workshop on Web Information and Data Management. 2009, 75–82
Tang J T, Lu Q, Wang T, Wang J, Li WJ. A bipartite graph based social network splicing method for person name disambiguation. In: Proceedings of the 34th international ACM SIGIR Conference on Research and Development in Information Retrieval. 2011, 1233–1234
Lang J, Qin B, Song W, Liu L, Liu T, Li S. Person name disambiguation of searching results using social network. Chinese Journal of Computers, 2009, 32(7): 1365–1374
Xu R F, Xu J, Dai X Y, Kit C. Combine person name and person identity recognition and document clustering for Chinese person name disambiguation. In: Proceedings of CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2010, 359
Fisher R A. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 1936, 7(2): 179–188
Han Z H, Peng L, Sun X P. SIR-NERD: A Chinese named entity recog nition and disambiguation system using a two-stage method. In: Proceedings of the 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2012, 115
Zong H, Wong D F, Chao L S. A template based hybrid model for Chinese personal name disambiguation. In: Proceedings of the 2nd CIPSSIGHAN Joint Conference on Chinese Language Processing. 2012
Han W, Liu G, Mao Y Z, Huang Z N. Attribute based Chinese named entity recognition and disambiguation. In: Proceedings of the 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing. 2012, 127
Author information
Authors and Affiliations
Corresponding author
Additional information
Ruifeng Xu is an associate professor and PhD supervisor at School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China. His research areas are natural language processing, emotion computing, text mining and Bioinformatics. His main research focuses on computational methods for natural language processing and understanding. He received his BS degree in computer science from Harbin Institute of Technology, China, MS and PhD degrees in computer science from the Hong Kong Polytechnic University, China.
Lin Gui is a PhD candidate at the School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China. His research areas are natural language processing, sentiment analysis, emotion computing and machine learning. His main research focuses on machine learning for natural language processing. He received his BS degree from Nankai University, China and MS degree in computer science from the Harbin Institute of Technology, China.
Qin Lu is a professor and associate head of Department of Computing, the Hong Kong Polytechnic University, China. Her research areas are natural language processing, computational linguistics, lexical semantics, text mining and knowledge discovery. Her main research focuses on using computational methods for information extraction, text mining and knowledge discovery. She has conducted extensive work on Chinese collocation extraction, terminology extraction and ontology construction, named entity disambiguation and emotion analysis. She received her BS degree in Beijing Normal University, China, MS and PhD degrees in computer science from the University of Illinois at Urbana-Champaign, USA.
Shuai Wang is a master candidate at the School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China. His research areas are natural language processing, information retrieval, machine learning. His main research focuses on machine learning for natural language processing. He received his BS degree from Heilongjiang Institute of Technology, China.
Jian Xu obtained his PhD degree from the Department of Computing, the Hong Kong Polytechnic University, China, and currently works for Huawei Technologies Company Limited. His research areas are natural language processing, computational linguistics, lexical semantics, text mining and knowledge discovery. His main research focuses on entity disambiguation. He received his BS degree from Beijing Language and Culture University, China, and MS degree from Peking University, China.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Xu, R., Gui, L., Lu, Q. et al. Incorporating multi-kernel function and Internet verification for Chinese person name disambiguation. Front. Comput. Sci. 10, 1026–1038 (2016). https://doi.org/10.1007/s11704-016-4503-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-016-4503-0