Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2187980.2188165acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
poster

Combining classification with clustering for web person disambiguation

Published: 16 April 2012 Publication History

Abstract

Web Person Disambiguation is often conducted through clustering web documents to identify different namesakes for a given name. This paper presents a new key-phrased clustering method combined with a second step re-classification to identify outliers to improve cluster performance. For document clustering, the hierarchical agglomerative approach is conducted based on the vector space model which uses key phrases as the main feature. Outliers of cluster results are then identified through a centroids-based method. The outliers are then reclassified by the SVM classifier into the more appropriate clusters using a key phrase-based string kernel model as its feature space. The re-classification uses the clustering result in the first step as its training data so as to avoid the use of separate training data required by most classification algorithms. Experiments conducted on the WePS-2 dataset show that the algorithm based on key phrases is effective in improving the WPD performance.

References

[1]
Artiles, J., Borthwick, A., Gonzalo, J., Sekine, S., and Amigo, E. 2010. WePS-3 evaluation campaign: overview of the web people search clustering and attribute extraction tasks. CLEF 2010.
[2]
Balog, K., He, J., Hofmann, K., et al. The University of Amsterdam at WePS2. WePS 2009.
[3]
Cancedda, N., Gaussier, E., Goutte, C. & Renders, J. M. 2003. Word Sequence Kernels. Journal of Machine Learning Research.
[4]
Chen, Y., Lee, Y. M., Chu-Ren Huang. 2009. PolyUHK: A robust information extraction system for web personal names. WePS 2009.
[5]
Hulth, A. and Megyesi, B. 2006. A study on automatically extracted keywords in text categorization. CoLing/ACL 2006. Sydney (2006).
[6]
Ikeda, M., Ono, S., Sato, I., Yoshida, M., Nakagawa, H. 2009. Person name disambiguation on the web by two-stage clustering. WePS 2009, 18th WWW Conference.
[7]
Kleinberg, J. 1998. Authoritative sources in a hyperlinked environment. In Proc. 9th ACM-SIAM SODA.
[8]
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C. 2002. Text classification using string kernels, The Journal of Machine Learning Research, 2, p. 419--444.
[9]
Milne, D. 2007. Computing semantic relatedness using Wikipedia link structure. In Proc. NZ CSRSC'07, Hamilton, New Zealand.
[10]
Witten, I. H., Paynter, G. W., Frank, E., Gutwin, C. and Nevill-Manning, C. G. 2000. KEA: Practical Automatic Keyphrase Extraction. Working Paper 00/5.

Cited By

View all
  • (2021)Deep Name Disambiguation by Combining BERT and Orthogonal Constrained Non-negative Matrix Factorization2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS)10.1109/CCIS53392.2021.9754675(498-503)Online publication date: 7-Nov-2021
  • (2020)A Scholar Disambiguation Method Based on Heterogeneous Relation-Fusion and Attribute EnhancementIEEE Access10.1109/ACCESS.2020.29723728(28375-28384)Online publication date: 2020
  • (2016)Incorporating multi-kernel function and Internet verification for Chinese person name disambiguationFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-016-4503-010:6(1026-1038)Online publication date: 1-Dec-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web
April 2012
1250 pages
ISBN:9781450312301
DOI:10.1145/2187980

Sponsors

  • Univ. de Lyon: Universite de Lyon

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 April 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. key phrase
  2. string kernel
  3. svm
  4. web person disambiguation

Qualifiers

  • Poster

Conference

WWW 2012
Sponsor:
  • Univ. de Lyon
WWW 2012: 21st World Wide Web Conference 2012
April 16 - 20, 2012
Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Deep Name Disambiguation by Combining BERT and Orthogonal Constrained Non-negative Matrix Factorization2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS)10.1109/CCIS53392.2021.9754675(498-503)Online publication date: 7-Nov-2021
  • (2020)A Scholar Disambiguation Method Based on Heterogeneous Relation-Fusion and Attribute EnhancementIEEE Access10.1109/ACCESS.2020.29723728(28375-28384)Online publication date: 2020
  • (2016)Incorporating multi-kernel function and Internet verification for Chinese person name disambiguationFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-016-4503-010:6(1026-1038)Online publication date: 1-Dec-2016
  • (2016)LD2LD: Integrating, Enriching and Republishing Library Data as Linked DataKnowledge Graph and Semantic Computing: Semantic, Knowledge, and Linked Big Data10.1007/978-981-10-3168-7_16(160-171)Online publication date: 23-Nov-2016
  • (2015)Web Person Disambiguation Using Hierarchical Co-reference ModelComputational Linguistics and Intelligent Text Processing10.1007/978-3-319-18111-0_22(279-291)Online publication date: 2015
  • (2013)OnPerDisProceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 0110.1109/WI-IAT.2013.28(185-192)Online publication date: 17-Nov-2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media