Abstract
This paper studies the problem of name ambiguity which concerns the discovery of the different underlying meanings behind a name. We have developed a semantic approach on the basis of which a graph-based clustering algorithm determines the sets of the semantically related sentences that talk about the same name. Our approach is evaluated with the Bulgarian, Romanian, Spanish and English languages for various couples of city, country, person and organization names. The yielded results significantly outperform a majority based classifier and are compared to a bigram co-occurrence approach.
This research has been funded by QALLME number FP6 IST-033860 and TEX-MESS number TIN2006-15265-C06-01.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Jurafski, D., Martin, J.: Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. Prentice-Hal, Englewood Cliffs (2000)
Bagga, A., Baldwin, B.: Entity-based cross-document coreferencing using the vector space model. In: Proceedings of the Thirty-Sixth Annual Meeting of the ACL and Seventeenth International Conference on Computational Linguistics, pp. 79–85 (1998)
Mann, G.S., Yarowsky, D.: Unsupervised personal name disambiguation. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, pp. 33–40 (2003)
Kulkarni, A.: Unsupervised discrimination and labeling of ambiguous names. In: Proceedings of 43rd Annual Meeting of the Association for Computational Linguistics (2005)
Pedersen, T., Kulkarni, A., Angheluta, R., Kozareva, Z., Solorio, T.: An unsupervised language independent method of name discrimination using second order co-occurrence features. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 208–222. Springer, Heidelberg (2006)
Pedersen, T., Kulkarni, A.: Unsupervised discrimination of person names in web contexts. In: Proceedings of the Eighth International Conference on Intelligent Text Processing and Computational Linguistics (2007)
Foltz, P.W.: Using latent semantic indexing for information filtering. In: Proceedings of the ACM SIGOIS and IEEE CS TC-OA conference on Office information systems, pp. 40–47 (1990)
Turney, P.D.: Human-level performance on word analogy questions by latent relational analysis. Technical report, Institute for Information Technology, National Research Council of Canada (2004)
Cleuziou, G., Martin, L., Vrain, C.: Poboc: An overlapping clustering algorithm, application to rule-based classification and textual data. In: ECAI, pp. 440–444 (2004)
Nakov, P., Hearst, M.: Category-based pseudowords. In: NAACL 2003: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 67–69 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kozareva, Z., Vàzquez, S., Montoyo, A. (2007). Multilingual Name Disambiguation with Semantic Information. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-74628-7_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74627-0
Online ISBN: 978-3-540-74628-7
eBook Packages: Computer ScienceComputer Science (R0)