Abstract
This paper presents a novel approach to the development of anaphoric annotation of large corpora based on the use of semantic information to help the annotation process. The anaphora annotation scheme has been developed from a multilingual point of view in order to annotate three corpora: one for Catalan, one for Basque and one for Spanish. An anaphora resolution system based on restrictions and preferences has been used to aid the manual annotation process. Together with morpho-syntactic information, the system exploits the semantic relation between the anaphora and its antecedent.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Mitkov, R.: Anaphora resolution. Longman, London (2002)
Hobbs, J.R.: Pronoun resolution. Research report # 76-1, Department of Computer Sciences. City College. City University of New York, New York (1976)
Hobbs, J.R.: Resolving pronoun references. Lingua 44, 311–338 (1978)
Walker, M.A.: Centering, anaphora resolution and discourse structure. Oxford University Press, Oxford (1998)
Dagan, I., Itai, A.: A statistical filter for resolving pronoun references. Artificial Intelligence and Computer Vision, 125–135 (1991)
Lappin, S., Leass, H.: An algorithm for pronominal anaphora resolution. Computational Linguistics 20, 535–561 (1994)
Kennedy, C., Boguraev, B.: Anaphora for everyone: pronominal anaphora resolution without a parser. In: Proceedings of 16th International Conference on Computational Linguistics, Copenhagen, Denmark, vol. I, pp. 113–118 (1996)
Baldwin, B.: CogNIAC: high precision coreference with limited knowledge and linguistic resources. In: Proceedings of the ACL 1997/EACL 1997 workshop on Operational Factors in Practical, Robust Anaphor Resolution, Madrid, Spain, pp. 38–45 (1997)
Mitkov, R.: Robust pronoun resolution with limited knowledge. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL 1998), Montreal, Canada, pp. 869–875 (1998)
Ge, N., Hale, J., Charniak, E.: A statistical approach to anaphora resolution. In: Charniak, E. (ed.) Proceedings of Sixth WorkShop on Very Large Corpora, Montreal, Canada, pp. 161–170 (1998)
Byron, D.K., Allen, J.F.: Applying Genetic Algorithms to Pronoun Resolution. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI 1999), Orlando, Florida, p. 957 (1999)
Tetreault, J.R.: Analysis of Syntax-Based Pronoun Resolution Methods. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999), Maryland, USA, pp. 602–605 (1999)
Ge, N.: An approach to anaphoric pronouns. PhD thesis, Department of Computer Sicence. Brown University, Providence, Rhode Island, USA (2000)
Palomar, M., Ferrández, A., Moreno, L., Martínez-Barco, P., Peral, J., Saiz-Noeda, M., Muñoz, R.: An algorithm for Anaphora Resolution in Spanish Texts. Computational Linguistics 27, 545–567 (2001)
Carbonell, J.G., Brown, R.D.: Anaphora resolution: a multi-strategy approach. In: Proceedings of 12th International Conference on Computational Linguistics (COLING 1988), Budapest, Hungary, pp. 96–101 (1988)
Rich, E., Luperfoy, S.: An Architecture for Anaphora Resolution. In: Proceedings of the Second Conference on Applied Natural Language Processing, Austin, Texas, pp. 18–24 (1998)
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Five Papers on WordNet. Special Issue of the International Journal of Lexicography 3, 235–312 (1993)
Vossen, P.: EuroWordNet: Building a Multilingual Database with WordNets for European Languages. The ELRA Newsletter 3 (1998)
O’Hara, T., Mahesh, K., Niremburg, S.: Lexical Acquisition with WordNet and the Mikrokosmos Ontology. In: Proceedings of the WorkShop on Usage of WordNet in the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL 1998), Montreal, Canada (1998)
Mitkov, R.: Anaphora resolution: a combination of linguistic and statistical approaches. In: Proceedings of the Discourse Anaphora and Anaphor Resolution Colloquium (DAARC 1996), Lancaster, UK (1996)
Azzam, S., Humphreys, K., Gaizauskas, R.: Coreference Resolution in a Multilingual Information Extraction System. In: Proceedings of the Workshop on Linguistic Coreference. First Language Resources and Evaluation Conference (LREC 1998), Granada, Spain, pp. 74–78 (1998)
Harabagiu, S., Maiorano, S.: Knowledge-lean coreference resolution and its relation to textual cohesion and coreference. In: Cristea, D., Ide, N., Marcu, D. (eds.) The Relation of Discourse/Dialogue Structure and Reference, Association for Computational Linguistics, New Brunswick, New Jersey, pp. 29–38 (1999)
Saiz-Noeda, M., Palomar, M.: Semantic Knowledge-driven Method to Solve Pronominal Anaphora in Spanish. In: Christodoulakis, D.N. (ed.) NLP 2000. LNCS (LNAI), vol. 1835, pp. 204–211. Springer, Heidelberg (2000)
Grosz, B., Joshi, A., Weinstein, S.: Centering: a framework for modeling the local coherence of discourse. Computational Linguistics 21, 203–225 (1995)
Brennan, S., Friedman, M., Pollard, C.: A centering approach to pronouns. In: Proceedings of the 25st Annual Meeting of the Association for Computational Linguistics (ACL 1987), Stanford, California, USA, pp. 155–162 (1987)
Dagan, I., Justeson, J., Lappin, S., Leass, H., Ribak, A.: Syntax and lexical statistics in anaphora resolution. Applied Artificial Intelligence 9, 633–644 (1995)
Cardie, C., Wagstaff, K.: Noun Phrase Coreference as Clustering. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in NLP and Very Large Corpora, Maryland, USA, pp. 82–89 (1999)
Aone, C., Bennett, S.W.: Evaluating automated and manual acquisition of anaphora resolution strategies. In: Publishers, M.K. (ed.) Proceedings of the 33th Annual Meeting of the Association for Computational Linguistics (ACL 1995), Cambridge, Massachusetts, pp. 122–129 (1995)
Saiz-Noeda, M.: Influencia y aplicación de papeles sintácticos e información semántica en la resolución de la anáfora pronominal en español. PhD thesis, Universidad de Alicante, Alicante (2002)
Atserias, J., Villarejo, L., Rigau, G.: Spanish WordNet 1.6: Porting the Spansih WordNet across Princeton versions. In: 4th International Conference on Language Resources and Evaluation, LREC 2004, Lisbon, Portugal (2004)
Lyons, J.: Semantics. Cambridge University Press, London (1977)
Atserias, J., Climent, S., Rigau, G.: Towards the MEANING Top Ontology: Sources of Ontological Meaning. In: 4th International Conference on Language Resources and Evaluation, LREC 2004, Lisbon, Portugal (2004)
Sebastián, N., Martí, M.A., Carreiras, M.F., Cuetos, F.: 2000 LEXESP: Léxico Informatizado del Español. Edicions de la Universitat de Barcelona, Barcelona (2000)
Civit, M.: Criterios de etiquetación y desambiguación morfosintáctica de corpus en Español. Sociedad Española para el Procesamiento del Lenguaje Natural, Alicante (2003)
Civit, M., Martí, M.A., Navarro, B., Bufí, N., Fernández, B., Marcos, R.: Issues in the Syntactic Annotation of Cast3LB. In: 4th International Workshop on Linguistically Interpreted Corpora (LINC 2003), EACL 2003, Budapest (2003)
Miller, G.A.: Wordnet: An on-line lexical database. International Journal of Lexicography 3, 235–312 (1990)
Kilgarriff, A.: Gold standard datasets for evaluating word sense disambiguation programs. Computer Speech and Language. Special Use on Evaluation 12, 453–472 (1998)
Bisbal, E., Molina, A., Moreno, L., Pla, F., Saiz-Noeda, M., Sanchís, E.: 3LB-SAT: una herramienta de anotación semántica. Procesamiento del Lenguaje Natural 31, 193–200 (2003)
Hirschman, L.: MUC-7 coreference task definition Message Understanding Conference Proceedings (1997)
Gaizauskas, R., Humphreys, K.: Quantitative evaluation of coreference algorithms in an information extraction system. In: Botley, S.P., McEnery, A.M. (eds.) Corpus-based and Computational Approaches to Discourse Anaphora, pp. 143–167. John Benjamins, Amsterdam (1996)
Mitkov, R., Evans, R., Orasan, C., Barbu, C., Jones, L., Sotirova, V.: Coreference and anaphora: developing annotating tools, annotated resources and annotation strategies. In: Proceedings of the Discourse, Anaphora and Reference Resolution Conference (DAARC 2000), Lancaster, UK (2002)
Carletta, J.: Assessing agreement on classification tasks: the kappa statistics. Computational Linguistics 22, 249–254 (1996)
Vieira, R.: How to evaluate systems against human judgement on the presence of sidagreement? Encontro Preparatório de Avaliação Conjunta do Processamento Computacional do Português (2002), http://acdc.linguateca.pt/aval_conjunta/Faro2002/Renata_Vieira_OnlinePDF.pdf
Kripperdorff, K.: Content Analysis. Sage Publications, London (1985)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Saiz-Noeda, M., Navarro, B., Izquierdo, R. (2004). Semantic-Aided Anaphora Resolution in Large Corpora Development. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds) Advances in Natural Language Processing. EsTAL 2004. Lecture Notes in Computer Science(), vol 3230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30228-5_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-30228-5_28
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23498-2
Online ISBN: 978-3-540-30228-5
eBook Packages: Springer Book Archive