Abstract
This paper reports on linguistic features and decisions that we find vital in the process of annotation and resolution of coreference for highly inflectional languages. The presented results have been collected during preparation of a corpus of general direct nominal coreference of Polish. Starting from the notion of a mention, its borders and potential vs. actual referentiality, we discuss the problem of complete and near-identity, zero subjects and dominant expressions. We also present interesting linguistic cases influencing the coreference resolution such as the difference between semantic and syntactic heads or the phenomenon of coreference chains made of indefinite pronouns.
The work reported here was carried out within the Computer-based methods for coreference resolution in Polish texts (CORE) project financed by the Polish National Science Centre (contract number 6505/B/T02/2011/40). The paper is also co-founded by the European Union from resources of the European Social Fund, Project PO KL “Information technologies: Research and their interdisciplinary applications”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Osenova, P., Simov, K.: BTB-TR05: BulTreeBank Stylebook. BulTreeBank Version 1.0. Technical Report BTB-TR05, Linguistic Modelling Laboratory, Bulgarian Academy of Sciences, Sofia, Bulgaria (2004)
Nedoluzhko, A., Mírovský, J., Ocelák, R., Pergler, J.: Extended Coreferential Relations and Bridging Anaphora in the Prague Dependency Treebank. In: Proceedings of the 7th Discourse Anaphora and Anaphor Resolution Colloquium (DAARC 2009), Goa, India, pp. 1–16. AU-KBC Research Centre, Anna University, Chennai (2009)
Recasens, M., Martí, M.A.: AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan. Language Resources and Evaluation 44(4), 315–345 (2010)
Korzen, I., Buch-Kromann, M.: Anaphoric relations in the Copenhagen Dependency Treebanks. In: Proceedings of DGfS Workshop, Göttingen, Germany, pp. 83–98 (2011)
Poesio, M., Artstein, R.: Anaphoric Annotation in the ARRAU Corpus. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, pp. 1170–1174. European Language Resources Association (2008)
Recasens, M.: Coreference: Theory, Annotation, Resolution and Evaluation. PhD thesis, Department of Linguistics, University of Barcelona, Barcelona, Spain (2010)
Linguistic Data Consortium: ACE (Automatic Content Extraction) Spanish Annotation Guidelines for Entities (2006), http://projects.ldc.upenn.edu/ace/docs/Spanish-Entities-Guidelines_v1.6.pdf (accessed on February 18, 2013)
Hinrichs, E.W., Kübler, S., Naumann, K.: A Unified Representation for Morphological, Syntactic, Semantic, and Referential Annotations. In: Proceedings of the ACL Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, Ann Arbor, Michigan, USA, pp. 13–20 (2005)
Magnini, B., Pianta, E., Girardi, C., Negri, M., Romano, L., Speranza, M., Lenzi, V.B., Sprugnoli, R.: I-CAB: the Italian Content Annotation Bank. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006), Genova, Italy, pp. 963–968. European Language Resources Association (2006)
Iida, R., Komachi, M., Inui, K., Matsumoto, Y.: Annotating a Japanese Text Corpus with Predicate-Argument and Coreference Relations. In: Proceedings of the Linguistic Annotation Workshop (LAW 2007), pp. 132–139. Association for Computational Linguistics, Stroudsburg (2007)
Pradhan, S.S., Ramshaw, L., Weischedel, R., MacBride, J., Micciulla, L.: Unrestricted Coreference: Identifying Entities and Events in OntoNotes. In: Proceedings of the First IEEE International Conference on Semantic Computing (ICSC 2007), pp. 446–453. IEEE Computer Society, Washington, DC (2007)
Weischedel, R., Pradhan, S., Ramshaw, L., Kaufman, J., Franchini, M., El-Bachouti, M.: OntoNotes Release 4.0 (2010), http://www.bbn.com/NLP/OntoNotes (accessed on February 18, 2013)
Hendrickx, I., Bouma, G., Daelemans, W., Hoste, V., Kloosterman, G., Mineur, A.M., Van Der Vloet, J., Verschelde, J.L.: A Coreference Corpus and Resolution System for Dutch. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, pp. 144–149. European Language Resources Association, ELRA (2008)
Recasens, M., Hovy, E., Martí, M.A.: Identity, non-identity, and near-identity: Addressing the complexity of coreference. Lingua 121(6) (2011)
Recasens, M., Hovy, E., Marti, M.A.: A Typology of Near-Identity Relations for Coreference (NIDENT). In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta, pp. 149–156. European Language Resources Association (2010)
Cohen, J.: A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20(1), 37–46 (1960)
Bennet, E.M., Alpert, R., Goldstein, A.C.: Communications through limited response questioning. Public Opinion Quarterly 18, 303–308 (1954)
Krippendorff, K.H.: Content Analysis: An Introduction to Its Methodology, 2nd edn. Sage Publications, Inc. (December 2003)
Passonneau, R.J.: Computing reliability for coreference annotation. In: LREC. European Language Resources Association (2004)
Ogrodniczuk, M., Głowińska, K., Kopeć, M., Savary, A., Zawisławska, M.: Interesting Linguistic Features in Coreference Annotation of an Inflectional Language. In: Sun, M., Liu, T., Sun, L., Zhang, M., Sun, M., Lin, D., Wang, H. (eds.) CCL and NLP-NABD 2013. LNCS (LNAI), vol. 8202, pp. 97–108. Springer, Heidelberg (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ogrodniczuk, M., Głowińska, K., Kopeć, M., Savary, A., Zawisławska, M. (2013). Interesting Linguistic Features in Coreference Annotation of an Inflectional Language. In: Sun, M., Zhang, M., Lin, D., Wang, H. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2013 2013. Lecture Notes in Computer Science(), vol 8202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41491-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-41491-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41490-9
Online ISBN: 978-3-642-41491-6
eBook Packages: Computer ScienceComputer Science (R0)