Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Is it worth the effort? Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Corpora with high-quality linguistic annotations are an essential component in many NLP applications and a valuable resource for linguistic research. For obtaining these annotations, a large amount of manual effort is needed, making the creation of these resources time-consuming and costly. One attempt to speed up the annotation process is to use supervised machine-learning systems to automatically assign (possibly erroneous) labels to the data and ask human annotators to correct them where necessary. However, it is not clear to what extent these automatic pre-annotations are successful in reducing human annotation effort, and what impact they have on the quality of the resulting resource. In this article, we present the results of an experiment in which we assess the usefulness of partial semi-automatic annotation for frame labeling. We investigate the impact of automatic pre-annotation of differing quality on annotation time, consistency and accuracy. While we found no conclusive evidence that it can speed up human annotation, we found that automatic pre-annotation does increase its overall quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. This problem is also known in the context of resources that are collaboratively constructed via the web (Kruschwitz et al. 2009).

  2. In FrameNet, the participant roles are called frame elements, while in a more general context the term semantic roles is commonly used. Also, in the FrameNet lexicon, the lexical entries are called lexical units. A lexical unit (LU) is a pairing of a lemma and a frame that it evokes. Most of FrameNet’s lemmas consist of a single morphological lexeme but multi-word expressions consist of several. In this paper, we will sometimes allow ourselves to use the term word senses to refer to the frames a lemma evokes because, as noted by Erk (2005), the process of frame assignment can be treated as a word sense disambiguation task.

  3. http://www.natcorp.ox.ac.uk/.

  4. FrameNet analyzes the English lexicon from an encoding point-of-view: given a frame, it finds words that evoke that frame. FrameNet proceeds from frame to frame, rather than analyzing all senses of a given lemma. This means that as long as FrameNet is not complete, polysemous words may not have all their senses covered by FrameNet.

  5. One exception concerns metaphorical usages which may not be covered well by any of the frames provided by FrameNet for the lemma. In those cases, our annotators occasionally left the target word unannotated (see Sect. 4.2).

  6. FrameNet release 1.3 was released in June 2006. It contained 795 frames and listed 10,195 lexical units. The successor version 1.5 with 1,019 frames and 11,829 lexical units became available only as this paper went to press.

  7. We avoided the order NSE because in that order the pre-annotation quality would have improved between all adjacent batches (from ’no annotation’ to ’state-of-the-art annotation’ to ’enhanced annotation’), in which case we might have had a confounding effect between pre-annotation quality and a possible ongoing training effect. From the remaining five theoretically possible orders we randomly selected three subject to the constraint that each annotation condition came first for exactly one group.

  8. The small number of instances for the enhanced pre-annotation (batch 1:8 sentences, batch 2:10 sentences, batch 3:14 sentences) did not allow for a reliable analysis of the incorrectly annotated sentences with enhanced pre-annotation.

  9. The annotation of frame and role assignment was done as a combined task, therefore we do not report separate results for annotation time for semantic role assignment.

References

  • Baker, C. F., Fillmore, C. J., & Lowe, J. B. (1998). The Berkeley FrameNet project. In Proceedings of the 17th international conference on computational linguistics (pp. 86–90). Morristown, NJ, USA: Association for Computational Linguistics.

  • Baldridge, J., & Osborne, M. (2004). Active learning and the total cost of annotation. In Proceedings of EMNLP.

  • Brants, T., & Plaehn, O. (2000). Interactive corpus annotation. In Proceedings of LREC-2000.

  • Burchardt, A., Erk, K., Frank, A., Kowalski, A., & Padó, S. (2006). SALTO—a versatile multi-level annotation tool. In Proceedings of LREC.

  • Chiou, F. D., Chiang, D., & Palmer, M. (2001). Facilitating treebank annotation using a statistical parser. In Proceedings of HLT-2001.

  • Chou, W. C., Tsai, R. T. H., Su, Y. S., Ku, W., Sung, T. Y., & Hsu, W. L. (2006). A semi-automatic method for annotating a biomedical proposition bank. In Proceedings of FLAC-2006.

  • Dandapat, S., Biswas, P., Choudhury, M., & Bali, K. (2009). Complex linguistic annotation—no easy way out! A case from Bangla and Hindi POS labeling tasks. In Proceedings of the third linguistic annotation workshop (pp. 10–18). Suntec, Singapore: Association for Computational Linguistics.

  • Erk, K. (2005). Frame assignment as word sense disambiguation. In Proceedings of the 6th international workshop on computational semantics (IWCS-6). The Netherlands: Tilburg.

  • Erk, K., & Pado, S. (2006). Shalmaneser—a flexible toolbox for semantic role assignment. In Proceedings of LREC, Genoa, Italy.

  • Fillmore, Charles J. (1982). Frame semantics. In The Linguistic Society of Korea (Eds.), Linguistics in the morning calm (pp. 111–137). Seoul: Hanshin.

  • Fillmore, C. J., & Baker, C. (2010). A frame approach to semantic analysis. In B. Heine & H. Narrog (Eds.), Oxford handbook of linguistic analysis. Oxford: Oxford University Press.

    Google Scholar 

  • Fillmore, C. J., Petruck, M. R., Ruppenhofer, J., & Wright, A. (2003). FrameNet in action: The case of attaching. International Journal of Lexicography, 16(3), 297–332.

    Article  Google Scholar 

  • Ganchev, K., Pereira, F., Mandel, M., Carroll, S., & White, P. (2007). Semi-automated named entity annotation. In Proceedings of the linguistic annotation workshop (pp. 53–56). Prague, Czech Republic: Association for Computational Linguistics.

  • Kruschwitz, U., Chamberlain, J., & Poesio, M. (2009). (Linguistic) science through web collaboration in the ANAWIKI project. In Proceedings of WebSci’09.

  • Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.

    Google Scholar 

  • Meurers, W. D. (2005). On the use of electronic corpora for theoretical linguistics. Case studies from the syntax of German. Lingua, 115(11), 1619–1639.

    Article  Google Scholar 

  • Meurers, W. D., & Müller, S. (2007). Corpora and syntax (article 44). In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics. Berlin: Mouton de Gruyter.

    Google Scholar 

  • Mueller, C., Rapp, S., & Strube, M. (2002). Applying co-training to reference resolution. In Proceedings of 40th annual meeting of the association for computational linguistics (pp. 352–359). Philadelphia, Pennsylvania, USA: Association for Computational Linguistics.

  • Ng, V., & Cardie, C. (2003). Bootstrapping coreference classifiers with multiple machine learning algorithms. In Proceedings of the 2003 conference on empirical methods in natural language processing (EMNLP-2003).

  • Rehbein, I., Ruppenhofer, J., & Palmer, A. (2010). Bringing active learning to life. In Proceedings of the 23rd international conference on computational linguistics (COLING 2010), Beijing, China.

  • Xue, N., Chiou, F. D., & Palmer, M. (2002). Building a large-scale annotated Chinese corpus. In Proceedings of the 19th international conference on computational linguistics (COLING 2002).

Download references

Acknowledgments

We would like to thank Berry Claus for extensive discussion and comments on our design. We are also grateful to our annotators Markus Dräger, Lisa Fuchs, and Corinna Schorr and to the anonymous reviewers for their insightful comments and useful feedback. Ines Rehbein and Josef Ruppenhofer are supported by the German Research Foundation DFG under grant PI 154/9-3 and Caroline Sporleder as part of the Cluster of Excellence Multimodal Computing and Interaction (MMCI).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ines Rehbein.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rehbein, I., Ruppenhofer, J. & Sporleder, C. Is it worth the effort? Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation. Lang Resources & Evaluation 46, 1–23 (2012). https://doi.org/10.1007/s10579-011-9170-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-011-9170-z

Keywords