Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2392444.2392484guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Empirical evaluation of semi-automated XML annotation of text documents with the GoldenGATE editor

Published: 16 September 2007 Publication History

Abstract

Digitized scientific documents should be marked up according to domain-specific XML schemas, to make maximum use of their content. Such markup allows for advanced, semantics-based access to the document collection. Many NLP applications have been developed to support automated annotation. But NLP results often are not accurate enough; and manual corrections are indispensable. We therefore have developed the GoldenGATE editor, a tool that integrates NLP applications and assistance features for manual XML editing. Plain XML editors do not feature such a tight integration: Users have to create the markup manually or move the documents back and forth between the editor and (mostly command line) NLP tools. This paper features the first empirical evaluation of how users benefit from such a tight integration when creating semantically rich digital libraries. We have conducted experiments with humans who had to perform markup tasks on a document collection from a generic domain. The results show clearly that markup editing assistance in tight combination with NLP functionality significantly reduces the user effort in annotating documents.

References

[1]
Mikheev, A., Moens, M., Grover, C.: Named Entity Recognition without Gazetteers. In: Proceedings of EACL, Bergen, Norway (1999).
[2]
Miller, D., Boisen, S., Schwartz, R., Stone, R., Weischedel, R.: Named Entity Extraction from Noisy Input: Speech and OCR. In: Christodoulakis, D.N. (ed.) NLP 2000. LNCS (LNAI), vol. 1835, Springer, Heidelberg (2000).
[3]
Sautter, G., Agosti, D., Böhm, K.: Semi-automated XML Markup of Biosystematics Legacy Literature with the GoldenGATE Editor. In: Proceedings of PSB, Weilea, HI, USA (2007).
[4]
Sautter, G., Agosti, D., Böhm, K.: A Combining Approach to Find All Taxon Names (FAT) in Legacy Biosystematics Literature, Biodiversity Informatics Journal 3 (2006).
[5]
Tichy, W.: Hints for Reviewing Empirical Work in Software Engineering. Journal of Empirical Softw. Eng. 5, 309-312 (2000).
[6]
Müller, M., Padberg, F.: An Empirical Study about the Feelgood Factor in Pair Programming. Int. Symp. on Softw. Metr. 10, 151-158 (2004).
[7]
IDM Computer Solutions Inc., www.ultraedit.com.
[8]
oxygen/, www.oxygenxml.com.
[9]
Altova GmbH, www.altova.com.
[10]
The OpenNLP project, www.opennlp.org.
[11]
LingPipe, www.alias-i.com/lingpipe.
[12]
Rabiner, L., Juang, B.: An Introduction to Hidden Markov Models. IEEE ASSP Magazine 3(1), 4-16 (1986).
[13]
GATE, General Architecture for Text Engineering, gate.ac.uk.
[14]
Lucene, A.: lucene.apache.org/java/docs.
[15]
WordFreak, http://wordfreak.sourceforge.net/.
[16]
Knowtator, http://bionlp.sourceforge.net/Knowtator/index.shtml.
[17]
Cohen, J.: Statistical Power Analysis for the Behavioral Sciences, 2nd edn., Erlbaum, Hillsdale, NJ. (1988).
[18]
Christensen, L.: Experimental Methodology, 10th edn. Pearson, Boston, MA (2007).

Cited By

View all
  • (2015)Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical StudyACM Transactions on Software Engineering and Methodology10.1145/269968824:4(1-49)Online publication date: 2-Sep-2015
  • (2013)Does automated white-box test generation really help software testers?Proceedings of the 2013 International Symposium on Software Testing and Analysis10.1145/2483760.2483774(291-301)Online publication date: 15-Jul-2013
  • (2010)ProcessTronProceedings of the 10th annual joint conference on Digital libraries10.1145/1816123.1816127(21-28)Online publication date: 21-Jun-2010
  • Show More Cited By
  1. Empirical evaluation of semi-automated XML annotation of text documents with the GoldenGATE editor

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ECDL'07: Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
    September 2007
    585 pages
    ISBN:3540748504
    • Editors:
    • László Kovács,
    • Norbert Fuhr,
    • Carlo Meghini

    Sponsors

    • Hungarian Academy of Sciences: The Hungarian Academy of Sciences

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 16 September 2007

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 19 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2015)Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical StudyACM Transactions on Software Engineering and Methodology10.1145/269968824:4(1-49)Online publication date: 2-Sep-2015
    • (2013)Does automated white-box test generation really help software testers?Proceedings of the 2013 International Symposium on Software Testing and Analysis10.1145/2483760.2483774(291-301)Online publication date: 15-Jul-2013
    • (2010)ProcessTronProceedings of the 10th annual joint conference on Digital libraries10.1145/1816123.1816127(21-28)Online publication date: 21-Jun-2010
    • (2009)User Evaluation Study of a Tagging Approach to Semantic MappingProceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications10.1007/978-3-642-02121-3_46(623-637)Online publication date: 31-May-2009
    • (2008)Biography as events in time and spaceProceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems10.1145/1463434.1463535(1-2)Online publication date: 5-Nov-2008

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media