Abstract
The present paper describes a large-scale lexical resource for the biology domain designed both for human and for machine use. This lexicon aims at semantic interoperability and extendability, through the adoption of ISO-LMF standard for lexical representation and through a granular and distributed encoding of relevant information. The first part of this contribution focuses on three aspects of the model that are of particular interest to the biology community: the treatment of term variants, the representation on bio events and the alignment with a domain ontology. The second part of the paper describes the physical implementation of the model: a relational database equipped with a set of automatic uploading procedures. Peculiarity of the BioLexicon is that it combines features of both terminologies and lexicons. A set verbs relevant for the domain is also represented with full details on their syntactic and semantic argument structure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Calzolari, N., Bertagna, F., Lenci, A., Monachini, M. (eds.): Standards and Best Practices for Multilingual Computational Lexicons. MILE (The Multilingual ISLE Lexical Entry). ISLE CLWG Deliverable D2.2 & 3.2 Pisa (2003)
Carroll, J., McCarthy., D.: Word sense disambiguation using automatically acquired verbal preferences. Computers and the Humanities. Senseval Special Issue 34(1-2) (2000)
Cimiano, P., Hotho, A., Staab, S.: Clustering Concept Hierarchies from Text. In: Proceedings of the LREC 2004, Lisbon, Portugal (2004)
Faure, D., Nedellac, C.: A corpus-based conceptual clustering method for verb frames and ontology. In: Velardi, P. (ed.) Proceedings of the LREC Workshop on Adapting lexical and corpus resources to sublanguages and applications. ELRA (1998)
Ferrucci, D., Lally, A.: UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engeneering 10(3-4), 327–348 (2004)
Fersøe, H.: Validation Manual for Lexica. Technical Report. ELRA. Release 2.0 (2004)
Fersøe, H., Monachini, M.: ELRA Validation Methodology and Standard Promotion for Linguistic Resources. In: Proceedings of the LREC 2004, Lisbon, Portugal, pp. 941–944 (2004)
Francopulo, G., et al.: The relevance of standards for research infrastructure. In: Proceedings of the LREC 2006. Genoa, Italy (2006b)
Hahn, U., Markó, K.: Joint knowledge capture for grammars and ontologies. In: Proceedings of the 1st international conference on knowledge capture, Victoria, British Columbia, Canada (2001)
Harkema, H., et al.: A Large Scale Terminology Resource for Biomedical Text Processing. In: Proceedings of the BioLINK 2004, pp. 53–60. ACL (2001)
Hindle, D.: Noun classification from predicate argument structures. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (1990)
ISO-12620. Terminology and other content language resources- Data Categories- Specifications of data categories and management of a Data Category Registry for language resources. Technical Report. ISO/TC37/SC3/WG4 (2006)
Kors, J.A., et al.: Combination of Genetic Databases for Improving Identification of Genes and Proteins in Text. In: Proceedings of the BioLINK 2005. ACL (2005)
Lapata, M., Brew, C.: Using Subcategorization to Resolve Verb Class Ambiguity. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, MD, pp. 397–404 (1999)
Nenadic, G., Ananiadou, S., McNaught, J.: Enhancing Automatic Term Recognition through Term Variation. In: Proceedings of the 20th Coling, Geneve, Switzerland (2004)
Pereira, F., Tishby, N., Lee, L.: Distributional clustering of English Words. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, pp. 183–190. ACL (1993)
Ruimy, N., et al.: A computational semantic lexicon of Italian: SIMPLE. Linguistica Computazionale XVIII-XIX, 821–864 (2003)
Spasic, I., Nenadic, G., Ananiadou, S.: Using Domain-Specific Verbs for Term Classification. In: Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp. 17–24 (2003)
SPECIALIST Lexicon and Lexical Tools. Natural Library of Medicine. UMLS Release Documentation, http://www.nlm.nih.gov/pubs/factsheets/umlslex.html
Wright, S.E.: A global data category registry for interoperable language resources. In: Proceedings of the LREC 2004, Lisbon, Portugal (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Quochi, V., Del Gratta, R., Sassolini, E., Bartolini, R., Monachini, M., Calzolari, N. (2009). A Standard Lexical-Terminological Resource for the Bio Domain. In: Vetulani, Z., Uszkoreit, H. (eds) Human Language Technology. Challenges of the Information Society. LTC 2007. Lecture Notes in Computer Science(), vol 5603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04235-5_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-04235-5_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04234-8
Online ISBN: 978-3-642-04235-5
eBook Packages: Computer ScienceComputer Science (R0)