GNAT
Gene/protein named entity recognition and normalization software
GNAT is a library and web service capable of performing gene entity NER and normalization of biomedical articles. Mentions of genes and proteins in the articles are linked to to Entrez Gene identifiers. GNAT is available both for local download (suitable for large-scale processing) and as a web service (suitable for more limited processing or testing). A combination of local and remote processing is also available, where CPU-heavy operations are performed locally and memory-intensive operations are performed remotely (this is suitable for large-scale processing where a large amount of memory is not available). GNAT uses LINNAEUS (Gerner et al., 2010) for species detection and BANNER (Leaman et al., 2008) in one part of its false positive filtering process.
GNAT is described in the following papers:
- Hakenberg J, Gerner M, Haeussler M, Solt I, Plake C, Schroeder M, Gonzalez G, Nenadic G, Bergman C: The GNAT library for local and remote gene mention normalization. Bioinformatics 27(19):2769-71, 2011 [html] [pdf].
- Solt I, Gerner M, Thomas P, Nenadic G, Bergman CM, Leser U, Hakenberg J: Gene mention normalization in full texts using GNAT and LINNAEUS. In Proceedings of the BioCreative III Workshop, Bethesda, USA, 2010 [pdf]
- Hakenberg J, Plake C, Royer L, Strobelt H, Leser U, Schroeder M: Gene mention normalization and interaction extraction with context models and sentence motifs. Genome Biology 9:S14, 2008 [html] [pdf]
- Hakenberg J, Plake C, Leaman R, Schroeder M, Gonzales G: Inter-species normalization of gene mentions with GNAT. Bioinformatics 24:i126-i132, 2008 [html] [pdf]
For questions, suggestions or bug reports, please contact Jörg Hakenberg, Martin Gerner or Casey Bergman. The files on this webpage can also be accessed from this project's SourceForge project page.
|
|
News
- 07 Sep 2012 - minor update 1.22, GNAT now parses bulks of Medline XML files downloaded from NCBI via FTP (medline12n0123.xml.gz)
- 28 Aug 2012 - GNAT release 1.21, incl. update of SQL backend database: in particular, GO codes, terms, and GeneRIFs; reads PubMed/Medline XML and returns inline annotations
- 31 Jul 2011 - our application note in Bioinformatics that introduces the SourceForge version of the GNAT library was just accepted for publication in Bioinformatics
- 21 July 2011 - released of GNAT version 1.1: minor bug fixes; tests to run against BioCreative data; switched dictionaries for S. cer. and E. coli to S. cer. S228c and E. coli str. K-12 substr. MG1655, respectively, since genes in Entrez Gene have been transferred to these origins; dictionary update included an SQL table as well.
Downloads
- gnat-1.22.tar.gz (43 MB): GNAT library for performing local gene NER and normalization. The archive also contains documentation, source code, and javadoc documentation.
- gnat_sql_data-1.2.tar.gz (492 MB): Data required to build the SQL tables that are necessary for local GNAT normalization.
- gnat_dictionaries-1.1.tar.gz (125 MB): Pre-compiled BRICS automaton dictionaries for 10 species (human, mouse, rat, E. coli, fruitfly, baker's yeast, A. thaliana, cow, zebrafish, and chicken), that are necessary for local GNAT entity recognition. Also included is an .mwt file, which can be used to compile BRICS automatons for other species.
- or go to the SourceForge project page to download files and access the SVN
Remote web service availability
GNAT can be accessed remotely through its web service, as a RESTFUL service by posting data to
http://textmining.ls.manchester.ac.uk:8081/ (
example). For documentation describing how to use the service, see
this location. In case of questions, don't hesitate to contact us.
Other references
- Gerner, M., Nenadic, G. and Bergman, C. M. (2010) LINNAEUS: a species name identification system for biomedical literature. BMC Bioinformatics 11:85
- Leaman, R. and G. Gonzales. (2008) BANNER: An executable survey of advances in biomedical named entity recognition. Pac Symp Biocomput. 2008:652-663.
Last updated: Dec 18th, 2013.