Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

SchistoDB: aSchistosoma mansonigenome resource

2008

Nucleic Acids Research SchistoDB: a Schistosoma mansoni genome resource Adhemar Zerlotini, Mark Heiges, Haiming Wang, Romulo L. V. Moraes, Anderson J. Dominitini, Jerônimo C. Ruiz, Jessica C. Kissinger and Guilherme Oliveira Nucleic Acids Res. First published online 8 Oct 2008; doi:10.1093/nar/gkn681 The full text of this article, along with updated information and services is available online at http://nar.oxfordjournals.org/cgi/content/full/gkn681v1 References This article cites 16 references, 8 of which can be accessed free at http://nar.oxfordjournals.org/cgi/content/full/gkn681v1#BIBL Reprints Reprints of this article can be ordered at http://www.oxfordjournals.org/corporate_services/reprints.html Email and RSS alerting Sign up for email alerts, and subscribe to this journal’s RSS feeds at http://nar.oxfordjournals.org PowerPoint® image downloads Images from this journal can be downloaded with one click as a PowerPoint slide. Journal information Additional information about Nucleic Acids Research, including how to subscribe can be found at http://nar.oxfordjournals.org Published on behalf of Oxford University Press http://www.oxfordjournals.org Downloaded from http://nar.oxfordjournals.org by on 9 October 2008 Nucleic Acids Research Advance Access published October 8, 2008 Nucleic Acids Research, 2008, 1–4 doi:10.1093/nar/gkn681 SchistoDB: a Schistosoma mansoni genome resource Adhemar Zerlotini1,2, Mark Heiges2, Haiming Wang2, Romulo L. V. Moraes1, Anderson J. Dominitini1, Jerônimo C. Ruiz1, Jessica C. Kissinger2,3 and Guilherme Oliveira1,* 1 Laboratory of Cellular and Molecular Parasitology, Instituto René Rachou – FIOCRUZ, Belo Horizonte, MG, Brazil, Center for Tropical and Emerging Global Diseases, University of Georgia and 3Department of Genetics, University of Georgia, Athens, GA, USA 2 Received August 15, 2008; Revised September 19, 2008; Accepted September 23, 2008 ABSTRACT CONTENT OF THE CURRENT RELEASE SchistoDB (http://schistoDB.net/) is a genomic database for the parasitic organism Schistosoma mansoni, one of the major causative agents of schistosomiasis worldwide. It currently incorporates sequences and annotation for S. mansoni in a single user-friendly database. Several genomic scale analyses are available as well as ESTs, oligonucleotides, metabolic pathways and drugs. In this article, we describe the data sets and its analyses, how to query the database and tools available in the website. SchistoDB contains several different S. mansoni data sets and the results of different computational analyses. One highlight of the database is its integration to the metabolic pathway prediction generated using the SRI PathwayTools software (4). Pathway analysis allowed us to select putative drug target candidates. The database also contains all drugs available on KEGG drug database (5), thus enabling us to indicate enzymes known to be targeted in other organisms. Protein topology and cellular location predictions are important tools for the selection of vaccine candidates. We expect that SchistoDB will contribute to efforts towards the identification of drug and vaccine candidates in addition to a more comprehensive analysis of genes. INTRODUCTION Data The flatworm Schistosoma mansoni is one of the major etiological agents of human intestinal schistosomiasis mansoni. The disease affects over 200 million individuals in 74 developing countries and causes high morbidity in infected populations (1). Current strategies of disease control depend heavily on the use of the sole drug available for mass treatment, praziquantel (1). Treatment is effective in single dose and has resulted in decreased morbidity at endemic areas. However, it is highly desirable that control strategies include other countermeasures such as vaccines and new drugs. In addition, Praziquantel is not efficacious against all life cycle forms present in the human host and there is evidence that drug resistance may arise in schistosomes (2). The S. mansoni genome is 270 Mb contained in eight pairs of chromosomes (3). The present work focuses on the computational genome analysis of this parasitic species. SchistoDB provides access to the latest draft genome sequence and annotation of S. mansoni (6,7) (Puerto Rico strain) obtained from the Wellcome Trust Sanger Institute and the mitochondrial genome (8) (NMRI strain). The current database version (Release 2.0) also contains oligonucleotides (9) used in the Agilent 44 K element array widely used by the community and ESTs mapped to the genome. The database provides the results of computational analyses including open reading frames (ORFs) >50 aa and protein feature predictions such as signal peptides, transmembrane domains, hydrophobicity plots and InterPro domains (10), Gene Ontology (11) function predictions, EC Number assignment and BLAST similarities to the NCBI non-redundant protein database and Protein Data Bank database (12). We also loaded the OrthoMCL (13) group of genes from S. mansoni with orthologous genes from 86 other eukaryotic and *To whom correspondence should be addressed. Tel: +55 31 3349 7785; Fax: +55 31 3295 3115; Email: oliveira@cpqrr.fiocruz.br ß 2008 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. 2 Nucleic Acids Research, 2008 Table 1. Data types and sources that have been integrated into SchistoDB and the number of genes that are impacted Data type Data source Gene number Protein coding genes Orthologs GO—Gene Ontology Terms EC—Ezyme Commission Numbers ESTs PDB—Protein Data Bank Sanger OrthoMCL InterProScan SchistoCyc GenBank RSCB PDB 13 339 9516 5667 712 9534 2713 prokaryotic genomes. In addition, drugs provided by KEGG (5) were loaded and their targets were associated to S. mansoni genes that have matching EC numbers. Users are able to visualize all data types in record pages and by queries using the query interface (see Data-mining tools section) (Table 1). Database architecture SchistoDB uses GUS 3.5 to systematically load data into an underlying Oracle database. The open source database schema (GUS—Genomics Unified Schema) uses controlled vocabularies and ontologies to provide wide relations between the different data types and analyses. Online access to SchistoDB occurs via the GUS WDK (Web Development Kit, www.gusdb.org/wdk) which facilitated the creation of the website. The use of GUS significantly facilitates the data loading and analysis process, enabling future and frequent release cycles. GUS and WDK have been used for the development of other databases such as PlasmoDB (14). DATA-MINING TOOLS SchistoDB currently provides approximately 30 different queries of the data and several tools for analyzing, retrieving or viewing the data such as BLAST, Pathway Tools and GMOD Genome Browser (15). Once the appropriate selection of data types to display has been achieved, users can integrate different search results using the ‘Query History’ page. Refining the original query iteratively until a narrow list of genes of interest is obtained, providing a manageable number of targets to validate, a time consuming and expensive process. The data can also be downloaded in flat file format for further analysis. GBrowse genome browser (www.gmod.org) is used in SchistoDB to display gene models, EST alignments, BLAST results, etc. GBrowse enables visualization of the parasite genome and gene models, ORF identification, and facilitates downloading of data in various formats. Different tracks display each analyses or distinct data sets within the genome browser. Schistosoma mansoni metabolic pathways are available through Pathway Tools web interface where several queries provide access to pathways, reactions, enzymes, compounds and other elements. The graphical overview allows the user to visualize the complete set of pathways and highlight specific reactions or perform organism comparison and expression analyses. Mining for candidate drug and vaccine targets will benefit from many of the analyses available. SchistoDB integrates different datasets in a relational database that has permitted us to apply a technique known as genomic filtering (16). Genomic filtering allows the identification of gene products that might be of interest for drug targeting based on several criteria e.g. absence of alternative pathways that consume or produce a given compound, presence or similarity to the host molecule to avoid toxicity, EST evidence, cellular location, known drugs that target the same gene product in other organisms or 3D models of the protein. The presence of signal peptides and transmembrane domains will be important for the identification of vaccine candidates. EST evidence permits the verification if the putative target is expressed in the relevant life cycle stages. The identification of similar proteins with structure information permits homology modeling of S. mansoni proteins which will contribute to the design of new chemicals and the identification of exposed antigenic peptides. The user could perform complex operations with the results, such as use Boolean operators (AND, NOT, OR) to search for proteins that, for example, have a signal peptide, do not have transmembrane domains and are expressed in the schistosomula life cycle stage according to EST evidence, to identify secreted proteins. Figure 1 shows an example where the combination of the queries ‘Genes by PDB similarity’, ‘Genes by Drug Evidence’ and ‘Genes by EST Evidence’ generates a narrow list of 56 genes from a total of 13 339. That means, 56 genes have similar 3D structures in PDB database, drugs known to target the same gene product in other organims and also overlaping ESTs. Clicking on any of the gene identifiers opens a page with information on that gene. The search can be downloaded with userselectable features. FUTURE DIRECTIONS The current version contains only S. mansoni data, so the expansion of the database will start with the integration of data sets from other Schistosome species. We also expect to load and integrate other data types such as SNPs, microarray and SAGE. As new data are added, we will include additional queries and tools to view these data. ACKNOWLEDGEMENTS The authors would like to acknowledge the genome sequencing consortium, TIGR and WTSI for the availability of the genome assembly and annotation of S. mansoni. Without their generous pre-publication contribution, this integrated database resource would not be possible. Special thanks to the GUS developers and to the EupathDB group, that provided essential support to accomplish this work. Nucleic Acids Research, 2008 3 Figure 1. Screenshots from SchistoDB displaying the flow of a query. From the initial page users select from the various query choices for identifying genes, contigs, ORFs or ESTs. From each query a results page is displayed. The results may be downloaded, combined or the query revised. The query history page allows the user to manipulate previous results. Individual genes are displayed in the results page and it links to the gene page. In the example, the gene for ribokinase is displayed. The gene results page includes: annotation, links to SchistoCyc and GeneDB, the gene model, BLAST hits, EST clusters, microarray oligonucleotides, ORFs, EC, Gene Ontology, KEGG Drugs, Orthology, protein domains, the predicted protein, mRNA and coding sequences. 4 Nucleic Acids Research, 2008 FUNDING National Institutes of Health – Fogarty International Center (5D43TW007012-03 to A.Z., R.L.V.M. and A.J.D.). Funding for open access charge: National Institutes of Health – Fogarty International Center (5D43TW007012-03). Conflict of interest statement. None declared. REFERENCES 1. Chitsulo,L., Engels,D., Montresor,A. and Savioli,L. (2000) The global status of schistosomiasis and its control. Acta Trop., 77, 41–51. 2. Pica-Mattoccia,L. and Cioli,D. (2004) Sex- and stage-related sensitivity of Schistosoma mansoni to in vivo and in vitro praziquantel treatment. Int. J. Parasitol., 34, 527–533. 3. Simpson,A.J.G., Sher,A. and McCutchan,T.F. (1982) The genome of Schistosoma mansoni: isolation of DNA, its size, bases and repetitive sequences. Mol. Biochem. Parasitol., 6, 125–137. 4. Karp,P.D., Paley,S. and Romero,P. (2002) The Pathway Tools software. Bioinformatics, 18(Suppl 1), S225–S232. 5. Kanehisa,M., Araki,M., Goto,S., Hattori,M., Hirakawa,M., Itoh,M., Katayama,T., Kawashima,S., Okuda,S., Tokimatsu,T. et al. (2008) KEGG for linking genomes to life and the environment. Nucleic Acids Res., 36, D480–D484. 6. El-Sayed,N.M.A., Bartholomeu,D., Ivens,A., Johnston,D.A. and LoVerde,P.T. (2004) Advances in schistosome genomics. Trends Parasitol., 20, 154–157. 7. Haas,B.J., Berriman,M., Hirai,H., Cerqueira,G.G., Loverde,P.T. and El-Sayed,N.M. (2007) Schistosoma mansoni genome: closing in on a final gene set. Exp. Parasitol., 117, 225–228. 8. Le,T.H., Blair,D. and McManus,D.P. (2000) Mitochondrial DNA sequences of human schistosomes: the current status. Int. J. Parasitol., 30, 283–290. 9. Verjovski-Almeida,S., Venancio,T.M., Oliveira,K.C.P., Almeida,G.T. and DeMarco,R. (2007) Use of a 44k oligoarray to explore the transcriptome of Schistosoma mansoni adult worms. Exp. Parasitol., 117, 236–245. 10. Mulder,N.J., Apweiler,R., Attwood,T.K., Bairoch,A., Bateman,A., Binns,D., Bork,P., Buillard,V., Cerutti,L., Copley,R. et al. (2007) New developments in the InterPro database. Nucleic Acids Res., 35, D224–D228. 11. Harris,M.A., Clark,J., Ireland,A., Lomax,J., Ashburner,M., Foulger,R., Eilbeck,K., Lewis,S., Marshall,B., Mungall,C. et al. (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res., 32, D258–D261. 12. Kouranov,A., Xie,L., de la Cruz,J., Chen,L., Westbrook,J., Bourne,P.E. and Berman,H.M. (2006) The RCSB PDB information portal for structural genomics. Nucleic Acids Res., 34, D302–D305. 13. Chen,F., Mackey,A.J., Stoeckert,C.J. and Roos,D.S. (2006) OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res., 34, D363–D368. 14. Bahl,A., Brunk,B., Crabtree,J., Fraunholz,M.J., Gajria,B., Grant,G.R., Ginsburg,H., Gupta,D., Kissinger,J.C., Labo,P. et al. (2003) PlasmoDB: the Plasmodium genome resource. A database integrating experimental and computational data. Nucleic Acids Res., 31, 212–215. 15. Stein,L.D., Mungall,C., Shu,S., Caudy,M., Mangone,M., Day,A., Nickerson,E., Stajich,J.E., Harris,T.W., Arva,A. et al. (2002) The generic genome browser: a building block for a model organism system database. Genome Res., 12, 1599–1610. 16. McCarter,J.P. (2004) Genomic filtering: an approach to discovering novel antiparasitics. Trends Parasitol., 20, 462–468.