Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Leishmania Genomics: Where Do We Stand?

2006

NLM Citation: Uliana SRB, Ruiz JC, Cruz AK. Leishmania Genomics: Where Do We Stand?. 2006 Oct 12 [Updated 2007 Aug 24]. In: Gruber A, Durham AM, Huynh C, et al., editors. Bioinformatics in Tropical Disease Research: A Practical and Case-Study Approach [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2008. Chapter B02. Bookshelf URL: https://www.ncbi.nlm.nih.gov/books/ Chapter B02. Leishmania Genomics: Where Do We Stand? Silvia R.B Uliana, MD, PhD,1 Jeronimo C. Ruiz, PhD,2 and Angela K. Cruz, PhD3 Created: October 12, 2006; Updated: August 24, 2007. A Short History of the Leishmania Genome Project More than 20 different species of the genus Leishmania are known to be pathogenic to humans. These protozoan parasites are kinetoplastids of the Trypanosomatidae family and depend on a female Phlebotominae insect and on a variety of vertebrate hosts to complete its vital cycle. Collectively, Leishmania spp. are responsible for one of the world’s major communicable diseases, which lead the World Health Organization to include leishmaniasis among the six major diseases targeted for intensive research and control efforts. It is estimated that more than 2 million new cases of leishmaniasis occur each year in 88 countries (http://www.who.int/health-topics/ leishmaniasis.htm). Leishmaniasis is a spectral disease with multifaceted clinical manifestations, varying from mild and frequently self-healing cutaneous lesions to severe mucocutaneous ulcers or to visceral manifestations, which can lead to death. Besides its unquestionable medical relevance, this ancient eukaryote is an important biological model. Leishmania are mostly diploid organisms with no apparent sexual cycle displaying some uncommon biochemical, genetic, and morphological features that are either unique to the order Kinetoplastida or are more frequently used by these organisms than by any other. These features include a unique mitochondrial DNA organization, the kinetoplast, extensive mitochondrial DNA editing (1), glycosomes (2), polycistronic transcription (3), trans-splicing (4, 5), and GPI anchoring of membrane proteins, among others (6-8). Because leishmaniasis is a Third World endemic disease, there is almost no interest on the part of the pharmaceutical companies to search for low-toxicity drugs or invest in the development of safe and effective vaccines to fight the disease. Therefore, deciphering the genomic sequence of the parasite was taken as an option to optimize approaches and accelerate the understanding of the biology of this powerful human parasite and its interaction with the host. The ambience of several successful, ongoing genome projects, particularly those of pathogenic organisms, was the background to put forward a genome project for Leishmania and other relevant Trypanosomatids (9). The first Parasite Genome Network Planning Meeting was held in April 1994 in Rio de Janeiro, Brazil, and was sponsored by FIOCRUZ and UNICEF/UNDP/World Bank/WHO Special Programme for Research and Training in Tropical Diseases (TDR). The Genome Projects of Trypanosoma cruzi, Trypanosoma brucei, and Leishmania major were launched at the Meeting, and reference strains for each one of the parasites were chosen for the Author Affiliations: 1 Instituto de Ciências Biomédicas, Universidade de São Paulo; Email: srbulian@icb.usp.br. 2 Fundação Oswaldo Cruz, Centro de Pesquisas René Rachou; Email: jcruiz@cpqrr.fiocruz.br. 3 Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo; Email: akcruz@fmrp.usp.br. 2 Bioinformatics in Tropical Disease Research sequencing project. A Leishmania Genome Network (LGN) was established, and several laboratories from all over the world got involved in the initiative. The LGN major aims were: 1) to develop a physical map of the parasite’s genome; 2) to curate, analyze, and disseminate parasite genome data; and 3) to characterize the entire genomic sequence of the organism. These core objectives would allow progress in different areas from discovery of new drug targets, diagnostic and/or vaccine approaches, and improvement of knowledge on the parasite biology to disseminate expertise in genomics to different countries. Pulsed-field gel electrophoresis (PFGE) was used to generate "molecular karyotypes" for distinct Leishmania species and strains. A combination of PFGE and hybridization analysis revealed that an observed chromosome size variation was not a consequence of clonal heterogeneity of strains, and that the variation was relatively small (10 to 20%) and mainly attributable to expansion and deletion of subtelomeric regions (10). In landmark studies, Wincker and colleagues (10) defined 36 physical linkage groups corresponding to the complete set of chromosomes, suggesting a genome size of 35 Mb. They not only showed that these physical linkage groups were highly conserved in Old World species but were also mostly conserved in New World species (11). The LGN group extended the analysis to construct the Leishmania major Friedlin (LmjF) reference strain molecular karyotype (12). These data are evidence of the conservation of syntenic (conservation of gene order) groups among species. Understanding the extent of conservation in genome organization was a very important step at that time, confirming that the use of a single reference strain for all Leishmania species was an appropriate decision. Another important work front during the mid- and late-1990s was the construction of a physical map for the LmjF chromosomes. This approach was chosen to allow the selection of a tileset of clones necessary to span the entire genome. The map was built from cloned DNAs of large insert genomic libraries (13). The selected clones would then be sequenced, and a large proportion of the mapping phase of the genome project involved the systematic choice of contig end-probing for contig-joining activities. At that point, given hardware and software features and the lack of funds, full LmjF genome shotgun sequencing was not considered a feasible strategy. A parallel activity during the early years of the Leishmania Genome Project was the sequencing of expressed genomic libraries, seen at that time as a fast and simple means toward gene discovery. Single-pass sequencing of clones from cDNA libraries from different life cycle stages was conducted, generating more than 2,500 Expressed Sequence Tags (ESTs). EST sequencing was originally identified as an efficient, cost-effective method for gene discovery in relatively large genomes, such as the protozoan ones. Nevertheless, other studies demonstrated that T. brucei and Leishmania gene-dense genomes and non-intronic genes would make genomic sequencing a much better option, even for gene discovery purposes (14, 15). The complete sequencing of chromosome 1 was initiated in 1996, as a pilot project of physical map-based sequencing, and was published by Myler and coworkers (16) in 1999. A striking polarity of putative transcription was shown and confirmed for the genomes of Leishmania, T. brucei, and T. cruzi. The so-called Directional Gene Clusters (DGCs), a unique characteristic of gene organization of Trypanosomatids to be discussed further (below), may be a consequence of the polycistronic mode of transcription and the unusual linkage between transcription and RNA processing involving a coupled mechanism of trans-splicing and polyadenylation (17, 18). The rest of the genome came afterwards, and the 33.6-Mb sequence was obtained by a combination of approaches ranging from hierarchical sequencing strategy, or the clone-by-clone approach, and Whole Chromosome Shotgun (WGS), which involves an initial separation step of individual or co-migrating chromosomes by PFGE. In silico analysis of the L. major genome predicted 911 RNA genes, 39 pseudogenes, and 8,370 protein-coding genes. The entire Leishmania genome sequence was published along with the Trypanosoma brucei and T. cruzi genomes in the July 2005 Science issue (19). This marks the conquest of an international Leishmania Genomics: Where Do We Stand? 3 effort that, as already said (20), must not be taken as a final achievement but as the beginning of several possible investigation routes. The Leishmania spp. genomes are now being exploited with a comparative genomics approach: the genomes of L. infantum and L. braziliensis have been completely sequenced and annotated (21). It is worthwhile mentioning that sequencing of each of these two genomes was accomplished in less than 1 year. Moreover, all the recent software and hardware improvements allowed all of the sequencing to be completed by the whole genomic DNA shotgun sequencing approach. Particular Features of Leishmania Genome Organization in Comparison with Other Eukaryotic Genomes Mapping and sequencing of the L. major genome allowed the definition of 36 discrete chromosomes composing the genome of this protozoan parasite. Different Leishmania species have 34 to 36 chromosomes, varying in size from 268 to 2,680 kb. As mentioned above, chromosomes do not condense during cell division, and the genome is largely diploid and homozygous. The 33.6-Mb haploid genome of L. major has an overall G+C content of 59.7%. The 8,370 predicted proteincoding genes are densely arranged and occupy about 48% of the genome (19) (http://www.genedb.org/genedb/ leish/). Many protein-coding genes are grouped in multicopy families represented in tandem arrays. L. major genome is relatively poor in repetitive sequences when compared with the other Trypanosomatids (19, 22). Interspersed elements are not as frequent, and active retrotransposable elements have not been found (23). Major repeated elements in L. major are found exclusively in subtelomeric regions (24). As mentioned above, contraction and expansion of these subtelomeric regions are responsible for the size variation observed between lines and chromosome homologs (25). Leishmania telomeres contain variable numbers of the hexameric repeat 5'-TAGGGT-3', variable numbers of the octameric repeat 5'-TGGTCATG-3', and a sequence immediately adjacent to the telomere, common to all Leishmania species and named Leishmania conserved telomere-associated sequence (LCTAS) (26). The LCTAS can be present as a single copy at the end of the chromosome or in repeated copies intercalated by the telomere end-most hexameric repeat. In the subtelomeric region, some other, less conserved repeated elements are found, followed by a barren region (27). One of the most remarkable and particular characteristics of Leishmania and Trypanosoma genomes is the organization of genes into long arrays of tandem coding sequences, directionally positioned and transcribed into polycistronic primary transcripts. These DGCs are not homogeneous in size, varying from a few kb to more than 1.2 Mb. Sequences localized between two divergent transcription units are called strand-switch regions, and the nucleotide sequences in these regions do not display a particular consensus sequence for any RNA polymerase known promoter but show a composition bias with increased AT content in relation to other chromosomal regions (28). Transcription by RNA polymerase II (RNA pol II) in these organisms is unique. After the completion of chromosome 1 (chr1) of L. major (16), the study of its transcriptional activity showed that coding strand-specific RNA pol II transcribing activity was initiated in the strand-switch region and proceeded bidirectionally (7). Similarly, on chr3, transcription starts at discrete regions (perhaps not exclusively, but more intensely) and proceeds along the DGC (6). No substantial sequence similarity was found between the regions of transcription initiation identified on LmjF chr1 and chr3. Previous work had shown that RNA pol II initiates transcription at random in Leishmania, not requiring a specific promoter (29). Nevertheless, in intact chromosomes, RNA pol II transcription is predominantly, but not exclusively, strand specific and unidirectional (30). As a result, the full complements of DGCs are transcribed at the same rate. 4 Bioinformatics in Tropical Disease Research Interestingly, even if a characteristic RNA pol II promoter region does not seem to exist preceding the long DGCs, a typical and discrete pol II promoter has been found and characterized in the intergenic sequences of the tandem array encoding about 120 copies of the spliced leader RNA (SL RNA) gene (31-33). Several conserved genes encoding subunits of basal transcription factors for RNA pol II have been identified in the Leishmania genome, but some homologs for several of those have not been found thus far, making the mechanism of RNA transcription in these organisms a theme for further investigation. The tandem organization of ribosomal RNA (rRNA) genes and the characteristics of RNA polymerase I (RNA pol I) promoters resemble the structure present in other organisms. One large tandem array contains approximately 60 copies of the genes for 24S, 18S, and 5.8S rRNA genes, whereas 5S rRNA genes are dispersed in 11 different loci. RNA pol I promoters are present and exhibit the expected characteristics: they are strong promoters, insensitive to α-amanitin, poorly conserved in sequence, and regulated by upstream repeated motifs (34-37). RNA pol I is responsible for the transcription of the rRNA units (6) in Leishmania and, again unusually, variant surface antigen-coding genes in T. brucei, a singular example of transcription of protein-coding genes by RNA pol I. Transcription of tRNAs and small RNAs seems to be driven by RNA pol III promoters (6). Another atypical feature of the Leishmania (and Trypanosomatids in general) genome is the almost complete absence of introns. Very few examples of cis-spliced genes have been found thus far in these organisms (19, 38). Over 40 examples of sequences for which there is strong evidence of horizontal transfer from bacteria were found in the L. major genome. Some of them are exclusive of Leishmania, whereas many others are shared with Trypanosoma species. Control of Gene Expression in Leishmania During its life cycle, Leishmania alternates between the extracellular promastigote that multiplies in the digestive tract of the insect vector and the intracellular amastigote, a parasite of macrophages in the vertebrate host (http://www.dpd.cdc.gov/DPDx/HTML/Leishmaniasis.htm). After colonization and multiplication of procyclic promastigotes in the insect gut and before the inoculation of infective parasites with the next blood meal, promastigotes undergo a developmental differentiation process known as metacyclogenesis (39). Procyclic and metacyclic promastigotes and amastigotes have the ability to differentially express gene products, as exemplified by numerous genes and proteins already characterized (40-48). Although in most organisms a first and decisive level of gene expression control is given by the transcription rate of a particular sequence, this is clearly not the case in Leishmania, as discussed above. To increase transcription, Leishmania parasites rely on duplication or amplification of gene sequences, and these organisms reveal a striking genome plasticity, being able to respond to pressure by changing the ploidy locally or in the whole genome (49-52). Consequently, other than amplifying the whole genome, Leishmania has to rely on posttranscriptional mechanisms to control gene expression. One of the consequences of polycistronic transcription is the lack of capping at the 5' end of discrete gene transcripts, making them unstable. Stability of mRNAs is achieved in these organisms by trans-splicing. This type of RNA processing was first described in Trypanosomatids and later demonstrated also in other organisms. The trans-splicing machinery is responsible for transferring a capped small RNA, known as spliced-leader (SL), to the 5' end of almost all mRNAs (53, 54). The acceptor site for the SL RNA is generally an AG dinucleotide preceded by a pyrimidine-reach sequence upstream from the open reading frame (ORF) by 100-250 nt. The polypyrimidine tract is also responsible for directing polyadenylation of the upstream mRNA (18, 53, 54). The rate of splicing and maturation of RNA is, therefore, one of the levels for the control of gene expression in Leishmania. Examples of alternative splicing of transcripts have been described in T. cruzi and T. brucei (55, 56). Leishmania Genomics: Where Do We Stand? 5 On the other hand, the control of RNA degradation has been shown to play a part in the regulation of gene expression. In several cases, the presence of particular motifs in the 3'-untranslated region (3'UTR) has been shown to be essential for regulating the mRNA half-life (43, 57-59). Strong evidence of the requirement for de novo protein synthesis to occur indicates that degradation or stabilization of an mRNA takes place through protein interaction with the 3'UTR sequence (60), and accordingly, several genes for proteins with RNA-binding motifs were identified. Regulatory sequences in the 3'UTR can also regulate translation, as has been shown recently for amastin and heat shock proteins 70 and 83 (61-63). RNA editing represents still another mechanism of gene expression control. Primary transcripts for several mitochondrial genes do not encode functional ORFs, and the explanation for that eluded many scientists until the demonstration by Benne et al. (1) that primary transcripts were modified by the addition or deletion of uridines, providing for a functional ORF for several mitochondrial enzymes. The editing phenomena was later shown to be based on small complementary RNA molecules encoded by the kinetoplast DNA minicircles (64, 65). These small RNAs were called guide RNAs (gRNA). Experimental Approaches for Gene Function Studies: Reverse and Forward Genetics Leishmania genetic features did not allow studies using classical genetic approaches. Therefore, until the late 1980s, gene function was inferred from similarity to genes from other organisms or expression in heterologous systems. The breakthrough for Trypanosomatids’ reverse genetics started with the transient expression of reporter genes in Leptomonas spp. (66). Shortly after, appropriate conditions for transient and stable transfections were established for Leishmania (67-69). Obligatory features of vectors for transient and stable transfections were determined (18, 29, 70). In both cases, promoterless circular plasmids gave readily detectable expression in Leishmania. The expression is dependent on the presence of the polypyrimidine track and acceptor site (a dinucleotide AG) for trans-splicing positioned upstream to a reporter gene, in the case of transient expression, or to a selectable marker in vectors designed for stable transfection. Several vectors, reporter genes, and positive or negative selectable markers are now available (71). Downstream to the gene to be expressed, a 3’UTR from a resident gene is added to ensure the proper polyA tail addition. Positional cloning between the 5' and 3' UTRs, as described above, attains transcription of any given gene. Transient transfection vectors may be designed to contain an endogenous promoter for RNA pol I genes or a heterologous one, such as phage T7 promoter, which drives transcription more efficiently (34, 71, 72). Data now available on 3'UTR sequence elements that are able to regulate the stability and translation efficiency of RNA (72, 73) may be of help for the future development of vectors designed to achieve defined patterns of developmental expression, which will represent another relevant instrument for reverse genetics. Long genomic fragments cloned into cosmid vectors have been used for expression of several genes at once, becoming an invaluable tool for functional complementation studies. Cosmid genomic libraries are very useful to rescue phenotypes and unravel the function of novel genes (74, 75). Functional studies carried out with such recombinants have offered evidence that in episomes, transcription happens from both strands, and no weakening attributable to the position of the gene within an insert was observed (76). For technical reasons, transient transfection has limited applications: it demands high amounts of plasmid DNA for detection of the reporter gene expression, and the attainable levels do not allow subcellular localization of the exogenous protein. Its main use is restricted to searches for sequence elements involved in gene regulation. Therefore, permanently transformed cell lines are the preferred approach for most of the transfection applications. Expression of genes present in the episomes can be enhanced by increasing the concentration of selective drug, which leads to amplification of the plasmid copy number and to higher levels of transcription. This is a very convenient approach because it allows tuning up the level of drug pressure and overexpression of 6 Bioinformatics in Tropical Disease Research the gene of interest in the episome, making possible a correlation between the gene product and a given phenotype. Nevertheless, it is also well established that the absence of drug pressure will lead to a loss of circular extrachromosomal molecules, and it is true that after a few serial passages, the original clone is turned into a mixed population of cells, with and without the episome. The stability of the exogenous gene can be achieved by its integration into the genome. To promote integration of DNA into the Leishmania genome, a linear fragment with non-cohesive ends must be transfected. The sine qua non condition for insertion is the presence of sequence homology between the transfected fragment and the integration site in the genome (77). The high frequency and efficiency of homologous recombination are used to integrate an exogenous gene to be overexpressed in the rDNA locus. This tactic places the gene under the control of the only known strong promoter, which drives constitutive expression in Leishmania. Transcription powered by RNA pol I in such an arrangement not only leads to overexpression of the gene in higher levels than obtained by RNA pol II-driven expression of episomal genes but brings the advantage of stabilization of the overexpressed gene (78). More importantly, the high frequency and efficiency of homologous recombination made it possible to knock out genes using linear fragments designed to replace a resident gene with a selectable marker (77). Gene replacement happens after a double crossing-over event between homologous sequences flanking the target gene and the selectable marker. Relevant technical features to obtain genome insertion with high efficiency have been explored by different groups (77, 79). There are no sound reports of non-homologous integration in the Leishmania genome. The parasite cell offers favorable homologous recombination machinery but, as mentioned above, Leishmania is mostly diploid and has no manipulatable sexual cycle, making the knock-out of a given gene more difficult. Two rounds of transfection and two different selectable markers are the minimal requirements to obtain a null mutant for a given locus (80). Targeting occurs with comparable efficiencies at both steps and independently of the order of the selectable markers used in each transfection. Several genes have been knocked out using this two-step strategy since 1990, and it has even been shown that long stretches of genomic sequences (40 kb) can be deleted by homologous replacement (81, 82). This finding shows that gene clusters with several copies can be targeted at once. Interestingly, an unsuccessful double replacement of a given gene is now taken as indicative of a gene essential for the parasite’s survival. Several reports on different targets demonstrate that the parasite uses extreme genetic means to avoid the loss of an essential gene. These reports were also very important to reveal that ploidy in Leishmania is anything but strict: alterations of chromosome number either by aneuploidy or tetraploidy are recurrent findings (49, 52). Such a limitation for the study of essential genes has been overcome by alternative approaches in trypanosomes but not in Leishmania. Generation of conditional knock-outs using a tetracyclineinducible system to fine-tune the level of expression of an essential gene has been successfully established in T. brucei (83), but it is not fully operational in Leishmania (84). An efficient inducible system for functional studies of genes that cannot be knocked out would be a remarkably useful tool for reverse genetics in Leishmania. Gene silencing or RNAi technology as a means to knock down genes is another extremely important technical advance for functional genetic studies that is booming in T. brucei but missing in Leishmania. In African trypanosomes, a stem-loop or a double-stranded RNA can be transfected or directly introduced in the parasite and can be used in conjunction with the tet repressor system to knock down any given gene transiently. This approach allows the quick investigation of several genes in parallel and has been widely used and shown to be the best choice for functional genomics in several organisms including T. brucei. Leishmania Genomics: Where Do We Stand? 7 Global Approaches for Functional Analyses of the Parasite Genome: Transcriptome and Proteome Before the completion of the genome, investigators devoted to functional studies in Leishmania had to use "one gene at a time" strategies, applying the available reverse and forward genetics technology. There is now an urgent need for global approaches to functionally tackle the multitude of data available. The completion of the Tri-Tryp sequencing represents a landmark in the study of these human pathogens. The analyses of the genome sequence and comparative genomics has already generated an enormous amount of information. But within the available data lies an enormous number of unanswered questions. For example, of the 8,370 L. major predicted proteincoding genes, about 10% seem to be unique, having no orthologs in Trypanosoma or other organisms (19, 22) (http://www.genedb.org/genedb/leish/index.jsp). The function of these unknown genes, as well as the understanding of the physiology of these parasites, will depend on new technical developments, some of which are already in use for the study of the Leishmania genome as well as other genomes. Possible approaches to large-scale functional studies involve the characterization of the complete set of RNA molecules (known as the transcriptome), of the complete set of proteins (the proteome), or of the complete set of low molecular weight metabolic intermediates (the metabolome), as well as the generation of genome-wide, single-gene replacements. Examples of the study of a full genome aimed at identifying and characterizing the whole set of genes are already available. Whole-genome analysis by single-gene replacements has been accomplished in Saccharomyces cerevisiae by the Yeast Deletion Program (85, 86). Phenotypes generated by each gene deletion were analyzed by assays of growth fitness under different culture conditions and by visual examination of cell shape and size. Some problems became evident after that initiative: the redundancy hiding effects of related genes, the presence of essential genes (about 14% in S. cerevisiae), and the lack of detectable phenotype after deletion of some genes. The refinement of the available tools for phenotype analysis becomes then an obvious necessity when we think about global functional analysis of parasite genes. Tools for a functional study of the Leishmania genome through a similar type of approach were developed. Transposable elements have been adapted to Leishmania and could be used to generate random disrupted mutants (87-89). Drawbacks of this approach are the diploidy, restricting the observation of a mutant phenotype to the cases where gene dosage is sufficient to produce a change, and the fact that mutants may not display a “visible” phenotype to the methods of observation feasible at the moment. Any functional approach will have to take into account several variables, and the developmental stage, physiological state, growth conditions, etc. will influence the set of expressed products in a given cell. The study of a response to a particular stimulus is actually a valuable tool to understand the physiology of the cell. The study of transcriptomes has been applied to several parasites (90-92) and innumerous cell types in complex organisms. It is a powerful approach and can generate a large amount of data. Several comparative studies of Leishmania transcripts have been undertaken using microarrays of random genomic sequences, ESTs, and oligonucleotides. Even before the completion of the L. major genome, microarrays were already in use to evaluate changes among the different developmental stages. The first analyses were based on PCR amplification of inserts of promastigote- or amastigote-derived L. major cDNA libraries. Transcripts purified from L. major promastigotes or lesion-amastigotes tested on that array detected about 15% of the uniquely spotted sequences as up- or downregulated in amastigotes (93). Another array built with random genomic sequences estimated to cover about 80% of the genome (14, 88) was probed for differential expression during metacyclogenesis. About 15% of the sequences in the array showed changes in abundance by a factor of 1.5 or more (94). 8 Bioinformatics in Tropical Disease Research These early estimates of transcript regulation in Leishmania were not in agreement with data obtained on microarray analyses of steady-state RNA in T. brucei (95, 96), which estimated that 2% of trypanosome genes showed developmental regulation at the mRNA level. Akopyants et al. (97) showed, with the same sequences used in the random-genomic array mentioned above, that a change in abundance of transcripts was observed in only 1-3% of the spotted sequences. These results were validated through Northern blotting, indicating that microarrays could be used for analysis of the Leishmania transcriptome. The discrepancy with the previous studies may be attributed, first, to the source of sequences in some arrays. The EST libraries were probably enriched for highly expressed genes, and that may represent a strong bias in the analysis of the microarray. On the other hand, more stringent criteria used for defining the cut-off for differential expression might explain the differences obtained with the genomic derived microarray. Holzer et al. (98) used whole-genome high-density L. major oligonucleotide microarrays to analyze RNA isolated from L. mexicana promastigotes, lesion-derived amastigotes, and axenic amastigotes. In this study, several new steps were taken: an oligonucleotide array was built based on the whole set of predicted ORFs in the L. major genome and assayed against another Leishmania species, showing that the degree of sequence conservation in the translated sequences is high enough to allow cross-species analyses to be performed. Furthermore, a comparison of axenic cultured amastigotes and lesion-derived amastigotes was conducted. Conclusions were in agreement with most of the previously reported data: 3.5% of all genes in the Leishmania genome showed differentially regulated, steady-state mRNA levels. Most of the regulated transcripts are promastigote enriched. Very few amastigote-upregulated molecules were detected. Interestingly, axenic amastigotes behaved much more like promastigotes than lesion-derived amastigotes, sharing with the first most of the upregulated sequences. Thus, these authors have confirmed that regulatory control in Leishmania is exerted mainly downstream of mRNA concentration. These findings were not unexpected given the lack of transcription control. They also indicated that post-transcriptional regulation at the RNA level is not an exceptionally important player in the control of gene expression in this case. It must be stressed, at this point, that obtaining a large number of lesion-amastigotes for functional studies becomes a very serious problem in some circumstances. For some Leishmania species, tissue-amastigotes are very scarce indeed, and it is hard to envisage methods for obtaining enough cells to purify large amounts of RNA or proteins. Even if differential gene expression based on abundance of transcripts was more common in Leishmania, there is often a poor correlation between the transcriptome and levels of translated proteins as shown in other systems (99, 100). It is then understandable why efforts were turned to the study of the final products of the cell machinery. The term proteome was used initially in 1995 to describe the total protein complement expressed by a genome (101, 102). The basic study of proteomes depends on the use of two main techniques: two-dimensional electrophoretic separation of proteins and mass spectrometry. Two-dimensional electrophoresis (2-DE) of proteins, initially described by O’Farrell (103) has gained several technical refinements recently, making it more reliable and reproducible. However, technical problems still exist, mainly because of poor sensitivity of the method and lack of reproducibility, and to poor hydrophilicity and extreme pIs of some proteins. Mass spectrometry is used for the identification of all spots detected in a 2D-gel or of a selection based on several criteria (abundance, differences between developmental stages, metabolic labeling, etc.). Two major improvements on the technique made it more amenable to protein analysis: matrix-assisted laser desorption and ionization (MALDI) (104, 105) and electrospray ionization (106, 107). A peptide-mass fingerprint obtained by MALDI-TOF analysis of trypsin-digested proteins can then be compared with full-translated genome data. Furthermore, peptide sequence tags can be obtained from the same peptide mixture by subjecting it to tandem MS. Leishmania Genomics: Where Do We Stand? 9 Proteome analyses of Leishmania species have been applied to study the developmental differentiation. Early reports on the L. infantum proteome identified over 2,000 spots (of the approximately 8,300 genes predicted in the genome) in 2-DE. The comparison between promastigote and axenic amastigote patterns revealed about 3% of differentially expressed spots (108). Similar results were observed with L. donovani (5%) or L. mexicana (7%) axenic amastigotes (109, 110). The number of differentially expressed proteins in these studies is very low, and at this point it is not possible to say whether this is attributable to a very stable expression pattern and a low number of differentially expressed proteins or to technical difficulties. Alternatively, as has been shown for the transcription pattern, axenic amastigotes may not represent a reliable tool to evaluate the physiology of lesion amastigotes. A high resolution 2-DE separation of the proteome of L. major promastigotes (with about 3,700 spots) was obtained and used to study mutant lines displaying drug resistance to methotrexate or overexpressing trypanothione reductase (111). This study allowed confirmation of drug resistance mechanisms and showed trypanothione to be post-translationally modified. Large-scale identification of protein spots observed by 2-DE has not been accomplished for Leishmania proteins thus far but, applied to comparative proteomic analysis of T. cruzi differentiation, has uncovered interesting information on the energy sources used in different life stages (112). One interesting application of proteome analyses will come from the study of isolated cellular fractions or organelles. For example, the T. brucei flagellum proteome, with 331 components, has been characterized (113). Several proteins identified in this study were subjected to functional ablation analysis through RNAi, bringing further insights into the flagellum physiology, especially in the bloodstream stage of T. brucei. Methods for cell disruption and subcellular fractionation of T. brucei have been described (114) and successfully applied to fractionate Leishmania promastigotes (115). Direct identification of proteins in organelles may represent a powerful tool to be applied to functional studies. Current proteome investigations are focused, as exemplified above, on detecting changes in the abundance of certain proteins as a response to a given status or stimulus. Consistent evaluations of quantitative changes in abundance are not easily obtained with classical proteomic techniques. One approach developed to perform fine quantitative analysis of protein components in a complex mixture is the differential protein labeling using reagents such as isotope-coded affinity tags (ICAT) (100, 116). This technique has been already applied to a subset of T. cruzi proteins (117) and has been used to analyze differential protein expression in other infectious agents (118, 119). Progress on methodologies to address large-scale functional proteomics may greatly facilitate our understanding of these parasites. Techniques to study the interaction proteome, transient or stable post-translational modifications, and subcellular trafficking are beginning to appear in large-scale applications for the study of other organisms and will have to be adapted according to the particularities of these parasites. Comparative Genomics of Trypanosomatids: Learning from Inter- and Intrageneric Analyses The recent availability of the Tri-Tryp genomes is a marker in the knowledge about the biology of these three pathogens. Whereas analysis of a single genome provides remarkable biological insights on any particular organism, comparative analysis of multiple genomes offers considerably more information. It expands our ability to understand the genetic and evolutionary bases of the shared and distinct parasitic modes and lifestyles and better assign putative function to predicted coding sequences. Comparison of gene content and genome architecture, composition, and organization of protein domains encoded by each of the genomes of L. major, T. cruzi, and T. brucei was very informative. In spite of the marked 10 Bioinformatics in Tropical Disease Research differences regarding their life cycles and pathogenesis, these parasites share about 6,200 genes, which are distributed in large syntenic, polycistronic gene clusters. Comparison of the predicted protein sequences within each of the three genomes revealed that the number of conserved proteins is higher between the two intracellular parasites L. major and T. cruzi and lower when comparison is made between one of them and T. brucei. Twelve percent of the Leishmania proteome is composed of species-specific members against 32 and 26% of T. cruzi and T. brucei, respectively. Because most of the species-specific proteins are annotated as members of surface antigen families, it is possible to relate the differences in absolute number of unique sequences to strategies of survival and immune evasion of each Trypanosomatid (120). Despite their evolutionary distance, the genomes of these three organisms are highly syntenic. The comparison of T. brucei and L. major genomes shows that 68 and 75% of their genes, respectively, remain in the same genomic context; 110 blocks of synteny span 30.7 Mb of the Leishmania genome. The strand-switch regions separating DGCs are frequently the spot for synteny breakpoints in Leishmania and T. brucei. Many of the species-specific genes are found in nonsyntenic internal or subtelomeric regions of the chromosomes from each parasite. Furthermore, in these regions also reside retroelements and structural RNAs, and gene family expansions take place. Despite the reported conservation, comparative analyses indicate the presence of gene divergence, sequence acquisition, and loss (less frequent than gene insertion), or rearrangement within syntenic regions of chromosomes (22). The L. major subtelomeric regions are shorter (<20 kb) than the trypanosome ones, with relatively few repetitive sequences. Nevertheless, accumulating evidence indicates that recombination events take place at these sites and may be involved in genetic divergence and gene function gain (19) (L. Brito and A. K. Cruz, unpublished results). Data accumulated over the years on comparative studies of particular Leishmania genes have already provided some clues about what is to be expected from genomic large-scale comparative analysis. For example, the structure and genomic organization of the major surface protease (MSP) genes in several species of Leishmania were characterized (reviewed by Yao et al. (121)). Several species have multiple copies of the MSP genes organized in tandem. All MSP genes encode similar amino acid sequences. Differences among the different copies in one species are mainly located in the C-terminal coding regions and their 3'UTRs. The comparison between related copies in different species revealed a high conservation of the coding sequence and lesser degrees of similarity in the untranslated regions. The same was observed in other loci, such as the META cluster (122). Taking those studies into account, we might expect to find single nucleotide changes in the coding regions and differences in the untranslated region that might be related to regulation of expression of these sequences. On the other hand, searching for sequence conservation on non-coding regions of the genomes of three Leishmania species may be a route to find elements of functional relevance for gene expression regulation. Comparative analysis of L. major, L. infantum, and L. braziliensis genomes revealed conservation of overall gene content and genome architecture. Large-scale synteny results of these comparative analyses indicate that there are no synteny breaks among the compared genomes. Despite conservation of genomic context, there are differences (about 200 species-specific genes) that most likely reflect specific adaptations to distinct speciesspecific selection pressures. Furthermore, species-specific genes identified thus far are not clusterized (21). The Central Role of Bioinformatics to Put Forward Functional Genomics in Leishmania The recent success of the international scientific community in decoding the genetic blueprint of the entire genome of Tri-Tryps (19, 22) has lead to the post-genomics era, where there is a need for an intellectual fusion of biomedicine and information technology. Leishmania Genomics: Where Do We Stand? 11 Bioinformatics plays a crucial role in data manipulation, data curation, and knowledge extraction, thus bridging the gap between disparate information sources for subsequent biological model building, refinement, and validation. The post-genome challenge, particularly in the case of these unicellular human pathogens, is to translate new information about genes, their control pathways, proteins, and their interactions into improved healthcare. In the field of structural annotation, a term that refers to the identification of putative genes in DNA sequence, the discovery of protein-coding genes has been pursued based on the development of several bioinformatics approaches and methodologies (for reviews, see Fickett (123), Burge and Karlin (124), Claverie (125), and Worthey and Myler (126)). Each one of them exhibits performance trade-offs, increasing sensitivity (ability to detect true positives) and increasing selectivity (ability to exclude false positives), and as a general rule, algorithms for finding protein-coding genes can be classified into two big categories: similarity based (extrinsic methods) and algorithm based (intrinsic methods) (127). The similarity-based category relies on similaritybased algorithms using localized alignments of query sequences against a set of public domain databases, whereas the algorithm-based category, also termed ab initio approaches, relies upon the evaluation of compositional properties of exons, introns, and other features without explicit reference to other sequences. As a general rule and because the use of multiple lines of evidence lead to the increase of sensitivity and selectivity, gene prediction often uses a combination of approaches, including both intrinsic and extrinsic methods. Particularly regarding L. major, a combination of CodonUsage (128), GeneScan (129), TestCode (130), and Glimmer (131), together with the manual inspection of the gene calls by curators, has led to a gene identification rate greater than 95%. It’s worth mentioning that ~70% of the genes have no significant similarity to existing genes in sequence databases. On the other hand, the in silico identification of non-protein-coding genes is a less advanced discipline, mostly because of the diversity associated with these sequences. Therefore, efforts have been focused on comparative analyses of completed genomes available to date. The parasitic Trypanosomatids have been studied extensively at the molecular and biochemical levels, through classical biochemistry and molecular biology. Furthermore, several unusual aspects of biochemical physiology are shared between trypanosomatids but are absent from cells in their mammalian hosts. Some examples include the thiol metabolism, mechanisms to maintain a reducing intracellular redox milieu, and folate metabolism. Nevertheless, even with some potentially good drug targets that have been identified, safe, effective, and affordable medicines for the treatment of leishmaniasis are missing (132). Therefore, exploitation of differences between the genomes of the parasite and its vertebrate host is important and may lead to innovative routes for drug target discovery. There is a current initiative of the UNICEF-UNDP-World Bank-WHO Special Programme for Research and Training in Tropical Disease/World Health Organization (http://www.who.int/tdr/ grants/workplans/gdr.htm) to bring together the pharmaceutical industry with their combinatorial libraries of lead compounds, medicinal chemistry, and genome data-mining to find new drug targets and the corresponding inhibitory molecules acting selectively against the parasites. Although some are optimistic about the development of a single drug useful against all Trypanosomatids, the profound differences in the ways each parasite interacts with the host means that diverse pharmacological criteria need to be met in developing novel drugs. Thus, a single anti-trypanosomatid “cure-all” is unlikely. Bioinformatics tools, learned and developed as a function of the Trypanosomatids genome project, are now being applied to comparative genomics of Leishmania. Understanding why different Leishmania species cause such diverse diseases has been an elusive and attractive target for many parasitologists. Comparative genomics may provide new clues to understanding the virulence mechanisms of various species and may turn the present course of controlling and treating leishmaniasis into a major step forward. 12 Bioinformatics in Tropical Disease Research References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. Benne R, Van den Burg J.et al. Major transcript of the frameshifted coxII gene from trypanosome mitochondria contains four nucleotides that are not encoded in the DNA. Cell. 1986;46(6):819–826. PubMed PMID: 3019552. Opperdoes FR, Borst P. Localization of nine glycolytic enzymes in a microbody-like organelle in Trypanosoma brucei: the glycosome. FEBS Lett. 1977;80(2):360–364. PubMed PMID: 142663. Johnson PJ, Kooter JM.et al. Inactivation of transcription by UV irradiation of T. brucei provides evidence for a multicistronic transcription unit including a VSG gene. Cell. 1987;51(2):273–281. PubMed PMID: 3664637. Boothroyd JC, Cross GA. Transcripts coding for variant surface glycoproteins of Trypanosoma brucei have a short, identical exon at their 5' end. Gene. 1982;20(2):281–289. PubMed PMID: 7166234. Walder JA, Eder PS.et al. The 35-nucleotide spliced leader sequence is common to all trypanosome messenger RNA's. Science. 1986;233(4763):569–571. PubMed PMID: 3523758. Martinez-Calvillo S, Nguyen D.et al. Transcription initiation and termination on Leishmania major chromosome 3. Eukaryot Cell. 2004;3(2):506–517. PubMed PMID: 15075279. Martinez-Calvillo S, Yan S.et al. Transcription of Leishmania major Friedlin chromosome 1 initiates in both directions within a single region. Mol Cell. 2003;11(5):1291–1299. PubMed PMID: 12769852. Wincker P, Ravel C.et al. The Leishmania genome comprises 36 chromosomes conserved across widely divergent human pathogenic species. Nucleic Acids Res. 1996;24(9):1688–1694. PubMed PMID: 8649987. Koonin EV, Galperin MY. Prokaryotic genomes: the emerging paradigm of genome-based microbiology. Curr Opin Genet Dev. 1997;7(6):757–763. PubMed PMID: 9468784. Ravel C, Wincker P.et al. Medium-range restriction maps of five chromosomes of Leishmania infantum and localization of size-variable regions. Genomics. 1996;35(3):509–516. PubMed PMID: 8812485. Britto C, Ravel C.et al. Conserved linkage groups associated with large-scale chromosomal rearrangements between Old World and New World Leishmania genomes. Gene. 1998;222(1):107–117. PubMed PMID: 9813266. Ivens AC, Lewis SM.et al. A physical map of the Leishmania major Friedlin genome. Genome Res. 1998;8(2):135–145. PubMed PMID: 9477341. Ivens AC, Blackwell JM. The Leishmania genome comes of Age. Parasitol Today. 1999;15(6):225–231. PubMed PMID: 10366828. Akopyants NS, Clifton SW.et al. A survey of the Leishmania major Friedlin strain V1 genome by shotgun sequencing: a resource for DNA microarrays and expression profiling. Mol Biochem Parasitol. 2001;113(2): 337–340. PubMed PMID: 11295190. El-Sayed NM, Donelson JE. A survey of the Trypanosoma brucei rhodesiense genome using shotgun sequencing. Mol Biochem Parasitol. 1997;84(2):167–178. PubMed PMID: 9084037. Myler PJ, Audleman L.et al. Leishmania major Friedlin chromosome 1 has an unusual distribution of protein-coding genes. Proc Natl Acad Sci U S A. 1999;96(6):2902–2906. PubMed PMID: 10077609. Aly R, Argaman M.et al. A regulatory role for the 5' and 3' untranslated regions in differential expression of hsp83 in Leishmania. Nucleic Acids Res. 1994;22(15):2922–2929. PubMed PMID: 8065903. LeBowitz JH, Smith HQ.et al. Coupling of poly(A) site selection and trans-splicing in Leishmania. Genes Dev. 1993;7(6):996–1007. PubMed PMID: 8504937. Ivens AC, Peacock CS.et al. The genome of the kinetoplastid parasite, Leishmania major. Science. 2005;309(5733):436–442. PubMed PMID: 16020728. C. elegans Sequencing Consortium Genome sequence of the nematode C. elegans: a platform for investigating biology. The C. elegans Sequencing Consortium. Science. 1998;282(5396):2012–2018. PubMed PMID: 9851916. Peacock CS, Seeger K.et al. Comparative genomic analysis of three Leishmania species that cause diverse human disease". Nat Genet. 2007;39(7):839–847. PubMed PMID: 17572675. Leishmania Genomics: Where Do We Stand? 13 22. El-Sayed NM, Myler PJ.et al. The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease. Science. 2005;309(5733):409–415. PubMed PMID: 16020725. 23. Bringaud F, Ghedin E.et al. Evolution of non-LTR retrotransposons in the trypanosomatid genomes: Leishmania major has lost the active elements. Mol Biochem Parasitol. 2006;145(2):158–170. PubMed PMID: 16257065. 24. Wickstead B, Ersfeld K.et al. Repetitive elements in genomes of parasitic protozoa. Microbiol Mol Biol Rev. 2003;67(3):360–375. PubMed PMID: 12966140. 25. Sunkin SM, Kiser P.et al. The size difference between Leishmania major friedlin chromosome one homologues is localized to sub-telomeric repeats at one chromosomal end. Mol Biochem Parasitol. 2000;109(1):1–15. PubMed PMID: 10924752. 26. Fu G, Barker DC. Characterisation of Leishmania telomeres reveals unusual telomeric repeats and conserved telomere-associated sequence. Nucleic Acids Res. 1998;26(9):2161–2167. PubMed PMID: 9547275. 27. Pedrosa AL, Ruiz JC.et al. Characterisation of three chromosomal ends of Leishmania major reveals transcriptional activity across arrays of reiterated and unique sequences. Mol Biochem Parasitol. 2001;114(1):71–80. PubMed PMID: 11356515. 28. Tosato V, Ciarloni L.et al. Secondary DNA structure analysis of the coding strand switch regions of five Leishmania major Friedlin chromosomes. Curr Genet. 2001;40(3):186–194. PubMed PMID: 11727994. 29. Curotto de Lafaille MA, Laban A.et al. Gene expression in Leishmania: analysis of essential 5' DNA sequences. Proc Natl Acad Sci U S A. 1992;89(7):2703–2707. PubMed PMID: 1557376. 30. Monnerat S, Martinez-Calvillo S.et al. Genomic organization and gene expression in a chromosomal region of Leishmania major. Mol Biochem Parasitol. 2004;134(2):233–243. PubMed PMID: 15003843. 31. Gilinger G, Bellofatto V. Trypanosome spliced leader RNA genes contain the first identified RNA polymerase II gene promoter in these organisms. Nucleic Acids Res. 2001;29(7):1556–1564. PubMed PMID: 11266558. 32. Luo H, Gilinger G.et al. Transcription initiation at the TATA-less spliced leader RNA gene promoter requires at least two DNA-binding proteins and a tripartite architecture that includes an initiator element. J Biol Chem. 1999;274(45):31947–31954. PubMed PMID: 10542223. 33. Thomas S, Westenberger SJ.et al. Intragenomic spliced leader RNA array analysis of kinetoplastids reveals unexpected transcribed region diversity in Trypanosoma cruzi. Gene. 2005;352:100–108. PubMed PMID: 15925459. 34. Gay LS, Wilson ME.et al. The promoter for the ribosomal RNA genes of Leishmania chagasi. Mol Biochem Parasitol. 1996;77(2):193–200. PubMed PMID: 8813665. 35. Martinez-Calvillo S, Sunkin SM.et al. Genomic organization and functional characterization of the Leishmania major Friedlin ribosomal RNA gene locus. Mol Biochem Parasitol. 2001;116(2):147–157. PubMed PMID: 11522348. 36. Uliana SR, Fischer W.et al. Structural and functional characterization of the Leishmania amazonensis ribosomal RNA promoter. Mol Biochem Parasitol. 1996;76(1-2):245–255. PubMed PMID: 8920010. 37. Yan S, Lodes MJ.et al. Characterization of the Leishmania donovani ribosomal RNA promoter. Mol Biochem Parasitol. 1999;103(2):197–210. PubMed PMID: 10551363. 38. Mair G, Shi H.et al. A new twist in trypanosome RNA metabolism: cis-splicing of pre-mRNA. RNA. 2000;6(2):163–169. PubMed PMID: 10688355. 39. Sacks DL, Perkins PV. Identification of an infective stage of Leishmania promastigotes. Science. 1984;223(4643):1417–1419. PubMed PMID: 6701528. 40. Bellatin JA, Murray AS.et al. Leishmania mexicana: identification of genes that are preferentially expressed in amastigotes. Exp Parasitol. 2002;100(1):44–53. PubMed PMID: 11971653. 41. Boucher N, Wu Y.et al. A common mechanism of stage-regulated gene expression in Leishmania mediated by a conserved 3'-untranslated region element. J Biol Chem. 2002;277(22):19511–19520. PubMed PMID: 11912202. 14 Bioinformatics in Tropical Disease Research 42. Brooks DR, Denise H.et al. The stage-regulated expression of Leishmania mexicana CPB cysteine proteases is mediated by an intercistronic sequence element. J Biol Chem. 2001;276(50):47061–47069. PubMed PMID: 11592967. 43. Charest H, Zhang WW.et al. The developmental expression of Leishmania donovani A2 amastigote-specific genes is post-transcriptionally mediated and involves elements located in the 3'-untranslated region. J Biol Chem. 1996;271(29):17081–17090. PubMed PMID: 8663340. 44. Flinn HM, Smith DF. Genomic organisation and expression of a differentially-regulated gene family from Leishmania major. Nucleic Acids Res. 1992;20(4):755–762. PubMed PMID: 1371863. 45. Kelly BL, Nelson TN.et al. Stage-specific expression in Leishmania conferred by 3' untranslated regions of L. major leishmanolysin genes (GP63). Mol Biochem Parasitol. 2001;116(1):101–104. PubMed PMID: 11463473. 46. Moore LL, Santrich C.et al. Stage-specific expression of the Leishmania mexicana paraflagellar rod protein PFR-2. Mol Biochem Parasitol. 1996;80(2):125–135. PubMed PMID: 8892290. 47. Rochette A, McNicoll F.et al. Characterization and developmental gene regulation of a large gene family encoding amastin surface proteins in Leishmania spp. Mol Biochem Parasitol. 2005;140(2):205–220. PubMed PMID: 15760660. 48. Wang Y, Dimitrov K.et al. Stage-specific activity of the Leishmania major CRK3 kinase and functional rescue of a Schizosaccharomyces pombe cdc2 mutant. Mol Biochem Parasitol. 1998;96(1-2):139–150. PubMed PMID: 9851613. 49. Cruz AK, Titus R.et al. Plasticity in chromosome number and testing of essential genes in Leishmania by targeting. Proc Natl Acad Sci U S A. 1993;90(4):1599–1603. PubMed PMID: 8381972. 50. Martinez-Calvillo S, Stuart K.et al. Ploidy changes associated with disruption of two adjacent genes on Leishmania major chromosome 1. Int J Parasitol. 2005;35(4):419–429. PubMed PMID: 15777918. 51. Mottram JC, Coombs GH. Leishmania mexicana: subcellular distribution of enzymes in amastigotes and promastigotes. Exp Parasitol. 1985;59(3):265–274. PubMed PMID: 3158538. 52. Uliana SR, Goyal N.et al. Leishmania: overexpression and comparative structural analysis of the stageregulated meta 1 gene. Exp Parasitol. 1999;92(3):183–191. PubMed PMID: 10403759. 53. Matthews KR, Tschudi C.et al. A common pyrimidine-rich motif governs trans-splicing and polyadenylation of tubulin polycistronic pre-mRNA in trypanosomes. Genes Dev. 1994;8(4):491–501. PubMed PMID: 7907303. 54. Ullu E, Matthews KR.et al. Temporal order of RNA-processing reactions in trypanosomes: rapid trans splicing precedes polyadenylation of newly synthesized tubulin transcripts. Mol Cell Biol. 1993;13(1):720– 725. PubMed PMID: 8417363. 55. Manning-Cela R, Gonzalez A.et al. Alternative splicing of LYT1 transcripts in Trypanosoma cruzi. Infect Immun. 2002;70(8):4726–4728. PubMed PMID: 12117992. 56. Vassella E, Braun R.et al. Control of polyadenylation and alternative splicing of transcripts from adjacent genes in a procyclin expression site: a dual role for polypyrimidine tracts in trypanosomes? Nucleic Acids Res. 1994;22(8):1359–1364. PubMed PMID: 8190625. 57. Beetham JK, Myung KS.et al. Glycoprotein 46 mRNA abundance is post-transcriptionally regulated during development of Leishmania chagasi promastigotes to an infectious form. J Biol Chem. 1997;272(28):17360– 17366. PubMed PMID: 9211875. 58. Brittingham A, Miller MA.et al. Regulation of GP63 mRNA stability in promastigotes of virulent and attenuated Leishmania chagasi. Mol Biochem Parasitol. 2001;112(1):51–59. PubMed PMID: 11166386. 59. Quijada L, Soto M.et al. Identification of a putative regulatory element in the 3'-untranslated region that controls expression of HSP70 in Leishmania infantum. Mol Biochem Parasitol. 2000;110(1):79–91. PubMed PMID: 10989147. 60. Wilson ME, Paetz KE.et al. The effect of ongoing protein synthesis on the steady state levels of Gp63 RNAs in Leishmania chagasi. J Biol Chem. 1993;268(21):15731–15736. PubMed PMID: 8340397. Leishmania Genomics: Where Do We Stand? 15 61. Folgueira C, Quijada L.et al. The translational efficiencies of the two Leishmania infantum HSP70 mRNAs, differing in their 3'-untranslated regions, are affected by shifts in the temperature of growth through different mechanisms. J Biol Chem. 2005;280(42):35172–35183. PubMed PMID: 16105831. 62. Larreta R, Soto M.et al. The expression of HSP83 genes in Leishmania infantum is affected by temperature and by stage-differentiation and is regulated at the levels of mRNA stability and translation. BMC Mol Biol. 2004;5:3. PubMed PMID: 15176985. 63. McNicoll F, Muller M.et al. Distinct 3'-untranslated region elements regulate stage-specific mRNA accumulation and translation in Leishmania. J Biol Chem. 2005;280(42):35238–35246. PubMed PMID: 16115874. 64. Blum B, Bakalara N.et al. A model for RNA editing in kinetoplastid mitochondria: "guide" RNA molecules transcribed from maxicircle DNA provide the edited information. Cell. 1990;60(2):189–198. PubMed PMID: 1688737. 65. Sturm NR, Simpson L. Kinetoplast DNA minicircles encode guide RNAs for editing of cytochrome oxidase subunit III mRNA. Cell. 1990;61(5):879–884. PubMed PMID: 1693097. 66. Bellofatto V, Cross GA. Expression of a bacterial gene in a trypanosomatid protozoan. Science. 1989;244(4909):1167–1169. PubMed PMID: 2499047. 67. Kapler GM, Coburn CM.et al. Stable transfection of the human parasite Leishmania major delineates a 30kilobase region sufficient for extrachromosomal replication and expression. Mol Cell Biol. 1990;10(3): 1084–1094. PubMed PMID: 2304458. 68. Laban A, Tobin JF.et al. Stable expression of the bacterial neor gene in Leishmania enriettii. Nature. 1990;343(6258):572–574. PubMed PMID: 2300209. 69. Laban A, Wirth DF. Transfection of Leishmania enriettii and expression of chloramphenicol acetyltransferase gene. Proc Natl Acad Sci U S A. 1989;86(23):9119–9123. PubMed PMID: 2594753. 70. LeBowitz JH, Coburn CM.et al. Development of a stable Leishmania expression vector and application to the study of parasite surface antigen genes. Proc Natl Acad Sci U S A. 1990;87(24):9736–9740. PubMed PMID: 2124701. 71. Clayton CE. Genetic manipulation of kinetoplastida. Parasitol Today. 1999;15(9):372–378. PubMed PMID: 10461166. 72. Clayton CE. Life without transcriptional control? From fly to man and back again. EMBO J. 2002;21(8): 1881–1888. PubMed PMID: 11953307. 73. Ghedin E, Charest H.et al. Inducible expression of suicide genes in Leishmania donovani amastigotes. J Biol Chem. 1998;273(36):22997–23003. PubMed PMID: 9722523. 74. Ryan KA, Dasgupta S.et al. Shuttle cosmid vectors for the trypanosomatid parasite Leishmania. Gene. 1993;131(1):145–150. PubMed PMID: 8370535. 75. Ryan KA, Garraway LA.et al. Isolation of virulence genes directing surface glycosyl-phosphatidylinositol synthesis by functional complementation of Leishmania. Proc Natl Acad Sci U S A. 1993;90(18):8609– 8613. PubMed PMID: 8378337. 76. Pedrosa AL, Cruz AK. The effect of location and direction of an episomal gene on the restoration of a phenotype by functional complementation in Leishmania. Mol Biochem Parasitol. 2002;122(2):141–148. PubMed PMID: 12106868. 77. Cruz A, Beverley SM. Gene replacement in parasitic protozoa. Nature. 1990;348(6297):171–173. PubMed PMID: 2234081. 78. Misslitz A, Mottram JC.et al. Targeted integration into a rRNA locus results in uniform and high level expression of transgenes in Leishmania amastigotes. Mol Biochem Parasitol. 2000;107(2):251–261. PubMed PMID: 10779601. 79. Papadopoulou B, Dumas C. Parameters controlling the rate of gene targeting frequency in the protozoan parasite Leishmania. Nucleic Acids Res. 1997;25(21):4278–4286. PubMed PMID: 9336458. 80. Cruz A, Coburn CM.et al. Double targeted gene replacement for creating null mutants. Proc Natl Acad Sci U S A. 1991;88(16):7170–7174. PubMed PMID: 1651496. 16 Bioinformatics in Tropical Disease Research 81. Curotto de Lafaille MA, Wirth DF. Creation of Null/+ mutants of the alpha-tubulin gene in Leishmania enriettii by gene cluster deletion. J Biol Chem. 1992;267(33):23839–23846. PubMed PMID: 1429722. 82. McKean PG, Denny PW.et al. Phenotypic changes associated with deletion and overexpression of a stageregulated gene family in Leishmania. Cell Microbiol. 2001;3(8):511–523. PubMed PMID: 11488813. 83. Wirtz E, Clayton C. Inducible gene expression in trypanosomes mediated by a prokaryotic repressor. Science. 1995;268(5214):1179–1183. PubMed PMID: 7761835. 84. Yan S, Martinez-Calvillo S.et al. A low-background inducible promoter system in Leishmania donovani. Mol Biochem Parasitol. 2002;119(2):217–223. PubMed PMID: 11814573. 85. Giaever G, Chu AM.et al. Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002;418(6896):387–391. PubMed PMID: 12140549. 86. Winzeler EA, Shoemaker DD.et al. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science. 1999;285(5429):901–906. PubMed PMID: 10436161. 87. Augusto MJ, Squina FM.et al. Specificity of modified Drosophila mariner transposons in the identification of Leishmania genes. Exp Parasitol. 2004;108(3-4):109–113. PubMed PMID: 15582507. 88. Beverley SM, Akopyants NS.et al. Putting the Leishmania genome to work: functional genomics by transposon trapping and expression profiling. Philos Trans R Soc Lond B Biol Sci. 2002;357(1417):47–53. PubMed PMID: 11839181. 89. Robinson KA, Beverley SM. Improvements in transfection efficiency and tests of RNA interference (RNAi) approaches in the protozoan parasite Leishmania. Mol Biochem Parasitol. 2003;128(2):217–228. PubMed PMID: 12742588. 90. Fitzpatrick JM, Johnston DA.et al. An oligonucleotide microarray for transcriptome analysis of Schistosoma mansoni and its application/use to investigate gender-associated gene expression. Mol Biochem Parasitol. 2005;141(1):1–13. PubMed PMID: 15811522. 91. Llinas M, Bozdech Z.et al. Comparative whole genome transcriptome analysis of three Plasmodium falciparum strains. Nucleic Acids Res. 2006;34(4):1166–1173. PubMed PMID: 16493140. 92. Radke JR, Behnke MS.et al. The transcriptome of Toxoplasma gondii. BMC Biol. 2005;3:26. PubMed PMID: 16324218. 93. Almeida R, Gilmartin BJ.et al. Expression profiling of the Leishmania life cycle: cDNA arrays identify developmentally regulated genes present but not annotated in the genome. Mol Biochem Parasitol. 2004;136(1):87–100. PubMed PMID: 15138070. 94. Saxena A, Worthey EA.et al. Evaluation of differential gene expression in Leishmania major Friedlin procyclics and metacyclics using DNA microarray analysis. Mol Biochem Parasitol. 2003;129(1):103–114. PubMed PMID: 12798511. 95. Brems S, Guilbride DL.et al. The transcriptomes of Trypanosoma brucei Lister 427 and TREU927 bloodstream and procyclic trypomastigotes. Mol Biochem Parasitol. 2005;139(2):163–172. PubMed PMID: 15664651. 96. Diehl S, Diehl F.et al. Analysis of stage-specific gene expression in the bloodstream and the procyclic form of Trypanosoma brucei using a genomic DNA-microarray. Mol Biochem Parasitol. 2002;123(2):115–123. PubMed PMID: 12270627. 97. Akopyants NS, Matlib RS.et al. Expression profiling using random genomic DNA microarrays identifies differentially expressed genes associated with three major developmental stages of the protozoan parasite Leishmania major. Mol Biochem Parasitol. 2004;136(1):71–86. PubMed PMID: 15138069. 98. Holzer TR, McMaster WR.et al. Expression profiling by whole-genome interspecies microarray hybridization reveals differential gene expression in procyclic promastigotes, lesion-derived amastigotes, and axenic amastigotes in Leishmania mexicana. Mol Biochem Parasitol. 2006;146(2):198–218. PubMed PMID: 16430978. 99. Anderson L, Seilhamer J. A comparison of selected mRNA and protein abundances in human liver. Electrophoresis. 1997;18(3-4):533–537. PubMed PMID: 9150937. 100. Gygi SP, Aebersold R. Absolute quantitation of 2-D protein spots. Methods Mol Biol. 1999;112:417–421. PubMed PMID: 10027266. Leishmania Genomics: Where Do We Stand? 17 101. Wasinger VC, Cordwell SJ.et al. Progress with gene-product mapping of the Mollicutes: Mycoplasma genitalium. Electrophoresis. 1995;16(7):1090–1094. PubMed PMID: 7498152. 102. Wilkins MR, Sanchez JC.et al. Progress with proteome projects: why all proteins expressed by a genome should be identified and how to do it. Biotechnol Genet Eng Rev. 1996;13:19–50. PubMed PMID: 8948108. 103. O'Farrell PH. High resolution two-dimensional electrophoresis of proteins. J Biol Chem. 1975;250(10): 4007–4021. PubMed PMID: 236308. 104. Hillenkamp F, Karas M.et al. Matrix-assisted laser desorption/ionization mass spectrometry of biopolymers. Anal Chem. 1991;63(24):1193A–1203A. PubMed PMID: 1897719. 105. Karas M, Hillenkamp F. Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. Anal Chem. 1988;60(20):2299–2301. PubMed PMID: 3239801. 106. Fenn JB, Mann M.et al. Electrospray ionization for mass spectrometry of large biomolecules. Science. 1989;246(4926):64–71. PubMed PMID: 2675315. 107. Smith RD, Loo JA.et al. New developments in biochemical mass spectrometry: electrospray ionization. Anal Chem. 1990;62(9):882–899. PubMed PMID: 2194402. 108. El Fakhry Y, Ouellette M.et al. A proteomic approach to identify developmentally regulated proteins in Leishmania infantum. Proteomics. 2002;2(8):1007–1017. PubMed PMID: 12203896. 109. Bente M, Harder S.et al. Developmentally induced changes of the proteome in the protozoan parasite Leishmania donovani. Proteomics. 2003;3(9):1811–1829. PubMed PMID: 12973740. 110. Nugent PG, Karsani SA.et al. Proteomic analysis of Leishmania mexicana differentiation. Mol Biochem Parasitol. 2004;136(1):51–62. PubMed PMID: 15138067. 111. Drummelsmith J, Brochu V.et al. Proteome mapping of the protozoan parasite Leishmania and application to the study of drug targets and resistance mechanisms. Mol Cell Proteomics. 2003;2(3):146–155. PubMed PMID: 12644573. 112. Atwood JA, Weatherly DB.et al. The Trypanosoma cruzi proteome. Science. 2005;309(5733):473–476. PubMed PMID: 16020736. 113. Broadhead R, Dawe HR.et al. Flagellar motility is required for the viability of the bloodstream trypanosome. Nature. 2006;440(7081):224–227. PubMed PMID: 16525475. 114. Fairlamb AH, Bowman IB. Cell disruption and subcellular fractionation of Trypanosoma brucei. Trans R Soc Trop Med Hyg. 1974;68(4):275. PubMed PMID: 4417067. 115. Rodrigues CO, Scott DA.et al. Presence of a vacuolar H+-pyrophosphatase in promastigotes of Leishmania donovani and its localization to a different compartment from the vacuolar H+-ATPase. Biochem J. 1999;340(Pt 3):759–766. PubMed PMID: 10359662. 116. Gygi SP, Rist B.et al. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat Biotechnol. 1999;17(10):994–999. PubMed PMID: 10504701. 117. Paba J, Ricart CA.et al. Proteomic analysis of Trypanosoma cruzi developmental stages using isotope-coded affinity tag reagents. J Proteome Res. 2004;3(3):517–524. PubMed PMID: 15253433. 118. Guina T, Wu M.et al. Proteomic analysis of Pseudomonas aeruginosa grown under magnesium limitation. J Am Soc Mass Spectrom. 2003;14(7):742–751. PubMed PMID: 12837596. 119. Schmidt F, Donahoe S.et al. Complementary analysis of the Mycobacterium tuberculosis proteome by twodimensional electrophoresis and isotope-coded affinity tag technology. Mol Cell Proteomics. 2004;3(1):24– 42. PubMed PMID: 14557599. 120. El-Sayed NM, Myler PJ.et al. Comparative genomics of trypanosomatid parasitic protozoa. Science. 2005;309(5733):404–409. PubMed PMID: 16020724. 121. Yao C, Donelson JE.et al. The major surface protease (MSP or GP63) of Leishmania sp. Biosynthesis, regulation of expression, and function. Mol Biochem Parasitol. 2003;132(1):1–16. PubMed PMID: 14563532. 122. Ramos CS, Franco FA.et al. Characterisation of a new Leishmania META gene and genomic analysis of the META cluster. FEMS Microbiol Lett. 2004;238(1):213–219. PubMed PMID: 15336424. 123. Fickett JW. Finding genes by computer: the state of the art. Trends Genet. 1996;12(8):316–320. PubMed PMID: 8783942. 18 Bioinformatics in Tropical Disease Research 124. Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997;268(1): 78–94. PubMed PMID: 9149143. 125. Claverie JM. Computational methods for the identification of genes in vertebrate genomic sequences. Hum Mol Genet. 1997;6(10):1735–1744. PubMed PMID: 9300666. 126. Worthey EA, Myler PJ. Protozoan genomes: gene identification and annotation. Int J Parasitol. 2005;35(5): 495–512. PubMed PMID: 15826642. 127. Stein LD. Using Perl to facilitate biological analysis. Methods Biochem Anal. 2001;43:413–449. PubMed PMID: 11449734. 128. Bibb MJ, Findlay PR.et al. The relationship between base composition and codon usage in bacterial genes and its use for the simple and reliable identification of protein-coding sequences. Gene. 1984;30(1-3):157– 166. PubMed PMID: 6096212. 129. Tiwari S, Ramachandran S.et al. Prediction of probable genes by Fourier analysis of genomic sequences. Comput Appl Biosci. 1997;13(3):263–270. PubMed PMID: 9183531. 130. Fickett JW. Recognition of protein coding regions in DNA sequences. Nucleic Acids Res. 1982;10(17): 5303–5318. PubMed PMID: 7145702. 131. Delcher AL, Harmon D.et al. Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 1999;27(23):4636–4641. PubMed PMID: 10556321. 132. Trouiller P, Olliaro P.et al. Drug development for neglected diseases: a deficient market and a public-health policy failure. Lancet. 2002;359(9324):2188–2194. PubMed PMID: 12090998.