NLM Citation: Uliana SRB, Ruiz JC, Cruz AK. Leishmania Genomics:
Where Do We Stand?. 2006 Oct 12 [Updated 2007 Aug 24]. In: Gruber
A, Durham AM, Huynh C, et al., editors. Bioinformatics in Tropical
Disease Research: A Practical and Case-Study Approach [Internet].
Bethesda (MD): National Center for Biotechnology Information (US);
2008. Chapter B02.
Bookshelf URL: https://www.ncbi.nlm.nih.gov/books/
Chapter B02. Leishmania Genomics: Where Do We
Stand?
Silvia R.B Uliana, MD, PhD,1 Jeronimo C. Ruiz, PhD,2 and Angela K. Cruz, PhD3
Created: October 12, 2006; Updated: August 24, 2007.
A Short History of the Leishmania Genome Project
More than 20 different species of the genus Leishmania are known to be pathogenic to humans. These protozoan
parasites are kinetoplastids of the Trypanosomatidae family and depend on a female Phlebotominae insect and
on a variety of vertebrate hosts to complete its vital cycle. Collectively, Leishmania spp. are responsible for one of
the world’s major communicable diseases, which lead the World Health Organization to include leishmaniasis
among the six major diseases targeted for intensive research and control efforts. It is estimated that more than 2
million new cases of leishmaniasis occur each year in 88 countries (http://www.who.int/health-topics/
leishmaniasis.htm). Leishmaniasis is a spectral disease with multifaceted clinical manifestations, varying from
mild and frequently self-healing cutaneous lesions to severe mucocutaneous ulcers or to visceral manifestations,
which can lead to death.
Besides its unquestionable medical relevance, this ancient eukaryote is an important biological model.
Leishmania are mostly diploid organisms with no apparent sexual cycle displaying some uncommon
biochemical, genetic, and morphological features that are either unique to the order Kinetoplastida or are more
frequently used by these organisms than by any other. These features include a unique mitochondrial DNA
organization, the kinetoplast, extensive mitochondrial DNA editing (1), glycosomes (2), polycistronic
transcription (3), trans-splicing (4, 5), and GPI anchoring of membrane proteins, among others (6-8).
Because leishmaniasis is a Third World endemic disease, there is almost no interest on the part of the
pharmaceutical companies to search for low-toxicity drugs or invest in the development of safe and effective
vaccines to fight the disease. Therefore, deciphering the genomic sequence of the parasite was taken as an option
to optimize approaches and accelerate the understanding of the biology of this powerful human parasite and its
interaction with the host. The ambience of several successful, ongoing genome projects, particularly those of
pathogenic organisms, was the background to put forward a genome project for Leishmania and other relevant
Trypanosomatids (9).
The first Parasite Genome Network Planning Meeting was held in April 1994 in Rio de Janeiro, Brazil, and was
sponsored by FIOCRUZ and UNICEF/UNDP/World Bank/WHO Special Programme for Research and Training
in Tropical Diseases (TDR). The Genome Projects of Trypanosoma cruzi, Trypanosoma brucei, and Leishmania
major were launched at the Meeting, and reference strains for each one of the parasites were chosen for the
Author Affiliations: 1 Instituto de Ciências Biomédicas, Universidade de São Paulo; Email: srbulian@icb.usp.br. 2
Fundação Oswaldo Cruz, Centro de Pesquisas René Rachou; Email: jcruiz@cpqrr.fiocruz.br. 3 Faculdade de Medicina de
Ribeirão Preto, Universidade de São Paulo; Email: akcruz@fmrp.usp.br.
2
Bioinformatics in Tropical Disease Research
sequencing project. A Leishmania Genome Network (LGN) was established, and several laboratories from all
over the world got involved in the initiative.
The LGN major aims were: 1) to develop a physical map of the parasite’s genome; 2) to curate, analyze, and
disseminate parasite genome data; and 3) to characterize the entire genomic sequence of the organism. These
core objectives would allow progress in different areas from discovery of new drug targets, diagnostic and/or
vaccine approaches, and improvement of knowledge on the parasite biology to disseminate expertise in
genomics to different countries.
Pulsed-field gel electrophoresis (PFGE) was used to generate "molecular karyotypes" for distinct Leishmania
species and strains. A combination of PFGE and hybridization analysis revealed that an observed chromosome
size variation was not a consequence of clonal heterogeneity of strains, and that the variation was relatively small
(10 to 20%) and mainly attributable to expansion and deletion of subtelomeric regions (10). In landmark studies,
Wincker and colleagues (10) defined 36 physical linkage groups corresponding to the complete set of
chromosomes, suggesting a genome size of 35 Mb. They not only showed that these physical linkage groups were
highly conserved in Old World species but were also mostly conserved in New World species (11). The LGN
group extended the analysis to construct the Leishmania major Friedlin (LmjF) reference strain molecular
karyotype (12). These data are evidence of the conservation of syntenic (conservation of gene order) groups
among species. Understanding the extent of conservation in genome organization was a very important step at
that time, confirming that the use of a single reference strain for all Leishmania species was an appropriate
decision.
Another important work front during the mid- and late-1990s was the construction of a physical map for the
LmjF chromosomes. This approach was chosen to allow the selection of a tileset of clones necessary to span the
entire genome. The map was built from cloned DNAs of large insert genomic libraries (13). The selected clones
would then be sequenced, and a large proportion of the mapping phase of the genome project involved the
systematic choice of contig end-probing for contig-joining activities. At that point, given hardware and software
features and the lack of funds, full LmjF genome shotgun sequencing was not considered a feasible strategy.
A parallel activity during the early years of the Leishmania Genome Project was the sequencing of expressed
genomic libraries, seen at that time as a fast and simple means toward gene discovery. Single-pass sequencing of
clones from cDNA libraries from different life cycle stages was conducted, generating more than 2,500 Expressed
Sequence Tags (ESTs). EST sequencing was originally identified as an efficient, cost-effective method for gene
discovery in relatively large genomes, such as the protozoan ones. Nevertheless, other studies demonstrated that
T. brucei and Leishmania gene-dense genomes and non-intronic genes would make genomic sequencing a much
better option, even for gene discovery purposes (14, 15).
The complete sequencing of chromosome 1 was initiated in 1996, as a pilot project of physical map-based
sequencing, and was published by Myler and coworkers (16) in 1999. A striking polarity of putative
transcription was shown and confirmed for the genomes of Leishmania, T. brucei, and T. cruzi. The so-called
Directional Gene Clusters (DGCs), a unique characteristic of gene organization of Trypanosomatids to be
discussed further (below), may be a consequence of the polycistronic mode of transcription and the unusual
linkage between transcription and RNA processing involving a coupled mechanism of trans-splicing and
polyadenylation (17, 18).
The rest of the genome came afterwards, and the 33.6-Mb sequence was obtained by a combination of
approaches ranging from hierarchical sequencing strategy, or the clone-by-clone approach, and Whole
Chromosome Shotgun (WGS), which involves an initial separation step of individual or co-migrating
chromosomes by PFGE. In silico analysis of the L. major genome predicted 911 RNA genes, 39 pseudogenes, and
8,370 protein-coding genes. The entire Leishmania genome sequence was published along with the Trypanosoma
brucei and T. cruzi genomes in the July 2005 Science issue (19). This marks the conquest of an international
Leishmania Genomics: Where Do We Stand?
3
effort that, as already said (20), must not be taken as a final achievement but as the beginning of several possible
investigation routes.
The Leishmania spp. genomes are now being exploited with a comparative genomics approach: the genomes of L.
infantum and L. braziliensis have been completely sequenced and annotated (21). It is worthwhile mentioning
that sequencing of each of these two genomes was accomplished in less than 1 year. Moreover, all the recent
software and hardware improvements allowed all of the sequencing to be completed by the whole genomic DNA
shotgun sequencing approach.
Particular Features of Leishmania Genome Organization in
Comparison with Other Eukaryotic Genomes
Mapping and sequencing of the L. major genome allowed the definition of 36 discrete chromosomes composing
the genome of this protozoan parasite. Different Leishmania species have 34 to 36 chromosomes, varying in size
from 268 to 2,680 kb. As mentioned above, chromosomes do not condense during cell division, and the genome
is largely diploid and homozygous.
The 33.6-Mb haploid genome of L. major has an overall G+C content of 59.7%. The 8,370 predicted proteincoding genes are densely arranged and occupy about 48% of the genome (19) (http://www.genedb.org/genedb/
leish/). Many protein-coding genes are grouped in multicopy families represented in tandem arrays.
L. major genome is relatively poor in repetitive sequences when compared with the other Trypanosomatids (19,
22). Interspersed elements are not as frequent, and active retrotransposable elements have not been found (23).
Major repeated elements in L. major are found exclusively in subtelomeric regions (24). As mentioned above,
contraction and expansion of these subtelomeric regions are responsible for the size variation observed between
lines and chromosome homologs (25).
Leishmania telomeres contain variable numbers of the hexameric repeat 5'-TAGGGT-3', variable numbers of the
octameric repeat 5'-TGGTCATG-3', and a sequence immediately adjacent to the telomere, common to all
Leishmania species and named Leishmania conserved telomere-associated sequence (LCTAS) (26). The LCTAS
can be present as a single copy at the end of the chromosome or in repeated copies intercalated by the telomere
end-most hexameric repeat. In the subtelomeric region, some other, less conserved repeated elements are found,
followed by a barren region (27).
One of the most remarkable and particular characteristics of Leishmania and Trypanosoma genomes is the
organization of genes into long arrays of tandem coding sequences, directionally positioned and transcribed into
polycistronic primary transcripts. These DGCs are not homogeneous in size, varying from a few kb to more than
1.2 Mb. Sequences localized between two divergent transcription units are called strand-switch regions, and the
nucleotide sequences in these regions do not display a particular consensus sequence for any RNA polymerase
known promoter but show a composition bias with increased AT content in relation to other chromosomal
regions (28).
Transcription by RNA polymerase II (RNA pol II) in these organisms is unique. After the completion of
chromosome 1 (chr1) of L. major (16), the study of its transcriptional activity showed that coding strand-specific
RNA pol II transcribing activity was initiated in the strand-switch region and proceeded bidirectionally (7).
Similarly, on chr3, transcription starts at discrete regions (perhaps not exclusively, but more intensely) and
proceeds along the DGC (6). No substantial sequence similarity was found between the regions of transcription
initiation identified on LmjF chr1 and chr3. Previous work had shown that RNA pol II initiates transcription at
random in Leishmania, not requiring a specific promoter (29). Nevertheless, in intact chromosomes, RNA pol II
transcription is predominantly, but not exclusively, strand specific and unidirectional (30). As a result, the full
complements of DGCs are transcribed at the same rate.
4
Bioinformatics in Tropical Disease Research
Interestingly, even if a characteristic RNA pol II promoter region does not seem to exist preceding the long
DGCs, a typical and discrete pol II promoter has been found and characterized in the intergenic sequences of the
tandem array encoding about 120 copies of the spliced leader RNA (SL RNA) gene (31-33). Several conserved
genes encoding subunits of basal transcription factors for RNA pol II have been identified in the Leishmania
genome, but some homologs for several of those have not been found thus far, making the mechanism of RNA
transcription in these organisms a theme for further investigation.
The tandem organization of ribosomal RNA (rRNA) genes and the characteristics of RNA polymerase I (RNA
pol I) promoters resemble the structure present in other organisms. One large tandem array contains
approximately 60 copies of the genes for 24S, 18S, and 5.8S rRNA genes, whereas 5S rRNA genes are dispersed in
11 different loci. RNA pol I promoters are present and exhibit the expected characteristics: they are strong
promoters, insensitive to α-amanitin, poorly conserved in sequence, and regulated by upstream repeated motifs
(34-37). RNA pol I is responsible for the transcription of the rRNA units (6) in Leishmania and, again unusually,
variant surface antigen-coding genes in T. brucei, a singular example of transcription of protein-coding genes by
RNA pol I. Transcription of tRNAs and small RNAs seems to be driven by RNA pol III promoters (6).
Another atypical feature of the Leishmania (and Trypanosomatids in general) genome is the almost complete
absence of introns. Very few examples of cis-spliced genes have been found thus far in these organisms (19, 38).
Over 40 examples of sequences for which there is strong evidence of horizontal transfer from bacteria were
found in the L. major genome. Some of them are exclusive of Leishmania, whereas many others are shared with
Trypanosoma species.
Control of Gene Expression in Leishmania
During its life cycle, Leishmania alternates between the extracellular promastigote that multiplies in the digestive
tract of the insect vector and the intracellular amastigote, a parasite of macrophages in the vertebrate host
(http://www.dpd.cdc.gov/DPDx/HTML/Leishmaniasis.htm). After colonization and multiplication of procyclic
promastigotes in the insect gut and before the inoculation of infective parasites with the next blood meal,
promastigotes undergo a developmental differentiation process known as metacyclogenesis (39). Procyclic and
metacyclic promastigotes and amastigotes have the ability to differentially express gene products, as exemplified
by numerous genes and proteins already characterized (40-48).
Although in most organisms a first and decisive level of gene expression control is given by the transcription rate
of a particular sequence, this is clearly not the case in Leishmania, as discussed above. To increase transcription,
Leishmania parasites rely on duplication or amplification of gene sequences, and these organisms reveal a
striking genome plasticity, being able to respond to pressure by changing the ploidy locally or in the whole
genome (49-52). Consequently, other than amplifying the whole genome, Leishmania has to rely on posttranscriptional mechanisms to control gene expression.
One of the consequences of polycistronic transcription is the lack of capping at the 5' end of discrete gene
transcripts, making them unstable. Stability of mRNAs is achieved in these organisms by trans-splicing. This
type of RNA processing was first described in Trypanosomatids and later demonstrated also in other organisms.
The trans-splicing machinery is responsible for transferring a capped small RNA, known as spliced-leader (SL),
to the 5' end of almost all mRNAs (53, 54). The acceptor site for the SL RNA is generally an AG dinucleotide
preceded by a pyrimidine-reach sequence upstream from the open reading frame (ORF) by 100-250 nt. The
polypyrimidine tract is also responsible for directing polyadenylation of the upstream mRNA (18, 53, 54).
The rate of splicing and maturation of RNA is, therefore, one of the levels for the control of gene expression in
Leishmania. Examples of alternative splicing of transcripts have been described in T. cruzi and T. brucei (55, 56).
Leishmania Genomics: Where Do We Stand?
5
On the other hand, the control of RNA degradation has been shown to play a part in the regulation of gene
expression. In several cases, the presence of particular motifs in the 3'-untranslated region (3'UTR) has been
shown to be essential for regulating the mRNA half-life (43, 57-59). Strong evidence of the requirement for de
novo protein synthesis to occur indicates that degradation or stabilization of an mRNA takes place through
protein interaction with the 3'UTR sequence (60), and accordingly, several genes for proteins with RNA-binding
motifs were identified. Regulatory sequences in the 3'UTR can also regulate translation, as has been shown
recently for amastin and heat shock proteins 70 and 83 (61-63).
RNA editing represents still another mechanism of gene expression control. Primary transcripts for several
mitochondrial genes do not encode functional ORFs, and the explanation for that eluded many scientists until
the demonstration by Benne et al. (1) that primary transcripts were modified by the addition or deletion of
uridines, providing for a functional ORF for several mitochondrial enzymes. The editing phenomena was later
shown to be based on small complementary RNA molecules encoded by the kinetoplast DNA minicircles (64,
65). These small RNAs were called guide RNAs (gRNA).
Experimental Approaches for Gene Function Studies: Reverse
and Forward Genetics
Leishmania genetic features did not allow studies using classical genetic approaches. Therefore, until the late
1980s, gene function was inferred from similarity to genes from other organisms or expression in heterologous
systems. The breakthrough for Trypanosomatids’ reverse genetics started with the transient expression of
reporter genes in Leptomonas spp. (66). Shortly after, appropriate conditions for transient and stable
transfections were established for Leishmania (67-69).
Obligatory features of vectors for transient and stable transfections were determined (18, 29, 70). In both cases,
promoterless circular plasmids gave readily detectable expression in Leishmania. The expression is dependent on
the presence of the polypyrimidine track and acceptor site (a dinucleotide AG) for trans-splicing positioned
upstream to a reporter gene, in the case of transient expression, or to a selectable marker in vectors designed for
stable transfection. Several vectors, reporter genes, and positive or negative selectable markers are now available
(71). Downstream to the gene to be expressed, a 3’UTR from a resident gene is added to ensure the proper polyA
tail addition. Positional cloning between the 5' and 3' UTRs, as described above, attains transcription of any
given gene. Transient transfection vectors may be designed to contain an endogenous promoter for RNA pol I
genes or a heterologous one, such as phage T7 promoter, which drives transcription more efficiently (34, 71, 72).
Data now available on 3'UTR sequence elements that are able to regulate the stability and translation efficiency
of RNA (72, 73) may be of help for the future development of vectors designed to achieve defined patterns of
developmental expression, which will represent another relevant instrument for reverse genetics.
Long genomic fragments cloned into cosmid vectors have been used for expression of several genes at once,
becoming an invaluable tool for functional complementation studies. Cosmid genomic libraries are very useful
to rescue phenotypes and unravel the function of novel genes (74, 75). Functional studies carried out with such
recombinants have offered evidence that in episomes, transcription happens from both strands, and no
weakening attributable to the position of the gene within an insert was observed (76).
For technical reasons, transient transfection has limited applications: it demands high amounts of plasmid DNA
for detection of the reporter gene expression, and the attainable levels do not allow subcellular localization of the
exogenous protein. Its main use is restricted to searches for sequence elements involved in gene regulation.
Therefore, permanently transformed cell lines are the preferred approach for most of the transfection
applications. Expression of genes present in the episomes can be enhanced by increasing the concentration of
selective drug, which leads to amplification of the plasmid copy number and to higher levels of transcription.
This is a very convenient approach because it allows tuning up the level of drug pressure and overexpression of
6
Bioinformatics in Tropical Disease Research
the gene of interest in the episome, making possible a correlation between the gene product and a given
phenotype.
Nevertheless, it is also well established that the absence of drug pressure will lead to a loss of circular
extrachromosomal molecules, and it is true that after a few serial passages, the original clone is turned into a
mixed population of cells, with and without the episome. The stability of the exogenous gene can be achieved by
its integration into the genome. To promote integration of DNA into the Leishmania genome, a linear fragment
with non-cohesive ends must be transfected. The sine qua non condition for insertion is the presence of sequence
homology between the transfected fragment and the integration site in the genome (77). The high frequency and
efficiency of homologous recombination are used to integrate an exogenous gene to be overexpressed in the
rDNA locus. This tactic places the gene under the control of the only known strong promoter, which drives
constitutive expression in Leishmania. Transcription powered by RNA pol I in such an arrangement not only
leads to overexpression of the gene in higher levels than obtained by RNA pol II-driven expression of episomal
genes but brings the advantage of stabilization of the overexpressed gene (78).
More importantly, the high frequency and efficiency of homologous recombination made it possible to knock
out genes using linear fragments designed to replace a resident gene with a selectable marker (77). Gene
replacement happens after a double crossing-over event between homologous sequences flanking the target gene
and the selectable marker. Relevant technical features to obtain genome insertion with high efficiency have been
explored by different groups (77, 79). There are no sound reports of non-homologous integration in the
Leishmania genome. The parasite cell offers favorable homologous recombination machinery but, as mentioned
above, Leishmania is mostly diploid and has no manipulatable sexual cycle, making the knock-out of a given
gene more difficult. Two rounds of transfection and two different selectable markers are the minimal
requirements to obtain a null mutant for a given locus (80). Targeting occurs with comparable efficiencies at
both steps and independently of the order of the selectable markers used in each transfection. Several genes have
been knocked out using this two-step strategy since 1990, and it has even been shown that long stretches of
genomic sequences (40 kb) can be deleted by homologous replacement (81, 82). This finding shows that gene
clusters with several copies can be targeted at once.
Interestingly, an unsuccessful double replacement of a given gene is now taken as indicative of a gene essential
for the parasite’s survival. Several reports on different targets demonstrate that the parasite uses extreme genetic
means to avoid the loss of an essential gene. These reports were also very important to reveal that ploidy in
Leishmania is anything but strict: alterations of chromosome number either by aneuploidy or tetraploidy are
recurrent findings (49, 52). Such a limitation for the study of essential genes has been overcome by alternative
approaches in trypanosomes but not in Leishmania. Generation of conditional knock-outs using a tetracyclineinducible system to fine-tune the level of expression of an essential gene has been successfully established in T.
brucei (83), but it is not fully operational in Leishmania (84). An efficient inducible system for functional studies
of genes that cannot be knocked out would be a remarkably useful tool for reverse genetics in Leishmania.
Gene silencing or RNAi technology as a means to knock down genes is another extremely important technical
advance for functional genetic studies that is booming in T. brucei but missing in Leishmania. In African
trypanosomes, a stem-loop or a double-stranded RNA can be transfected or directly introduced in the parasite
and can be used in conjunction with the tet repressor system to knock down any given gene transiently. This
approach allows the quick investigation of several genes in parallel and has been widely used and shown to be
the best choice for functional genomics in several organisms including T. brucei.
Leishmania Genomics: Where Do We Stand?
7
Global Approaches for Functional Analyses of the Parasite
Genome: Transcriptome and Proteome
Before the completion of the genome, investigators devoted to functional studies in Leishmania had to use "one
gene at a time" strategies, applying the available reverse and forward genetics technology. There is now an urgent
need for global approaches to functionally tackle the multitude of data available. The completion of the Tri-Tryp
sequencing represents a landmark in the study of these human pathogens. The analyses of the genome sequence
and comparative genomics has already generated an enormous amount of information. But within the available
data lies an enormous number of unanswered questions. For example, of the 8,370 L. major predicted proteincoding genes, about 10% seem to be unique, having no orthologs in Trypanosoma or other organisms (19, 22)
(http://www.genedb.org/genedb/leish/index.jsp). The function of these unknown genes, as well as the
understanding of the physiology of these parasites, will depend on new technical developments, some of which
are already in use for the study of the Leishmania genome as well as other genomes.
Possible approaches to large-scale functional studies involve the characterization of the complete set of RNA
molecules (known as the transcriptome), of the complete set of proteins (the proteome), or of the complete set of
low molecular weight metabolic intermediates (the metabolome), as well as the generation of genome-wide,
single-gene replacements.
Examples of the study of a full genome aimed at identifying and characterizing the whole set of genes are already
available. Whole-genome analysis by single-gene replacements has been accomplished in Saccharomyces
cerevisiae by the Yeast Deletion Program (85, 86). Phenotypes generated by each gene deletion were analyzed by
assays of growth fitness under different culture conditions and by visual examination of cell shape and size.
Some problems became evident after that initiative: the redundancy hiding effects of related genes, the presence
of essential genes (about 14% in S. cerevisiae), and the lack of detectable phenotype after deletion of some genes.
The refinement of the available tools for phenotype analysis becomes then an obvious necessity when we think
about global functional analysis of parasite genes.
Tools for a functional study of the Leishmania genome through a similar type of approach were developed.
Transposable elements have been adapted to Leishmania and could be used to generate random disrupted
mutants (87-89). Drawbacks of this approach are the diploidy, restricting the observation of a mutant phenotype
to the cases where gene dosage is sufficient to produce a change, and the fact that mutants may not display a
“visible” phenotype to the methods of observation feasible at the moment.
Any functional approach will have to take into account several variables, and the developmental stage,
physiological state, growth conditions, etc. will influence the set of expressed products in a given cell. The study
of a response to a particular stimulus is actually a valuable tool to understand the physiology of the cell.
The study of transcriptomes has been applied to several parasites (90-92) and innumerous cell types in complex
organisms. It is a powerful approach and can generate a large amount of data.
Several comparative studies of Leishmania transcripts have been undertaken using microarrays of random
genomic sequences, ESTs, and oligonucleotides. Even before the completion of the L. major genome, microarrays
were already in use to evaluate changes among the different developmental stages. The first analyses were based
on PCR amplification of inserts of promastigote- or amastigote-derived L. major cDNA libraries. Transcripts
purified from L. major promastigotes or lesion-amastigotes tested on that array detected about 15% of the
uniquely spotted sequences as up- or downregulated in amastigotes (93).
Another array built with random genomic sequences estimated to cover about 80% of the genome (14, 88) was
probed for differential expression during metacyclogenesis. About 15% of the sequences in the array showed
changes in abundance by a factor of 1.5 or more (94).
8
Bioinformatics in Tropical Disease Research
These early estimates of transcript regulation in Leishmania were not in agreement with data obtained on
microarray analyses of steady-state RNA in T. brucei (95, 96), which estimated that 2% of trypanosome genes
showed developmental regulation at the mRNA level.
Akopyants et al. (97) showed, with the same sequences used in the random-genomic array mentioned above,
that a change in abundance of transcripts was observed in only 1-3% of the spotted sequences. These results were
validated through Northern blotting, indicating that microarrays could be used for analysis of the Leishmania
transcriptome. The discrepancy with the previous studies may be attributed, first, to the source of sequences in
some arrays. The EST libraries were probably enriched for highly expressed genes, and that may represent a
strong bias in the analysis of the microarray. On the other hand, more stringent criteria used for defining the
cut-off for differential expression might explain the differences obtained with the genomic derived microarray.
Holzer et al. (98) used whole-genome high-density L. major oligonucleotide microarrays to analyze RNA
isolated from L. mexicana promastigotes, lesion-derived amastigotes, and axenic amastigotes. In this study,
several new steps were taken: an oligonucleotide array was built based on the whole set of predicted ORFs in the
L. major genome and assayed against another Leishmania species, showing that the degree of sequence
conservation in the translated sequences is high enough to allow cross-species analyses to be performed.
Furthermore, a comparison of axenic cultured amastigotes and lesion-derived amastigotes was conducted.
Conclusions were in agreement with most of the previously reported data: 3.5% of all genes in the Leishmania
genome showed differentially regulated, steady-state mRNA levels. Most of the regulated transcripts are
promastigote enriched. Very few amastigote-upregulated molecules were detected. Interestingly, axenic
amastigotes behaved much more like promastigotes than lesion-derived amastigotes, sharing with the first most
of the upregulated sequences. Thus, these authors have confirmed that regulatory control in Leishmania is
exerted mainly downstream of mRNA concentration. These findings were not unexpected given the lack of
transcription control. They also indicated that post-transcriptional regulation at the RNA level is not an
exceptionally important player in the control of gene expression in this case.
It must be stressed, at this point, that obtaining a large number of lesion-amastigotes for functional studies
becomes a very serious problem in some circumstances. For some Leishmania species, tissue-amastigotes are
very scarce indeed, and it is hard to envisage methods for obtaining enough cells to purify large amounts of RNA
or proteins.
Even if differential gene expression based on abundance of transcripts was more common in Leishmania, there is
often a poor correlation between the transcriptome and levels of translated proteins as shown in other systems
(99, 100). It is then understandable why efforts were turned to the study of the final products of the cell
machinery.
The term proteome was used initially in 1995 to describe the total protein complement expressed by a genome
(101, 102). The basic study of proteomes depends on the use of two main techniques: two-dimensional
electrophoretic separation of proteins and mass spectrometry. Two-dimensional electrophoresis (2-DE) of
proteins, initially described by O’Farrell (103) has gained several technical refinements recently, making it more
reliable and reproducible. However, technical problems still exist, mainly because of poor sensitivity of the
method and lack of reproducibility, and to poor hydrophilicity and extreme pIs of some proteins.
Mass spectrometry is used for the identification of all spots detected in a 2D-gel or of a selection based on
several criteria (abundance, differences between developmental stages, metabolic labeling, etc.). Two major
improvements on the technique made it more amenable to protein analysis: matrix-assisted laser desorption and
ionization (MALDI) (104, 105) and electrospray ionization (106, 107).
A peptide-mass fingerprint obtained by MALDI-TOF analysis of trypsin-digested proteins can then be
compared with full-translated genome data. Furthermore, peptide sequence tags can be obtained from the same
peptide mixture by subjecting it to tandem MS.
Leishmania Genomics: Where Do We Stand?
9
Proteome analyses of Leishmania species have been applied to study the developmental differentiation. Early
reports on the L. infantum proteome identified over 2,000 spots (of the approximately 8,300 genes predicted in
the genome) in 2-DE. The comparison between promastigote and axenic amastigote patterns revealed about 3%
of differentially expressed spots (108). Similar results were observed with L. donovani (5%) or L. mexicana (7%)
axenic amastigotes (109, 110). The number of differentially expressed proteins in these studies is very low, and at
this point it is not possible to say whether this is attributable to a very stable expression pattern and a low
number of differentially expressed proteins or to technical difficulties. Alternatively, as has been shown for the
transcription pattern, axenic amastigotes may not represent a reliable tool to evaluate the physiology of lesion
amastigotes.
A high resolution 2-DE separation of the proteome of L. major promastigotes (with about 3,700 spots) was
obtained and used to study mutant lines displaying drug resistance to methotrexate or overexpressing
trypanothione reductase (111). This study allowed confirmation of drug resistance mechanisms and showed
trypanothione to be post-translationally modified.
Large-scale identification of protein spots observed by 2-DE has not been accomplished for Leishmania proteins
thus far but, applied to comparative proteomic analysis of T. cruzi differentiation, has uncovered interesting
information on the energy sources used in different life stages (112).
One interesting application of proteome analyses will come from the study of isolated cellular fractions or
organelles. For example, the T. brucei flagellum proteome, with 331 components, has been characterized (113).
Several proteins identified in this study were subjected to functional ablation analysis through RNAi, bringing
further insights into the flagellum physiology, especially in the bloodstream stage of T. brucei.
Methods for cell disruption and subcellular fractionation of T. brucei have been described (114) and successfully
applied to fractionate Leishmania promastigotes (115). Direct identification of proteins in organelles may
represent a powerful tool to be applied to functional studies.
Current proteome investigations are focused, as exemplified above, on detecting changes in the abundance of
certain proteins as a response to a given status or stimulus. Consistent evaluations of quantitative changes in
abundance are not easily obtained with classical proteomic techniques. One approach developed to perform fine
quantitative analysis of protein components in a complex mixture is the differential protein labeling using
reagents such as isotope-coded affinity tags (ICAT) (100, 116). This technique has been already applied to a
subset of T. cruzi proteins (117) and has been used to analyze differential protein expression in other infectious
agents (118, 119).
Progress on methodologies to address large-scale functional proteomics may greatly facilitate our understanding
of these parasites. Techniques to study the interaction proteome, transient or stable post-translational
modifications, and subcellular trafficking are beginning to appear in large-scale applications for the study of
other organisms and will have to be adapted according to the particularities of these parasites.
Comparative Genomics of Trypanosomatids: Learning from
Inter- and Intrageneric Analyses
The recent availability of the Tri-Tryp genomes is a marker in the knowledge about the biology of these three
pathogens. Whereas analysis of a single genome provides remarkable biological insights on any particular
organism, comparative analysis of multiple genomes offers considerably more information. It expands our ability
to understand the genetic and evolutionary bases of the shared and distinct parasitic modes and lifestyles and
better assign putative function to predicted coding sequences.
Comparison of gene content and genome architecture, composition, and organization of protein domains
encoded by each of the genomes of L. major, T. cruzi, and T. brucei was very informative. In spite of the marked
10
Bioinformatics in Tropical Disease Research
differences regarding their life cycles and pathogenesis, these parasites share about 6,200 genes, which are
distributed in large syntenic, polycistronic gene clusters. Comparison of the predicted protein sequences within
each of the three genomes revealed that the number of conserved proteins is higher between the two
intracellular parasites L. major and T. cruzi and lower when comparison is made between one of them and T.
brucei. Twelve percent of the Leishmania proteome is composed of species-specific members against 32 and 26%
of T. cruzi and T. brucei, respectively. Because most of the species-specific proteins are annotated as members of
surface antigen families, it is possible to relate the differences in absolute number of unique sequences to
strategies of survival and immune evasion of each Trypanosomatid (120).
Despite their evolutionary distance, the genomes of these three organisms are highly syntenic. The comparison
of T. brucei and L. major genomes shows that 68 and 75% of their genes, respectively, remain in the same
genomic context; 110 blocks of synteny span 30.7 Mb of the Leishmania genome. The strand-switch regions
separating DGCs are frequently the spot for synteny breakpoints in Leishmania and T. brucei. Many of the
species-specific genes are found in nonsyntenic internal or subtelomeric regions of the chromosomes from each
parasite. Furthermore, in these regions also reside retroelements and structural RNAs, and gene family
expansions take place. Despite the reported conservation, comparative analyses indicate the presence of gene
divergence, sequence acquisition, and loss (less frequent than gene insertion), or rearrangement within syntenic
regions of chromosomes (22).
The L. major subtelomeric regions are shorter (<20 kb) than the trypanosome ones, with relatively few repetitive
sequences. Nevertheless, accumulating evidence indicates that recombination events take place at these sites and
may be involved in genetic divergence and gene function gain (19) (L. Brito and A. K. Cruz, unpublished
results).
Data accumulated over the years on comparative studies of particular Leishmania genes have already provided
some clues about what is to be expected from genomic large-scale comparative analysis. For example, the
structure and genomic organization of the major surface protease (MSP) genes in several species of Leishmania
were characterized (reviewed by Yao et al. (121)). Several species have multiple copies of the MSP genes
organized in tandem. All MSP genes encode similar amino acid sequences. Differences among the different
copies in one species are mainly located in the C-terminal coding regions and their 3'UTRs. The comparison
between related copies in different species revealed a high conservation of the coding sequence and lesser
degrees of similarity in the untranslated regions. The same was observed in other loci, such as the META cluster
(122). Taking those studies into account, we might expect to find single nucleotide changes in the coding regions
and differences in the untranslated region that might be related to regulation of expression of these sequences.
On the other hand, searching for sequence conservation on non-coding regions of the genomes of three
Leishmania species may be a route to find elements of functional relevance for gene expression regulation.
Comparative analysis of L. major, L. infantum, and L. braziliensis genomes revealed conservation of overall gene
content and genome architecture. Large-scale synteny results of these comparative analyses indicate that there
are no synteny breaks among the compared genomes. Despite conservation of genomic context, there are
differences (about 200 species-specific genes) that most likely reflect specific adaptations to distinct speciesspecific selection pressures. Furthermore, species-specific genes identified thus far are not clusterized (21).
The Central Role of Bioinformatics to Put Forward Functional
Genomics in Leishmania
The recent success of the international scientific community in decoding the genetic blueprint of the entire
genome of Tri-Tryps (19, 22) has lead to the post-genomics era, where there is a need for an intellectual fusion of
biomedicine and information technology.
Leishmania Genomics: Where Do We Stand?
11
Bioinformatics plays a crucial role in data manipulation, data curation, and knowledge extraction, thus bridging
the gap between disparate information sources for subsequent biological model building, refinement, and
validation. The post-genome challenge, particularly in the case of these unicellular human pathogens, is to
translate new information about genes, their control pathways, proteins, and their interactions into improved
healthcare.
In the field of structural annotation, a term that refers to the identification of putative genes in DNA sequence,
the discovery of protein-coding genes has been pursued based on the development of several bioinformatics
approaches and methodologies (for reviews, see Fickett (123), Burge and Karlin (124), Claverie (125), and
Worthey and Myler (126)). Each one of them exhibits performance trade-offs, increasing sensitivity (ability to
detect true positives) and increasing selectivity (ability to exclude false positives), and as a general rule,
algorithms for finding protein-coding genes can be classified into two big categories: similarity based (extrinsic
methods) and algorithm based (intrinsic methods) (127). The similarity-based category relies on similaritybased algorithms using localized alignments of query sequences against a set of public domain databases,
whereas the algorithm-based category, also termed ab initio approaches, relies upon the evaluation of
compositional properties of exons, introns, and other features without explicit reference to other sequences.
As a general rule and because the use of multiple lines of evidence lead to the increase of sensitivity and
selectivity, gene prediction often uses a combination of approaches, including both intrinsic and extrinsic
methods. Particularly regarding L. major, a combination of CodonUsage (128), GeneScan (129), TestCode (130),
and Glimmer (131), together with the manual inspection of the gene calls by curators, has led to a gene
identification rate greater than 95%. It’s worth mentioning that ~70% of the genes have no significant similarity
to existing genes in sequence databases.
On the other hand, the in silico identification of non-protein-coding genes is a less advanced discipline, mostly
because of the diversity associated with these sequences. Therefore, efforts have been focused on comparative
analyses of completed genomes available to date.
The parasitic Trypanosomatids have been studied extensively at the molecular and biochemical levels, through
classical biochemistry and molecular biology. Furthermore, several unusual aspects of biochemical physiology
are shared between trypanosomatids but are absent from cells in their mammalian hosts. Some examples include
the thiol metabolism, mechanisms to maintain a reducing intracellular redox milieu, and folate metabolism.
Nevertheless, even with some potentially good drug targets that have been identified, safe, effective, and
affordable medicines for the treatment of leishmaniasis are missing (132). Therefore, exploitation of differences
between the genomes of the parasite and its vertebrate host is important and may lead to innovative routes for
drug target discovery. There is a current initiative of the UNICEF-UNDP-World Bank-WHO Special
Programme for Research and Training in Tropical Disease/World Health Organization (http://www.who.int/tdr/
grants/workplans/gdr.htm) to bring together the pharmaceutical industry with their combinatorial libraries of
lead compounds, medicinal chemistry, and genome data-mining to find new drug targets and the corresponding
inhibitory molecules acting selectively against the parasites. Although some are optimistic about the
development of a single drug useful against all Trypanosomatids, the profound differences in the ways each
parasite interacts with the host means that diverse pharmacological criteria need to be met in developing novel
drugs. Thus, a single anti-trypanosomatid “cure-all” is unlikely.
Bioinformatics tools, learned and developed as a function of the Trypanosomatids genome project, are now
being applied to comparative genomics of Leishmania. Understanding why different Leishmania species cause
such diverse diseases has been an elusive and attractive target for many parasitologists. Comparative genomics
may provide new clues to understanding the virulence mechanisms of various species and may turn the present
course of controlling and treating leishmaniasis into a major step forward.
12
Bioinformatics in Tropical Disease Research
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
Benne R, Van den Burg J.et al. Major transcript of the frameshifted coxII gene from trypanosome
mitochondria contains four nucleotides that are not encoded in the DNA. Cell. 1986;46(6):819–826.
PubMed PMID: 3019552.
Opperdoes FR, Borst P. Localization of nine glycolytic enzymes in a microbody-like organelle in
Trypanosoma brucei: the glycosome. FEBS Lett. 1977;80(2):360–364. PubMed PMID: 142663.
Johnson PJ, Kooter JM.et al. Inactivation of transcription by UV irradiation of T. brucei provides evidence
for a multicistronic transcription unit including a VSG gene. Cell. 1987;51(2):273–281. PubMed PMID:
3664637.
Boothroyd JC, Cross GA. Transcripts coding for variant surface glycoproteins of Trypanosoma brucei have
a short, identical exon at their 5' end. Gene. 1982;20(2):281–289. PubMed PMID: 7166234.
Walder JA, Eder PS.et al. The 35-nucleotide spliced leader sequence is common to all trypanosome
messenger RNA's. Science. 1986;233(4763):569–571. PubMed PMID: 3523758.
Martinez-Calvillo S, Nguyen D.et al. Transcription initiation and termination on Leishmania major
chromosome 3. Eukaryot Cell. 2004;3(2):506–517. PubMed PMID: 15075279.
Martinez-Calvillo S, Yan S.et al. Transcription of Leishmania major Friedlin chromosome 1 initiates in
both directions within a single region. Mol Cell. 2003;11(5):1291–1299. PubMed PMID: 12769852.
Wincker P, Ravel C.et al. The Leishmania genome comprises 36 chromosomes conserved across widely
divergent human pathogenic species. Nucleic Acids Res. 1996;24(9):1688–1694. PubMed PMID: 8649987.
Koonin EV, Galperin MY. Prokaryotic genomes: the emerging paradigm of genome-based microbiology.
Curr Opin Genet Dev. 1997;7(6):757–763. PubMed PMID: 9468784.
Ravel C, Wincker P.et al. Medium-range restriction maps of five chromosomes of Leishmania infantum and
localization of size-variable regions. Genomics. 1996;35(3):509–516. PubMed PMID: 8812485.
Britto C, Ravel C.et al. Conserved linkage groups associated with large-scale chromosomal rearrangements
between Old World and New World Leishmania genomes. Gene. 1998;222(1):107–117. PubMed PMID:
9813266.
Ivens AC, Lewis SM.et al. A physical map of the Leishmania major Friedlin genome. Genome Res.
1998;8(2):135–145. PubMed PMID: 9477341.
Ivens AC, Blackwell JM. The Leishmania genome comes of Age. Parasitol Today. 1999;15(6):225–231.
PubMed PMID: 10366828.
Akopyants NS, Clifton SW.et al. A survey of the Leishmania major Friedlin strain V1 genome by shotgun
sequencing: a resource for DNA microarrays and expression profiling. Mol Biochem Parasitol. 2001;113(2):
337–340. PubMed PMID: 11295190.
El-Sayed NM, Donelson JE. A survey of the Trypanosoma brucei rhodesiense genome using shotgun
sequencing. Mol Biochem Parasitol. 1997;84(2):167–178. PubMed PMID: 9084037.
Myler PJ, Audleman L.et al. Leishmania major Friedlin chromosome 1 has an unusual distribution of
protein-coding genes. Proc Natl Acad Sci U S A. 1999;96(6):2902–2906. PubMed PMID: 10077609.
Aly R, Argaman M.et al. A regulatory role for the 5' and 3' untranslated regions in differential expression of
hsp83 in Leishmania. Nucleic Acids Res. 1994;22(15):2922–2929. PubMed PMID: 8065903.
LeBowitz JH, Smith HQ.et al. Coupling of poly(A) site selection and trans-splicing in Leishmania. Genes
Dev. 1993;7(6):996–1007. PubMed PMID: 8504937.
Ivens AC, Peacock CS.et al. The genome of the kinetoplastid parasite, Leishmania major. Science.
2005;309(5733):436–442. PubMed PMID: 16020728.
C. elegans Sequencing Consortium Genome sequence of the nematode C. elegans: a platform for
investigating biology. The C. elegans Sequencing Consortium. Science. 1998;282(5396):2012–2018.
PubMed PMID: 9851916.
Peacock CS, Seeger K.et al. Comparative genomic analysis of three Leishmania species that cause diverse
human disease". Nat Genet. 2007;39(7):839–847. PubMed PMID: 17572675.
Leishmania Genomics: Where Do We Stand?
13
22. El-Sayed NM, Myler PJ.et al. The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas
disease. Science. 2005;309(5733):409–415. PubMed PMID: 16020725.
23. Bringaud F, Ghedin E.et al. Evolution of non-LTR retrotransposons in the trypanosomatid genomes:
Leishmania major has lost the active elements. Mol Biochem Parasitol. 2006;145(2):158–170. PubMed
PMID: 16257065.
24. Wickstead B, Ersfeld K.et al. Repetitive elements in genomes of parasitic protozoa. Microbiol Mol Biol Rev.
2003;67(3):360–375. PubMed PMID: 12966140.
25. Sunkin SM, Kiser P.et al. The size difference between Leishmania major friedlin chromosome one
homologues is localized to sub-telomeric repeats at one chromosomal end. Mol Biochem Parasitol.
2000;109(1):1–15. PubMed PMID: 10924752.
26. Fu G, Barker DC. Characterisation of Leishmania telomeres reveals unusual telomeric repeats and
conserved telomere-associated sequence. Nucleic Acids Res. 1998;26(9):2161–2167. PubMed PMID:
9547275.
27. Pedrosa AL, Ruiz JC.et al. Characterisation of three chromosomal ends of Leishmania major reveals
transcriptional activity across arrays of reiterated and unique sequences. Mol Biochem Parasitol.
2001;114(1):71–80. PubMed PMID: 11356515.
28. Tosato V, Ciarloni L.et al. Secondary DNA structure analysis of the coding strand switch regions of five
Leishmania major Friedlin chromosomes. Curr Genet. 2001;40(3):186–194. PubMed PMID: 11727994.
29. Curotto de Lafaille MA, Laban A.et al. Gene expression in Leishmania: analysis of essential 5' DNA
sequences. Proc Natl Acad Sci U S A. 1992;89(7):2703–2707. PubMed PMID: 1557376.
30. Monnerat S, Martinez-Calvillo S.et al. Genomic organization and gene expression in a chromosomal region
of Leishmania major. Mol Biochem Parasitol. 2004;134(2):233–243. PubMed PMID: 15003843.
31. Gilinger G, Bellofatto V. Trypanosome spliced leader RNA genes contain the first identified RNA
polymerase II gene promoter in these organisms. Nucleic Acids Res. 2001;29(7):1556–1564. PubMed
PMID: 11266558.
32. Luo H, Gilinger G.et al. Transcription initiation at the TATA-less spliced leader RNA gene promoter
requires at least two DNA-binding proteins and a tripartite architecture that includes an initiator element. J
Biol Chem. 1999;274(45):31947–31954. PubMed PMID: 10542223.
33. Thomas S, Westenberger SJ.et al. Intragenomic spliced leader RNA array analysis of kinetoplastids reveals
unexpected transcribed region diversity in Trypanosoma cruzi. Gene. 2005;352:100–108. PubMed PMID:
15925459.
34. Gay LS, Wilson ME.et al. The promoter for the ribosomal RNA genes of Leishmania chagasi. Mol Biochem
Parasitol. 1996;77(2):193–200. PubMed PMID: 8813665.
35. Martinez-Calvillo S, Sunkin SM.et al. Genomic organization and functional characterization of the
Leishmania major Friedlin ribosomal RNA gene locus. Mol Biochem Parasitol. 2001;116(2):147–157.
PubMed PMID: 11522348.
36. Uliana SR, Fischer W.et al. Structural and functional characterization of the Leishmania amazonensis
ribosomal RNA promoter. Mol Biochem Parasitol. 1996;76(1-2):245–255. PubMed PMID: 8920010.
37. Yan S, Lodes MJ.et al. Characterization of the Leishmania donovani ribosomal RNA promoter. Mol
Biochem Parasitol. 1999;103(2):197–210. PubMed PMID: 10551363.
38. Mair G, Shi H.et al. A new twist in trypanosome RNA metabolism: cis-splicing of pre-mRNA. RNA.
2000;6(2):163–169. PubMed PMID: 10688355.
39. Sacks DL, Perkins PV. Identification of an infective stage of Leishmania promastigotes. Science.
1984;223(4643):1417–1419. PubMed PMID: 6701528.
40. Bellatin JA, Murray AS.et al. Leishmania mexicana: identification of genes that are preferentially expressed
in amastigotes. Exp Parasitol. 2002;100(1):44–53. PubMed PMID: 11971653.
41. Boucher N, Wu Y.et al. A common mechanism of stage-regulated gene expression in Leishmania mediated
by a conserved 3'-untranslated region element. J Biol Chem. 2002;277(22):19511–19520. PubMed PMID:
11912202.
14
Bioinformatics in Tropical Disease Research
42. Brooks DR, Denise H.et al. The stage-regulated expression of Leishmania mexicana CPB cysteine proteases
is mediated by an intercistronic sequence element. J Biol Chem. 2001;276(50):47061–47069. PubMed
PMID: 11592967.
43. Charest H, Zhang WW.et al. The developmental expression of Leishmania donovani A2 amastigote-specific
genes is post-transcriptionally mediated and involves elements located in the 3'-untranslated region. J Biol
Chem. 1996;271(29):17081–17090. PubMed PMID: 8663340.
44. Flinn HM, Smith DF. Genomic organisation and expression of a differentially-regulated gene family from
Leishmania major. Nucleic Acids Res. 1992;20(4):755–762. PubMed PMID: 1371863.
45. Kelly BL, Nelson TN.et al. Stage-specific expression in Leishmania conferred by 3' untranslated regions of
L. major leishmanolysin genes (GP63). Mol Biochem Parasitol. 2001;116(1):101–104. PubMed PMID:
11463473.
46. Moore LL, Santrich C.et al. Stage-specific expression of the Leishmania mexicana paraflagellar rod protein
PFR-2. Mol Biochem Parasitol. 1996;80(2):125–135. PubMed PMID: 8892290.
47. Rochette A, McNicoll F.et al. Characterization and developmental gene regulation of a large gene family
encoding amastin surface proteins in Leishmania spp. Mol Biochem Parasitol. 2005;140(2):205–220.
PubMed PMID: 15760660.
48. Wang Y, Dimitrov K.et al. Stage-specific activity of the Leishmania major CRK3 kinase and functional
rescue of a Schizosaccharomyces pombe cdc2 mutant. Mol Biochem Parasitol. 1998;96(1-2):139–150.
PubMed PMID: 9851613.
49. Cruz AK, Titus R.et al. Plasticity in chromosome number and testing of essential genes in Leishmania by
targeting. Proc Natl Acad Sci U S A. 1993;90(4):1599–1603. PubMed PMID: 8381972.
50. Martinez-Calvillo S, Stuart K.et al. Ploidy changes associated with disruption of two adjacent genes on
Leishmania major chromosome 1. Int J Parasitol. 2005;35(4):419–429. PubMed PMID: 15777918.
51. Mottram JC, Coombs GH. Leishmania mexicana: subcellular distribution of enzymes in amastigotes and
promastigotes. Exp Parasitol. 1985;59(3):265–274. PubMed PMID: 3158538.
52. Uliana SR, Goyal N.et al. Leishmania: overexpression and comparative structural analysis of the stageregulated meta 1 gene. Exp Parasitol. 1999;92(3):183–191. PubMed PMID: 10403759.
53. Matthews KR, Tschudi C.et al. A common pyrimidine-rich motif governs trans-splicing and
polyadenylation of tubulin polycistronic pre-mRNA in trypanosomes. Genes Dev. 1994;8(4):491–501.
PubMed PMID: 7907303.
54. Ullu E, Matthews KR.et al. Temporal order of RNA-processing reactions in trypanosomes: rapid trans
splicing precedes polyadenylation of newly synthesized tubulin transcripts. Mol Cell Biol. 1993;13(1):720–
725. PubMed PMID: 8417363.
55. Manning-Cela R, Gonzalez A.et al. Alternative splicing of LYT1 transcripts in Trypanosoma cruzi. Infect
Immun. 2002;70(8):4726–4728. PubMed PMID: 12117992.
56. Vassella E, Braun R.et al. Control of polyadenylation and alternative splicing of transcripts from adjacent
genes in a procyclin expression site: a dual role for polypyrimidine tracts in trypanosomes? Nucleic Acids
Res. 1994;22(8):1359–1364. PubMed PMID: 8190625.
57. Beetham JK, Myung KS.et al. Glycoprotein 46 mRNA abundance is post-transcriptionally regulated during
development of Leishmania chagasi promastigotes to an infectious form. J Biol Chem. 1997;272(28):17360–
17366. PubMed PMID: 9211875.
58. Brittingham A, Miller MA.et al. Regulation of GP63 mRNA stability in promastigotes of virulent and
attenuated Leishmania chagasi. Mol Biochem Parasitol. 2001;112(1):51–59. PubMed PMID: 11166386.
59. Quijada L, Soto M.et al. Identification of a putative regulatory element in the 3'-untranslated region that
controls expression of HSP70 in Leishmania infantum. Mol Biochem Parasitol. 2000;110(1):79–91.
PubMed PMID: 10989147.
60. Wilson ME, Paetz KE.et al. The effect of ongoing protein synthesis on the steady state levels of Gp63 RNAs
in Leishmania chagasi. J Biol Chem. 1993;268(21):15731–15736. PubMed PMID: 8340397.
Leishmania Genomics: Where Do We Stand?
15
61. Folgueira C, Quijada L.et al. The translational efficiencies of the two Leishmania infantum HSP70 mRNAs,
differing in their 3'-untranslated regions, are affected by shifts in the temperature of growth through
different mechanisms. J Biol Chem. 2005;280(42):35172–35183. PubMed PMID: 16105831.
62. Larreta R, Soto M.et al. The expression of HSP83 genes in Leishmania infantum is affected by temperature
and by stage-differentiation and is regulated at the levels of mRNA stability and translation. BMC Mol Biol.
2004;5:3. PubMed PMID: 15176985.
63. McNicoll F, Muller M.et al. Distinct 3'-untranslated region elements regulate stage-specific mRNA
accumulation and translation in Leishmania. J Biol Chem. 2005;280(42):35238–35246. PubMed PMID:
16115874.
64. Blum B, Bakalara N.et al. A model for RNA editing in kinetoplastid mitochondria: "guide" RNA molecules
transcribed from maxicircle DNA provide the edited information. Cell. 1990;60(2):189–198. PubMed
PMID: 1688737.
65. Sturm NR, Simpson L. Kinetoplast DNA minicircles encode guide RNAs for editing of cytochrome oxidase
subunit III mRNA. Cell. 1990;61(5):879–884. PubMed PMID: 1693097.
66. Bellofatto V, Cross GA. Expression of a bacterial gene in a trypanosomatid protozoan. Science.
1989;244(4909):1167–1169. PubMed PMID: 2499047.
67. Kapler GM, Coburn CM.et al. Stable transfection of the human parasite Leishmania major delineates a 30kilobase region sufficient for extrachromosomal replication and expression. Mol Cell Biol. 1990;10(3):
1084–1094. PubMed PMID: 2304458.
68. Laban A, Tobin JF.et al. Stable expression of the bacterial neor gene in Leishmania enriettii. Nature.
1990;343(6258):572–574. PubMed PMID: 2300209.
69. Laban A, Wirth DF. Transfection of Leishmania enriettii and expression of chloramphenicol
acetyltransferase gene. Proc Natl Acad Sci U S A. 1989;86(23):9119–9123. PubMed PMID: 2594753.
70. LeBowitz JH, Coburn CM.et al. Development of a stable Leishmania expression vector and application to
the study of parasite surface antigen genes. Proc Natl Acad Sci U S A. 1990;87(24):9736–9740. PubMed
PMID: 2124701.
71. Clayton CE. Genetic manipulation of kinetoplastida. Parasitol Today. 1999;15(9):372–378. PubMed PMID:
10461166.
72. Clayton CE. Life without transcriptional control? From fly to man and back again. EMBO J. 2002;21(8):
1881–1888. PubMed PMID: 11953307.
73. Ghedin E, Charest H.et al. Inducible expression of suicide genes in Leishmania donovani amastigotes. J
Biol Chem. 1998;273(36):22997–23003. PubMed PMID: 9722523.
74. Ryan KA, Dasgupta S.et al. Shuttle cosmid vectors for the trypanosomatid parasite Leishmania. Gene.
1993;131(1):145–150. PubMed PMID: 8370535.
75. Ryan KA, Garraway LA.et al. Isolation of virulence genes directing surface glycosyl-phosphatidylinositol
synthesis by functional complementation of Leishmania. Proc Natl Acad Sci U S A. 1993;90(18):8609–
8613. PubMed PMID: 8378337.
76. Pedrosa AL, Cruz AK. The effect of location and direction of an episomal gene on the restoration of a
phenotype by functional complementation in Leishmania. Mol Biochem Parasitol. 2002;122(2):141–148.
PubMed PMID: 12106868.
77. Cruz A, Beverley SM. Gene replacement in parasitic protozoa. Nature. 1990;348(6297):171–173. PubMed
PMID: 2234081.
78. Misslitz A, Mottram JC.et al. Targeted integration into a rRNA locus results in uniform and high level
expression of transgenes in Leishmania amastigotes. Mol Biochem Parasitol. 2000;107(2):251–261. PubMed
PMID: 10779601.
79. Papadopoulou B, Dumas C. Parameters controlling the rate of gene targeting frequency in the protozoan
parasite Leishmania. Nucleic Acids Res. 1997;25(21):4278–4286. PubMed PMID: 9336458.
80. Cruz A, Coburn CM.et al. Double targeted gene replacement for creating null mutants. Proc Natl Acad Sci
U S A. 1991;88(16):7170–7174. PubMed PMID: 1651496.
16
Bioinformatics in Tropical Disease Research
81. Curotto de Lafaille MA, Wirth DF. Creation of Null/+ mutants of the alpha-tubulin gene in Leishmania
enriettii by gene cluster deletion. J Biol Chem. 1992;267(33):23839–23846. PubMed PMID: 1429722.
82. McKean PG, Denny PW.et al. Phenotypic changes associated with deletion and overexpression of a stageregulated gene family in Leishmania. Cell Microbiol. 2001;3(8):511–523. PubMed PMID: 11488813.
83. Wirtz E, Clayton C. Inducible gene expression in trypanosomes mediated by a prokaryotic repressor.
Science. 1995;268(5214):1179–1183. PubMed PMID: 7761835.
84. Yan S, Martinez-Calvillo S.et al. A low-background inducible promoter system in Leishmania donovani.
Mol Biochem Parasitol. 2002;119(2):217–223. PubMed PMID: 11814573.
85. Giaever G, Chu AM.et al. Functional profiling of the Saccharomyces cerevisiae genome. Nature.
2002;418(6896):387–391. PubMed PMID: 12140549.
86. Winzeler EA, Shoemaker DD.et al. Functional characterization of the S. cerevisiae genome by gene deletion
and parallel analysis. Science. 1999;285(5429):901–906. PubMed PMID: 10436161.
87. Augusto MJ, Squina FM.et al. Specificity of modified Drosophila mariner transposons in the identification
of Leishmania genes. Exp Parasitol. 2004;108(3-4):109–113. PubMed PMID: 15582507.
88. Beverley SM, Akopyants NS.et al. Putting the Leishmania genome to work: functional genomics by
transposon trapping and expression profiling. Philos Trans R Soc Lond B Biol Sci. 2002;357(1417):47–53.
PubMed PMID: 11839181.
89. Robinson KA, Beverley SM. Improvements in transfection efficiency and tests of RNA interference (RNAi)
approaches in the protozoan parasite Leishmania. Mol Biochem Parasitol. 2003;128(2):217–228. PubMed
PMID: 12742588.
90. Fitzpatrick JM, Johnston DA.et al. An oligonucleotide microarray for transcriptome analysis of
Schistosoma mansoni and its application/use to investigate gender-associated gene expression. Mol
Biochem Parasitol. 2005;141(1):1–13. PubMed PMID: 15811522.
91. Llinas M, Bozdech Z.et al. Comparative whole genome transcriptome analysis of three Plasmodium
falciparum strains. Nucleic Acids Res. 2006;34(4):1166–1173. PubMed PMID: 16493140.
92. Radke JR, Behnke MS.et al. The transcriptome of Toxoplasma gondii. BMC Biol. 2005;3:26. PubMed
PMID: 16324218.
93. Almeida R, Gilmartin BJ.et al. Expression profiling of the Leishmania life cycle: cDNA arrays identify
developmentally regulated genes present but not annotated in the genome. Mol Biochem Parasitol.
2004;136(1):87–100. PubMed PMID: 15138070.
94. Saxena A, Worthey EA.et al. Evaluation of differential gene expression in Leishmania major Friedlin
procyclics and metacyclics using DNA microarray analysis. Mol Biochem Parasitol. 2003;129(1):103–114.
PubMed PMID: 12798511.
95. Brems S, Guilbride DL.et al. The transcriptomes of Trypanosoma brucei Lister 427 and TREU927
bloodstream and procyclic trypomastigotes. Mol Biochem Parasitol. 2005;139(2):163–172. PubMed PMID:
15664651.
96. Diehl S, Diehl F.et al. Analysis of stage-specific gene expression in the bloodstream and the procyclic form
of Trypanosoma brucei using a genomic DNA-microarray. Mol Biochem Parasitol. 2002;123(2):115–123.
PubMed PMID: 12270627.
97. Akopyants NS, Matlib RS.et al. Expression profiling using random genomic DNA microarrays identifies
differentially expressed genes associated with three major developmental stages of the protozoan parasite
Leishmania major. Mol Biochem Parasitol. 2004;136(1):71–86. PubMed PMID: 15138069.
98. Holzer TR, McMaster WR.et al. Expression profiling by whole-genome interspecies microarray
hybridization reveals differential gene expression in procyclic promastigotes, lesion-derived amastigotes,
and axenic amastigotes in Leishmania mexicana. Mol Biochem Parasitol. 2006;146(2):198–218. PubMed
PMID: 16430978.
99. Anderson L, Seilhamer J. A comparison of selected mRNA and protein abundances in human liver.
Electrophoresis. 1997;18(3-4):533–537. PubMed PMID: 9150937.
100. Gygi SP, Aebersold R. Absolute quantitation of 2-D protein spots. Methods Mol Biol. 1999;112:417–421.
PubMed PMID: 10027266.
Leishmania Genomics: Where Do We Stand?
17
101. Wasinger VC, Cordwell SJ.et al. Progress with gene-product mapping of the Mollicutes: Mycoplasma
genitalium. Electrophoresis. 1995;16(7):1090–1094. PubMed PMID: 7498152.
102. Wilkins MR, Sanchez JC.et al. Progress with proteome projects: why all proteins expressed by a genome
should be identified and how to do it. Biotechnol Genet Eng Rev. 1996;13:19–50. PubMed PMID: 8948108.
103. O'Farrell PH. High resolution two-dimensional electrophoresis of proteins. J Biol Chem. 1975;250(10):
4007–4021. PubMed PMID: 236308.
104. Hillenkamp F, Karas M.et al. Matrix-assisted laser desorption/ionization mass spectrometry of
biopolymers. Anal Chem. 1991;63(24):1193A–1203A. PubMed PMID: 1897719.
105. Karas M, Hillenkamp F. Laser desorption ionization of proteins with molecular masses exceeding 10,000
daltons. Anal Chem. 1988;60(20):2299–2301. PubMed PMID: 3239801.
106. Fenn JB, Mann M.et al. Electrospray ionization for mass spectrometry of large biomolecules. Science.
1989;246(4926):64–71. PubMed PMID: 2675315.
107. Smith RD, Loo JA.et al. New developments in biochemical mass spectrometry: electrospray ionization.
Anal Chem. 1990;62(9):882–899. PubMed PMID: 2194402.
108. El Fakhry Y, Ouellette M.et al. A proteomic approach to identify developmentally regulated proteins in
Leishmania infantum. Proteomics. 2002;2(8):1007–1017. PubMed PMID: 12203896.
109. Bente M, Harder S.et al. Developmentally induced changes of the proteome in the protozoan parasite
Leishmania donovani. Proteomics. 2003;3(9):1811–1829. PubMed PMID: 12973740.
110. Nugent PG, Karsani SA.et al. Proteomic analysis of Leishmania mexicana differentiation. Mol Biochem
Parasitol. 2004;136(1):51–62. PubMed PMID: 15138067.
111. Drummelsmith J, Brochu V.et al. Proteome mapping of the protozoan parasite Leishmania and application
to the study of drug targets and resistance mechanisms. Mol Cell Proteomics. 2003;2(3):146–155. PubMed
PMID: 12644573.
112. Atwood JA, Weatherly DB.et al. The Trypanosoma cruzi proteome. Science. 2005;309(5733):473–476.
PubMed PMID: 16020736.
113. Broadhead R, Dawe HR.et al. Flagellar motility is required for the viability of the bloodstream
trypanosome. Nature. 2006;440(7081):224–227. PubMed PMID: 16525475.
114. Fairlamb AH, Bowman IB. Cell disruption and subcellular fractionation of Trypanosoma brucei. Trans R
Soc Trop Med Hyg. 1974;68(4):275. PubMed PMID: 4417067.
115. Rodrigues CO, Scott DA.et al. Presence of a vacuolar H+-pyrophosphatase in promastigotes of Leishmania
donovani and its localization to a different compartment from the vacuolar H+-ATPase. Biochem J.
1999;340(Pt 3):759–766. PubMed PMID: 10359662.
116. Gygi SP, Rist B.et al. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags.
Nat Biotechnol. 1999;17(10):994–999. PubMed PMID: 10504701.
117. Paba J, Ricart CA.et al. Proteomic analysis of Trypanosoma cruzi developmental stages using isotope-coded
affinity tag reagents. J Proteome Res. 2004;3(3):517–524. PubMed PMID: 15253433.
118. Guina T, Wu M.et al. Proteomic analysis of Pseudomonas aeruginosa grown under magnesium limitation. J
Am Soc Mass Spectrom. 2003;14(7):742–751. PubMed PMID: 12837596.
119. Schmidt F, Donahoe S.et al. Complementary analysis of the Mycobacterium tuberculosis proteome by twodimensional electrophoresis and isotope-coded affinity tag technology. Mol Cell Proteomics. 2004;3(1):24–
42. PubMed PMID: 14557599.
120. El-Sayed NM, Myler PJ.et al. Comparative genomics of trypanosomatid parasitic protozoa. Science.
2005;309(5733):404–409. PubMed PMID: 16020724.
121. Yao C, Donelson JE.et al. The major surface protease (MSP or GP63) of Leishmania sp. Biosynthesis,
regulation of expression, and function. Mol Biochem Parasitol. 2003;132(1):1–16. PubMed PMID:
14563532.
122. Ramos CS, Franco FA.et al. Characterisation of a new Leishmania META gene and genomic analysis of the
META cluster. FEMS Microbiol Lett. 2004;238(1):213–219. PubMed PMID: 15336424.
123. Fickett JW. Finding genes by computer: the state of the art. Trends Genet. 1996;12(8):316–320. PubMed
PMID: 8783942.
18
Bioinformatics in Tropical Disease Research
124. Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997;268(1):
78–94. PubMed PMID: 9149143.
125. Claverie JM. Computational methods for the identification of genes in vertebrate genomic sequences. Hum
Mol Genet. 1997;6(10):1735–1744. PubMed PMID: 9300666.
126. Worthey EA, Myler PJ. Protozoan genomes: gene identification and annotation. Int J Parasitol. 2005;35(5):
495–512. PubMed PMID: 15826642.
127. Stein LD. Using Perl to facilitate biological analysis. Methods Biochem Anal. 2001;43:413–449. PubMed
PMID: 11449734.
128. Bibb MJ, Findlay PR.et al. The relationship between base composition and codon usage in bacterial genes
and its use for the simple and reliable identification of protein-coding sequences. Gene. 1984;30(1-3):157–
166. PubMed PMID: 6096212.
129. Tiwari S, Ramachandran S.et al. Prediction of probable genes by Fourier analysis of genomic sequences.
Comput Appl Biosci. 1997;13(3):263–270. PubMed PMID: 9183531.
130. Fickett JW. Recognition of protein coding regions in DNA sequences. Nucleic Acids Res. 1982;10(17):
5303–5318. PubMed PMID: 7145702.
131. Delcher AL, Harmon D.et al. Improved microbial gene identification with GLIMMER. Nucleic Acids Res.
1999;27(23):4636–4641. PubMed PMID: 10556321.
132. Trouiller P, Olliaro P.et al. Drug development for neglected diseases: a deficient market and a public-health
policy failure. Lancet. 2002;359(9324):2188–2194. PubMed PMID: 12090998.