Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Bioinformation: Phylogenetic Analysis of Chloroplast Matk Gene From Zingiberaceae For Plant Dna Barcoding

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Bioinformation by Biomedical Informatics Publishing Group open access

www.bioinformation.net Hypothesis
______________________________________________________________________________

Phylogenetic analysis of chloroplast matK gene


from Zingiberaceae for plant DNA barcoding
Dhivya Selvaraj1, Rajeev Kumar Sarma1 and Ramalingam Sathishkumar 1, *
1
School of Biotechnology and Genetic Engineering, Plant Genetic Engineering Laboratory, Bharathiar University, Coimbatore, India;
Ramalingam Sathishkumar* - Email: sathishkumar_ram@hotmail.com; * Corresponding author

received May 08, 2008; revised June 18, 2008; accepted July 20, 2008; published September 08, 2008

Abstract:
MaturaseK gene (MatK) of chloroplast is highly conserved in plant systematics which is involved in Group II intron splicing. The size
of the gene is 1500 bp in length, located with in the intron of trnK. In the present study, matK gene from Zingiberaceae was taken for
the analysis of variants, parsimony site, patterns, transition/tranversion rates and phylogeny. The family of Zingiberaceae comprises
47 genera with medicinal values. The matK gene sequence have been obtained from genbank and used for the analysis. The sequence
alignments were performed by Clustal X, transition/transversion rates were predicted by MEGA and phylogenetic analyses were
carried out by PHYLIP package. The result indicates that the Zingiberaceae genus Afromonum, Alpinia, Globba, Curcuma and
Zingiber shows polyphylogeny. The overall variants between the species are 24% and transition/transversion rate is 1.54. Phylogenetic
tree was designed to identify the ideal regions that could be used for defining the inter and intera-generic relationships. From this
study it could be concluded that the matK gene is a good candidate for DNA barcoding of plant family Zingiberaceae.

Keywords: maturaseK; Zingiberaceae; transition/transversion; phylogenetic tree; consistency index; retention index; MEGA;
PHYLIP

Background:
In DNA barcoding, a short DNA sequence is used as a molecular of various plants [6]. The position of matK in the trnK gene
marker for identifying the diversity that exists among plant and was determined by comparing with a matK sequence of
animal species. An internal transcribed spacer (ITS) region of Trillium [7]. This data was used to identify molecular
nuclear ribosomal cistron is the most commonly used sequence markers, which was used for identifying species of these taxa
locus for plant molecular systematic investigations [1]. Many and also to provide the valuable information for both
chloroplast, mitochondrial and nuclear genes have been utilized for conventional and molecular plant breeding studies [8].
studying sequence variation at genus level. Among these genes
rbcL gene sequence have been analysed by various workers to The objective of this study is to evaluate generic, species
address plant systematics [2]. The matK gene of chloroplast is variation and phylogenetic relationships of Zingiberaceae
1500 bp long, located within the intron of the trnK and codes for family by using the chloroplast matK gene sequences
maturase like protein, which is involved in Group II intron available from genbank. Zingiberaceae is a family of
splicing. The two exons of the trnK gene that flank the matK were flowering plants comprising 47 genera and about 1000
lost, leaving the gene intact in the event of splicing. The gene species. This family contains many traditional medicinal
contains high substitution rates within the species and is emerging plants to cure various disorders. The study will also address
as potential candidate to study plant systematics and evolution [3]. the issues of optimal number of nucleotides essential for
A homology search for this gene indicates that the 102 aa at the exploring phylogenesis and the consequences of utilizing
carboxyl terminus are structurally related to some regions of different segments of gene as barcodes.
maturase-like polypeptide and this might be involved in splicing of
group II introns. It is another emerging gene with potential Methodology:
contribution to plant molecular systematics and evolution [4]. The Data collection
matK-trnK gene complex is commonly used for plant evolution The entire coding region of matK sequences of 101 different
studies and addresses the solution for various taxonomic levels [5]. species belonging to eight taxa of Zingiberaceae. Generic and
The matK gene has ideal size, high rate of substitution, large species information were obtained from taxonomy database of
proportion of variation at nucleic acid level at first and second National Centre for Biotechnology information (NCBI) [9].
codon position, low transition/transversion ratio and the presence
of mutationally conserved sectors. These features of matK gene are Sequence analysis
exploited to resolve family and species level relationships. The data analysis was done for the two grouped datasets. One
Polymorphism of chloroplast DNA especially trnK, matK and set includes all the plant species of Zingiberaceae for which
intergenic trnL - trnF regions has been used to study the phylogeny the sequences are available in genbank to find the interspecies
24
ΙSSN 0973-2063
Bioinformation 3(1): 24-27 (2008)
Bioinformation, an open access forum
© 2008 Biomedical Informatics Publishing Group
Bioinformation by Biomedical Informatics Publishing Group open access

www.bioinformation.net Hypothesis
______________________________________________________________________________
variation. Another dataset includes various genera of and Riedelia which shows 100% identity. The genus
Zingiberaceae like Aframomum, Alpinia, Curcuma, Globba, Pleuranthodium and Siliquamomum show 35% identity with
Hedychium and Zingiber to find intergeneric variation. Multiple higher range of variants. Group II Caulokaempferia and
sequence alignment was performed by using Clustal X, which is Globba show no variants which represent 100% identity,
offline software that performs optimum alignment for sequence. indicating absence of any variants. Taxon Rhynchanthus exist
Alignments were not complicated due to the occurrence of indels as monoclade. Group III has two branches; each one with
and were not included in data analysis [10]. Aligned sequences many sub branches. Branch I has three clades and three
were edited by using the software Bioedit (Biological sequence monoclade in which the taxa Hedychium and Hitchenia exists
alignment editor) [11]. as single clade representing 98 percent identity and 2 percent
variants. Second clade has taxa Zingiber and Scaphochlamys,
Phylogenetic analysis which are highly similar and do not show any variants and the
The basic sequence statistics including nucleotide frequencies, third clade has Curcuma and Stahlianthus showing one
transition/transversion (ns/nv) ratio and variability in different percent variants and are highly conserved. Branch II has three
regions of sequences were computed by Molecular Evolutionary clades A, B and C. A has Boesenbergia and Curcumorpha
Genetics Analysis (MEGA) [12]. The sequence data was analyzed which comes under the same clade but it shows sixty
by neighbor-joining method using NEIGHBOR program percentage of variants. The clade B possesses the genus,
Phylogeny Inference Package (PHYLIP) [13] and Unweighed Pair Cautleya and Cornukaempferia which represents 72% of
Group Mean Average (UPGMA) methods by using MEGA. variants with minimal identity and clade C has two sub clades
Distances were calculated using DNADIST program of PHYLIP. in which the taxa Haniffia and Keampferia are highly similar
Bootstrapping and decay analysis were performed by NJ plot. with no variance. Similarly the genus Hemiorchis and
Parsimony analysis and various clades were determined by Gagnepainia represents the same.
MEGA.
Analysis of individual taxon
Results and discussion: The genus Aframomum consists of 9 species and it is one of
Advancement in molecular biology and DNA sequencing the smaller genus in Zingiberaceae family. Gene sequence for
techniques has enabled to characterize the genomes of various matK is available for only 4 species in the database. The
organisms rapidly. Analyses of the DNA sequences of various species variation rate is 0.96 and the transition/transversion
species are providing valuable information about their taxonomy, range is 1.32. The species A. daneiellii and A. sceptrum shows
gene makeup and utilizations. In this study, DNA sequence the branch length of 100. Alpinia is the largest, most wide
polymorphism of the chloroplast gene matK of Zingiberaceae spread and taxonomically complex genus in Zingiberaceae
family was assessed to know the inter-specific and intra- specific with 230 species consisting of 84 species and are available in
differences. taxonomy database. The coding region for matK is available
for 10 species in the Genbank and the rest of the sequence has
Combined analysis not yet been characterized. The interspecies relationship of
Multiple sequence alignment shows that, there are variable this genus shows three percent of variations and 1.52 percent
numbers of Indels in the gene matK. The alignment of matK gene of transition/transversion ratio. The phylogenetic tree consists
of combined nucleotide sequence shows 497 variable sites and 251 of 14 informative sites and the overall mean distance is 0.027,
parsimony sites, the overall mean distance is 0.027. The percentage transition/tranversion ratio 1.054. The genus Curcuma has 37
of variants ranges from 0.38 to 8.85. The combined tree show three species and most of the species has potential medicinal value.
groups and they are as follows: Group I has three clusters A, B This genus shows 0.77% of variance and transition/
and C Figure 2. The cluster A has two clades, taxa Alpinia with 10 transversion rate is 0.445 %.
species and Plagiostachys with one clade (a), shows 100% identity
and clade (b) consisting of Elingera and Vanoverberghia shows Like Alpinia, Globba is another largest genus from Ginger
99% identity and the taxa Hornstedtia exist as monotaxa but it is family containing 84 species among which coding region for
more closer to clade (b). The cluster B has two clades consisting of matK is available for 26 species at present in Genbank and
Elettarriopsis, Paramomum and Afarmomum. Aframomum is the they are distributed from eastern Himalayas to South China
largest African genus of the Zingiberaceae family that contains and from Indochina to Malaysia. Phylogeny for 26 species
about 70 species. They are found in tropical forests and Savannahs. was studied using MEGA. The analysis shows that the tree
Renealmia is grouped into the same clade in the taxon. has 81 most parsimonious site (length = 110) as shown in
Elettarriopsis and Paramomum have similar features and are Figure 2. The consistency index is 0.789474, the retention
highly conserved, which shows 100% similarity with no variants. index is 0.913669, and the composite index is 0.813996 for all
Previous studies shows that the unique taxonomic position of the sites and parsimony-informative sites. There are total of 1291
disputed genus of Paramomum and Elettariopsis by morphological positions in the final dataset, out of which 40 were parsimony
characters of flowers. Both the genera have evolved from the core informative. Among all genera Globba shows largest of
clade of Amomum through inflorescence and flower diversification 8.85% of variance and decrease in transition and transversion
[14]. The taxa Afarmomum and Renealmia show 98% similarity rate of 0.92.
with 2% of variants. The cluster C has a single clade, Burbidgea

25
ΙSSN 0973-2063
Bioinformation 3(1): 24-27 (2008)
Bioinformation, an open access forum
© 2008 Biomedical Informatics Publishing Group
Bioinformation by Biomedical Informatics Publishing Group open access

www.bioinformation.net Hypothesis
______________________________________________________________________________

Figure 1: Comparative sequence variation among taxa representing different taxonomic hierarchy using the genbank sequences
of matk coding region. The x-axis represents the taxon of Zingiberacea; the y-axis represents the variance and
transition/transversion ratio for the respective taxon.

Figure 2: Combined Phylogenetic Tree of Family Zingiberaceae. (Evolutionary relationships of 107 taxa were inferred using the
Maximum Parsimony method Tree. 1 out of 1054 most parsimonious trees (length = 219) is shown. The consistency index is
0.794521 (0.691781), the retention index is 0.916201 (0.916201), and the composite index is 0.727941 (0.633810) for all sites and
parsimony-informative sites .The MP tree was obtained using the Close-Neighbor-Interchange algorithm with search level in which
the initial trees were obtained with the random addition of sequences (10 replicates). All positions containing gaps and missing data
were eliminated from the dataset. There were a total of 434 positions in the final dataset, out of which 78 were parsimony
informative).
26
ΙSSN 0973-2063
Bioinformation 3(1): 24-27 (2008)
Bioinformation, an open access forum
© 2008 Biomedical Informatics Publishing Group
Bioinformation by Biomedical Informatics Publishing Group open access

www.bioinformation.net Hypothesis
______________________________________________________________________________
Hedychium consist of 37 species and phylogenic analysis was I have higher boot strap values making the evolutionary sense
carried out for 7 species whose sequence is available. The between the genus of Zingiberaceae family. Thus, from this
bootstrap consensus tree is inferred from 500 replicates taken study it can be suggested that matK gene is a good candidate
to represent the evolutionary history of the taxa. Branches for DNA barcoding of Zingiberaceae family members. It can
corresponding to partitions reproduced in less than 50% be also concluded that barcodes for distinguishing the
bootstrap replicates which are collapsed. There are total of zingiberaceae family members could be selected from the
1180 positions in the final dataset, out of which 4 were nucleotide positions between 115 to 130, 680 to 690 and 1455
parsimony informative. The evolutionary relationship of eight to 1465 of the matK gene.
taxa is as follows, the most parsimonious trees represent the
branch length of 30. The consistency Index is 0.727273, the References:
retention index is 0.750000, and the composite index is [01] W. John-kress and J. Kenneth, Proceedings of National
0.675000 for all the sites and parsimony-informative sites. Academy of Sciences, 8369: 837 (2005) [PMID:
There are total of 1547 positions in the final dataset, out of 15928076]
which 8 were parsimony informative. [02] M. W. Chase et al., Annals of the Missouri Botanic
Garden, 80: 528 (1993)
Genus Zingiber contains 26 species, coding region of matK is [03] C. Notredame, Journal of Molecular Biology, 205: 217
available for 8 species, and it shows 7.44 percent of intra- (2000) [PMID: 10964570]
generic variation with 1.76 transitions and transversion rates. Z. [04] W. Khidir and L. Hongping, American Journal of
officinale is commonly known as ginger and is closely related Botany, 830 (1997)
with Zinger gramineum. The consistency index is 0.590361, [05] M. Ito and A. Kawamoto, Journal of Plant Research,
the retention index is 0.750000 and the composite index is 207: 216 (1999)
0.624724 for all sites and parsimony informative sites. There [06] K. Wolfe, Proceedings of the National Academy of
are total of 1478 positions in the final dataset, out of which 118 Science, 9054: 9058 (1987) [PMID:3480529]
were parsimony informative which is shown in Table 1 (see [07] K. Osaloo and F. Utech, Journal of Plant Research, 35:
supplementary material) and Figure 2. 49 (1999)
[08] L. Pedersen, Plant Systematics and Evolution, 239: 258
Conclusion: (2004)
Phylogenetic analysis complements and often outperforms [09] www.ncbi.nlm.nih.gov/GenBank
similarity searches, identifying variants, patterns and [10] J. Thompson, Nucleic Acid Research, 22: 4673 (1994)
transition/transversion rate in nucleotide sequence, when [PMID: 7984417]
addressing sequence identity, especially the reference database [11] www.mbio.ncsu.edu/BioEdit
does hold high matches in the matK gene. A portable software [12] S. Kumar and K. Tamura, Briefings in Bioinformatics,
Molecular Evolutionary Genetics Analysis (MEGA) 150: 163 (2004) [PMID: 15260895]
framework for qualified identification of nucleotide sequences [13] J. Felsenstein, Evolution, 39: 783 (1985)
of Zingiberaceae family is provided with inter and intra species [14] B. Efron and E. Halloran, National Academy of
relationship. From the combined tree analysis shows that group Sciences, 13429: 34 (1996) [PMID: 8917608]

Edited by P. Kangueane
Citation: Selvaraj et al., Bioinformation 3(1): 24-27 (2008)
License statement: This is an open-access article, which permits unrestricted use, distribution, and reproduction in
any medium, for non-commercial purposes, provided the original author and source are credited.

Supplementary material
Genus % of Variance No. of Overall distance Transition/Transversion
parsimony site mean ratio
Aframomum 0.96 3 0.04 1.32
Alpinia 3.00 26 0.09 1.733
Curcuma 0.77 18 0.02 0.445
Elettariopsis 0.59 0 0.03 1.54
Globba 8.85 46 0.012 0.929
Hedychium 0.38 4 0.02 0.941
Kaempferia 1.67 4 0.09 0.591
Zingiber 7.44 8 0.06 1.476
Table 1: Transition/transversion ratios of the 8 taxa from Genbank

27
ΙSSN 0973-2063
Bioinformation 3(1): 24-27 (2008)
Bioinformation, an open access forum
© 2008 Biomedical Informatics Publishing Group

You might also like