Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Research Article Genome-wide identification and comparative analysis of MYB transcription factor family in rice and Arabidopsis Amit Katiyar1,2, Shuchi Smita1,2, Sangram Keshari Lenka1,3, Ravi Rajwanshi1,4, Viswanathan Chinnusamy5, and Kailash Chander Bansal1,2* 1 National Research Centre on Plant Biotechnology, Indian Agricultural Research Institute, New Delhi-110012, India 2 National Bureau of Plant Genetic Resources, Indian Agricultural Research Institute Campus, New Delhi-110012, India 3 Department of Biology, University of Massachusetts, Amherst, MA 01003 4 Department of Biotechnology, Assam University, Silchar, Assam-788011, India 5 Division of Plant Physiology, Indian Agricultural Research Institute, New Delhi-110012, India * Corresponding author Email: Amit Katiyar: dr.amitkatiyar@gmail.com; Shuchi Smita: shuchi2803@gmail.com; Sangram Keshari Lenka: keshari2u@gmail.com; Ravi Rajwanshi: rrajwanshi@gmail.com; Viswanathan Chinnusamy: viswa_iari@hotmail.com; K.C. Bansal: kailashbansal@hotmail.com Phone: 91-11-25843554; Tele Fax: 91-11-25843554 1 Abstract Background: The MYB gene family comprises one of the richest groups of transcription factors in plants. Plant MYB proteins are classified into three groups namely R2R3, R1R2R3, and MYB-related proteins, on the basis of the number and position of the MYB repeats. MYBs are involved in plant development, secondary metabolism, hormone signal transduction, disease resistance and abiotic stress tolerance. A comparative analysis of MYB family genes in rice and Arabidopsis will help illuminate the evolution and function of MYB genes in plants. Results: A genome-wide analysis identified at least 155 and 197 MYB genes in rice and Arabidopsis, respectively. Gene structure analysis revealed that MYB family genes possess relatively more number of introns that are concentrated in the middle of the predicted genes. Intronless MYB-genes are highly conserved both in rice and Arabidopsis, indicating their structural similarity. MYB genes encoding R2R3 repeat MYB proteins retain conserved gene structure with three exons and two introns, whereas, genes encoding R1R2R3 repeat containing proteins consist of six exons and five introns. The splicing pattern of R1R2R3 MYB genes is similar among Arabidopsis MYBs. However in case of rice only few R1R2R3 MYB members (50%) are adopted similar splicing pattern. Consensus motif analysis in 1kb upstream region of MYB gene ORFs led to the identification of conserved and over-represented cis-motifs in both the monocot and dicot model plant species. Expression analysis by real-time RT-PCR showed that several members of MYBs are up-regulated by various abiotic stresses in both rice and Arabidopsis. 2 Conclusion: A comprehensive genomic analysis of chromosomal distribution, tandem repeats and phylogenetic relationship of MYB family genes in rice and Arabidopsis suggest their evolution via duplication. Genome-wide comparative analysis of MYB genes and their expression analysis revealed their potential function in development and stress response of plants. Background Transcription factors are essential regulators of gene transcription, an important step in regulation of gene expression. Transcription factors are usually modular, and consist of at least two domains namely a DNA-binding and an activation/repression domain, that function together to regulate transcription of target genes [1]. Based on the DNA binding domain, transcription factors have been classified in to different families. The MYB (myeloblastosis) transcription factor family is present in all eukaryotes. "Oncogene" v-MYB was the first MYB gene identified in avian myeloblastosis virus [2]. Three v-MYB-related genes namely c-MYB, a-MYB and b-MYB were subsequently identified in many vertebrates and implicated in the regulation of cell proliferation, differentiation, and apoptosis [3]. Homologous genes were also identified in insects, fungi and slime molds [4]. The Zea mays C1 gene involved in anthocyanin biosynthesis was the first regulatory gene to be characterized in plants having similarity to the mammalian transcription factor c-MYB [5]. Interestingly, plants encode large number of MYB genes as compared to fungi or animals [6]. In Arabidopsis, MYB family is one of the largest families of TFs with 198 members [7, 8, 9, 10]. Generally MYB proteins contain a MYB DNA-binding domain. The MYB domain is approximately 52 amino acid residues in length and forms a helixturn-helix fold with three regularly spaced tryptophan residues [11]. The three-dimensional structure of the MYB domain shows that the DNA recognition site α-helix interacts with the 3 DNA major groove [11]. However, amino acid sequences outside the MYB domain in MYB proteins are highly divergent. MYB transcription factors can be classified into three groups based on the number of adjacent repeats referred to as R1, R2 and R3. In animals, the MYB DNAbinding domain is characterized by an R1R2R3-type MYB domain, while in plants, the R2R3type MYB domain is more prevalent [4, 7, 12]. The plant R2R3-MYB genes probably evolved from an R1R2R3-MYB gene progenitor through loss of R1 repeat sequence. Alternatively, it might have evolved through duplication of R1 repeat from an R1-MYB gene [13, 14]. In plants, MYB transcription factors play a key role in plant development, secondary metabolism, hormone signal transduction, disease resistance and abiotic stress tolerance [15, 16]. Several R2R3 MYB proteins are involved in regulating responses to environmental stresses such as drought, salt, and cold [17, 9]. Transgenic rice over expressing OsMYB3R-2 exhibited enhanced cold tolerance as well as increased cell mitotic index [18]. Enhanced freezing stress tolerance was observed in Arabidopsis over expressing OsMYB4 [19, 10]. Arabidopsis MYB96, an R2R3-type MYB transcription factor, regulates drought stress response by integrating ABA and auxin signals [20]. Transgenic Arabidopsis expressing MYB15 exhibited hypersensitivity to exogenous ABA and improved tolerance to drought and salt stresses [21], but reduced tolerance to freezing stress [17]. A R2R3 type MYB transcription factor is involved in the cold regulation of CBF genes and in acquired freezing tolerance. Other functions of MYBs include control of cellular morphogenesis, regulation of secondary metabolism, meristem formation and the cell cycle regulation [22, 23, 24, 25, 12]. Recent studies have shown that the MYB genes are posttranscriptionally regulated by microRNAs; for instance AtMYB33, AtMYB35, AtMYB65 and AtMYB101 genes involved in anther or pollen development are targeted by miR159 family [26, 27]. 4 MYB TFs family genes have been identified in a number of monocot and dicot families [9] and evolutionary relationship between rice and Arabidopsis MYB proteins have been reported [28]. We report here the identification of 155 and 197 MYB transcription factor family genes in rice and Arabidopsis, respectively and their classification based on the number of MYB domain repeats. In addition to previous study, here we provide information on MYB response against abiotic stress and their expression in plant tissue. To map the evolutionary relationship among different MYB family members, phylogenetic trees were constructed for both rice and Arabidopsis MYB proteins. Several over- represented cis-regulatory motifs in the promoter region of MYB genes were identified. These cis-elemets might play a regulatory role in transcriptional regulation of MYB expression. Results and Discussion Identification, classification and structural analysis of MYB family members To identify MYB transcription factor family genes in rice (Oryza sativa L.) and Arabidopsis thaliana, the genome sequences of rice and Arabidopsis from “The Institute of Genomic Research (MSU)” and “The Arabidopsis Information Resource (TAIR)” database, respectively, were analyzed. We searched and obtained genes annotated as MYB in MSU (release 5) and TAIR (release 8) by using in-house PERL script along with careful manual inspection. The primary search disclosed 161 and 199 members annotated as “MYB” or “MYB-related genes” in MSU and TAIR database, respectively. We observed that some protein members lack of MYB-DNA binding domain but still annotated as MYB protein family in MSU and TAIR database and we annotated these proteins as MYB associated because they may interact with MYB proteins in transcriptional complexes. For instance, we discarded proteins having BTB (LOC_Os02g16000 5 and LOC_Os06g31100); Response_reg (LOC_Os04g28130, LOC_Os05g32890 and LOC_Os06g43910) and WD40 domain (LOC_Os03g26870) in rice, while ELM2 domain (AT2G03470 and AT4G11400) in Arabidopsis. Previously accepted names were assigned to each MYB gene and designated from 1 to 197 in Arabidopsis. AtMYB0 name was accepted for the first identified R2R3 MYB gene in Arabidopsis, subsequently, AtMYB1 name was assigned to the first identified R2R3-type MYB gene [29, 30, 31, 28]. Some of the MYB genes of Arabidpsis are uncharacterized, and they are denoted here in by TAIR locus id. In case of rice, most of the MYB genes are not characterized. Hence, we adopted MYB names from Arabidopsis and named based on their homologues and designated from 1 to 155. The gene identifiers assigned to each OsMYB and AtMYB genes to avoid confusion when multiple names are used for same gene. The pseudomolecule position, alternative protein name and best homologue hit for each OsMYB and AtMYB genes in rice and Arabidopsis genomes are summarized in Additional file 1, Table S1. Transcription factor family of MYB contains a structurally conserved MYB-DNA binding domain usually 52 amino acid in length that adopts a helix-turn-helix conformation to interact with major groove of the target DNA. To characterize the genes encoding MYB transcription factors, a search for pfam-domains and NCBI-conserved domains were performed for both rice and Arabidopsis [32]. Four distinct groups such as “MYB-related genes”, “MYB-R2R3”, “MYB-R1R2R3”, and “Atypical MYB genes” were obtained based on the number of adjacent repeats in MYB proteins. The final catalogue of MYB genes include 62, 88, 4 and 1 members from rice and 52, 138, 5 and 2 members from Arabidopsis in MYB-related, R2R3, R1R2R3 and atypical MYB groups, respectively. Proteins with two and three MYB repeats cluster into group “MYB-R2R3” and “MYB-R1R2R3”, respectively. The MYB-R2R3 subfamily consists of 55.48 and 69.54% of MYB genes in rice and Arabidopsis, respectively (Figure 1a, b). R2R3-type MYB 6 transcription factors contain conserved MYB DNA-binding domain towards N-terminal, while activation or repression domain generally placed near the C terminus. Five members in Arabidopsis and four members in rice contain three MYB repeats and therefore, clustered into group “MYB-R1R2R3”. Previous study revealed that R2R3 MYB repeats are homolog of R2R3 repeats of R1R2R3 family members in both rice and Arabidopsis. Genes encoding 3R-MYB proteins found to be functionally redundant in higher plants and their regulatory role in cell cycle control have been reported [33, 25]. The category “MYB-related genes” usually but not always contain a single MYB domain known as MYB-1R [14, 34, 28]. Plant “MYB-related genes” represented 33.54 (52 genes) and 31.47% (62 genes) in rice and Arabidopsis, respectively (Figure 1a, b). Thus, MYB-related genes constitute the second largest group of MYB proteins in both rice and Arabidopsis. MYB-related genes can be further classified in to several subclasses such as CCA1 (Circadian Clock Associated1) and LHY (Late Elongated Hypocotyl), TRY (triptychon) and CPC (Caprice) [14, 35, 12]. These subclasses are evolved from R2R3-type MYB genes that regulate cellular morphogenesis [36, 37, 38, 39]. We also identified one MYB protein in rice and two MYB proteins in Arabidopsis that contain more than three MYB repeats and belong to 4R-MYB group. The AT1G09770 and LOC_Os07g04700 4R-MYBs are CDC5-type protein, whereas AT3G18100 of Arabidopsis is annotated as 4R-type (Table 1; Additional file 2, Table S2). The 4R-MYB proteins belong to smallest class, which contain R1/R2-like repeats and found in several plant species. To further understand the nature of MYB proteins, their physiochemical properties were also analyzed. The MYB proteins were found to be similar in term of grand average of hydropathy (GRAVY). Kyte and Doolittle [40], proposed that higher average hydropathy score of protein indicates the physiochemical property of integral membrane protein, while negative score indicates soluble nature of protein. We observed that all MYB 7 proteins in rice and Arabidopsis, except ATG35516 had a negative GRAVY score, suggesting that MYBs are soluble proteins, a trait necessary for transcription factors. Minimum and maximum score of GRAVY were recorded as -1.287 (LOC_Os02g47744) and -0.178 (LOC_Os08g37970) in rice, and -1.359 (AT5G41020) and 0.612 (ATG35516) in Arabidopsis, respectively. Isoelectric point (pI) is another important property of proteins. Here, we calculated average pI value for all MYB-1R, R2R3 and R1R2R3 protein families which were 7.55, 6.90 and 7.25 in rice and 7.55, 6.89 and 6.80 in Arabidopsis, respectively. The average molecular weight of MYB-1R, R2R3 and R1R2R3 protein families are 31.128, 34.561 and 72.52 kDa in rice, and 34.186, 35.875 and 86.217 kDa in Arabidopsis, respectively (Additional file 2, Table S2). Functional classification of MYB transcription factors MYB proteins perform wide diversity of functions in plants. A Complete list of functional assignment of MYB genes is given in Additional file 3, Table S3. The R2R3-type MYB proteins are involved in plant specific processes such as control of secondary metabolism or cellular morphogenesis [41-47]. MYB-like proteins with MYB-1R domain (e.g. MYBST1 or StMYB1R1) have also been expanded in plants and can act as transcriptional activators [48]. CCA1 and LHY belong to the single MYB domain (MYB-1R) containing proteins [49, 50]. In Arabidopsis, CCA1 protein binds to a region of the Lhcb1*3 promoter and mediates phytochrome responsive transcription. In transgenic plants, antisense suppression of CCA1 reduced the phytochrome induction of the Lhcb1*3 gene suggesting that CCA1 acts as transcription activator [49]. CCA1 and LHY proteins also bind to the consensus sequence of plant telomeric DNA (TTTAGGG) and modulate transcription [51, 49, 7, 50, 52]. TTAGGG binding factor 1 (TBFs) functions as transcriptional activators in yeast, plants and human [53]. 8 Gene ontology (GO) analysis showed that all MYB members except few encode transcription factors, and they regulate the expression of target genes either independently or together with cofactors. Some members of the R2R3-type MYB sub-family proteins function as transcriptional repressor, while other act as transcriptional activators. For instance, AtMYB86 (AtMYB4/AT5G26660) acts as transcriptional repressor. The AtMYB86 expression is down regulated by exposure to UV-B light, indicating that derepression of its target genes is an important mechanism for acclimation to UV-B in Arabidopsis [54, 55]. The AtMYB34 (AT5G60890), a R2R3-type MYB protein, has catalytic-kinase as well as transcription activator activities [56, 57]. The AtMYB34 is involved in defense response against insects [58]. AtMYB23 (AT5G40330) is involved in protein binding (i.e. interaction with GL3) as well as DNA-binding to regulate transcription [59]. In case of rice, OsMYB1i (Os01g62660) protein acts as signal transducer activity (GO: 0004871) as well as transcription activator. The gene ontology (GO) of MYB proteins illustrated that 98.70% OsMYB and 98.47% AtMYB were fully involved in transcription activation, while rest of the MYB proteins were also involved in other activities such as kinase activity, protein binding and transcription repressor activity, etc. The subcellular localization of MYB proteins were also predicted using various bioinformatics tools. Comparative analysis of outcome generated by different prediction tools revealed that 98.70% OsMYB and 96.95% AtMYB proteins contain a nuclear localization signal (NLS) in their Nterminal region. The remaining members of MYB proteins are predicted to localize in mitochondria, plasma membrane and cytoplasmic organelles. The predicted locations of the MYB proteins were also verified by gene ontology under keyword “GO cellular component” and from the published literature (Additional file 4, Table S4). The similar functions of MYB genes in both rice and Arabidopsis indicate pathway conservation during evolution. 9 Gene structure and intron distribution To understand the structural components of MYB genes, their exons and introns organization were analyzed. We observed that 17 (10.96%) OsMYB and 9 (4.56%) AtMYB genes were intronless (Figure 2), which is in conformity with previous analysis [60]. To identify conserved intronless MYB genes between rice and Arabidopsis, local blast (BLASTP) was performed between protein sequence of all the predicted intronless genes of rice and Arabidopsis, and vice versa. Expected cut-off value of E-6 or less was used to identify the conserved intronless genes. We found that 13 (76.47%) and 7 (77.77%) intronless OsMYB and AtMYB genes, respectively, were orthologs among the intronless MYB genes. Other intronless MYB genes that fulfilled the matching criteria, expected cut-off value of E-10 or less were referred to as paralogs. We observed that 4 (23.52%) and 2 (22.22%) intronless OsMYB and AtMYB genes, respectively, were paralogs (Additional file 5, Table S5). This analysis showed that intronless genes of rice and Arabidopsis are highly conserved, and may be involved in similar functions in these plants [34, 60]. Assessment of molecular function revealed that intronless MYB genes were involved in regulatory functions. The MYB-1R protein (LOC_Os01g62660) is predicted to be involved in signal transduction activities (GO: 0004871) as well as transcriptional regulation. To explore the intron density in MYB genes with introns, we divided genomic region into three zones namely Nterminal, Mid-terminal and C-terminal. We observed that mid region have high density of introns i.e. 43.99 and 50.63% in rice and Arabidopsis, respectfully. The number of introns in the ORFs varied in rice and Arabidopsis, with maximum of 12 and 15 introns in OsMYB4R1 (Os07g04700) and AtMYB2j (AT2G47210) respectively. OsMYB1f (Os01g43180) in rice and AtMYB3e (AT3G10585) in Arabidopsis contain shortest introns with 37 and 43nt, respectively. Among all MYB genes, OsMYB8b (Os08g25799) of rice and AtMYB8 (AT1G35515) of Arabidopsis contain 10 maximum length of intron with 5116 and 1621nt, respectively (Additional file 6, Table S6). In order to gain insight into exons-introns architecture, the introns position on MYB domains were investigated. In the R2R3-MYB proteins, MYB domains are present on N-terminal. The Cterminal domains are highly variable and required for telomeric DNA binding in vitro. Previous study reported that majority of R2R3 domains containing MYB genes in rice and Arabidopsis have a conserved splicing pattern of three exons and two introns [13] In this study, we also noticed that a large number of rice (26.45%) and Arabidopsis (38.57%) R2R3-type domain containing proteins have a conserved splicing pattern with three exons and two introns. However, some R2R3-type MYB genes lack one intron either in R2 or R3 repeat in rice (23.22%) and Arabidopsis (25.88%) (Figure 3). It was proposed that the duplication of R2 in an early form of two repeat MYB proteins gave rise to the R1R2R3 MYB domains [14]. Hence, we also investigated the exon-intron structure of R1R2R3-type MYB proteins. We observed that 3RMYB proteins contain conserved three exons-two introns pattern in R1 and R2 and one conserved intron in R3 repeat in Arabidopsis. Similarly, in rice three out of five 3R-MYB genes have similar structure (Figure 4; Additional file 6, Table S6). These results indicate uniform distribution of introns on MYB domain in both rice and Arabidopsis. Chromosomal distribution, tandem repeats and duplication The position of all 155 OsMYB genes and 197 AtMYB genes were determined on chromosome pseudomolecules available at MSU (release 5) and TAIR (release 8) for rice and Arabidopsis, respectively (Figure 5 & 6). The distribution and density of the MYB genes on chromosomes were not uniform. Some chromosomes and chromosomal regions have high density of the MYB genes than other regions. Rice chromosome 1 and Arabidopsis chromosome 5 contained highest 11 density of MYB genes, i.e. 21.93 and 28.93%, respectively. Conversely, chromosome 11 of rice and chromosome 2 of Arabidopsis contained lowest density of MYB genes, i.e. 2.58 and 12.69%, respectively. Distribution of MYB genes on chromosomes revealed that lower arm of chromosomes are rich in MYB genes, i.e. 65.16% in rice and 52.79% in Arabidopsis. Distribution also revealed that chromosome 5 in rice, while chromosome 2 and 5 in Arabidopsis contained higher number of MYB proteins with introns, i.e. 29.41 and 33.33%, respectively. Intronless MYB genes are absent in chromosome 4, 9, 10, 11 and 12 in rice, and chromosome 1 in Arabidopsis (Figure 2). Distribution of MYB genes on chromosomal loci revealed that 11 (7.09%) in rice and 20 (10.15%) genes in Arabidopsis were found in tandem repeats suggesting local duplication (Table 2). Tandem duplication is one of the most common mechanisms for expansion of gene families in plants. Chromosome 6 in rice and chromosome 1 in Arabidopsis contained higher number of tandem repeats, i.e. 7 genes and showed over-representation of MYB genes. Three direct tandem repeats were found on chromosome 6 (LOC_Os06g07640; LOC_Os06g07650; LOC_Os06g07660) in rice, and chromosome 1 (AT1G66370, AT1G66380; AT1G66390) as well as chromosome 5 (AT5G40330; AT5G40350; AT5G40360) in Arabidopsis. Four direct tandem repeats were also observed on chromosome 3 (AT3G10580, AT3G10585, AT3G10590 and AT3G10595) in Arabidopsis. Manual inspection unraveled 44 (28.38 %) and 69 (35.02%) homologous pair of MYB genes in rice and Arabidopsis, respectively evolved due to segmental duplication. We also noticed that two pairs in Arabidopsis contained one MYB gene and other that was not classified as MYB gene in TAIR (release 10) databases (Table 3). About 44 (28.39%) OsMYB and 69 (35.02%) AtMYB genes showed homology with multiple genes including MYB gene from various locations on different chromosomes. It is widely accepted that redundant duplicated genes will be lost from the genome due to random 12 mutation and loss of function, but except when neo-or sub-functionalisations occur [61, 62]. Rabinowicz et al. (1999) suggested that gene duplications in R2R3-type MYB family occurred earlier period of land plants [63]. Recently a range of duplicated pair of MYB genes in R2R3type protein family have been identified in maize [64]. A detailed study of members of one of these groups may illustrate the mechanisms of the evolutionary divergence in R2R3 MYB genes. Among the tandem repeat pair (AT2G26950 and AT2G26960) in Arabidopsis, AtMYB104 (AT2G26950) is down-regulated by ABA, anoxia and cold stress, but up-regulated under drought, high temperature and salt, while AtMYB81 (AT2G26960) expression pattern was opposite to that of AtMYB104, i.e., AtMYB81 is up-regulated in response to ABA, anoxia and cold stress, but downregulated under drought, high temperature and salt stresses. Similar diversification was also observed in the duplicate pair (LOC_Os10g33810 and LOC_Os02g41510) in rice. OsMYB15 (LOC_Os10g33810) expressed in leaf, while OsMYB13-1 (LOC_Os02g41510) expressed in shoot and panicle tissue. These spatial and temporal differences among genes evolved by duplication indicate their functional diversification. Cis-motifs in the MYB gene promoter Detection of regulatory cis-elements in the promoter regions is essential to understand the spatial and temporal expression pattern of MYB genes. We discovered over represented cis-motif consensus pattern in 1 kb upstream sequence from translational initiation codon of MYB genes in both rice and Arabidopsis using the Multiple Expectation maximization for Motif Elicitation (MEME-version 4.1.0) analysis tool [65]. This program was used to search best 10 cis-motif consensus patterns of 8-12 bases width, with E-values ≤ 1, only on the forward strand of the input sequences. The identified motifs can then be related to the functions of known promoter 13 motifs in PLACE database [66]. Significant motifs identified with their position are shown in Figure 7 a-j. Of the ten detected motifs, four motifs were previously known motifs namely CCA1 (TBWYTTYTTTTT) and CGCG (GSCGCGCGMGCG) in rice, and ABRE (CCACGYGS) and MYB (GCSAGGTAGGGG) in Arabidopsis [67, 68]. In this study, we did not find any common motif between rice and Arabidopsis MYB promoter regions, indicating divergence in regulatory region of MYB genes in monocot and dicot species. Motif CCA1 (TBWYTTYTTTTT) in rice was found to be conserved across 82.58% 1kb upstream, and hence was considered as an overrepresented motif in MYB promoter sequences. Motif CCA1 is a MYB-related transcription factor binding site, which is involved in the phytochrome regulation of an Arabidopsis Lhcb gene. To investigate the distribution and occurrence rareness of the selected CCA1 motif in the complete genome of rice, the set of randomly generated sequences from 1kb upstream region, introns, coding DNA sequences and intergenic regions were used to search the perfect match of the target consensus motifs using PERL script. The CCA1 motif identified in the original upstream sequence was also identified in the set of shuffled sequences, thus point out to the common feature of rice genome. The CCA1 motif was also identified as a common motif of rice genome in our previous study [69]. Motifs CGCG (GSCGCGCGMGCG) in rice and ABRE (CCACGYGS) [67] and MYB (GCSAGGTAGGGG) [68] in Arabidopsis were found only in few of the MYB genes. Multilevel consensus sequence, PLACE representation, motif width description are given in Table 4; Additional file 7, Table S7. 14 Expression of MYB genes under abiotic stresses To identify MYB genes that have a potential role in abiotic stress response of plants, we analyzed the expression pattern of MYB genes in response to abiotic stresses. Expression of MYBs genes was examined from the availability of full-length cDNA (FL-cDNA) and Expressed Sequence Tag (EST) available at MSU and dbEST databases for rice and Arabidopsis, respectively [70]. It was found that 109 OsMYB genes in rice and 157 AtMYB genes in Arabidopsis had one or more representative ESTs. OsMYBS3-2 (LOC_Os10g41200) gene in rice and AtMYB5m (AT5G47390) gene in Arabidopsis had maximum number of ESTs, that is, 219 and 44, respectively. About 70% of rice MYB genes and 80% of Arabidopsis MYB genes appears to be highly expressed as evident from the availability of ESTs for these genes (Additional file 8, Table S8). The EST based expression profile was obtained from various organ or tissue libraries to identify organ or tissue-specific expression of MYB genes in rice and Arabidopsis. Further, we assessed the expression levels of MYB genes under various abiotic stresses by using publically available microarray data, PlantQTL-GE [71], and GENEVESTIGATOR [72, 73] database. As of June, 2006, PlantQTL-GE contained 1558 known genes, 3633 microarray data entries, 883598 ESTs, 21523 genetic markers, as well as 58687 annotated genes for rice. The exploration of PlantQTL-GE for rice MYBs showed that 14 (9.03%) OsMYB genes were up-regulated under cold, drought and salt stress in rice, of which 10 are up-regulated under drought condition (Additional file 9, Table S9). We also analyzed publically available microarray experiment E-MEXP-2401 [74] at ArrayExpress to identify MYB genes that are stress regulated in rice CV. Nagina-22 (N22) and IR64 under normal and drought conditions. We found that 142 (92.26%) MYB genes were differentially expressed under drought as compared with normal conditions in drought tolerant rice variety Nagina 22 (N22) 15 and drought susceptible variety IR64 (Additional file 13, Figure S1). This suggested that majority of MYB genes may have a role in drought and other abiotic stress tolerance. For instance, over-expression of R2R3 (Os.14823.1.S1_s_at; LOC_Os03g20090) MYB gene resulted in enhanced drought and salt tolerance [75]. Additionally R2R3 type MYB protein (Os.10172.1.S1_at; LOC_Os02g41510) and (OsAffx.3135.1.S1_at; LOC_Os03g04900) were implicated in drought stress tolerance [24]. The tools of GENEVESTIGATOR provide expression data from public repositories such as ArrayExpress [76] and GEO [77]. We observed that 44.67, 41.12 and 56.85% AtMYB genes were down regulated and 47.21, 50.76 and 35.02% AtMYB genes were up regulated in cold, drought and salt stress, respectively (Additional file 14, Figure S2a, b and c). The heat map of MYB genes expressed under abiotic stress was created by expression profiler (Additional file 15, Figure S3). No expression record was found for 8.12% AtMYB genes under cold, drought and salt stress in TAIR GENEVESTIGATOR database (Additional file 9, Table S9). To validate the expression data of OsMYB and AtMYB genes obtained from publically available microarray data, plantQTL-GE, and GENEVESTIGATOR database, we analyzed expression patterns of 60 OsMYB and 21 AtMYB genes using QRT-PCR (Additional file 10 & 11, Table S10 & S11). We performed phylogenetic analysis for all MYB genes in rice and Arabidopsis and selected one gene from each cluster. Out of the 60 genes examined by QRT-PCR, four (6.66%) OsMYB genes were up-regulated (≥ 1.5 fold change) and 37 (61.66%) OsMYB genes were down-regulated (≤ 0.5 fold changes) under drought stress in rice cv N22 (Additional file 16, Figure S4 a, b). We also found that OsMYB60-1 (LOC_Os11g03440) was highly up-regulated (2.03 fold change), indicating its potential role in drought stress (Additional file 11, Table S11). QRT-PCR analysis of 21 MYB genes in Arabidopsis revealed that 10 (47.61%) AtMYB genes (AT1G09770, AT5G35550, AT5G62320, 16 AT4G17785, AT3G10760, AT5G18240, AT5G11050, AT5G10280, AT5G47390, and AT3G24310) were up-regulated (≥ 1.5 fold changes) and 9 AtMYB genes (AT1G18570, AT1G74080, AT3G28910, AT3G29020, AT1G56650, AT1G22640, AT5G49330, AT4G09450, and AT4G38620) were down-regulated (≤ 0.5 fold changes) under drought stress (Additional file 17, Figure S5). Tissue-specific expression The set of cDNA libraries used to generate the expressed sequence tags (ESTs) [78, 79] build framework for a preliminary analysis of plant gene expression [80]. The availability of significant collections of expressed sequence tags from Arabidopsis thaliana and rice (Oryza sativa) genome allow analyzing expression profiles for plant tissues and genes. In rice, a tissue breakdown of EST evidence for the gene models is available through the Rice Gene Expression Anatomy Viewer at MSU database [81, 82]. In case of Arabidopsis, tissue-specific expressions of MYB genes were obtained from GENEVESTIGATOR tool [73]. The database contains expression data from a high diversity of experiments covering different tissues such as root tip, suspension cells, sheath, phloem, anther, seedling, endosperm, immature seed, pistil, flower, whole plant, shoot, leaf and panicle in both rice and Arabidopsis (Additional file 12, Table S12). The frequency of MYB ESTs in a given gene model can be queried on a tissue basis. Previous study reported that the abundance of EST tags for many genes varies according to the tissue of origin of the cDNA library [83, 84, 85, 86, 87]. In this study, the set of MYB genes that are highly expressed in certain tissue were identified by exploring EST libraries under various developmental stages available at MSU and TAIR database. The results showed that large numbers of OsMYB genes (32.90%) were highly expressed in the panicle, leaf and shoots of rice 17 (Additional file 18, Figure S6). To quickly identify a set of genes that are highly expressed in certain tissue, frequency of candidate EST in library was calculated. For instance higher frequency of ESTs for OsMYB2c, OsMYB8a, OsMYB77-1, OsMYB44-2, OsMYB4-3, OsMYB133, OsMYBS3-2 and OsMYB1d genes in flower, anther, endosperm, pistil, shoot, panicle, immature seed and whole plant with respective expression frequency of 0.015, 0.095, 0.046, 0.01, 0.017, 0.011, 0.051 and 0.02, were found indicating their functional role in the respective organs. In case of leaves, we observed that 3 MYB genes i.e. OsMYB48, OsMYB6g and OsMYBS3-2 showed maximum and equal expression levels. A similar analysis was performed in Arabidopsis to identify genes expressed in various tissues and the expression AtMYB genes were measured on the log2 scale. The following MYB genes expressed at very high levels: AtMYBCDC5 in callus (12.84) and seed (11.79); AtMYB1d in seedling (11.85) and stem (12.31); AtMYB1o in root (12.8) and root tip (12.44); AtMYB1g in flower (10.88), AtMYB91 in shoot (12.4), and AtMYB44 in pedicel (11.64) and leaves (12.33). MYB genes play important role in cell cycle progression in root tip and root growth. Rongmin et al. (2005) reported expression of wheat MYB genes in various tissues [88]. TaMYB1 showed high expression in root, sheath and leaf; TaMYB2, TaMYB3, TaMYB4 and TaMYB6 expressed at high level in root and leaf, but at low level in sheath, while TaMYB5 expression was highest in root than in sheath and leaf. Our study revealed that AtMYB44 and OsMYB48 showed high sequence similarity with TaMYB1 and TaMYB2, respectively and expressed highly in leaf as in case of wheat. This kind of analysis will be useful in selecting candidate genes for functional validation of their role in a specific tissue. 18 Evolutionary relationship To understand the evolutionary relationship of MYB family, phylogenetic trees were constructed using the multiple sequence alignment [89, 90] of MYB proteins. In MYB phylogenetic study, we omitted the hyper variable C-terminal domains. Here we used COBALT multiple sequence alignment tool [91], which automatically utilize information about bona fide proteins (i.e. MYB domains in this case) to execute multiple sequence alignment and build phylogenetic tree. The tree revealed that identified tandem repeat and homologues pairs were grouped together into single clade with very strong bootstrap support (Additional file 19, Figure S7). These results further support gene duplication in rice and Arabidopsis during evolution. It was also noticed that several members from “homologues pairs” (e.g. AT5G16600- AT3G02940 in Arabidopsis; LOC_Os12g07610- LOC_Os12g07640 in rice) and “tandem repeat pairs” (e.g. AT3G12720AT3G12730 in Arabidopsis; LOC_Os06g14700- LOC_Os06g14710 in rice) found in distinct clade, indicating that only few members had common ancestral origin that existed before the divergence of monocot and dicot. MYB proteins from both rice and Arabidopsis with similar domains (e.g. R2R3) were grouped into single clade. This result suggests that significant expansion of R2R3-type MYB genes in plant occurred before the divergence of monocots and dicots. This conclusion is in agreement with the previous studies [4, 63]. Conclusions MYB family is the largest family of transcription factors that play versatile regulatory roles in plants. Previous studies have given insight about the key roles played by MYB genes in regulating different plant traits. Our study provides genome-wide comparative analysis of MYB TF family between a dicot (Arabidopsis) and monocot (rice) plants. We provide here gene 19 organization, sequence diversity and expression pattern of 155 OsMYB genes of rice and 197 AtMYB genes of Arabidopsis. Structural analysis revealed that introns are highly distributed in the central region of the gene and R2R3-type MYB proteins usually have 2 introns at conserved positions. Introns distribution on domain and multiple sequence alignment of domains suggest that MYB domains were originally compact in size with introns inserted and the splice sites are conserved during evolution. In-silico analysis revealed that most of the MYB genes are present as duplicate genes across the genome in both rice and Arabidopsis. Phylogenetic analysis of rice and Arabidopsis MYB proteins provided useful information on their conserved features. Expression analysis identified MYB genes that express in different tissues at various developmental stages and under a range of abiotic stresses. Methods Identification of MYB gene family in rice and Arabidopsis We obtained the rice and Arabidopsis MYB gene list from MYB-transcription factor family genes which was built based on (http://rice.plantbiology.msu.edu/) and MSU (The Institute of Genomic Research) TAIR (The Arabidopsis Information Resource) (http://www.arabidopsis.org/) genome release, respectively. MSU (release 5) of rice and TAIR (release 8) of Arabidopsis pseudomolecules contained 155 and 197 MYB genes, respectively. MYB annotation To identify number of domains present in MYB protein we executed domain search by Conserved Domains Database (CDD) (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) and pfam database (http://pfam.sanger.ac.uk/) with both local and global search strategy and 20 expectation cut off (E value) 1.0 was set as the threshold for significance. Only significant domain found in rice and Arabidopsis MYB protein sequence were considered as a valid domain. To get more information about nature of the MYB protein, grand average of hydropathy (GRAVY), PI and the molecular weight were predicted by ProtParam tool available on Expert Protein Analysis System (ExPASy) proteomics server (http://www.expasy.ch/tools/protparam.html). The subcellular localization of MYB proteins were predicted by Protein Localization Server (PLOC) (http://www.genome.jp/SIT/plocdir/), Subcellular Localization Prediction of Eukaryotic Proteins (SubLoc V 1.0) (http://www.bioinfo.tsinghua.edu.cn/SubLoc/eu_predict.htm), SVM based server ESLpred (http://www.imtech.res.in/raghava/eslpred/submit.html), and ProtComp 9.0 server (http://linux1.softberry.com/berry.phtml?topic=protcomppl&group=programs&subgroup=proloc ). MYB protein function in term of their Gene Ontology (GO) was predicted by GO annotation search page available at MSU (http://rice.plantbiology.msu.edu/downloads_gad.shtml) and TAIR (http://www.arabidopsis.org/tools/bulk/go/index.jsp) for rice and Arabidopsis, respectively. Identification of over-represented motifs To identify the additional conserved motif Multiple Expectation Maximization Elicitation (MEME) analysis tool (version 4.1.0) was used on Linux platform (http://meme.sdsc.edu/meme/meme-intro.html) with the following parameters; number of repetition, any; maximum number of motif, 10; optimum width of motif, ≥8 and ≤10. Motifs graph were plotted according to their position within the region using WebLogo tool (http://weblogo.berkeley.edu/logo.cgi). Discovered motifs were analyzed using PLACE (http://www.dna.affrc.go.jp/PLACE/). Shuffled sequences were generated by randomly taking 21 1kb upstream sequence using “Sequence Manipulation Suit” (http://www.bioinformatics.org/sms2/shuffle_dna.html). Phylogenetic analysis To generate the phylogenetic trees of MYB transcription factor family genes, multiple sequence alignment of MYB protein sequence were performed using COBALT program (http://www.ncbi.nlm.nih.gov/tools/cobalt/). The dendrogram were constructed with the following parameters; method-fast minimum evolution, max sequence difference-0.85, distancegrishin (protein). MYB localization, tandem repeat and duplication To map the gene loci on rice and Arabidopsis chromosomes pseudomolecules were used in MapChart (version 2.2) program for rice and chromosome map tool for Arabidopsis available on The Arabidopsis Information Resource (TAIR) database [92] (http://www.arabidopsis.org/jsp/ChromosomeMap/tool.jsp). Tandem repeats were identified by manual visualization of rice and Arabidopsis physical map. Duplication or homologous pair genes were obtained by the segmental genome duplication segment (http://rice.plantbiology.msu.edu/segmental_dup/) and Arabidopsis Syntenic Pairs / Annotation Viewer (http://synteny.cnr.berkeley.edu/AtCNS/) in rice (distance = 500kb) and Arabidopsis, respectively. The tandem repeat and homologous pairs were aligned with the BLAST 2 SEQUENCE tool available on National Center on Biotechnology Information (NCBI) (http://blast.ncbi.nlm.nih.gov/Blast.cgi/). 22 Gene structure analysis To know more about intron / exon structure, MYB coding sequence (CDS) were aligned with their corresponding genomic sequences using spidey tool available on NCBI (http://www.ncbi.nlm.nih.gov/spidey/). To identify conserved intronless genes between rice and local Arabidopsis, protein blast (BLASTP) (http://www.molbiol.ox.ac.uk/analysis_tools/BLAST/BLAST_blastall.shtml) was performed for protein sequences of all predicted intronless genes in rice against all predicted intronless gene in Arabidopsis, and vice versa. Hits with E ≤ 6 were treated as conserved intronless genes and hits with E ≥ 10 were treated as paralogs. Expression analysis Expression support for each gene model is explored through gene expression evidence search page (http://rice.plantbiology.msu.edu/locus_expression_evidence.shtml) available at MSU for rice and GENEVESTIGATOR tool (https://www.genevestigator.com/) for Arabidopsis. MYB genes for which no ESTs were found, blast (BLASTP and TBLASTN) (http://blast.ncbi.nlm.nih.gov/Blast.cgi) search using NCBI databases was performed. Significant similarity of MYB genes with MYB genes of other plant species was searched. To measure the MYB expression level in abiotic (http://www.scbit.org/qtl2gene/new/) stress for plant rice and QTLGE database was GENEVESTIGATOR used tool (https://www.genevestigator.com/) for Arabidopsis. To identify tissue specific expression level of OsMYB genes in rice, highly expressed gene search (http://Rice.plantbiology.msu.edu/tissue.expression.shtml) available at MSU were used. For 23 Arabidopsis, GENEVESTIGATOR tool (https://www.genevestigator.com/gv/user/gvLogin.jsp) was used. Plant materials and growth conditions The plant materials used were drought tolerant rice (Oryza sativa L. subsp. Indica) cv. Nagina 22 and Arabidopsis thaliana ecotype Columbia. The seeds were surface sterilized. Rice seeds were placed on absorbent cotton, which was soaked overnight in water and kept in medium size plastic trays. Arabidopsis seeds were germinated on MS-agar medium containing 1% Sucrose and seven days old seedlings were transferred to soilrite for further growth. The rice and Arabidopsis seedlings were grown in a greenhouse under the photoperiod of 16/8 h light/dark cycle at 280C ± 1 and 230C ± 1, respectively. Drought stress treatment Drought was imposed to 3-weeks old rice seedlings [93] and 5-week-old Arabidopsis plants by withholding water till visible leaf rolling was observed. Control plants were irrigated with sufficient water. Plant water status was quantified by measuring relative water content of leaf. Control plants showed 96.89 and 97.49% RWC, while stressed plants showed 64.86 and 65.2% RWC in rice and Arabidopsis, respectively. Real-Time RT-PCR Total RNA from rice and Arabidopsis were isolated by TRIzol Reagent (Ambion) and treated with DNase (QIAGEN, GmbH). The first strand cDNA of rice and Arabidopsis was synthesized using Superscript III Kit (Invitrogen) from 1 g of total RNA according to manufacturer’s protocol. Reverse transcription reaction was carried out at 44°C for 60 min followed by 92°C for 24 10 min. Five ng of cDNA was used as template in a 20 L RT reaction mixture. 63 pairs of rice and 51 pairs of Arabidopsis were used to study expression of MYB transcription factor. Gene specific primers were design using IDT PrimerQuest (http://www.idtdna.com/scitools/applications/primerquest/default.aspx). Ubiquitin and actin primers were used as an internal control in rice and Arabidopsis, respectively. The primer combinations used here for real-time RT-PCR analysis specifically amplified only one desired band. The dissociation curve testing was carried out for each primer pair showing only one melting temperature. The RT-PCR reactions were carried out at 95°C for 5 min followed by 40 cycles of 95°C for 15s and 60°C for 30s each by the method described previously by Dai et al., 2007 [24]. For qRT-PCR, QuantiFast SYBR Green PCR master mix (QIAGEN GmbH) was used according to manufacturer’s instruction. The threshold cycles (CT) of each test target were averaged for triplicate reactions, and the values were normalized according to the CT of the control products (Os-actin or Ubiquitin) in case of rice and Arabidopsis, respectively. MYB TFs expression data were normalized by subtracting the mean reference gene CT value from individual CT values of corresponding target genes ( CT). The fold change value was calculated using the expression, where CT represents difference between the CT condition of interest and CT control. The primer sets used to study the MYB TFs expression profile are given in the Additional file 10, Table S10. Abbreviations MSU, Michigan State University; TAIR, The Arabidopsis Information Resource; PERL, Practical Extraction and Report Language; GO, gene ontology; BLAST, basic local alignment search tool; MEME, multiple expectation maximization for motif elicitation; EST, expressed 25 sequence tag; NCBI, National Center for Biotechnology Information; GEO, gene expression omnibus; QRT-PCR, quantitative reverse transcription PCR. Authors’ contributions AK performed all the Bioinformatics experiments, analysis the data and drafted the manuscript; SS helped in bioinformatics experiments, data mining and management; SKL conceived the idea of identification of MYB TF’s and designed the study; RR carried out all the wet-lab experiments; VC and KCB guided in the design of the study and drafting the manuscript. All authors read and approved the final manuscript. Acknowledgements We thank Indian Council of Agricultural Research (ICAR) for supporting this work through the ICAR-sponsored Network Project on Transgenics in Crops (NPTC) and National Initiative on Climate Resilient Agriculture (NICRA). SKL gratefully acknowledge University Grants Commission (UGC) and Council of Scientific and Industrial Research (CSIR) for CSIR-UGC Junior and Senior Research Fellowship grant. We thank Cathie Martin, John Innes Centre, Norwich Research Park, Colney, Norwich, UK, for her valuable suggestions on the data analysis and manuscript. References 1. Ptashne M: How eukaryotic transcriptional activators work. Nature 1988, 335:683-689. 26 2. Klempnauer KH, Gonda TJ, Bishop JM: Nucleotide sequence of the retroviral leukemia gene v-myb and its cellular progenitor c-MYB: the architecture of a transduced oncogene. Cell 1982, 31:453–463. 3. Weston K: Myb proteins in life, death and differentiation. Curr. Opin. Genet. Dev. 1998, 8:76–81. 4. Lipsick J S: One billion years of Myb. Oncogene 1996, 13:223–235. 5. Paz-Ares J, Ghosal D, Wienand U, Peterson P, Saedler H: The regulatory c1 locus of Zea mays encodes a protein with homology to MYB oncogene products and with structural similarities to transcriptional activators. EMBO J. 1987, 6:3553–3558. 6. Riechmann JL, Heard J, Martin G, Reuber L, Jiang C, Keddie J, Adam L, Pineda O, Ratcliffe OJ, Samaha RR, Creelman R, Pilgrim M, Broun P, Zhang JZ, Ghandehari D, Sherman BK, Yu G: Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science 2000, 290:2105–2110. 7. Martin C, Paz-Ares J: MYB transcription factors in plants. Trends Genet. 1997, 13:67–73. 8. Kranz H, Scholz K, Weisshaar B: c-MYB oncogene-like genes encoding three MYB repeats occur in all major plant lineage. Plant J. 2000, 21:231–235. 9. Yanhui C, Xiaoyuan Y, Kun H, Meihua L, Jigang L, Zhaofeng G, Zhiqiang L, Yunfei Z, Xiaoxiao W, Xiaoming Q, Yunping S, Li Z, Xiaohui D, Jingchu L, Xing-Wang D, Zhangliang C, Hongya G, Li-Jia Q: The MYB transcription factor superfamily of 27 Arabidopsis: expression analysis and phylogenetic comparison with the rice MYB family. Plant Mol. Biol. 2006, 60:107–124. 10. Pasquali G, Biricolti S, Locatelli F, Baldoni E, Mattana M: OsMYB4 expression improves adaptive responses to drought and cold stress in transgenic apples. Plant Cell Rep. 2008, 27:1677–1686. 11. Ogata K, Morikawa S, Nakamura H, Sekikawa A, Inoue T, Kanai H, Sarai A, Ishii S, Nishimura Y: Solution structure of a specific DNA complex of the MYB DNA-binding domain with cooperative recognition helices. Cell 1994, 79:639-648. 12. Dubos C, Stracke R, Grotewold E, Weisshaar B, Martin C, Lepiniec L: MYB transcription factors in Arabidopsis. Trends Plant Sci. 2010, 15:1360-1385. 13. Jiang C, Gu J, Chopra S, Gu X, Peterson T: Ordered origin of the typical two- and threerepeat Myb genes. Gene 2004, 326:13-22. 14. Rosinski JA, Atchley WR: Molecular evolution of the Myb family of transcription factors: evidence for polyphyletic origin. J. Mol. Evol. 1998, 46:74-83. 15. Allan AC, Hellens RP, and Laing WA: MYB transcription factors that colour our fruit. Cell 2008, 13:99-102. 16. Cominelli E, Tonelli C: A new role for plant R2R3-MYB transcription factors in cell cycle regulation. Cell Res.2009, 19:1231-1232. 28 17. Agarwal M, Hao Y, Kapoor A, Dong CH, Fujii H, Zheng X, Zhu JK: A R2R3 type MYB transcription factor is involved in the cold regulation of CBF genes and in acquired freezing tolerance. J. Biol. Chem. 2006, 281:37636-37645. 18. Ma Q, Dai X, Xu Y, Guo J, Liu Y, Chen N, Xiao J, Zhang D, Xu Z, Zhang X, Chong K: Enhanced tolerance to chilling stress in OsMYB3R-2 transgenic rice is mediated by alteration in cell cycle and ectopic expression of stress genes. Plant Physiol. 2009, 150:244–256. 19. Vannini C, Locatelli F, Bracale M, Magnani E, Marsoni M, Osnato M, Mattana M, Baldoni E, Coraggio I: Overexpression of the rice OsMYB4 gene increases chilling and freezing tolerance of Arabidopsis thaliana plants. Plant J. 2004, 37:115–127. 20. Seo PJ, Xiang F, Qiao M, Park JY, Lee YN, Kim SG, Lee YH, Park WJ, Park CM: The MYB96 transcription factor mediates abscisic acid signaling during drought stress response in Arabidopsis. Plant Physiol. 2009, 151:275-289. 21. Ding Z, Li S, An X, Liu X, Qin H, Wang D: Transgenic expression of MYB15 confers enhanced sensitivity to abscisic acid and improved drought tolerance in Arabidopsis thaliana. Cell Res. 2008, 18:1047–1060. 22. Ito M, Araki S, Matsunaga S, Itoh T, Nishihama R, Machida Y, Doonan JH, Watanabe A: G2/M-phase-specific transcription during the plant cell cycle is mediated by c-MYBlike transcription factors. Plant Cell 2001, 13:1891–1905. 29 23. Araki S, Ito M, Soyano T, Nishihama R, Machida Y: Mitotic cyclins stimulate the activity of c-MYB-like factors for transactivation of G2/M phase-specific genes in tobacco. J. Biol. Chem. 2004, 279:32979–32988. 24. Dai X, Xu Y, Ma Q, Xu W, Wang T, Xue Y, Chong K: Overexpression of an R1R2R3 MYB Gene, OsMYB3R-2, increases tolerance to freezing, drought, and salt stress in transgenic Arabidopsis. Plant Physiol. 2007, 143:1739–1751. 25. Haga N, Kato K, Murase M, Araki S, Kubo M, Demura T, Suzuki K, Muller I, Voss U, Jurgens G, Ito M: R1R2R3-MYB proteins positively regulate cytokinesis through activation of KNOLLE transcription in Arabidopsis thaliana. Development 2007, 134:1101–1110 26. Allen RS, Li J, Stalhe MI, Dubroue A, Gubler F, Millar A: Genetic analysis reveals functional redundancy and the major target genes of the Arabidopsis miR159 family. Proc. Natl. Acad. Sci. USA 2007, 104:16371-16376. 27. Addo-Quaye C, Eshoo TW, Bartel DP, Axtell MJ: Endogenous siRNA and miRNA targets identified by sequencing of the Arabidopsis degradome. Current Biol. 2008, 18:758-762. 28. Stracke, R., Werber, M., Weisshaar B: The R2R3-MYB gene family in Arabidopsis thaliana. Curr. Opin. Plant Biol. 2001, 4:447–456. 29. Shinozaki K, Yamaguchi-Shinozaki K, Urao T, Koizumi M: Nucleotide sequence of a gene from Arabidopsis thaliana encoding a MYB homologue. Plant Mol. Biol. 1992, 19:493499. 30 30. Romero I, Fuertes A, Benito MJ, Malpical JM, Leyva A, Paz-Ares J: More than 80 R2R3MYB regulatory genes in the genome of Arabidopsis thaliana. Plant J. 1998, 14:273-284. 31. Kranz HD, Denekamp M, Greco R, Jin H, Leyva A, Meissner RC, Petroni K, Urzainqui A, Bevan M, Martin C: Towards functional characterization of the members of the R2R3MYB gene family from Arabidopsis thaliana. Plant J. 1998, 16:263-276. 32. Finn RD, Tate J, Mistry J, Coggill PC, Sammut JS, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A: The Pfam protein families database. Nucleic Acids Res. 2008, 36:D281-D288. 33. Ito M: Conservation and diversification of three-repeat MYB transcription factor in plants. J. Plant Res. 2005, 118:61-69. 34. Jin H, Martin C: Multifunctionality and diversity within the plant MYB-gene family. Plant Mol. Biol. 1999, 41:577-585. 35. Lu SX, Knowles SM, Andronis C, Ong MS and Tobin EM: CIRCADIAN CLOCK ASSOCIATED1 and LATE ELONGATED HYPOCOTYL function synergistically in the circadian clock of Arabidopsis. Plant Physiol. 2009, 150: 834–843. 36. Simon M: Distinct and overlapping roles of single-repeat MYB genes in root epidennal patterning. Dev. Biol. 2007, 311:566-578. 37. Dubos C: MYBL2 is a new regulator of flavonoid biosynthesis in Arabidopsis thaliana. Plant J. 2008, 55:940-953. 31 38. Matsui K: AtMYBL2, a protein with a single MYB domain, act as negative regulator of anthocyanin biosynthesis in Arabidopsis. Plant J. 2008, 55:954-967. 39. Pesch M, Hulskamp M: One, two, thee. Models for trichome patterning in Arabidopsis. Curr. Opin. Plant Biol. 2009, 12:587-592. 40. Kyte J, Doolittle RF: A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 1982, 157:105-32. 41. Borevitz JO, Xia Y, Blount J, Dixon RA, and Lamb C: Activation tagging identifies a conserved MYB regulator of phenylpropanoid biosynthesis. Plant Cell 2000, 12: 23832394. 42. Jin H, Cominelli E, Bailey P, Parr A, Mehrtens F, Jones J, Tonelli C, Weisshaar B, Martin C: Transcriptional repression by AtMYB4 controls production of UV-protecting sunscreens in Arabidopsis. EMBO J. 2000, 19:6150-6161. 43. Nesi N, Jond C, Debeaujon I, Caboche M, Lepiniec L: The Arabidopsis TT2 gene encodes an R2R3 MYB domain protein that acts as a key determinant for proanthocyanidin accumulation in developing seed. Plant Cell 2001, 13:2099-2114. 44. Baudry A, Heim MA, Dubreucq B, Caboche M, Weisshaar B, Lepiniec L: TT2, TT8, and TTG1 synergistically specify the expression of BANYULS and proanthocyanidin biosynthesis in Arabidopsis thaliana. Plant J. 2004, 39:366-380. 45. Lee MM, Schiefelbein J: WEREWOLF, a MYB-related protein in Arabidopsis, is a position-dependent regulator of epidermal cell patterning. Cell 1999, 99:473-483. 32 46. Lee MM, Schiefelbein J: Developmentally distinct MYB genes encode functionally equivalent proteins in Arabidopsis. Development 2001, 12:1539-1546. 47. Higginson T, Li SF, Parish RW: AtMYB103 regulates tapetum and trichome development in Arabidopsis thaliana. Plant J. 2003, 35:177-192. 48. Baranowskij N, Frohberg C, Prat S, Willmitzer L: A novel DNA binding protein with homology to MYB oncoproteins containing only one repeat can function as a transcriptional activator. EMBO J. 1994, 13:5383–5392. 49. Wang ZY, Kenigsbuch D, Sun L, Harel E, Ong MS, Tobin EM: A MYB-related transcription factor is involved in the phytochromeregulation of an Arabidopsis Lhcb gene. Plant Cell 1997, 9:491-507. 50. Schaffer R, Ramsay N, Samach A, Corden S, Putterill J, Carre IA, Coupland G: The late elongated hypocotyl mutation of Arabidopsis disrupts circadian rhythms and the photoperiodiccontrol of flowering. Cell 1998, 93:1219-1229. 51. Green RM, Tobin EM: The Role of CCA1 and LHY in the plant circadian clock. Dev. Cell 2002, 2:516-518. 52. Jin H, Martin C: Multifunctionality and diversity within the plant MYB-gene family. Plant Mol. Biol. 1999, 41:577–585. 53. Bilaud T, Koering CE, Binet-Brasselet E, Ancelin K, Pollice A, Gasser SM, Gilson E: The telobox, a MYB related telomeric DNA binding motif found in proteins from yeast, plants and human. Nucl. Acids Res. 1996, 24:1294–1303. 33 54. Jin H, Cominelli E, Bailey P, Parr A, Mehrtens F, Jones J, Tonelli C, Weisshaar B, Martin C: Transcriptional repression by AtMYB4 controls production of UV-protecting sunscreens in Arabidopsis. EMBO J. 2000, 19:6150-61. 55. Hemm MR, Herrmann KM, Chapple C: AtMYB4: a transcription factor general in the battle against UV. Trends Plant Sci. 2001, 6:135-136. 56. Walker JC: Receptor-like protein kinase genes of Arabidopsis thaliana. Plant J.1993, 3:451-456. 57. Bender J and Fink GR: A MYB homologue, ATR1, activates tryptophan gene expression in Arabidopsis. Proc. Natl. Acad. Sci. USA 1998, 95:5655-5660. 58. Kim JH, Lee BW, Schroeder FC, Jander G: Identification of indoleglucosinolate breakdown products with antifeedant effects on Myzus persicae (green peach aphid). Plant J. 2008, 54:1015-1026. 59. Kirik V, Lee MM, Wester K, Herrmann U, Zheng Z, Oppenheimer D, Schiefelbein J, Hulskamp M: Functional diversification of MYB23 and GL1 genes in trichome morphogenesis and initiation. Development 2005, 132:1477-85. 60. Jain M, Khurana P, Tyagi AK, Khurana JP: Genome-wide analysis of intronless genes in rice and Arabidopsis. Funct. Integr. Genomics 2008, 8:69–78. 61. Blanc G, Wolfe KH: Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell 2004, 16:1679-1691. 34 62. Thomas BC, Pedersen B, Freeling M: Following tetraploidy in an Arabidopsis ancestor, genes were removed preferentially from one homeolog leaving clusters enriched in dose-sensitive genes. Genome Res. 2006, 16:934-946. 63. Rabinowicz PD, Braun EL, Wolfe AD, Bowen B, Grotewold E: Maize R2R3 MYB genes: sequence analysis reveals amplification in higher plants. Genetics 1999, 153:427–444. 64. Braun EL, Grotewold E: Diversification of the R2R3 MYB gene family and the segmental allotetraploid origin of the maize genome. Maize Genet. Coop. Newsl. 1999, 73:26–27. 65. Bailey TL, Williams N, Misleh C, Li WW: MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 2006, 34:W369–W373. 66. Higo K, Ugawa Y, Iwamoto M, Korenaga T: Plant cis-acting regulatory DNA elements (PLACE) database. Nucleic Acids Res. 1999, 27:297-300. 67. Kaplan B, Davydov O, Knight H, Galon Y, Knight MR, Fluhr R, Fromm H: Rapid transcriptome changes induced by cytosolic Ca2+ transients reveal ABRE-related sequences as Ca2+-responsive cis elements in Arabidopsis. Plant Cell. 2006, 18:2733-48. 68. Grotewold E, Drummond BJ, Bowen B, Peterson T: The myb-homologous P gene controls phlobaphene pigmentation in maize floral organs by directly activating a flavonoid biosynthetic gene subset. Cell.1994, 76:543-53. 69. Lenka S, Lohia B, Kumar A, Chinnusamy V, Bansal K: Genome-wide targeted prediction of ABA responsive genes in rice based on over-represented cis- motif in co-expressed genes. Plant Mol. Biol. 2009, 63:261-271. 35 70. Boguski MS, Lowe TM, Tolstoshev CM: dbEST--database for "expressed sequence tags". Nat. Genet.1993, 4:332-333. 71. Zeng H, Luo L, Zhang W, Zhou J, Li Z, Liu H,Zhu T, Feng X, Zhong Y: PlantQTL-GE: a database system for identifying candidate genes in rice and Arabidopsis by gene expression and QTL information. Nucleic Acids Res. 2007, 35: D879–D882. 72. Zimmermann P, Hoffmann MH, Hennig L, Gruissem W: GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiol. 2004, 136:2621– 2632. 73. Hruz T, Laule O, Szabo G, Wessendorp F, Bleuler S, Oertle L, Widmayer P, Gruissem W. Zimmermann P: Genevestigator V3: A Reference Expression Database for the MetaAnalysis of Transcriptomes. Adv. Bioinformatics 2008, 2008:420747. doi:10.1155/2008/420747/. 74. Lenka SK, Katiyar A, Chinnusamy V, Bansal KC: Comparative analysis of droughtresponsive transcriptome in Indica rice genotypes with contrasting drought tolerance. Plant Biotechnol J. 2011, 9(3):315-27. 75. Ding Z, Li S, An X, Liu X, Qin H, Wang D: Transgenic expression of MYB15 confers enhanced sensitivity to abscisic acid and improved drought tolerance in Arabidopsis thaliana. J Genet Genomics. 2009, 36: 17–29. 76. Brazma A, Parkinson H, Sarkans U, Shojatalab M, Vilo J, Abeygunawardena N, Holloway E, Kapushesky M, Kemmeren P, Lara GG: ArrayExpress—a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 2003, 31:68–71. 36 77. Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002, 30:207–210. 78. Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropolous MH, Xiao H, Merril CR, Wu A, Olde B, Moreno RF: Complementary DNA sequencing: expressed sequence tags and the human genome project. Science 1991, 252:1651-1656. 79. Matsubara K, Okubo K: Identification of new genes by systematic analysis of cDNAs and database construction. Curr. Opin. Biotechnol. 1997, 4:672-677. 80. Adams MD, Kerlavage RD, Fleischmann RA, Fuldner CJ, Bult NH, Lee EF,Kirkness KG, W einstock JD, Gocayne O, White: Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 1995, 377:3–17. 81. Yuan Q, Ouyang S, Wang A, Zhu W, Maiti R, Lin H, Hamilton J, Haas B, Sultana R, Cheung F, Wortman J, Buell CR: The institute for genomic research Osa1 rice genome annotation database. Plant Physiol. 2005, 138:18-26. 82. Ouyang S, Zhu W, Hamilton J, Lin H, Campbell M, Childs K, Thibaud-Nissen F, Malek RL, Lee Y, Zheng L, Orvis J, Haas B, Wortman J, Buell CR: The TIGR rice Ggenome annotation resource: improvements and new features. Nucleic Acids Res. 2007, 35:D883D887. 83. Uchimiya H, Kidou S-i, Shimazaki T, Aotsuka S, Takamatsu S, Nishi R, Hashimoto H, Matsubayashi Y, Kidou N, Umeda M, Kato A: Random sequencing of cDNA libraries 37 reveals a variety of expressed genes in cultured-cells of rice (Oryza sativa). Plant J. 1992, 2:1005–1009. 84. Hofte H, Desprez T, Amselem J, Chiapello H, Caboche M, Moisan A, Jourjon M, Charpenteau J, Berthomieu P, Guerrier D: An inventory of 1152 expressed sequence tags obtained by partial sequencing of cDNAs from Arabidopsis thaliana. Plant J. 1993, 4: 1051–1061. 85. Umeda M, Hara C, Matsubayashi Y, Li H, Liu Q, Tadokoro F, Aotsuka S, Uchimiya H: Expressed sequence tags from cultured-cells of rice (Oryza sativa) under stressed conditions—analysis of transcripts of genes engaged in ATP-generating pathways. Plant Mol. Biol. 1994, 25:469–478. 86. Cooke R, Raynal M, Laudie M, Grellet F, Delseny M, Morris P, Guerrier D, Giraudat J, Quigley F, Clabault G: Further progress towards a catalogue of all Arabidopsis genes: Analysis of a set of 5000 non-redundant ESTs. Plant J. 1996, 9:101–124. 87. Yamamoto K, Sasaki T: Large scale EST sequencing in rice. Plant Mol. Biol. 1997, 35: 135–144. 88. Rongmin Chen, Zhongfu Ni, Xiuling Nie, Yuxiang Qin, Guoqing Dong, Qixin Sun: Isolation and characterization of genes encoding MYB transcription factor in wheat (Triticum aestivem L.). Plant Sci. 2005, 169:1146–1154. 89. Perrière G, Gouy M: WWW-Query: An on-line retrieval system for biological sequence banks. Biochimie. 1996, 78:364-369. 38 90. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG: Clustal W and Clustal X version 2.0. Bioinformatics 2007, 23:2947-2948. 91. Papadopoulos JS, Agarwala R: COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics 2007, 23:1073–1079. 92. Poole RL: The TAIR database. Methods Mol. Biol. 2007, 406:179-212. 93. Salekdeh GH, Siopongco J, Wade LJ, Ghareyazie B, Bennett B: Proteomic analysis of rice leaves during drought stress and recovery. Proteomics 2002, 2:1131–1145. Figure legends Figure 1. Chromosome-wise distribution of different types of MYB transcription factor genes. a) rice, b) Arabidopsis Figure 2. Chromosome-wise distribution of intronless MYB genes in rice and Arabidopsis Figure 3. Intron distribution in MYB domain regions of MYB genes in rice and Arabidopsis. The graph shows dominantly two intron positions on the domain of MYB-related (Fig a, c) and MYB-R2R3 genes (Fig b, d) in rice and Arabidopsis, respectively. Figure 4. Conserved introns position on R1R2R3-type MYB domain containing proteins in rice and Arabidopsis. Vertical bar and arrow indicate conserved introns position. MSU Gene IDs in red letters represent genes with non-conserved intron position. 39 Figure 5. Distribution of OsMYB genes in rice genome. Arrow and star signs represent to tandem repeats and intronless genes, respectively. Figure 6. Distribution of AtMYB genes in Arabidopsis genome. Arrow and star signs represent tandem repeats and intronless genes, respectively. Figure 7. Conserved cis-motifs found in upstream promoter region of MYB genes in rice and Arabidopsis (Fig 7a-j). Tables Table 1. Group specific characterization and comparison of MYB transcription factor family genes based on pfam-domain, GRAVY, molecular weight and cellular localization RICE No of MYB Groups (%) GRAVY PI Molecular Weight Localization Genes Min. Max. Avg. Min. Max. Avg. Min. Max. Avg. MYB-related genes 62 40 -1.287 -0.201 -1.3875 3.99 12.26 8.125 7613.7 170921.8 89267.75 Nuclear MYB-R2R3 88 56.77 -0.906 -0.178 -0.995 4.67 10.4 7.535 21605.3 75878.9 48742.1 Nuclear MYB-R1R2R3 4 2.58 -0.691 -0.593 -0.9875 5.05 8.53 13.605 64100.1 109413.5 86756.8 Nuclear Atypical MYB genes 1 0.64 -0.748 -0.748 -0.748 9.56 9.56 9.56 92424.6 92424.6 92424.6 Nuclear ARABIDOPSIS No of MYB Groups (%) GRAVY PI Molecular Weight Localization Genes MYB-related genes 52 26.39 Min. Max. Avg. Min. Max. Avg. Min. Max. Avg. -1.359 0.612 -0.3735 4.75 6.62 2.375 7570.9 50112 3785.45 Nuclear 40 MYB-R2R3 138 70.05 -1.102 -0.471 -0.7865 4.16 10.24 7.2 27951.2 33239 13975.6 Nuclear MYB-R1R2R3 5 2.54 -0.941 -0.774 -0.8575 5.43 9.22 7.325 50032.2 158268.4 79134.2 Nuclear Atypical MYB genes 2 0.51 -0.941 -0.94 -0.9405 5.67 6.37 3.185 95766.5 96084.3 95925.4 Nuclear Table 2. Comparison of tandem repeat MYB genes in rice and Arabidopsis based on cellular localization. MYB coding sequence were aligned using BLAST 2 SEQUENCES to quantitate the sequence difference between the pair genes Tandem Repeat in rice TR_NO OsTR1 TR_OsMYB_G1 TR_OsMYB_G2 Blast 2 sequences alignment OsMYB_G1 Cellular Cellular Bit % Localization G1 Localization G2 Score Identity OsMYB_G2 E-value LOC_Os06g07640 LOC_Os06g07650 OsMYB1-3 OsMYB6a Nuclear Nuclear 75.5 55% 2.00E-18 LOC_Os06g07650 LOC_Os06g07660 OsMYB6a OsMYB6b Nuclear Nuclear 488 84% 2.00E-142 OsTR2 LOC_Os06g14700 LOC_Os06g14710 OsMYB44-6 OsMYB44-7 Nuclear Nuclear 146 64% 2.00E-40 OsTR3 LOC_Os08g05510 LOC_Os08g05520 OsMYB8a OsMYB103 Nuclear Nuclear 19.2 25% 1.60E-01 OsTR4 LOC_Os09g12750 LOC_Os09g12770 OsMYB9a OsMYB9b Nuclear Nuclear 55.8 40% 6.00E-13 OsTR5 LOC_Os12g07610 LOC_Os12g07640 OsMYB98-6 OsMYB4-5 Nuclear Nuclear 105 45% 2.00E-27 Tandem Repeat in Arabidopsis TR_NO TR_AtMYB_G1 TR_AtMYB_G2 Blast 2 sequences alignment AtMYB_G1 Cellular Cellular Bit % Localization G 1 Localization G2 Score Identity AtMYB_G2 E-value AtTR1 AT1G35515 AT1G35516 AtMYB8 AtMYB1h Nuclear Nuclear No significant similarity found AtTR2 AT1G66370 AT1G66380 AtMYB113 AtMYB114 Nuclear Nuclear 212 80% 3.00E-60 AT1G66380 AT1G66390 AtMYB114 AtMYB90 Nuclear Nuclear 220 87% 1.00E-62 AtTR3 AT1G69560 AT1G69580 AtMYB105 AtMYB1n Nuclear Nuclear 14.2 31% 5.3 AtTR4 AT2G26950 AT2G26960 AtMYB104 AtMYB81 Nuclear Nuclear 358 50% 2.00E-103 AtTR5 AT3G10580 AT3G10585 AtMYB3d AtMYB3e Nuclear Cytoplasmic 172 64% 4.00E-48 AT3G10590 AT3G10595 AtMYB3f AtMYB3g Nuclear Un-Predictable 56.6 27% 3.00E-13 41 AtTR6 AT3G12720 AT3G12730 AtMYB67 AtMYB3j Nuclear Nuclear 16.9 31% 4.40E-01 AtTR7 AT4G09450 AT4G09460 AtMYB4c AtMYB6 Cytoplasmic Nuclear 21.2 25% 1.40E-02 AtTR8 AT5G40330 AT5G40350 AtMYB23 AtMYB24 Nuclear Nuclear 142 55% 5.00E-39 AT5G40350 AT5G40360 AtMYB24 AtMYB115 Nuclear Nuclear 89.4 42% 8.00E-23 Table 3. Comparison of homologous pair of MYB genes of rice and Arabidopsis based on cellular localization. The coding sequence were aligned using BLAST 2 SEQUENCES to quantitate the sequence differences between the gene pairs Duplications in rice Blast 2 sequences alignment Cellular Localization G2 Nuclear Bit Score % Identity E-value OsMYB5b Cellular Localization G1 Nuclear 160 81% 1.00E-38 OsMYB18-4 Nuclear Nuclear 1230 79% 0.00E+00 Nuclear Nuclear 234 82% 8.00E-59 Nuclear Nuclear 188 77% 3.00E-47 Nuclear Nuclear 234 94% 2.00E-58 OsMYB86-3 Nuclear Nuclear 696 77% 0.00E+00 GAMYB OsMYB5-3 Nuclear Nuclear 298 78% 1.00E-75 LOC_Os05g38460 OsMYB3R-3 OsMYB3R-5 Nuclear Nuclear 476 74% 8.00E-124 LOC_Os01g63460 LOC_Os05g37730 OsMYB1j OsMYB5e Nuclear Nuclear 22 100% 6.80E-01 OsHP10 LOC_Os01g65370 LOC_Os05g35500 OsMYB3 OsMYB1-2 Nuclear Nuclear 636 88% 6.00E-168 OsHP11 LOC_Os02g09480 LOC_Os05g37730 OsMYB44-2 OsMYB5e Nuclear Nuclear 32 87% 7.00E-04 OsHP12 LOC_Os02g14490 LOC_Os06g35140 OsMYB2a OsMYB6f Nuclear Nuclear 548 73% 2.00E-143 OsHP13 LOC_Os02g40530 LOC_Os04g42950 OsMYB305-1 Nuclear Nuclear 284 94% 8.00E-72 OsHP14 LOC_Os02g41510 LOC_Os04g43680 OsMYB13-1 OsMYB13-3 Nuclear Nuclear 460 86% 3.00E-120 OsHP15 LOC_Os02g42870 LOC_Os04g45060 OsMYB17-2 OsMYB17-1 Nuclear Nuclear 744 77% 0.00E+00 OsHP16 LOC_Os02g45080 LOC_Os04g47890 OsMYB2d OsMYB4b Nuclear Nuclear 312 73% 6.00E-80 OsHP17 LOC_Os02g46780 LOC_Os04g50770 OsMYB58-1 OsMYB58-2 Nuclear Nuclear 620 70% 2.00E-163 HP_NO OsMYB_HP_G1 OsMYB_HP_G2 OsMYB_G1 OsMYB_G2 OsHP1 LOC_Os01g06320 LOC_Os05g07010 OsMYB1b OsHP2 LOC_Os01g18240 LOC_Os05g04820 OsMYB18-2 OsHP3 LOC_Os01g44370 LOC_Os05g50350 OsMYB1g OsHP4 LOC_Os01g47370 LOC_Os05g49240 OsMYB1h OsHP5 LOC_Os01g49160 LOC_Os05g48010 OsMYB36-2 OsHP6 LOC_Os01g50720 LOC_Os05g46610 OsMYB4-2 OsHP7 LOC_Os01g59660 LOC_Os05g41166 OsHP8 LOC_Os01g62410 OsHP9 MYB OsMYB5g MYB MYB 42 OsHP18 LOC_Os02g51799 LOC_Os06g11780 OsMYB9-1 OsMYB93 Nuclear Nuclear 442 80% 5.00E-115 OsHP19 LOC_Os02g54520 LOC_Os07g48870 OsMYB36-4 OsMYB2 Nuclear Nuclear 54 78% 1.00E-09 OsHP20 LOC_Os03g03760 LOC_Os10g39550 OsMYB3a MYB Nuclear Nuclear 136 83% 3.00E-31 OsHP21 LOC_Os03g20090 LOC_Os07g48870 OsMYB112 OsMYB2 Nuclear Nuclear 554 84% 2.00E-145 OsHP22 LOC_Os03g25550 LOC_Os07g44090 OsMYB18-3 OsMYB86-1 Nuclear Nuclear 374 88% 1.00E-96 OsHP23 LOC_Os03g26130 LOC_Os07g43580 OsMYB94-1 OsMYB30 Nuclear Nuclear 384 82% 2.00E-99 OsHP24 LOC_Os05g04820 LOC_Os07g44090 OsMYB18-4 OsMYB86-1 Nuclear Nuclear 422 83% 2.00E-109 OsHP25 LOC_Os05g10690 LOC_Os01g09640 OsMYBS3-1 MYB Nuclear Nuclear 232 83% 9.00E-58 OsHP26 LOC_Os05g49240 LOC_Os05g50340 OsMYB5g MYB Nuclear Nuclear 104 72% 4.00E-24 OsHP27 LOC_Os06g43090 LOC_Os02g09480 OsMYB77-3 OsMYB44-2 Nuclear Nuclear 616 71% 2.00E-162 OsHP28 LOC_Os06g45410 LOC_Os02g07770 OsMYB1-4 MYB Nuclear Nuclear 180 90% 1.00E-43 OsHP29 LOC_Os06g45890 LOC_Os02g07170 OsMYB6h MYB Nuclear Nuclear 98 81% 1.00E-21 OsHP30 LOC_Os07g02800 LOC_Os03g55590 OsMYB7a MYB Nuclear Nuclear 162 91% 1.00E-38 OsHP31 LOC_Os08g25799 LOC_Os09g12750 OsMYB8b OsMYB9a Nuclear Nuclear 682 80% 2.00E-180 OsHP32 LOC_Os08g25820 LOC_Os09g12770 OsMYB8c OsMYB9b Nuclear Nuclear 616 73% 2.00E-162 OsHP33 LOC_Os08g33660 LOC_Os02g36890 OsMYB16 MYB Nuclear Nuclear 134 69% 4.00E-31 OsHP34 LOC_Os08g33660 LOC_Os04g38740 OsMYB16 MYB Nuclear Nuclear 136 80% 1.00E-31 OsHP35 LOC_Os08g33940 LOC_Os09g24800 OsMYB94-2 OsMYB96-3 Nuclear Nuclear 838 76% 0.00E+00 OsHP36 LOC_Os08g43450 LOC_Os09g36250 OsMYB13-2 MYB Nuclear Nuclear 76 71% 2.00E-15 OsHP37 LOC_Os08g43550 LOC_Os09g36730 OsMYB7 OsMYB4-3 Nuclear Nuclear 502 84% 1.00E-131 OsHP38 LOC_Os09g23200 LOC_Os08g33050 OsMYB9c MYB Nuclear Nuclear 222 66% 2.00E-54 OsHP39 LOC_Os10g33810 LOC_Os02g41510 OsMYB15 OsMYB13-1 Nuclear Nuclear 374 81% 8.00E-97 OsHP40 LOC_Os10g33810 LOC_Os04g43680 OsMYB15 OsMYB13-3 Nuclear Nuclear 384 82% 2.00E-99 OsHP41 LOC_Os10g39550 LOC_Os03g03760 OsMYB10c OsMYB3a Nuclear Nuclear 384 81% 3.00E-99 OsHP42 LOC_Os11g03440 LOC_Os12g03150 OsMYB60-1 OsMYB60-3 Nuclear Nuclear 1702 96% 0.00E+00 OsHP43 LOC_Os11g47460 LOC_Os12g37970 OsMYB111-1 OsMYB111-2 Nuclear Nuclear 634 83% 2.00E-167 OsHP44 LOC_Os12g37690 LOC_Os11g45740 OsMYB78 MYB Nuclear Nuclear 226 88% 5.00E-56 Duplications in Arabidopsis Blast 2 sequences alignment Cellular Cellular % Bit HP_NO AtMYB_HP_G1 AtMYB_HP_G2 AtMYB_G1 ATMYB_G2 Localization Localization Identi E-value Score G1 G2 ty AtHP1 AT2G31180 AT1G06180 AtMYB14 AtMYB13 Nuclear Nuclear 350 84% 2.00E-100 AtHP2 AT1G57560 AT1G09540 AtMYB50 AtMYB61 Nuclear Nuclear 392 88% 7.00E-113 43 AtHP3 AT1G58220 AT1G09710 AtMYB1l Nuclear Nuclear 827 75% 0 AtMYB1g Similar to MYB TF No MYB AtHP4 AT1G26580 AT1G13880 Nuclear Nuclear 45.4 76% 4.00E-08 AtHP5 AT2G02820 AT1G14350 AtMYB88 AtMYB124 Nuclear Nuclear 728 80% 0 AtHP6 AT3G12820 AT1G16490 AtMYB10 AtMYB58 Nuclear Nuclear 293 79% 3.00E-83 AtHP7 AT1G17950 AT1G73410 AtMYB52 AtMYB54 Nuclear Nuclear 381 88% 7.00E-110 AtHP8 AT1G79180 AT1G16490 AtMYB63 AtMYB58 Nuclear Nuclear 346 84% 4.00E-99 AtHP9 AT5G61420 AT1G18570 AtMYB28 AtMYB51 Nuclear Nuclear 99 86% 1.00E-24 AtHP10 AT1G74080 AT1G18570 AtMYB122 AtMYB51 Nuclear Nuclear 305 81% 9.00E-87 AtHP11 AT5G07700 AT1G18570 AtMYB76 AtMYB51 Nuclear Nuclear 185 71% 2.00E-50 AtHP12 AT5G60890 AT1G18570 AtMYB34 AtMYB51 Nuclear Nuclear 206 77% 8.00E-57 AtHP13 AT1G74430 AT1G18710 AtMYB95 AtMYB47 Nuclear Nuclear 351 82% 7.00E-101 AtHP14 AT1G74840 AT1G19000 AtMYB1o AtMYB1d Nuclear Nuclear 233 85% 3.00E-65 AtHP15 AT1G35516 AT1G22640 AtMYB1h AtMYB3 Nuclear Nuclear No significant similarity found AtHP16 AT4G09460 AT1G22640 AtMYB6 AtMYB3 Nuclear Nuclear 394 84% 1.00E-113 AtHP17 AT1G68320 AT1G25340 AtMYB62 AtMYB116 Nuclear Nuclear 366 86% 3.00E-105 AtHP18 AT3G27810 AT1G25340 AtMYB21 AtMYB116 Nuclear Nuclear 149 70% 7.00E-40 AtHP19 AT1G68670 AT1G25550 AtMYB1m AtMYB1f Nuclear Nuclear 176 84% 8.00E-48 AtHP20 AT3G29020 AT1G26780 AtMYB110 AtMYB117 Nuclear Nuclear 232 77% 8.00E-65 AtHP21 AT1G26780 AT1G69560 AtMYB117 AtMYB105 Nuclear Nuclear 416 88% 3.00E-120 AtHP22 AT5G39700 AT1G69560 AtMYB89 AtMYB105 Nuclear Nuclear No significant similarity found AtHP23 AT5G07690 AT1G74080 AtMYB29 AtMYB122 Nuclear Nuclear 161 76% 2.00E-43 AtHP24 AT1G19510 AT1G75250 AtMYB1e AtMYB1p Nuclear Nuclear 154 80% 4.00E-42 AtHP25 AT4G36570 AT1G75250 AtMYB4d AtMYB1p Nuclear Nuclear No significant similarity found AtHP26 AT4G34990 AT2G16720 AtMYB32 AtMYB7 Nuclear Nuclear 411 85% 1.00E-118 AtHP27 AT4G37260 AT2G23290 AtMYB73 AtMYB70 Nuclear Nuclear 364 84% 1.00E-104 AtHP28 AT5G67300 AT2G23290 AtMYB44 AtMYB70 Nuclear Nuclear 171 77% 3.00E-46 AtHP29 AT5G11050 AT2G25230 AtMYB64 AtMYB100 Nuclear Nuclear 63.9 78% 1.00E-13 AtHP30 AT5G01200 AT2G38090 AtMYB5a AtMYB2f Nuclear 195 82% 1.00E-53 Mitochondri al AtHP31 AT3G55730 AT2G39880 AtMYB109 AtMYB25 Nuclear Nuclear 281 81% 2.00E-79 AtHP32 AT3G10760 AT2G40970 AtMYB3h AtMYB2h Nuclear Nuclear 235 69% 8.00E-66 AtHP33 AT5G05090 AT2G40970 AtMYB5c AtMYB2h Nuclear Nuclear 156 81% 5.00E-42 AtHP34 AT3G62610 AT2G47460 AtMYB11 AtMYB12 Nuclear Nuclear 388 86% 9.00E-112 44 AtHP35 AT5G15310 AT3G01140 AtMYB16 AtMYB106 Nuclear Nuclear 593 83% 2.00E-173 AtHP36 AT5G40350 AT3G01530 AtMYB24 AtMYB57 Nuclear Nuclear 254 81% 1.00E-71 AtHP37 AT5G16600 AT3G02940 AtMYB43 AtMYB107 Nuclear Nuclear 110 73% 7.00E-28 AtHP38 AT5G16770 AT3G02940 AtMYB9 AtMYB107 Nuclear Nuclear 586 86% 3.00E-171 AtHP39 AT3G24120 AT3G04030 AtMYB3l AtMYB3a Nuclear Nuclear 73% 86 1.00E-20 AtHP40 AT5G18240 AT3G04030 AtMYB5h AtMYB3a Nuclear Nuclear 887 80% 0 AtHP41 AT5G49620 AT3G06490 AtMYB78 AtMYB108 Nuclear Nuclear 396 83% 4.00E-114 AtHP42 AT5G02320 AT3G09370 AtMYB3R5 AtMYB3R3 Nuclear Nuclear 610 85% 4.00E-178 AtHP43 AT5G04760 AT3G10580 AtMYB5b AtMYB3d Nuclear Nuclear 105 71% 7.00E-27 AtHP44 AT5G05790 AT3G11280 AtMYB5d AtMYB3i Nuclear Nuclear 455 80% 5.00E-132 AtHP45 AT5G06100 AT3G11440 AtMYB33 AtMYB65 Nuclear Nuclear 710 78% 0 AtHP46 AT1G56160 AT3G12820 AtMYB72 AtMYB10 Nuclear Nuclear 270 81% 2.00E-76 AtHP47 AT4G13480 AT3G24310 AtMYB79 AtMYB71 Nuclear Nuclear 436 83% 2.00E-126 AtHP48 AT1G13300 AT3G25790 AtMYB1b AtMYB3m Nuclear Nuclear 250 84% 4.00E-70 AtHP49 AT5G40360 AT3G27785 AtMYB115 AtMYB118 Nuclear Nuclear 161 76% 3.00E-43 AtHP50 AT3G01530 At1g68320 AtMYB57 AtMYB62 Nuclear Nuclear 239 81% 4.00E-67 AtHP51 AT5G14750 AT3G27920 AtMYB66 AtMYB0 Nuclear Nuclear 320 80% 1.00E-91 AtHP52 AT5G40330 AT3G27920 AtMYB23 AtMYB0 Nuclear Nuclear 379 85% 2.00E-109 AtHP53 AT5G59780 AT3G46130 AtMYB59 AtMYB48 Nuclear Nuclear 237 86% 1.00E-66 AtHP54 AT5G59570 AT3G46640 AtMYB5p MYB Nuclear Nuclear 313 85% 4.00E-89 AtHP55 AT5G62470 AT3G47600 AtMYB96 AtMYB94 Nuclear Nuclear 527 88% 2.00E-153 AtHP56 AT5G65790 AT3G49690 AtMYB68 AtMYB84 Nuclear Nuclear 494 87% 2.00E-143 AtHP57 AT4G37780 AT3G49690 AtMYB87 AtMYB84 Nuclear Nuclear 246 79% 4.00E-69 AtHP58 AT4G22680 AT3G61250 AtMYB85 AtMYB17 Nuclear Nuclear 147 70% 3.00E-39 AtHP59 AT1G01520 AT4G01280 AtMYB1a AtMYB4a Nuclear Nuclear 272 83% 7.00E-77 AtHP60 AT4G21440 AT4G05100 AtMYB102 AtMYB74 Nuclear Nuclear 385 89% 1.00E-110 AtHP61 AT5G52260 AT4G25560 AtMYB19 AtMYB18 Nuclear Nuclear 407 79% 2.00E-117 AtHP62 AT5G55020 AT4G26930 AtMYB120 AtMYB97 Nuclear Nuclear 283 82% 7.00E-80 AtHP63 AT2G20400 AT4G28610 AtMYB2d No MYB Nuclear Nuclear 419 73% 7.00E-121 AtHP64 AT5G11510 AT4G32730 AtMYB3R4 AtMYB3R1 Nuclear Nuclear 329 78% 3.00E-93 AtHP65 AT3G09600 AT5G02840 AtMYB3b MYB (LCL1) Nuclear Nuclear 682 80% 0 AtHP66 AT3G10590 AT5G04760 AtMYB3f AtMYB5b Nuclear Nuclear 51.8 76% 1.00E-10 AtHP67 AT5G23650 AT5G08520 AtMYB5i AtMYB5f Nuclear Nuclear 139 72% 8.00E-37 45 AtHP68 AT5G65230 AT5G10280 AtMYB53 AtMYB92 Nuclear Nuclear 534 84% 9.00E-156 AtHP69 AT3G50060 AT5G67300 AtMYB77 AtMYB44 Nuclear Nuclear 265 82% 1.00E-74 Table 4. The multilevel consensus sequence, PLACE representation, motif width and description of non-coding regulatory regions in rice and Arabidopsis log Motif Multilevel Consensus Motif Sequence Symbols in PLACE Database No. Width Site likelihood E-Value Description (rice) ratio (Ilr) 1 [CT]C[TC]CTC[TC][TC][CT]C[TC]C YCYCTCYYYCYC 12 123 1195 4.10E-76 RRRRRGAGRRRG 12 127 1199 5.90E-68 [AG][GA][AG][AG][AG]GAG[AG][A 2 G][AG]G 3 CG[GC]CG[GC][CT]G[GC]CGG CGSCGSYGSCGG 12 51 615 3.50E-45 4 A[AG]AAAA[AT][AC][AT]AA ARAAAAWMWAA 11 127 1120 3.10E-37 AC S000149, CCA1 binding site, CCA1 protein (myb-related transcription factor) interact with two 5 T[TGC][TA][TC]TT[TC]TTTTT TBWYTTYTTTTT 12 128 1123 8.80E-33 imperfect repeats of AAMAATCT in Lhcb1*3 gene of Arabidopsis thaliana, Related to regulation by phytochrome 6 A[GA]C[AT]GC[AT]GC[AT]GC ARCWGCWGCWGC 12 51 578 4.30E-28 7 CC[GT]CC[GT]CC[TG]C[CG][CTG] CCKCCKCCKCSB 12 70 725 1.30E-28 8 [TG]AGCTAGCTAG[CG] KAGCTAGCTAGS 12 29 386 7.00E-25 9 [CG]ATC[GC]ATC[GC]ATC SATCSATCSATC 12 38 452 1.70E-19 46 AC S000501, “CGCG box" recognized by AtSR1-6 (Arabidopsis thaliana signal responsive genes), 10 G[CG]CGCGCG[CA]GCG GSCGCGCGMGCG 12 26 340 1.90E-16 Multiple CGCG elements are found in promoters of many genes log Motif Multilevel Consensus Motif Sequence Symbols in PLACE Database No. Width Site likelihood E-Value Description (Arabidopsis) ratio (Ilr) T[CT]T[TC][TC][TC]T[CT]T[TC][TC] 1 TYTYYYTYTYYY 12 166 1471 1.80E-62 [CT] 2 A[AG]A[GA]AA[AG]AAAA[AG] ARARAARAAAAR 12 188 1579 2.00E-60 3 [AG]GAGA[GA]AGAG[AG]G RGAGARAGAGRG 12 45 532 2.80E-23 TTTKKKTKKNT 11 143 1189 3.00E-01 TTT[TG][TG][TG]T[TG][TG][CGTA 4 A]T 5 GGG[CG]CGGCT GGGSCGGCT 9 8 113 1.50E+00 6 T[CG]CGA[CT]GG[TC]CC TSCGAYGGYCC 11 12 165 1.20E+01 7 CTC[TA]C[TA]CTCT[CG] CTCWCWCTCTS 11 23 274 8.10E+00 8 CAC[AC]CAC[AT]CA[CT]A CACMCACWCAYA 12 34 384 3.10E-01 S000507;"ABRE-related sequence" or “motifs" 9 CCACG[CT]G[GC] CCACGYGS 8 14 171 2.00E+02 identified in the upstream regions of 162 Ca (2+)responsive unregulated genes; Arabidopsis thaliana S000179; Core of consensus maize P (myb homolog) binding site; W=A/T; 6 bp core; Maize P gene specifies red pigmentation of kernel pericarp, cob, 10 GC[CG]AGGTAGGGG GCSAGGTAGGGG 12 3 59 3.90E+02 and other floral organs; P binds to A1 gene, but not Bz1 gene; Maize C1 (myb homolog) activates both A1 and Bz1 genes [94]; Zea mays. Additional files Additional file 1, Table S1 Title: Nomenclature of MYB TF’s 47 Description: Nomenclature and genomic position of MYB family genes in rice and Arabidopsis Additional file 2, Table S2 Title: MYB classification Description: Classification of MYB family genes and their detail annotation such as GRAVY, PI, and molecular weight and predicted subcellular localization Additional file 3, Table S3 Title: MYB molecular function Description: Annotation of MYB proteins using gene ontology in term of molecular function Additional file 4, Table S4 Title: MYB targeting Description: Subcellular localization of MYB proteins in rice and Arabidopsis Additional file 5, Table S5 Title: Sequence alignment of intronless MYB genes Description: Sequence comparison between rice and Arabidopsis intronless genes to predict conserveness Additional file 6, Table S6 Title: Introns density Description: Introns distribution on genomic region and MYB domain of rice and Arabidopsis 48 Additional file 7, Table S7 Title: Motifs finding Description: Additional conserved motifs in rice and Arabidopsis proteins predicted by MEME Additional file 8, Table S8 Title: Expression of MYB genes Description: Availability of full-length complementary DNA (FL-cDNA) / expressed sequence tag (EST) consequent to MYB genes Additional file 9, Table S9 Title: MYB expression under abiotic stress Description: Expression analysis of MYB genes under abiotic stress conditions Additional file 10, Table S10 Title: Primers for MYB gene Description: List of gene specific primers for rice and Arabidopsis Additional file 11, Table S11 Title: Expression of MYB genes under drought stress Description: QRT-PCR expression analysis of MYB transcription factor family genes in rice and Arabidopsis under drought stress 49 Additional file 12, Table S12 Title: Expression levels of MYB genes in different tissues Description: Analysis of tissue specificity in expression of MYB genes in rice and Arabidopsis Additional file 13, Figure S1 Title: MYB gene expression under drought stresses in rice Description: MYB gene expression under drought stresses in rice was obtained from publically available microarray data. Additional file 14, Figure S2 Title: MYB gene expression under abiotic stresses in Arabidopsis Description: MYB gene expression under cold (Fig a), drought (Fig b) and salt (Fig c) stress in Arabidopsis. GENEVESTIGATOR database was used to analyze the MYB gene expression levels. Additional file 15, Figure S3 Title: Heat map of MYB genes expressed under abiotic stress in Arabidopsis Description: MYB gene expression under cold, drought, and salt stress in Arabidopsis. GENEVESTIGATOR database was used to analyze the MYB gene expression levels. Heat map was created by expression profiler available at the EBI. 50 Additional file 16, Figure S4 Title: Expression profile of MYB genes under drought in rice by QRT-PCR Description: QRT-PCR expression analysis of MYB genes in rice under drought stress (Fig a-b). Additional file 17, Figure S5 Title: QRT-PCR expression analysis of MYB genes under drought in Arabidopsis. Description: QRT-PCR expression analysis of AtMYB genes under drought stress Additional file 18, Figure S6 Title: MYB expression in different tissues of rice Description: Tissue specific expression profile of MYB gene in rice examine by MSU database Additional file 19, Figure S7 Title: Phylogenetic analysis of MYB proteins Description: Phylogenetic analysis of MYB proteins in both rice and Arabidopsis. The tree was constructed by using the multiple sequence alignment of bona fide MYB proteins 51 Figure 6 Additional files provided with this submission: Additional file 1: Table S1.xls, 88K http://www.biomedcentral.com/imedia/1237906381674080/supp1.xls Additional file 2: Table S2.xls, 73K http://www.biomedcentral.com/imedia/1827841453674080/supp2.xls Additional file 3: Table S3.xls, 49K http://www.biomedcentral.com/imedia/3750304876740802/supp3.xls Additional file 4: Table S4.xls, 88K http://www.biomedcentral.com/imedia/1616826807674080/supp4.xls Additional file 5: Table S5.xls, 23K http://www.biomedcentral.com/imedia/3098841196740802/supp5.xls Additional file 6: Table S6.xls, 157K http://www.biomedcentral.com/imedia/1175121110674080/supp6.xls Additional file 7: Table S7.xls, 35K http://www.biomedcentral.com/imedia/8252255546740802/supp7.xls Additional file 8: Table S8.xls, 58K http://www.biomedcentral.com/imedia/1531354074674080/supp8.xls Additional file 9: Table S9.xls, 98K http://www.biomedcentral.com/imedia/1033086239674080/supp9.xls Additional file 10: Table S10.xls, 32K http://www.biomedcentral.com/imedia/8645340716740802/supp10.xls Additional file 11: Table S11.xls, 28K http://www.biomedcentral.com/imedia/1277441141674080/supp11.xls Additional file 12: Table S12.xls, 79K http://www.biomedcentral.com/imedia/1620164983674080/supp12.xls Additional file 13: Figure S1.tif, 14225K http://www.biomedcentral.com/imedia/1186193758674080/supp13.tiff Additional file 14: Figure S2.tif, 10106K http://www.biomedcentral.com/imedia/1440160680674080/supp14.tiff Additional file 15: Figure S3.tif, 9645K http://www.biomedcentral.com/imedia/8751481796740802/supp15.tiff Additional file 16: Figure S4.tif, 20193K http://www.biomedcentral.com/imedia/1750166082674080/supp16.tiff Additional file 17: Figure S5.tif, 4680K http://www.biomedcentral.com/imedia/1453972442674080/supp17.tiff Additional file 18: Figure S6.tif, 2071K http://www.biomedcentral.com/imedia/1147910699674080/supp18.tiff Additional file 19: Figure S7.tif, 16050K http://www.biomedcentral.com/imedia/1220352762674080/supp19.tiff Additional file 20: PLANTPHYSIOL Recommendation.docx, 14K http://www.biomedcentral.com/imedia/4524176306740802/supp20.docx