BackgroundIt is essential in modern biology to understand how transcriptional regulatory regions ... more BackgroundIt is essential in modern biology to understand how transcriptional regulatory regions are composed ofcis-elements, yet we have limited knowledge of, for example, the combinational uses of these elements and their positional distribution.ResultsWe predicted the positions of 228 known binding motifs for transcription factors in phylogenetically conserved regions within -2000 and +1000 bp of transcriptional start sites (TSSs) of human genes and visualized their correlated non-overlapping occurrences. In the 8,454 significantly correlated motif pairs, two major classes were observed: 248 pairs in Class 1 were mainly found around TSSs, whereas 4,020 Class 2 pairs appear at rather arbitrary distances from TSSs. These classes are distinct in a number of aspects. First, the positional distribution of the Class 1 constituent motifs shows a single peak near the TSSs, whereas Class 2 motifs show a relatively broad distribution. Second, genes that harbor the Class 1 pairs are more li...
In 1999, we released the DBTBS database of Bacillus subtilis promoters and transcription factors ... more In 1999, we released the DBTBS database of Bacillus subtilis promoters and transcription factors [1]. DBTBS is a reference database containing experimentally characterized transcription factors with their regulated genes as well as their recognition sequences, as reported in the literature. This database is useful to confirm predictions about transcription relatives using B.subtilis data. The latest version of this database shows position specific scoring matrices (PSSMs) to support a function to find putative transcription factor binding sites [2]. One of the problems for using weight matrices or consensus patterns to identify novel recognition positions of known transcription factors is that it often produces a number of false positives. To overcome this problem, the use of sequence conservation information between closely related species, called phylogenetic footprinting, is widely used. For example, we predicted B. subtilis regulons based on the conservation of upstream sequence...
Although the number of bacteria with their entire genomic sequence known is increasing, it is ess... more Although the number of bacteria with their entire genomic sequence known is increasing, it is essential to study well-known genomes for understanding the 'blueprint` of bacteria more precisely. For this purpose, E. coli and B. subtilis are especially suitable because of their long history of research. Among various information coded in bacterial genomes, we are interested in the analysis of transcriptional regulation network. One of our ultimate goal is to predict the expression condition of given ORFs from their upstream sequences. For example, we have developed a prediction system of sigma-dependency of ORFs found in B. subtilis [6] and in E. coli (unpublished). However, to understand the detailed mechanism of transcription regulation, the knowledge of other transcription factors is also crucial. For E. coli, there is such a database called RegulonDB [3] but there is no databases containing comprehensive information of transcription in B. subtilis. Thus, we constructed a datab...
In recent years, the number of bacteria whose entire genomic sequence is determined is growing ra... more In recent years, the number of bacteria whose entire genomic sequence is determined is growing rapidly. However, the information which can be derived from computer analyses of them is still limited although the predictive identi cation of coding regions is performed with relatively high accuracy (see [4], for example). Therefore, we have studied ways to interpret the regulatory information coded in genomic sequences [5, 6]. In this work, we report our rst e ort to integrate the models for detecting various signals (e.g., promoters and terminators) with our previous model of coding regions, aiming at the recognition of transcriptional units in bacterial genomes.
Cloning and sequencing of segment 9 of Bombyx mori cytoplasmic polyhedrosis virus (BmCPV) strains... more Cloning and sequencing of segment 9 of Bombyx mori cytoplasmic polyhedrosis virus (BmCPV) strains H and I were performed. The segment consisted of 1,186 bp harboring 5′ and 3′ noncoding regions and an open reading frame from positions 75 to 1037, encoding a protein with 320 amino acids, termed NS5. Comparison of the nucleotide sequences of NS5 for the two strains indicated 37 point differences resulting in only six amino acid replacements. Homology search showed that NS5 has localized similarities to human poliovirus RNA-dependent RNA polymerase and human rotavirus NS26. By Western blot analysis, NS5 was found in BmCPV-infected midgut cells, but not in polyhedra or virus virions, and was mainly detectable in the nucleus in BmCPV-infected BmN4 cells. Immunoblot analysis with anti-NS5 and antipolyhedrin antibodies displayed marked differences in the period of expression of NS5 and polyhedrin: the polyhedrin molecule was first detected 2 or 3 days after infection with BmCPV, whereas th...
The horizontal transfer of genes between distantly related organisms is undoubtedly a major facto... more The horizontal transfer of genes between distantly related organisms is undoubtedly a major factor in the evolution of novel traits. Because genes are functionless without expression, horizontally transferred genes must acquire appropriate transcriptional regulations in their recipient organisms, although the evolutionary mechanism is not known well. The defining characteristic of tunicates is the presence of a cellulose containing tunic covering the adult and larval body surface. Cellulose synthase was acquired by horizontal gene transfer from Actinobacteria. We found that acquisition of the binding site of AP-2 transcription factor was essential for tunicate cellulose synthase to gain epidermal-specific expression. Actinobacteria have very GC-rich genomes, regions of which are capable of inducing specific expression in the tunicate epidermis as the AP-2 binds to a GC-rich region. Therefore, the actinobacterial cellulose synthase could have been potentiated to evolve its new functi...
Human multipotent mesenchymal stromal cells (hMSCs) possess the ability to differentiate into ost... more Human multipotent mesenchymal stromal cells (hMSCs) possess the ability to differentiate into osteoblasts, and they can be utilized as a source for bone regenerative therapy. Osteoinductive pretreatment, which induces the osteoblastic differentiation of hMSCs in vitro, has been widely used for bone tissue engineering prior to cell transplantation. However, the molecular basis of osteoblastic differentiation induced by osteoinductive medium (OIM) is still unknown. Therefore, we used a next-generation sequencer to investigate the changes in gene expression during the osteoblastic differentiation of hMSCs. The hMSCs used in this study possessed both multipotency and self-renewal ability. Whole-transcriptome analysis revealed that the expression of zinc finger and BTB domain containing 16 (ZBTB16) was significantly increased during the osteoblastogenesis of hMSCs. ZBTB16 mRNA and protein expression was enhanced by culturing the hMSCs with OIM. Small interfering RNA (siRNA)-mediated gene...
The use of pathways and gene interaction networks for the analysis of differential expression exp... more The use of pathways and gene interaction networks for the analysis of differential expression experiments has allowed us to highlight the differences in gene expression profiles between samples in a systems biology perspective. The usefulness and accuracy of pathway analysis critically depend on our understanding of how genes interact with one another. That knowledge is continuously improving due to advances in next generation sequencing technologies and in computational methods. While most approaches treat each of them as independent entities, pathways actually coordinate to perform essential functions in a cell. In this work, we propose a methodology based on a sparse regression approach to find genes that act as intermediary to and interact with two pathways. We model each gene in a pathway using a set of predictor genes, and a connection is formed between the pathway gene and a predictor gene if the sparse regression coefficient corresponding to the predictor gene is non-zero. A...
DBTMEE (http://dbtmee.hgc.jp/) is a searchable and browsable database designed to manipulate gene... more DBTMEE (http://dbtmee.hgc.jp/) is a searchable and browsable database designed to manipulate gene expression information from our ultralarge-scale whole-transcriptome analysis of mouse early embryos. Since integrative approaches with multiple public analytical data have become indispensable for studying embryogenesis due to technical challenges such as biological sample collection, we intend DBTMEE to be an integrated gateway for the research community. To do so, we combined the gene expression profile with various public resources. Thereby, users can extensively investigate molecular characteristics among totipotent, pluripotent and differentiated cells while taking genetic and epigenetic characteristics into consideration. We have also designed user friendly web interfaces that enable users to access the data quickly and easily. DBTMEE will help to promote our understanding of the enigmatic fertilization dynamics.
Comparative sequence analysis was carried out for the regions adjacent to experimentally validate... more Comparative sequence analysis was carried out for the regions adjacent to experimentally validated transcriptional start sites (TSSs), using 3324 pairs of human and mouse genes. We aligned the upstream putative promoter sequences over the 1-kb proximal regions and found that the sequence conservation could not be further extended at, on average, 510 bp upstream positions of the TSSs. This discontinuous manner of the sequence conservation revealed a “block” structure in about one-third of the putative promoter regions. Consistently, we also observed that G+C content and CpG frequency were significantly different inside and outside the blocks. Within the blocks, the sequence identity was uniformly 65% regardless of their length. About 90% of the previously characterized transcription factor binding sites were located within those blocks. In 46% of the blocks, the 5′ ends were bounded by interspersed repetitive elements, some of which may have nucleated the genomic rearrangements. The ...
Innate immune response involves protein–protein interactions, deoxyribonucleic acid (DNA)–protein... more Innate immune response involves protein–protein interactions, deoxyribonucleic acid (DNA)–protein interactions and signaling cascades. So far, thousands of protein–protein interactions have been curated as a static interaction map. However, protein–protein interactions involved in innate immune response are dynamic. We recorded the dynamics in the interactome during innate immune response by combining gene expression data of lipopolysaccharide (LPS)-stimulated dendritic cells with protein–protein interactions data. We identified the differences in interactome during innate immune response by constructing differential networks and identifying protein modules, which were up-/down-regulated at each stage during the innate immune response. For each protein complex, we identified enriched biological processes and pathways. In addition, we identified core interactions that are conserved throughout the innate immune response and their enriched gene ontology terms and pathways. We defined t...
BackgroundIt is essential in modern biology to understand how transcriptional regulatory regions ... more BackgroundIt is essential in modern biology to understand how transcriptional regulatory regions are composed ofcis-elements, yet we have limited knowledge of, for example, the combinational uses of these elements and their positional distribution.ResultsWe predicted the positions of 228 known binding motifs for transcription factors in phylogenetically conserved regions within -2000 and +1000 bp of transcriptional start sites (TSSs) of human genes and visualized their correlated non-overlapping occurrences. In the 8,454 significantly correlated motif pairs, two major classes were observed: 248 pairs in Class 1 were mainly found around TSSs, whereas 4,020 Class 2 pairs appear at rather arbitrary distances from TSSs. These classes are distinct in a number of aspects. First, the positional distribution of the Class 1 constituent motifs shows a single peak near the TSSs, whereas Class 2 motifs show a relatively broad distribution. Second, genes that harbor the Class 1 pairs are more li...
In 1999, we released the DBTBS database of Bacillus subtilis promoters and transcription factors ... more In 1999, we released the DBTBS database of Bacillus subtilis promoters and transcription factors [1]. DBTBS is a reference database containing experimentally characterized transcription factors with their regulated genes as well as their recognition sequences, as reported in the literature. This database is useful to confirm predictions about transcription relatives using B.subtilis data. The latest version of this database shows position specific scoring matrices (PSSMs) to support a function to find putative transcription factor binding sites [2]. One of the problems for using weight matrices or consensus patterns to identify novel recognition positions of known transcription factors is that it often produces a number of false positives. To overcome this problem, the use of sequence conservation information between closely related species, called phylogenetic footprinting, is widely used. For example, we predicted B. subtilis regulons based on the conservation of upstream sequence...
Although the number of bacteria with their entire genomic sequence known is increasing, it is ess... more Although the number of bacteria with their entire genomic sequence known is increasing, it is essential to study well-known genomes for understanding the 'blueprint` of bacteria more precisely. For this purpose, E. coli and B. subtilis are especially suitable because of their long history of research. Among various information coded in bacterial genomes, we are interested in the analysis of transcriptional regulation network. One of our ultimate goal is to predict the expression condition of given ORFs from their upstream sequences. For example, we have developed a prediction system of sigma-dependency of ORFs found in B. subtilis [6] and in E. coli (unpublished). However, to understand the detailed mechanism of transcription regulation, the knowledge of other transcription factors is also crucial. For E. coli, there is such a database called RegulonDB [3] but there is no databases containing comprehensive information of transcription in B. subtilis. Thus, we constructed a datab...
In recent years, the number of bacteria whose entire genomic sequence is determined is growing ra... more In recent years, the number of bacteria whose entire genomic sequence is determined is growing rapidly. However, the information which can be derived from computer analyses of them is still limited although the predictive identi cation of coding regions is performed with relatively high accuracy (see [4], for example). Therefore, we have studied ways to interpret the regulatory information coded in genomic sequences [5, 6]. In this work, we report our rst e ort to integrate the models for detecting various signals (e.g., promoters and terminators) with our previous model of coding regions, aiming at the recognition of transcriptional units in bacterial genomes.
Cloning and sequencing of segment 9 of Bombyx mori cytoplasmic polyhedrosis virus (BmCPV) strains... more Cloning and sequencing of segment 9 of Bombyx mori cytoplasmic polyhedrosis virus (BmCPV) strains H and I were performed. The segment consisted of 1,186 bp harboring 5′ and 3′ noncoding regions and an open reading frame from positions 75 to 1037, encoding a protein with 320 amino acids, termed NS5. Comparison of the nucleotide sequences of NS5 for the two strains indicated 37 point differences resulting in only six amino acid replacements. Homology search showed that NS5 has localized similarities to human poliovirus RNA-dependent RNA polymerase and human rotavirus NS26. By Western blot analysis, NS5 was found in BmCPV-infected midgut cells, but not in polyhedra or virus virions, and was mainly detectable in the nucleus in BmCPV-infected BmN4 cells. Immunoblot analysis with anti-NS5 and antipolyhedrin antibodies displayed marked differences in the period of expression of NS5 and polyhedrin: the polyhedrin molecule was first detected 2 or 3 days after infection with BmCPV, whereas th...
The horizontal transfer of genes between distantly related organisms is undoubtedly a major facto... more The horizontal transfer of genes between distantly related organisms is undoubtedly a major factor in the evolution of novel traits. Because genes are functionless without expression, horizontally transferred genes must acquire appropriate transcriptional regulations in their recipient organisms, although the evolutionary mechanism is not known well. The defining characteristic of tunicates is the presence of a cellulose containing tunic covering the adult and larval body surface. Cellulose synthase was acquired by horizontal gene transfer from Actinobacteria. We found that acquisition of the binding site of AP-2 transcription factor was essential for tunicate cellulose synthase to gain epidermal-specific expression. Actinobacteria have very GC-rich genomes, regions of which are capable of inducing specific expression in the tunicate epidermis as the AP-2 binds to a GC-rich region. Therefore, the actinobacterial cellulose synthase could have been potentiated to evolve its new functi...
Human multipotent mesenchymal stromal cells (hMSCs) possess the ability to differentiate into ost... more Human multipotent mesenchymal stromal cells (hMSCs) possess the ability to differentiate into osteoblasts, and they can be utilized as a source for bone regenerative therapy. Osteoinductive pretreatment, which induces the osteoblastic differentiation of hMSCs in vitro, has been widely used for bone tissue engineering prior to cell transplantation. However, the molecular basis of osteoblastic differentiation induced by osteoinductive medium (OIM) is still unknown. Therefore, we used a next-generation sequencer to investigate the changes in gene expression during the osteoblastic differentiation of hMSCs. The hMSCs used in this study possessed both multipotency and self-renewal ability. Whole-transcriptome analysis revealed that the expression of zinc finger and BTB domain containing 16 (ZBTB16) was significantly increased during the osteoblastogenesis of hMSCs. ZBTB16 mRNA and protein expression was enhanced by culturing the hMSCs with OIM. Small interfering RNA (siRNA)-mediated gene...
The use of pathways and gene interaction networks for the analysis of differential expression exp... more The use of pathways and gene interaction networks for the analysis of differential expression experiments has allowed us to highlight the differences in gene expression profiles between samples in a systems biology perspective. The usefulness and accuracy of pathway analysis critically depend on our understanding of how genes interact with one another. That knowledge is continuously improving due to advances in next generation sequencing technologies and in computational methods. While most approaches treat each of them as independent entities, pathways actually coordinate to perform essential functions in a cell. In this work, we propose a methodology based on a sparse regression approach to find genes that act as intermediary to and interact with two pathways. We model each gene in a pathway using a set of predictor genes, and a connection is formed between the pathway gene and a predictor gene if the sparse regression coefficient corresponding to the predictor gene is non-zero. A...
DBTMEE (http://dbtmee.hgc.jp/) is a searchable and browsable database designed to manipulate gene... more DBTMEE (http://dbtmee.hgc.jp/) is a searchable and browsable database designed to manipulate gene expression information from our ultralarge-scale whole-transcriptome analysis of mouse early embryos. Since integrative approaches with multiple public analytical data have become indispensable for studying embryogenesis due to technical challenges such as biological sample collection, we intend DBTMEE to be an integrated gateway for the research community. To do so, we combined the gene expression profile with various public resources. Thereby, users can extensively investigate molecular characteristics among totipotent, pluripotent and differentiated cells while taking genetic and epigenetic characteristics into consideration. We have also designed user friendly web interfaces that enable users to access the data quickly and easily. DBTMEE will help to promote our understanding of the enigmatic fertilization dynamics.
Comparative sequence analysis was carried out for the regions adjacent to experimentally validate... more Comparative sequence analysis was carried out for the regions adjacent to experimentally validated transcriptional start sites (TSSs), using 3324 pairs of human and mouse genes. We aligned the upstream putative promoter sequences over the 1-kb proximal regions and found that the sequence conservation could not be further extended at, on average, 510 bp upstream positions of the TSSs. This discontinuous manner of the sequence conservation revealed a “block” structure in about one-third of the putative promoter regions. Consistently, we also observed that G+C content and CpG frequency were significantly different inside and outside the blocks. Within the blocks, the sequence identity was uniformly 65% regardless of their length. About 90% of the previously characterized transcription factor binding sites were located within those blocks. In 46% of the blocks, the 5′ ends were bounded by interspersed repetitive elements, some of which may have nucleated the genomic rearrangements. The ...
Innate immune response involves protein–protein interactions, deoxyribonucleic acid (DNA)–protein... more Innate immune response involves protein–protein interactions, deoxyribonucleic acid (DNA)–protein interactions and signaling cascades. So far, thousands of protein–protein interactions have been curated as a static interaction map. However, protein–protein interactions involved in innate immune response are dynamic. We recorded the dynamics in the interactome during innate immune response by combining gene expression data of lipopolysaccharide (LPS)-stimulated dendritic cells with protein–protein interactions data. We identified the differences in interactome during innate immune response by constructing differential networks and identifying protein modules, which were up-/down-regulated at each stage during the innate immune response. For each protein complex, we identified enriched biological processes and pathways. In addition, we identified core interactions that are conserved throughout the innate immune response and their enriched gene ontology terms and pathways. We defined t...
Uploads
Papers by Kenta Nakai