DataGrid group of BioGrid project [1], in which we are joining, is aiming to establish a methodol... more DataGrid group of BioGrid project [1], in which we are joining, is aiming to establish a methodology which enables to choose target proteins easily through the process of drug discovery. For that purpose, the group has been developing intelligent database services which federate various biological databases seamlessly. As one component of the services, we have been developing a real-time mapping system of cDNAs and genomes. In choosing a target protein for a drug discovery, the exon-intron structure of the corresponding gene and the information about the gene expression may also be required. The system, which we have been developing, is planned to enable users to get such the sequence information immediately. A lot of mapping methods are already published and there are excellent gene databases like Ensembl [2]. One of the distinguished features from these existing ones is that the user can complete the process of genome research such as homology search, mapping calculation and acquisition of detailed data only by inputting query sequence into this system.
Interleukin-18 (IL-18) is one of the pivotal cytokines controlling the defense mechanism called i... more Interleukin-18 (IL-18) is one of the pivotal cytokines controlling the defense mechanism called inflammation. As a first step to develop proteins for controlling the IL-18 level, we initiated a study of IL-18-binding proteins (IL-18BPs). Twenty-four IL-18BP family members, 11 from vertebrates and 13 from chordopoxviruses, were picked from the NCBI database. Eight of these vertebrate IL-18BPs and two of the chordopoxvirus IL18-BPs were identified here and characterized as new members of the IL-18BP family. Their IL-18 binding domains were aligned and the distribution of highly conserved critical amino acid residues was analyzed and used to construct a phylogenetic tree. From this tree it was inferred that at least two independent events created two different ancestral viral IL-18BP genes by retroposition of IL-18BP genes from the vertebrate lineage. These two events are estimated to have occurred after an ancient mammalian IL-18BP gene diverged from birds, and before the mammalian IL-18BP gene diverged into human, ungulate and rodent IL-18BP genes. Moreover, our results suggest that IL-18BP and interleukin-1 receptor, type II (IL-1R2) had a common ancestral gene and diverged from the ancestral gene into IL-18BP and IL-1R2 genes in the fish period.
Complete genome sequences of more than 30 organisms have been determined today. When manycomplete... more Complete genome sequences of more than 30 organisms have been determined today. When manycompletegenomesequencesbecomeavailable,oneofthefirstquestionsiswhichregionsareconservedamong various genome sequences. For the purpose, however, most existing tools are not availablebecause they can not treat large sequences such as complete genome sequences, or even when theycantreatcompletegenomesequences,theyareoftentooslow.We have developed a software tool
Sincethefirstcompletesequencedeterminationofabacterialwholegenome,morethan30bacterialgenomeshavebe... more Sincethefirstcompletesequencedeterminationofabacterialwholegenome,morethan30bacterialgenomeshavebeencompletelysequencedandthesegenomeshavebeenextensivelyanalyzed. Whatisthemostconservedsequencepreservedovertheinterspeciesinthebacteriawholegenomes? Manyhurdles exist in order to answer the above apparently simple question. However, at present, morethan30specieseubacterialandarchaealwholegenomesequencesaredetermined,andweareabletoanswerthatsimplequestion.When we perform the phylogenetic analysis, the most widely used molecule is a small-subunitribosomalRNA(rRNA),becausethestructurestabilityofrRNAisveryhigh,andtheyareoffunda-mentalimportanceandsooccurinallorganisms[1]. However,itisnotclearanalyticallythatwhichsequenceorgeneisthemostconservedsequenceinthebacterialwholegenomesamongvariousspeciesuntil many whole genome sequences are obtained and compare all genome sequences in mutually.When we search the most conserved sequence in whole bacterial genomes among various species, itis insufficie...
We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings ... more We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.
The availability of diverse second- and third-generation sequencing technologies enables the rapi... more The availability of diverse second- and third-generation sequencing technologies enables the rapid determination of the sequences of bacterial genomes. However, identifying the sequencing technology most suitable for producing a finished genome with multiple chromosomes remains a challenge. We evaluated the abilities of the following three second-generation sequencers: Roche 454 GS Junior (GS Jr), Life Technologies Ion PGM (Ion PGM), and Illumina MiSeq (MiSeq) and a third-generation sequencer, the Pacific Biosciences RS sequencer (PacBio), by sequencing and assembling the genome of Vibrio parahaemolyticus, which consists of a 5-Mb genome comprising two circular chromosomes. We sequenced the genome of V. parahaemolyticus with GS Jr, Ion PGM, MiSeq, and PacBio and performed de novo assembly with several genome assemblers. Although GS Jr generated the longest mean read length of 418 bp among the second-generation sequencers, the maximum contig length of the best assembly from GS Jr was...
Spontaneously hypertensive rats (SHR) and stroke-prone SHR (SHRSP) are frequently used as model r... more Spontaneously hypertensive rats (SHR) and stroke-prone SHR (SHRSP) are frequently used as model rats not only in studies of essential hypertension and stroke, but also in studies of attention deficit hyperactivity disorder (ADHD). Normotensive Wistar-Kyoto rats (WKY) are normally used as controls in these studies. In this study, using these rats, we aimed to identify the genes causing hypertension and stroke, as well as the genes involved in ADHD. Since adrenal gland products can directly influence cardiovascular, endocrine and sympathetic nervous system functions, gene expression profiles in the adrenal glands of the 3 rat strains were examined using genome-wide microarray technology when the rats were 3 and 6 weeks of age, a period in which the rats are considered to be in a pre-hypertensive state. Gene expression profiles were compared between SHR and WKY and between SHRSP and SHR. A total of 353 genes showing more than a 4-fold increase or less than a 4-fold decrease in expressi...
Genetic reassortment plays a vital role in the evolution of the influenza virus and has historica... more Genetic reassortment plays a vital role in the evolution of the influenza virus and has historically been linked with the emergence of pandemic strains. Reassortment is believed to occur when a single host - typically swine - is simultaneously infected with multiple influenza strains. The reassorted viral strains with novel gene combinations tend to easily evade the immune system in other host species, satisfying the basic requirements of a virus with pandemic potential. Therefore, it is vital to continuously monitor the genetic content of circulating influenza strains and keep an eye out for new reassortants. We present a new approach to identify reassortants from large data sets of influenza whole genome nucleotide sequences and report the results of the first ever comprehensive search for reassortants of all published influenza A genomic data. 35 of the 52 well supported candidate reassortants we found are reported here for the first time while our analysis method offers new insi...
Web services have become a key technology for bioinformatics, since life science databases are gl... more Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the com...
The interaction between biological researchers and the bioinformatics tools they use is still ham... more The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009. Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system ...
BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sc... more BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sciences (DBCLS) in Tokyo, Japan. The overall goal of the BioHackathon series is to improve the quality and accessibility of life science research data on the Web by bringing together representatives from public databases, analytical tool providers, and cyber-infrastructure researchers to jointly tackle important challenges in the area of in silico biological research. The theme of BioHackathon 2010 was the…
DataGrid group of BioGrid project [1], in which we are joining, is aiming to establish a methodol... more DataGrid group of BioGrid project [1], in which we are joining, is aiming to establish a methodology which enables to choose target proteins easily through the process of drug discovery. For that purpose, the group has been developing intelligent database services which federate various biological databases seamlessly. As one component of the services, we have been developing a real-time mapping system of cDNAs and genomes. In choosing a target protein for a drug discovery, the exon-intron structure of the corresponding gene and the information about the gene expression may also be required. The system, which we have been developing, is planned to enable users to get such the sequence information immediately. A lot of mapping methods are already published and there are excellent gene databases like Ensembl [2]. One of the distinguished features from these existing ones is that the user can complete the process of genome research such as homology search, mapping calculation and acquisition of detailed data only by inputting query sequence into this system.
Interleukin-18 (IL-18) is one of the pivotal cytokines controlling the defense mechanism called i... more Interleukin-18 (IL-18) is one of the pivotal cytokines controlling the defense mechanism called inflammation. As a first step to develop proteins for controlling the IL-18 level, we initiated a study of IL-18-binding proteins (IL-18BPs). Twenty-four IL-18BP family members, 11 from vertebrates and 13 from chordopoxviruses, were picked from the NCBI database. Eight of these vertebrate IL-18BPs and two of the chordopoxvirus IL18-BPs were identified here and characterized as new members of the IL-18BP family. Their IL-18 binding domains were aligned and the distribution of highly conserved critical amino acid residues was analyzed and used to construct a phylogenetic tree. From this tree it was inferred that at least two independent events created two different ancestral viral IL-18BP genes by retroposition of IL-18BP genes from the vertebrate lineage. These two events are estimated to have occurred after an ancient mammalian IL-18BP gene diverged from birds, and before the mammalian IL-18BP gene diverged into human, ungulate and rodent IL-18BP genes. Moreover, our results suggest that IL-18BP and interleukin-1 receptor, type II (IL-1R2) had a common ancestral gene and diverged from the ancestral gene into IL-18BP and IL-1R2 genes in the fish period.
Complete genome sequences of more than 30 organisms have been determined today. When manycomplete... more Complete genome sequences of more than 30 organisms have been determined today. When manycompletegenomesequencesbecomeavailable,oneofthefirstquestionsiswhichregionsareconservedamong various genome sequences. For the purpose, however, most existing tools are not availablebecause they can not treat large sequences such as complete genome sequences, or even when theycantreatcompletegenomesequences,theyareoftentooslow.We have developed a software tool
Sincethefirstcompletesequencedeterminationofabacterialwholegenome,morethan30bacterialgenomeshavebe... more Sincethefirstcompletesequencedeterminationofabacterialwholegenome,morethan30bacterialgenomeshavebeencompletelysequencedandthesegenomeshavebeenextensivelyanalyzed. Whatisthemostconservedsequencepreservedovertheinterspeciesinthebacteriawholegenomes? Manyhurdles exist in order to answer the above apparently simple question. However, at present, morethan30specieseubacterialandarchaealwholegenomesequencesaredetermined,andweareabletoanswerthatsimplequestion.When we perform the phylogenetic analysis, the most widely used molecule is a small-subunitribosomalRNA(rRNA),becausethestructurestabilityofrRNAisveryhigh,andtheyareoffunda-mentalimportanceandsooccurinallorganisms[1]. However,itisnotclearanalyticallythatwhichsequenceorgeneisthemostconservedsequenceinthebacterialwholegenomesamongvariousspeciesuntil many whole genome sequences are obtained and compare all genome sequences in mutually.When we search the most conserved sequence in whole bacterial genomes among various species, itis insufficie...
We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings ... more We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.
The availability of diverse second- and third-generation sequencing technologies enables the rapi... more The availability of diverse second- and third-generation sequencing technologies enables the rapid determination of the sequences of bacterial genomes. However, identifying the sequencing technology most suitable for producing a finished genome with multiple chromosomes remains a challenge. We evaluated the abilities of the following three second-generation sequencers: Roche 454 GS Junior (GS Jr), Life Technologies Ion PGM (Ion PGM), and Illumina MiSeq (MiSeq) and a third-generation sequencer, the Pacific Biosciences RS sequencer (PacBio), by sequencing and assembling the genome of Vibrio parahaemolyticus, which consists of a 5-Mb genome comprising two circular chromosomes. We sequenced the genome of V. parahaemolyticus with GS Jr, Ion PGM, MiSeq, and PacBio and performed de novo assembly with several genome assemblers. Although GS Jr generated the longest mean read length of 418 bp among the second-generation sequencers, the maximum contig length of the best assembly from GS Jr was...
Spontaneously hypertensive rats (SHR) and stroke-prone SHR (SHRSP) are frequently used as model r... more Spontaneously hypertensive rats (SHR) and stroke-prone SHR (SHRSP) are frequently used as model rats not only in studies of essential hypertension and stroke, but also in studies of attention deficit hyperactivity disorder (ADHD). Normotensive Wistar-Kyoto rats (WKY) are normally used as controls in these studies. In this study, using these rats, we aimed to identify the genes causing hypertension and stroke, as well as the genes involved in ADHD. Since adrenal gland products can directly influence cardiovascular, endocrine and sympathetic nervous system functions, gene expression profiles in the adrenal glands of the 3 rat strains were examined using genome-wide microarray technology when the rats were 3 and 6 weeks of age, a period in which the rats are considered to be in a pre-hypertensive state. Gene expression profiles were compared between SHR and WKY and between SHRSP and SHR. A total of 353 genes showing more than a 4-fold increase or less than a 4-fold decrease in expressi...
Genetic reassortment plays a vital role in the evolution of the influenza virus and has historica... more Genetic reassortment plays a vital role in the evolution of the influenza virus and has historically been linked with the emergence of pandemic strains. Reassortment is believed to occur when a single host - typically swine - is simultaneously infected with multiple influenza strains. The reassorted viral strains with novel gene combinations tend to easily evade the immune system in other host species, satisfying the basic requirements of a virus with pandemic potential. Therefore, it is vital to continuously monitor the genetic content of circulating influenza strains and keep an eye out for new reassortants. We present a new approach to identify reassortants from large data sets of influenza whole genome nucleotide sequences and report the results of the first ever comprehensive search for reassortants of all published influenza A genomic data. 35 of the 52 well supported candidate reassortants we found are reported here for the first time while our analysis method offers new insi...
Web services have become a key technology for bioinformatics, since life science databases are gl... more Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the com...
The interaction between biological researchers and the bioinformatics tools they use is still ham... more The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009. Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system ...
BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sc... more BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sciences (DBCLS) in Tokyo, Japan. The overall goal of the BioHackathon series is to improve the quality and accessibility of life science research data on the Web by bringing together representatives from public databases, analytical tool providers, and cyber-infrastructure researchers to jointly tackle important challenges in the area of in silico biological research. The theme of BioHackathon 2010 was the…
Uploads