Background We present the draft genome sequence of Dysdera silvatica, a nocturnal ground-dwelling... more Background We present the draft genome sequence of Dysdera silvatica, a nocturnal ground-dwelling spider from a genus that has undergone a remarkable adaptive radiation in the Canary Islands. Results The draft assembly was obtained using short (Illumina) and long (PaciBio and Nanopore) sequencing reads. Our de novo assembly (1.36 Gb), which represents 80% of the genome size estimated by flow cytometry (1.7 Gb), is constituted by a high fraction of interspersed repetitive elements (53.8%). The assembly completeness, using BUSCO and core eukaryotic genes, ranges from 90% to 96%. Functional annotations based on both ab initio and evidence-based information (including D. silvatica RNA sequencing) yielded a total of 48,619 protein-coding sequences, of which 36,398 (74.9%) have the molecular hallmark of known protein domains, or sequence similarity with Swiss-Prot sequences. The D. silvatica assembly is the first representative of the superfamily Dysderoidea, and just the second available...
Gene clusters of recently duplicated genes are hotbeds for evolutionary change. However, our unde... more Gene clusters of recently duplicated genes are hotbeds for evolutionary change. However, our understanding of how mutational mechanisms and evolutionary forces shape the structural and functional evolution of these clusters is hindered by the high sequence identity among the copies, which typically results in their inaccurate representation in genome assemblies. The presumed testis-specific, chimeric gene Sdic originated and tandemly expanded in Drosophila melanogaster, contributing to increased male-male competition. Using various types of massively parallel sequencing data, we studied the organization, sequence evolution, and functional attributes of the different Sdic copies. By leveraging long-read sequencing data, we uncovered both copy number and order differences from the currently accepted annotation for the Sdic region. Despite evidence for pervasive gene conversion affecting the Sdic copies, we also detected signatures of two episodes of diversifying selection, which have contributed to the evolution of a variety of C-termini and miRNA binding site compositions. Expression analyses involving RNA-seq datasets from 59 different biological conditions revealed distinctive expression breadths among the copies, with three copies being transcribed in females, opening the possibility to a sexually antagonistic effect. Phenotypic assays using Sdic knock-out strains indicated that should this antagonistic effect exist, it does not compromise female fertility. Our results strongly suggest that the genome consolidation of the Sdic gene cluster is more the result of a quick exploration of different paths of molecular tinkering by different copies than a mere dosage increase, which could be a recurrent evolutionary outcome in the presence of persistent sexual selection.
The development of molecular markers is one of the most important challenges in phylogenetic and ... more The development of molecular markers is one of the most important challenges in phylogenetic and genome wide population genetics studies, especially in studies with non-model organisms. A highly promising approach for obtaining suitable markers is the utilization of genomic partitioning strategies for the simultaneous discovery and genotyping of a large number of markers. Unfortunately, not all markers obtained from these strategies provide enough information for solving multiple evolutionary questions at a reasonable taxonomic resolution. We have developed Development Of Molecular markers In Non-model Organisms (DOMINO), a bioinformatics tool for informative marker development from both next generation sequencing (NGS) data and pre-computed sequence alignments. The application implements popular NGS tools with new utilities in a highly versatile pipeline specifically designed to discover or select personalized markers at different levels of taxonomic resolution. These markers can be directly used to study the taxa surveyed for their design, utilized for further downstream PCR amplification in a broader set taxonomic scope, or exploited as suitable templates to bait design for target DNA enrichment techniques. We conducted an exhaustive evaluation of the performance of DOMINO via computer simulations and illustrate its utility to find informative markers in an empirical dataset. DOMINO is freely available from www.ub.edu/softevol/domino CONTACT: elsanchez@ub.edu or jrozas@ub.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Insect biochemistry and molecular biology, Jan 11, 2016
Manduca sexta, known as the tobacco hornworm or Carolina sphinx moth, is a lepidopteran insect th... more Manduca sexta, known as the tobacco hornworm or Carolina sphinx moth, is a lepidopteran insect that is used extensively as a model system for research in insect biochemistry, physiology, neurobiology, development, and immunity. One important benefit of this species as an experimental model is its extremely large size, reaching more than 10 g in the larval stage. M. sexta larvae feed on solanaceous plants and thus must tolerate a substantial challenge from plant allelochemicals, including nicotine. We report the sequence and annotation of the M. sexta genome, and a survey of gene expression in various tissues and developmental stages. The Msex_1.0 genome assembly resulted in a total genome size of 419.4 Mbp. Repetitive sequences accounted for 25.8% of the assembled genome. The official gene set is comprised of 15,451 protein-coding genes, of which 2498 were manually curated. Extensive RNA-seq data from many tissues and developmental stages were used to improve gene models and for ins...
Coffee is a valuable beverage crop due to its characteristic flavor, aroma, and the stimulating e... more Coffee is a valuable beverage crop due to its characteristic flavor, aroma, and the stimulating effects of caffeine. We generated a high-quality draft genome of the species Coffea canephora, which displays a conserved chromosomal gene order among asterid angiosperms. Although it shows no sign of the whole-genome triplication identified in Solanaceae species such as tomato, the genome includes several species-specific gene family expansions, among them N-methyltransferases (NMTs) involved in caffeine production, defense-related genes, and alkaloid and flavonoid enzymes involved in secondary compound synthesis. Comparative analyses of caffeine NMTs demonstrate that these genes expanded through sequential tandem duplications independently of genes from cacao and tea, suggesting that caffeine in eudicots is of polyphyletic origin.
The molecular characterization of a novel single-stranded RNA virus, obtained by next generation ... more The molecular characterization of a novel single-stranded RNA virus, obtained by next generation sequencing using Illumina platform, in a field grapevine isolate of the plant pathogenic fungus Botrytis, is reported in this work. The sequence comparison of this virus against the NCBI database showed a strong identity with RNA dependent RNA polymerases (RdRps) of plant pathogenic viruses belonging to the genus Ourmiavirus, therefore, this novel virus was named Botrytis ourmia-like virus (BOLV). BOLV has one open reading frame of 2169 nucleotides, which encodes a protein of 722 amino acids showing conserved domains of plant RNA viruses RdRps such as the most conserved GDD active domain. Our analyses showed that BOLV is phylogenetically closer to the fungal Narnavirus and the plant Ourmiavirus than to Mitovirus of the family Narnaviridae. Hence, we proposed that BOLV might represent the link between fungal viruses of the family Narnaviridae and the plant ourmiaviruses.
The funnel-web spider Macrothele calpeiana is a charismatic Mygalomorph with a great interest in ... more The funnel-web spider Macrothele calpeiana is a charismatic Mygalomorph with a great interest in basic, applied and translational research. Nevertheless, current scarcity of genomic and transcriptomic data of this species clearly limits the research in this non-model organism. To overcome this limitation, we launched the first tissue-specific enriched RNA-seq analysis in this species using a subtractive hybridization approach, with two main objectives, to characterize the specific transcriptome of the putative chemosensory appendages (palps and first pair of legs), and to provide a new set of DNA markers for further phylogenetic studies. We have characterized the set of transcripts specifically expressed in putative chemosensory tissues of this species, much of them showing features shared by chemosensory system genes. Among specific candidates, we have identified some members of the iGluR and NPC2 families. Moreover, we have demonstrated the utility of these newly generated data as molecular markers by inferring the phylogenetic position M. calpeina in the phylogenetic tree of Mygalomorphs. Our results provide novel resources for researchers interested in spider molecular biology and systematics, which can help to expand our knowledge on the evolutionary processes underlying fundamental biological questions, as species invasion or biodiversity origin and maintenance.
Application of next-generation sequencing (NGS) methods for transcriptome analysis (RNA-seq) has ... more Application of next-generation sequencing (NGS) methods for transcriptome analysis (RNA-seq) has become increasingly accessible in recent years and are of great interest to many biological disciplines including, eg, evolutionary biology, ecology, biomedicine, and computational biology. Although virtually any research group can now obtain RNA-seq data, only a few have the bioinformatics knowledge and computation facilities required for transcriptome analysis. Here, we present TRUFA (TRanscriptome User-Friendly Analysis), an open informatics platform offering a web-based interface that generates the outputs commonly used in de novo RNA-seq analysis and comparative transcriptomics. TRUFA provides a comprehensive service that allows performing dynamically raw read cleaning, transcript assembly, annotation, and expression quantification. Due to the computationally intensive nature of such analyses, TRUFA is highly parallelized and benefits from accessing high-performance computing resources. The complete TRUFA pipeline was validated using four previously published transcriptomic data sets. TRUFA's results for the example datasets showed globally similar results when comparing with the original studies, and performed particularly better when analyzing the green tea dataset. The platform permits analyzing RNA-seq data in a fast, robust, and user-friendly manner. Accounts on TRUFA are provided freely upon request at https://trufa.ifca.es.
A region of approximately 1.6 kb encompassing the ribosomal protein 49 gene (rp49) has been seque... more A region of approximately 1.6 kb encompassing the ribosomal protein 49 gene (rp49) has been sequenced and compared in nine species of theobscuragroup ofDrosophila:four species belonging to theobscurasubgroup, three to thepseudoobscurasubgroup, and two to ...
Background We present the draft genome sequence of Dysdera silvatica, a nocturnal ground-dwelling... more Background We present the draft genome sequence of Dysdera silvatica, a nocturnal ground-dwelling spider from a genus that has undergone a remarkable adaptive radiation in the Canary Islands. Results The draft assembly was obtained using short (Illumina) and long (PaciBio and Nanopore) sequencing reads. Our de novo assembly (1.36 Gb), which represents 80% of the genome size estimated by flow cytometry (1.7 Gb), is constituted by a high fraction of interspersed repetitive elements (53.8%). The assembly completeness, using BUSCO and core eukaryotic genes, ranges from 90% to 96%. Functional annotations based on both ab initio and evidence-based information (including D. silvatica RNA sequencing) yielded a total of 48,619 protein-coding sequences, of which 36,398 (74.9%) have the molecular hallmark of known protein domains, or sequence similarity with Swiss-Prot sequences. The D. silvatica assembly is the first representative of the superfamily Dysderoidea, and just the second available...
Gene clusters of recently duplicated genes are hotbeds for evolutionary change. However, our unde... more Gene clusters of recently duplicated genes are hotbeds for evolutionary change. However, our understanding of how mutational mechanisms and evolutionary forces shape the structural and functional evolution of these clusters is hindered by the high sequence identity among the copies, which typically results in their inaccurate representation in genome assemblies. The presumed testis-specific, chimeric gene Sdic originated and tandemly expanded in Drosophila melanogaster, contributing to increased male-male competition. Using various types of massively parallel sequencing data, we studied the organization, sequence evolution, and functional attributes of the different Sdic copies. By leveraging long-read sequencing data, we uncovered both copy number and order differences from the currently accepted annotation for the Sdic region. Despite evidence for pervasive gene conversion affecting the Sdic copies, we also detected signatures of two episodes of diversifying selection, which have contributed to the evolution of a variety of C-termini and miRNA binding site compositions. Expression analyses involving RNA-seq datasets from 59 different biological conditions revealed distinctive expression breadths among the copies, with three copies being transcribed in females, opening the possibility to a sexually antagonistic effect. Phenotypic assays using Sdic knock-out strains indicated that should this antagonistic effect exist, it does not compromise female fertility. Our results strongly suggest that the genome consolidation of the Sdic gene cluster is more the result of a quick exploration of different paths of molecular tinkering by different copies than a mere dosage increase, which could be a recurrent evolutionary outcome in the presence of persistent sexual selection.
The development of molecular markers is one of the most important challenges in phylogenetic and ... more The development of molecular markers is one of the most important challenges in phylogenetic and genome wide population genetics studies, especially in studies with non-model organisms. A highly promising approach for obtaining suitable markers is the utilization of genomic partitioning strategies for the simultaneous discovery and genotyping of a large number of markers. Unfortunately, not all markers obtained from these strategies provide enough information for solving multiple evolutionary questions at a reasonable taxonomic resolution. We have developed Development Of Molecular markers In Non-model Organisms (DOMINO), a bioinformatics tool for informative marker development from both next generation sequencing (NGS) data and pre-computed sequence alignments. The application implements popular NGS tools with new utilities in a highly versatile pipeline specifically designed to discover or select personalized markers at different levels of taxonomic resolution. These markers can be directly used to study the taxa surveyed for their design, utilized for further downstream PCR amplification in a broader set taxonomic scope, or exploited as suitable templates to bait design for target DNA enrichment techniques. We conducted an exhaustive evaluation of the performance of DOMINO via computer simulations and illustrate its utility to find informative markers in an empirical dataset. DOMINO is freely available from www.ub.edu/softevol/domino CONTACT: elsanchez@ub.edu or jrozas@ub.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Insect biochemistry and molecular biology, Jan 11, 2016
Manduca sexta, known as the tobacco hornworm or Carolina sphinx moth, is a lepidopteran insect th... more Manduca sexta, known as the tobacco hornworm or Carolina sphinx moth, is a lepidopteran insect that is used extensively as a model system for research in insect biochemistry, physiology, neurobiology, development, and immunity. One important benefit of this species as an experimental model is its extremely large size, reaching more than 10 g in the larval stage. M. sexta larvae feed on solanaceous plants and thus must tolerate a substantial challenge from plant allelochemicals, including nicotine. We report the sequence and annotation of the M. sexta genome, and a survey of gene expression in various tissues and developmental stages. The Msex_1.0 genome assembly resulted in a total genome size of 419.4 Mbp. Repetitive sequences accounted for 25.8% of the assembled genome. The official gene set is comprised of 15,451 protein-coding genes, of which 2498 were manually curated. Extensive RNA-seq data from many tissues and developmental stages were used to improve gene models and for ins...
Coffee is a valuable beverage crop due to its characteristic flavor, aroma, and the stimulating e... more Coffee is a valuable beverage crop due to its characteristic flavor, aroma, and the stimulating effects of caffeine. We generated a high-quality draft genome of the species Coffea canephora, which displays a conserved chromosomal gene order among asterid angiosperms. Although it shows no sign of the whole-genome triplication identified in Solanaceae species such as tomato, the genome includes several species-specific gene family expansions, among them N-methyltransferases (NMTs) involved in caffeine production, defense-related genes, and alkaloid and flavonoid enzymes involved in secondary compound synthesis. Comparative analyses of caffeine NMTs demonstrate that these genes expanded through sequential tandem duplications independently of genes from cacao and tea, suggesting that caffeine in eudicots is of polyphyletic origin.
The molecular characterization of a novel single-stranded RNA virus, obtained by next generation ... more The molecular characterization of a novel single-stranded RNA virus, obtained by next generation sequencing using Illumina platform, in a field grapevine isolate of the plant pathogenic fungus Botrytis, is reported in this work. The sequence comparison of this virus against the NCBI database showed a strong identity with RNA dependent RNA polymerases (RdRps) of plant pathogenic viruses belonging to the genus Ourmiavirus, therefore, this novel virus was named Botrytis ourmia-like virus (BOLV). BOLV has one open reading frame of 2169 nucleotides, which encodes a protein of 722 amino acids showing conserved domains of plant RNA viruses RdRps such as the most conserved GDD active domain. Our analyses showed that BOLV is phylogenetically closer to the fungal Narnavirus and the plant Ourmiavirus than to Mitovirus of the family Narnaviridae. Hence, we proposed that BOLV might represent the link between fungal viruses of the family Narnaviridae and the plant ourmiaviruses.
The funnel-web spider Macrothele calpeiana is a charismatic Mygalomorph with a great interest in ... more The funnel-web spider Macrothele calpeiana is a charismatic Mygalomorph with a great interest in basic, applied and translational research. Nevertheless, current scarcity of genomic and transcriptomic data of this species clearly limits the research in this non-model organism. To overcome this limitation, we launched the first tissue-specific enriched RNA-seq analysis in this species using a subtractive hybridization approach, with two main objectives, to characterize the specific transcriptome of the putative chemosensory appendages (palps and first pair of legs), and to provide a new set of DNA markers for further phylogenetic studies. We have characterized the set of transcripts specifically expressed in putative chemosensory tissues of this species, much of them showing features shared by chemosensory system genes. Among specific candidates, we have identified some members of the iGluR and NPC2 families. Moreover, we have demonstrated the utility of these newly generated data as molecular markers by inferring the phylogenetic position M. calpeina in the phylogenetic tree of Mygalomorphs. Our results provide novel resources for researchers interested in spider molecular biology and systematics, which can help to expand our knowledge on the evolutionary processes underlying fundamental biological questions, as species invasion or biodiversity origin and maintenance.
Application of next-generation sequencing (NGS) methods for transcriptome analysis (RNA-seq) has ... more Application of next-generation sequencing (NGS) methods for transcriptome analysis (RNA-seq) has become increasingly accessible in recent years and are of great interest to many biological disciplines including, eg, evolutionary biology, ecology, biomedicine, and computational biology. Although virtually any research group can now obtain RNA-seq data, only a few have the bioinformatics knowledge and computation facilities required for transcriptome analysis. Here, we present TRUFA (TRanscriptome User-Friendly Analysis), an open informatics platform offering a web-based interface that generates the outputs commonly used in de novo RNA-seq analysis and comparative transcriptomics. TRUFA provides a comprehensive service that allows performing dynamically raw read cleaning, transcript assembly, annotation, and expression quantification. Due to the computationally intensive nature of such analyses, TRUFA is highly parallelized and benefits from accessing high-performance computing resources. The complete TRUFA pipeline was validated using four previously published transcriptomic data sets. TRUFA's results for the example datasets showed globally similar results when comparing with the original studies, and performed particularly better when analyzing the green tea dataset. The platform permits analyzing RNA-seq data in a fast, robust, and user-friendly manner. Accounts on TRUFA are provided freely upon request at https://trufa.ifca.es.
A region of approximately 1.6 kb encompassing the ribosomal protein 49 gene (rp49) has been seque... more A region of approximately 1.6 kb encompassing the ribosomal protein 49 gene (rp49) has been sequenced and compared in nine species of theobscuragroup ofDrosophila:four species belonging to theobscurasubgroup, three to thepseudoobscurasubgroup, and two to ...
Uploads