Abstract
High-throughput sequencing of mRNA has made the deep and efficient probing of transcriptomes more affordable. However, the vast amounts of short RNA-seq reads make de novo transcriptome assembly an algorithmic challenge. In this work, we present IsoTree, a novel framework for transcripts reconstruction in the absence of reference genomes. Unlike most of de novo assembly methods that build de Bruijn graph or splicing graph by connecting k-mers which are sets of overlapping substrings generated from reads, IsoTree constructs splicing graph by connecting reads directly. For each splicing graph, IsoTree applies an iterative scheme of mixed integer linear program to build a prefix tree, called isoform tree. Each path from the root node of the isoform tree to a leaf node represents a plausible transcript candidate which will be pruned based on the information of pair-end reads. Experiments showed that IsoTree performs better in recall on both pair-end reads and single-end reads and in precision on pair-end reads compared to other leading transcript assembly programs including Cufflinks, StringTie and BinPacker.
This work is supported by National Natural Science Foundation of China under No. 61672325 and No. 61472222.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, M., Manley, J.L.: Mechanisms of alternative splicing regulation: insights from molecular and genomics approaches. Nat. Rev. Mol. Cell Biol. 10(11), 741–754 (2009)
Wang, E.T., Sandberg, R., Luo, S., et al.: Alternative isoform regulation in human tissue transcriptomes. Nature 456(7221), 470–476 (2008)
Faustino, N.A., Cooper, T.A.: Pre-mRNA splicing and human disease. Genes Dev. 17(4), 419–437 (2003)
Sveen, A., Kilpinen, S., Ruusulehto, A., et al.: Aberrant RNA splicing in cancer; expression changes and driver mutations of splicing factor genes. Oncogene 35, 2413–2427 (2015)
Pertea, M., Pertea, G.M., Antonescu, C.M., et al.: StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33(3), 290–295 (2015)
Trapnell, C., Willians, B.A., Pertea, G., et al.: Transcript assembly and abundance estimation from RNA-Seq reveals throusands of new transcripts and switching among isoforms. Nat. Biotechnol. 28(5), 511–515 (2010)
Guttman, M., Garber, M., Levin, J.Z., et al.: Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28(5), 503–510 (2010)
Maretty, L., Sibbesen, J.A., Krogh, A.: Bayesian transcriptome assembly. Genome Biol. 15(10), 501 (2014)
Feng, J., Li, W., Jiang, T.: Inference of isoforms from short sequence reads. J. Comput. Biol. 18(3), 305–321 (2011)
Li, W., Jiang, T.: IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly. J. Comput. Biol. 18(11), 1693–1707 (2011)
Tomescu, A.I., Kuosmanen, A., Rizzi, R., et al.: A novel min-cost flow method for estimating transcript expression with RNA-Seq. BMC Bioinform. 14(5), S15 (2013)
Mezlini, A.M., Smith, E.J.M., Fiume, M., et al.: iReckon: simultaneous isoform discovery and abundance estimation from RNA-seq data. Genome Res. 23(3), 519–529 (2013)
Canzar, S., Andreotti, S., Weese, D., Reinert, K., Klau, G.W.: CIDANE: comprehensive isoform discovery and abundance estimation. Genome Biol. 17(1), 16 (2016)
Liu, J., Yu, T., Jiang, T., et al.: TransComb: genome-guided transcriptome assembly via combing junctions in splicing graphs. Genome Biol. 17(1), 213 (2016)
Trapnell, C., Pachter, L., Salzberg, S.L.: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9), 1105–1111 (2009)
Kim, D., Pertea, G., Trapnell, C., et al.: TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14(4), R36 (2013)
Wu, T.D., Nacu, S.: Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26(7), 873–881 (2010)
Dobin, A., Davis, C.A., Schlesinger, F., et al.: STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1), 15–21 (2013)
Au, K.F., Jiang, H., Lin, L., et al.: Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res. 38(14), 4570–4578 (2010)
Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18(5), 821–829 (2008)
Simpson, J.T., Wong, K., Jackman, S.D., et al.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)
Grabherr, M.G., Haas, B.J., Yassour, M., et al.: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29(7), 644–652 (2011)
Peng, Y., Leung, H.C.M., Yiu, S.M., et al.: IDBA-tran: a more robust de novo de Bruijn graph assembler for transcriptomes with uneven expression levels. Bioinformatics 29(13), i326–i334 (2013)
Schulz, M.H., Zerbino, D.R., Vingron, M., et al.: Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28(8), 1086–1092 (2012)
Chang, Z., Li, G., Liu, J., et al.: Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol. 16(1), 30 (2015)
Liu, J., Li, G., Chang, Z., et al.: BinPacker: packing-based de novo transcriptome assembly from RNA-seq data. PLoS Comput. Biol. 12(2), e1004772 (2016)
Camacho, C., Coulouris, G., Avagyan, V., et al.: BLAST+: architecture and applications. BMC Bioinform. 10(1), 421 (2009)
Heber, S., Alekseyev, M., Sze, S.H., et al.: Splicing graphs and EST assembly problem. Bioinformatics 18(suppl 1), S181–S188 (2002)
Griebel, T., Zacher, B., Ribeca, P., et al.: Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic Acids Res. 40(20), 10073–10083 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zhao, J., Feng, H., Zhu, D., Zhang, C., Xu, Y. (2017). IsoTree: De Novo Transcriptome Assembly from RNA-Seq Reads. In: Cai, Z., Daescu, O., Li, M. (eds) Bioinformatics Research and Applications. ISBRA 2017. Lecture Notes in Computer Science(), vol 10330. Springer, Cham. https://doi.org/10.1007/978-3-319-59575-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-59575-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59574-0
Online ISBN: 978-3-319-59575-7
eBook Packages: Computer ScienceComputer Science (R0)