Abstract
In the past few decades, the dangers of mycosis have caused widespread concern. With the development of the sequencing technology, the effective analysis of fungal sequencing data has become a hotspot. With the gradual increase of fungal sequencing data, there is now a lack of sufficient approaches for the identification and functional annotation of fungal chromosomal genomes. To overcome this challenge, this paper firstly deals with the approaches of the identification and annotation of fungal genomes based on short and long reads sequenced by using multiple platforms such as Illumina and Pacbio. Then this paper develops an automated bioinformatics pipeline called PFGI for the identification and annotation task. The experimental evaluation on a real-world dataset ENA (European Nucleotide Archive) shows that PFGI provides a user-friendly way to perform fungal identification and annotation based on the sequencing data analysis, and could provide accurate analyzing results, accurate to the species level (97% sequence identity).
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Desprez-Loustau M L, Robin C, Buée M, Courtecuisse R, Garbaye J, Suffert F, Sache I, Rizzo D M. The fungal dimension of biological invasions. Trends in Ecology & Evolution, 2007, 22(9): 472-480. https://doi.org/10.1016/j.tree.2007.04.005.
Schuster S C. Next-generation sequencing transforms today’s biology. Nature Methods, 2008, 5(1): 16-18. https://doi.org/10.1038/nmeth1156.
van Dijk E L, Auger H, Jaszczyszyn Y, Thermes C. Ten years of next-generation sequencing technology. Trends in Genetics, 2014, 30(9): 418-426. https://doi.org/10.1016/j.tig.2014.07.001.
van Dijk E L, Jaszczyszyn Y, Naquin D, Thermes C. The third revolution in sequencing technology. Trends in Genetics, 2018, 34(9): 666-681. https://doi.org/10.1016/j.tig.2018.05.008.
Dannemiller K C, Reeves D, Bibby K, Yamamoto N, Peccia J. Fungal high-throughput taxonomic identification tool for use with next-generation sequencing (FHiTINGS). Journal of Basic Microbiology, 2014, 54(4): 315-321. https://doi.org/10.1002/jobm.201200507.
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden T L. BLAST+: Architecture and applications. BMC Bioinformatics, 2009, 10(1): Article No. 421. https://doi.org/10.1186/1471-2105-10-421.
Gweon H S, Oliver A, Taylor J, Booth T, Gibbs M, Read D S, Griffiths R I, Schonrogge K. PIPITS: An automated pipeline for analyses of fungal internal transcribed spacer sequences from the I llumina sequencing platform. Methods in Ecology and Evolution, 2015, 6(8): 973-980. https://doi.org/10.1111/2041-210X.12399.
Eng A, Verster A J, Borenstein E. Meta-LAFFA: A flexible, end-to-end, distributed computing-compatible metagenomic functional annotation pipeline. BMC Bioinformatics, 2020, 21(1): Article No. 471. https://doi.org/10.1186/s12859-020-03815-9.
Clarke E L, Taylor L J, Zhao C, Connell A, Lee J J, Fett B, Bushman F D, Bittinger K. Sunbeam: An extensible pipeline for analyzing metagenomic sequencing experiments. Microbiome, 2019, 7(1): Article No. 46. https://doi.org/10.1186/s40168-019-0658-x.
Rhoads A, Au K F. PacBio sequencing and its applications. Genomics, Proteomics & Bioinformatics, 2015, 13(5): 278-289. https://doi.org/10.1016/j.gpb.2015.08.002.
Seemann T. Prokka: Rapid prokaryotic genome annotation. Bioinformatics, 2014, 30(14): 2068-2069. https://doi.org/10.1093/bioinformatics/btu153.
Jolley K A, Maiden M C. BIGSdb: Scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics, 2010, 11(1): Article No. 595. https://doi.org/10.1186/1471-2105-11-595.
Chen S, Zhou Y, Chen Y, Gu J. FASTQ: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 2018, 34(17): i884-i890. https://doi.org/10.1093/bioinformatics/bty560.
Bolger A M, Lohse M, Usadel B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics, 2014, 30(15): 2114-2120. https://doi.org/10.1093/bioinformatics/btu170.
Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet Journal, 2011, 17(1): 10-12. https://doi.org/10.14806/ej.17.1.200.
Benson D A, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman D J, Ostell J, Sayers E W. GenBank. Nucleic Acids Research, 2012, 41(D1): D36-D42. https://doi.org/10.1093/nar/gks1195.
Li D, Liu C M, Luo R, Sadakane K, Lam T W. MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 2015, 31(10): 1674-1676. https://doi.org/10.1093/bioinformatics/btv033.
Zerbino D R, Birney E. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 2008, 18(5): 821-829. https://doi.org/10.1101/gr.074492.107.
Bankevich A, Nurk S, Antipov D et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology, 2012, 19(5): 455-477. https://doi.org/10.1089/cmb.2012.0021.
Koren S, Walenz B P, Berlin K, Miller J R, Bergman N H, Phillippy A M. Canu: Scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research, 2017, 27(5): 722-736. https://doi.org/10.1101/gr.215087.116.
Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: Quality assessment tool for genome assemblies. Bioinformatics, 2013, 29(8): 1072-1075. https://doi.org/10.1093/bioinformatics/btt086.
Cock P J, Antao T, Chang J T et al. Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 2009, 25(11): 1422-1423. https://doi.org/10.1093/bioinformatics/btp163.
Rowe W P. When the levee breaks: A practical guide to sketching algorithms for processing the flood of genomic data. Genome Biology, 2019, 20(1): Article No. 199. https://doi.org/10.1186/s13059-019-1809-x.
Li H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics, 2018, 34(18): 3094-3100. https://doi.org/10.1093/bioinformatics/bty191.
Kanz C, Aldebert P, Althorpe N et al. The EMBL nucleotide sequence database. Nucleic Acids Research, 2005, 33(suppl_1): D29-D33. https://doi.org/10.1093/nar/gki098.
Cornish-Bowden A. Nomenclature for incompletely specified bases in nucleic acid sequences: Recommendations 1984. Nucleic Acids Research, 1985, 13(9): 3021-3030. https://doi.org/10.1093/nar/13.9.3021.
Caboche S, Even G, Loywick A, Audebert C, Hot D. MICRA: An automatic pipeline for fast characterization of microbial genomes from high-throughput sequencing data. Genome Biology, 2017, 18(1): Article No. 233. https://doi.org/10.1186/s13059-017-1367-z.
Author information
Authors and Affiliations
Corresponding author
Supplementary Information
ESM 1
(PDF 378 kb)
Rights and permissions
About this article
Cite this article
Liu, J., Sun, JL. & Liu, YZ. Effective Identification and Annotation of Fungal Genomes. J. Comput. Sci. Technol. 36, 248–260 (2021). https://doi.org/10.1007/s11390-021-0856-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-021-0856-4