Abstract
Gene fusion is a genomic alteration where two genes after a break event are juxtaposed to form a new hybrid gene, leading to possible cancer development and progression. However, identifying gene fusions is not a trivial process as it requires the management and processing countless amounts of data. Genomic data (particularly DNA and RNA) can reach up to 300 GB per sample. Furthermore, specific software and hardware architectures are required to correctly process this type of data. Although many tools are available for detecting gene fusions, to date, systematic workflows that are free and easily usable even by non-specialists are hardly available.
This paper presents an integrated system for identifying gene fusions in RNA and DNA genomic samples, focusing on hardware and software architectural aspects. The proposed workflow is easy-to-use, scalable, and highly reproducible. It includes five gene fusion detection tools, three mainly intended for RNA samples (EricScript, Arriba, FusionCatcher) and two for DNA samples (INTEGRATE and GeneFuse). The workflow runs on servers exploiting Nextflow (a DSL for data-driven computational pipelines), Docker containers, and Conda virtual environments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abate, F., et al.: Pegasus: a comprehensive annotation and prediction tool for detection of driver gene fusions in cancer. BMC Syst. Biol. 8, 97 (2014). https://doi.org/10.1186/s12918-014-0097-z
Ahmed, S., Ali, M.U., Ferzund, J., Sarwar, M.A., Rehman, A., Mehmood, A.: Modern data formats for big bioinformatics data analytics (2017). https://www.ijacsa.thesai.org
Allegretti, S., Bolelli, F., Cancilla, M., Pollastri, F., Canalini, L., Grana, C.: How does connected components labeling with decision trees perform on GPUs? In: International Conference on Computer Analysis of Images and Patterns, pp. 39–51. Springer (2019). https://doi.org/10.1007/978-3-030-29888-3_
Allegretti, S., Bolelli, F., Pollastri, F., Longhitano, S., Pellacani, G., Grana, C.: Supporting skin lesion diagnosis with content-based image retrieval. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 8053–8060. IEEE (2021)
Benelli, M., Pescucci, C., Marseglia, G., Severgnini, M., Torricelli, F., Magi, A.: Discovering chimeric transcripts in paired-end rna-seq data by using ericscript. Bioinformatics 28, 3232–3239 (2012). https://doi.org/10.1093/bioinformatics/bts617
Bolelli, F., Baraldi, L., Pollastri, F., Grana, C.: A hierarchical quasi-recurrent approach to video captioning. In: 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS), pp. 162–167. IEEE (2018)
Chen, S., Liu, M., Huang, T., Liao, W., Xu, M., Gu, J.: Genefuse: detection and visualization of target gene fusions from dna sequencing data. Int. J. Biol. Sci. 14, 843–848 (2018). https://doi.org/10.7150/ijbs.24626
Cirrincione, G., Randazzo, V., Kumar, R.R., Cirrincione, M., Pasero, E.: Growing curvilinear component analysis (GCCA) for stator fault detection in induction machines. In: Esposito, A., Faundez-Zanuy, M., Morabito, F.C., Pasero, E. (eds.) Neural Approaches to Dynamics of Signal Exchanges. SIST, vol. 151, pp. 235–244. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-8950-4_22
Cirrincione, G., Randazzo, V., Pasero, E.: Growing curvilinear component analysis (GCCA) for dimensionality reduction of nonstationary data. In: Esposito, A., Faudez-Zanuy, M., Morabito, F.C., Pasero, E. (eds.) Multidisciplinary Approaches to Neural Computing. SIST, vol. 69, pp. 151–160. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-56904-8_15
Cirrincione, G., Randazzo, V., Pasero, E.: A neural based comparative analysis for feature extraction from ECG signals. In: Esposito, A., Faundez-Zanuy, M., Morabito, F.C., Pasero, E. (eds.) Neural Approaches to Dynamics of Signal Exchanges. SIST, vol. 151, pp. 247–256. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-8950-4_23
Goecks, J., Nekrutenko, A., Taylor, J.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11(8), 1–13 (2010)
Killian, J.A., Topiwala, T.M., Pelletier, A.R., Frankhouser, D.E., Yan, P.S., Bundschuh, R.: Fuspot: a web-based tool for visual evaluation of fusion candidates. BMC Genom. 19, 139 (2018). https://doi.org/10.1186/s12864-018-4486-3
Kim, P., Yiya, K., Zhou, X.: Fgviewer: an online visualization tool for functional features of human fusion genes. Nucleic Acids Res. 48, W313–W320 (2021). https://doi.org/10.1093/NAR/GKAA364
Köster, J., Rahmann, S.: Snakemake-a scalable bioinformatics workflow engine. Bioinformatics 28(19), 2520–2522 (2012). https://doi.org/10.1093/bioinformatics/bts480, https://doi.org/10.1093/bioinformatics/bts480
Latysheva, N.S., Babu, M.M.: Discovering and understanding oncogenic gene fusions through data intensive computational approaches. Nucleic Acids Res. 44, 4487–4503 (2016). https://doi.org/10.1093/nar/gkw282
Lovino, M., Bontempo, G., Cirrincione, G., Ficarra, E.: Multi-omics classification on kidney samples exploiting uncertainty-aware models. In: Huang, D.-S., Jo, K.-H. (eds.) ICIC 2020. LNCS, vol. 12464, pp. 32–42. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60802-6_4
Lovino, M., Ciaburri, M.S., Urgese, G., Di Cataldo, S., Ficarra, E.: Deeprior: a deep learning tool for the prioritization of gene fusions. Bioinformatics 36(10), 3248–3250 (2020)
Lovino, M., Montemurro, M., Barrese, V.S., Ficarra, E.: Identifying the oncogenic potential of gene fusions exploiting mirnas. J. Biomed. Inform. 129, 104057 (2022)
Lovino, M., Urgese, G., Macii, E., Di Cataldo, S., Ficarra, E.: A deep learning approach to the screening of oncogenic gene fusions in humans. Int. J. Mol. Sci. 20(7), 1645 (2019)
Nicorici, D., et al.: Fusioncatcher - a tool for finding somatic fusion genes in paired-end rna-sequencing data. bioRxiv, p. 011650 (2014). https://doi.org/10.1101/011650
Paviglianiti, A., Randazzo, V., Pasero, E., Vallan, A.: Noninvasive arterial blood pressure estimation using abpnet and vital-ecg. In: 2020 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), pp. 1–5. IEEE (2020)
Ponzio, F., Deodato, G., Macii, E., Di Cataldo, S., Ficarra, E.: Exploiting "uncertain" deep networks for data cleaning in digital pathology. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), pp. 1139–1143. IEEE (2020)
Ponzio, F., Villalobos, A.E.L., Mesin, L., de’Sperati, C., Roatta, S.: A human-computer interface based on the "voluntary" pupil accommodative response. Int. J. Hum. Comput. Stud. 126, 53–63 (2019)
Roy, S., et al.: Standards and guidelines for validating next-generation sequencing bioinformatics pipelines: a joint recommendation of the association for molecular pathology and the college of american pathologists, Jan 2018. https://doi.org/10.1016/j.jmoldx.2017.11.003
Shugay, M., Mendíbil, I.O.D., Vizmanos, J.L., Novo, F.J.: Oncofuse: a computational framework for the prediction of the oncogenic potential of gene fusions. Bioinformatics 29, 2539–2546 (2013). https://doi.org/10.1093/bioinformatics/btt445
Uhrig, S., et al.: Accurate and efficient detection of gene fusions from rna sequencing data
Vivian, J., et al.: Toil enables reproducible, open source, big biomedical data analyses. Nat. Biotechnol. 35(4), 314–316 (2017)
Wang, Q., Xia, J., Jia, P., Pao, W., Zhao, Z.: Application of next generation sequencing to human gene fusion detection: Computational tools, features and perspectives. Briefings Bioinf. 14, 506–519 (2013). https://doi.org/10.1093/bib/bbs044
Wang, Y., Shi, T., Song, X., Liu, B., Wei, J.: Gene fusion neoantigens: Emerging targets for cancer immunotherapy May 2021. https://doi.org/10.1016/j.canlet.2021.02.023
Williford, A., Betrán, E.: Gene fusion, May 2013. https://doi.org/10.1002/9780470015902.a0005099.pub3, https://onlinelibrary.wiley.com/doi/10.1002/9780470015902.a0005099.pub3
Zhang, J., Gao, T., Maher, C.A.: Integrate-vis: A tool for comprehensive gene fusion visualization. Scientific Reports 7, 17808 ( 2017). https://doi.org/10.1038/s41598-017-18257-2
Zhang, J., et al.: Integrate: gene fusion discovery using whole genome and transcriptome data. Genome Res. 26, 108–118 (2016). https://doi.org/10.1101/gr.186114.114
Funding
This study was funded by the European Union’s Horizon 2020 research and innovation programme DECIDER under Grant Agreement 965193.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Citarrella, F., Bontempo, G., Lovino, M., Ficarra, E. (2022). FusionFlow: An Integrated System Workflow for Gene Fusion Detection in Genomic Samples. In: Chiusano, S., et al. New Trends in Database and Information Systems. ADBIS 2022. Communications in Computer and Information Science, vol 1652. Springer, Cham. https://doi.org/10.1007/978-3-031-15743-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-15743-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15742-4
Online ISBN: 978-3-031-15743-1
eBook Packages: Computer ScienceComputer Science (R0)