Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

RabbitTrim: Highly Optimized Trimming of Illumina Sequencing Data on Multi-core Platforms

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2024)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 14955))

Included in the following conference series:

  • 573 Accesses

Abstract

Trimmomatic is a de-facto standard trimmer for Illumina sequencing data. However, limited by its sub-optimal implementation, it cannot fully exploit the computational power of common multi-core platforms. Therefore, we propose RabbitTrim, a highly optimized implementation of Trimmomatic based on efficient I/O strategies, parallel (de)compression engines, block-based memory pools, bitwise operations and vectorization techniques. RabbitTrim achieves speedups between 1.5x and 3.3x (3.7x and 8.0x) when processing plain (gzip-compressed) FASTQ files on a 48-core Intel server. Overall, RabbitTrim is able to process 101 GB gzip-compressed sequencing data in only 5 min while Trimmomatic requires at least 21 min. The source code is available at https://github.com/RabbitBio/RabbitTrim.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Adler, M.: pigz: A parallel implementation of Gzip for modern multi-processor, multi-core machines. Jet Propulsion Laboratory (2015)

    Google Scholar 

  2. Bolger, A.M., Lohse, M., Usadel, B.: Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15), 2114–2120 (2014)

    Article  Google Scholar 

  3. Chen, S., Zhou, Y., Chen, Y., Gu, J.: fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34(17), i884–i890 (2018)

    Article  Google Scholar 

  4. Fang, L.T., et al.: Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. Nat. Biotechnol. 39(9), 1151–1160 (2021)

    Article  Google Scholar 

  5. Kerbiriou, M., Chikhi, R.: Parallel decompression of Gzip-compressed files and random access to DNA sequences. In: 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 209–217. IEEE (2019)

    Google Scholar 

  6. Knespel, M., Brunst, H.: Rapidgzip: parallel decompression and seeking in Gzip files using cache prefetching. In: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, pp. 295–307 (2023)

    Google Scholar 

  7. Lindgreen, S.: AdapterRemoval: easy cleaning of next-generation sequencing reads. BMC. Res. Notes 5, 1–7 (2012)

    Article  Google Scholar 

  8. Martin, M.: Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17(1), 10–12 (2011)

    Article  Google Scholar 

  9. Schubert, M., Lindgreen, S., Orlando, L.: AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC. Res. Notes 9, 1–7 (2016)

    Article  Google Scholar 

  10. Sun, K.: Ktrim: an extra-fast and accurate adapter-and quality-trimmer for sequencing data. Bioinformatics 36(11), 3561–3562 (2020)

    Article  Google Scholar 

  11. Tucker, G., Oursler, R., Stern, J.: ISA-L Igzip: improvements to a fast deflate. In: 2017 Data Compression Conference (DCC), pp. 465–465. IEEE Computer Society (2017)

    Google Scholar 

  12. Yan, L., et al.: RabbitQCPlus: more efficient quality control for sequencing data. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 619–626. IEEE (2022)

    Google Scholar 

  13. Zhang, H., et al.: RabbitFX: efficient framework for FASTA/Q file parsing on modern multi-core platforms. IEEE/ACM Trans. Comput. Biol. Bioinform. (2022)

    Google Scholar 

  14. Zhou, X., Rokas, A.: Prevention, diagnosis and treatment of high-throughput sequencing data pathologies. Mol. Ecol. 23(7), 1679–1700 (2014)

    Article  Google Scholar 

Download references

Acknowledgement

This work is partially supported by NSFC Grants 62102231; Shandong Provincial Natural Science Foundation (ZR2021QF089); Engineering Research Center of Digital Media Technology, Ministry of Education, China.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zekun Yin , Xin Li or Weiguo Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, M. et al. (2024). RabbitTrim: Highly Optimized Trimming of Illumina Sequencing Data on Multi-core Platforms. In: Peng, W., Cai, Z., Skums, P. (eds) Bioinformatics Research and Applications. ISBRA 2024. Lecture Notes in Computer Science(), vol 14955. Springer, Singapore. https://doi.org/10.1007/978-981-97-5131-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5131-0_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5130-3

  • Online ISBN: 978-981-97-5131-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics