Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3581784.3607094acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Open access

Space Efficient Sequence Alignment for SRAM-Based Computing: X-Drop on the Graphcore IPU

Published: 11 November 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Dedicated accelerator hardware has become essential for processing AI-based workloads, leading to the rise of novel accelerator architectures. Furthermore, fundamental differences in memory architecture and parallelism have made these accelerators targets for scientific computing.
    The sequence alignment problem is fundamental in bioinformatics; we have implemented the X-Drop algorithm, a heuristic method for pairwise alignment that reduces search space, on the Graphcore Intelligence Processor Unit (IPU) accelerator. The X-Drop algorithm has an irregular computational pattern, which makes it difficult to accelerate due to load balancing.
    Here, we introduce a graph-based partitioning and queue-based batch system to improve load balancing. Our implementation achieves 10× speedup over a state-of-the-art GPU implementation and up to 4.65× compared to CPU. In addition, we introduce a memory-restricted X-Drop algorithm that reduces memory footprint by 55× and efficiently uses the IPU's limited low-latency SRAM. This optimization further improves strong scaling by 3.6×.

    References

    [1]
    Mohammed Alser, Jeremy Rotman, Dhrithi Deshpande, Kodi Taraszka, Huwenbo Shi, Pelin Icer Baykal, Harry Taegyun Yang, Victor Xue, Sergey Knyazev, Benjamin D. Singer, Brunilda Balliu, David Koslicki, Pavel Skums, Alex Zelikovsky, Can Alkan, Onur Mutlu, and Serghei Mangul. 2021. Technology Dictates Algorithms: Recent Developments in Read Alignment. Genome Biology 22, 1 (Aug. 2021), 249.
    [2]
    Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman. 1990. Basic Local Alignment Search Tool. Journal of Molecular Biology 215, 3 (Oct. 1990), 403--410.
    [3]
    Shanika L. Amarasinghe, Shian Su, Xueyi Dong, Luke Zappia, Matthew E. Ritchie, and Quentin Gouil. 2020. Opportunities and Challenges in Long-Read Sequencing Data Analysis. Genome Biology 21, 1 (Feb. 2020), 30.
    [4]
    Alberto Apostolico and Raffaele Giancarlo. 1998. Sequence Alignment in Molecular Biology. Journal of Computational Biology 5, 2 (Jan. 1998), 173--196.
    [5]
    Muaaz G. Awan, Jack Deslippe, Aydin Buluc, Oguz Selvitopi, Steven Hofmeyr, Leonid Oliker, and Katherine Yelick. 2020. ADEPT: A Domain Independent Sequence Alignment Strategy for Gpu Architectures. BMC Bioinformatics 21, 1 (Sept. 2020), 406.
    [6]
    Luk Burchard, Kristian Gregorius Hustad, Johannes Langguth, and Xing Cai. 2023. Enabling Unstructured-Mesh Computation on Massively Tiled AI-Processors: An Example of Accelerating In-Silico Cardiac Simulation. Frontiers in Physics 11 (2023), 105.
    [7]
    Luk Burchard, Johannes Moe, Daniel Thilo Schroeder, Konstantin Pogorelov, and Johannes Langguth. 2021. Ipug: Accelerating Breadth-First Graph Traversals Using Manycore Graphcore Ipus. In High Performance Computing: 36th International Conference, ISC High Performance 2021, Virtual Event, June 24--July 2, 2021, Proceedings. Springer, 291--309.
    [8]
    Damla Senol Cali, Gurpreet S. Kalsi, Zülal Bingöl, Can Firtina, Lavanya Subramanian, Jeremie S. Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand, Anant Nori, Allison Scibisz, Sreenivas Subramoney, Can Alkan, Saugata Ghose, and Onur Mutlu. 2020. GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis. arXiv:arXiv:2009.07692
    [9]
    L. Dagum and R. Menon. 1998. OpenMP: An Industry Standard API for Shared-Memory Programming. IEEE Computational Science and Engineering 5, 1 (Jan. 1998), 46--55.
    [10]
    Bálint Dömölki. 1964. An Algorithm for Syntactical Analysis. Computational Linguistics 3, 29--46 (1964), 151.
    [11]
    Zonghao Feng, Shuang Qiu, Lipeng Wang, and Qiong Luo. 2019. Accelerating Long Read Alignment on Three Processors. In Proceedings of the 48th International Conference on Parallel Processing. ACM, Kyoto Japan, 1--10.
    [12]
    Osamu Gotoh. 1982. An Improved Algorithm for Matching Biological Sequences. Journal of Molecular Biology 162, 3 (Dec. 1982), 705--708.
    [13]
    Gordon Gremme, Sascha Steinbiss, and Stefan Kurtz. 2013. GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations. IEEE/ACM transactions on computational biology and bioinformatics 10, 3 (2013), 645--656.
    [14]
    Giulia Guidi, Gabriel Raulet, Daniel Rokhsar, Leonid Oliker, Katherine Yelick, and Aydin Buluc. 2022. Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly. In Proceedings of the 51st International Conference on Parallel Processing. 1--11.
    [15]
    Giulia Guidi, Oguz Selvitopi, Marquita Ellis, Leonid Oliker, Katherine Yelick, and Aydin Buluc. 2020. Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly. arXiv:arXiv:2010.10055
    [16]
    Saransh Gupta, Mohsen Imani, Behnam Khaleghi, Venkatesh Kumar, and Tajana Rosing. 2019. RAPID: A ReRAM Processing in-Memory Architecture for DNA Sequence Alignment. In 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). 1--6.
    [17]
    S. Henikoff and J. G. Henikoff. 1992. Amino Acid Substitution Matrices from Protein Blocks. Proceedings of the National Academy of Sciences of the United States of America 89, 22 (Nov. 1992), 10915--10919.
    [18]
    D. S. Hirschberg. 1975. A Linear Space Algorithm for Computing Maximal Common Subsequences. Commun. ACM 18, 6 (June 1975), 341--343.
    [19]
    Zhe Jia, Blake Tillman, Marco Maggioni, and Daniele Paolo Scarpazza. 2019. Dissecting the Graphcore IPU Architecture via Microbenchmarking. arXiv:arXiv:1912.03413
    [20]
    Simon Knowles. 2021. Graphcore. In 2021 IEEE Hot Chips 33 Symposium (HCS). IEEE, 1--25.
    [21]
    Gary Lauterbach. 2021. The Path to Successful Wafer-Scale Integration: The Cerebras Story. IEEE Micro 41, 6 (Nov. 2021), 52--57.
    [22]
    Victor W. Lee, Changkyu Kim, Jatin Chhugani, Michael Deisher, Daehyun Kim, Anthony D. Nguyen, Nadathur Satish, Mikhail Smelyanskiy, Srinivas Chennupaty, Per Hammarlund, Ronak Singhal, and Pradeep Dubey. 2010. Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA '10). Association for Computing Machinery, New York, NY, USA, 451--460.
    [23]
    Heng Li. 2018. Minimap2: Pairwise Alignment for Nucleotide Sequences. Bioinformatics 34, 18 (Sept. 2018), 3094--3100.
    [24]
    Joël Lindegger, Damla Senol Cali, Mohammed Alser, Juan Gómez-Luna, Nika Mansouri Ghiasi, and Onur Mutlu. 2023. Scrooge: A Fast and Memory-Frugal Genomic Sequence Aligner for CPUs, GPUs, and ASICs. Bioinformatics (March 2023), btad151. arXiv:2208.09985 [cs, q-bio]
    [25]
    Cheng Ling and Khaled Benkrid. 2010. Design and Implementation of a CUDA-compatible GPU-based Core for Gapped BLAST Algorithm. Procedia Computer Science 1, 1 (May 2010), 495--504.
    [26]
    Chi-Man Liu, Thomas Wong, Edward Wu, Ruibang Luo, Siu-Ming Yiu, Yingrui Li, Bingqiang Wang, Chang Yu, Xiaowen Chu, Kaiyong Zhao, Ruiqiang Li, and Tak-Wah Lam. 2012. SOAP3: Ultra-Fast GPU-based Parallel Alignment Tool for Short Reads. Bioinformatics (Oxford, England) 28, 6 (March 2012), 878--879.
    [27]
    Yongchao Liu, Adrianto Wirawan, and Bertil Schmidt. 2013. CUDASW++ 3.0: Accelerating Smith-Waterman Protein Database Search by Coupling CPU and GPU SIMD Instructions. BMC Bioinformatics 14, 1 (April 2013), 117.
    [28]
    Thorben Louw and Simon McIntosh-Smith. 2021. Using the Graphcore IPU for Traditional HPC Applications. In 3rd Workshop on Accelerated Machine Learning (AccML).
    [29]
    Hans Meuer, Erich Strohmaier, Jack Dongarra, and Horst Simon. 2001. Top500 supercomputer sites.
    [30]
    Sparsh Mittal, Gaurav Verma, Brajesh Kaushik, and Farooq A. Khanday. 2021. A Survey of SRAM-based in-Memory Computing Techniques and Applications. Journal of Systems Architecture 119 (Oct. 2021), 102276.
    [31]
    André Müller, Bertil Schmidt, Richard Membarth, Roland Leißa, and Sebastian Hack. 2022. AnySeq/GPU: A Novel Approach for Faster Sequence Alignment on GPUs. In Proceedings of the 36th ACM International Conference on Supercomputing (ICS '22). Association for Computing Machinery, New York, NY, USA, 1--11.
    [32]
    Onur Mutlu, Saugata Ghose, Juan Gómez-Luna, and Rachata Ausavarungnirun. 2022. A Modern Primer on Processing in Memory. arXiv:arXiv:2012.03112
    [33]
    Eugene W. Myers and Webb Miller. 1988. Optimal Alignments in Linear Space. Bioinformatics 4, 1 (March 1988), 11--17.
    [34]
    S. B. Needleman and C. D. Wunsch. 1970. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins. Journal of Molecular Biology 48, 3 (March 1970), 443--453.
    [35]
    Sergey Nurk, Sergey Koren, Arang Rhie, Mikko Rautiainen, Andrey V. Bzikadze, Alla Mikheenko, Mitchell R. Vollger, Nicolas Altemose, Lev Uralsky, Ariel Gershman, Sergey Aganezov, Savannah J. Hoyt, Mark Diekhans, Glennis A. Logsdon, Michael Alonge, Stylianos E. Antonarakis, Matthew Borchers, Gerard G. Bouffard, Shelise Y. Brooks, Gina V. Caldas, Nae-Chyun Chen, Haoyu Cheng, Chen-Shan Chin, William Chow, Leonardo G. de Lima, Philip C. Dishuck, Richard Durbin, Tatiana Dvorkina, Ian T. Fiddes, Giulio Formenti, Robert S. Fulton, Arkarachai Fungtammasan, Erik Garrison, Patrick G. S. Grady, Tina A. Graves-Lindsay, Ira M. Hall, Nancy F. Hansen, Gabrielle A. Hartley, Marina Haukness, Kerstin Howe, Michael W. Hunkapiller, Chirag Jain, Miten Jain, Erich D. Jarvis, Peter Kerpedjiev, Melanie Kirsche, Mikhail Kolmogorov, Jonas Korlach, Milinn Kremitzki, Heng Li, Valerie V. Maduro, Tobias Marschall, Ann M. McCartney, Jennifer McDaniel, Danny E. Miller, James C. Mullikin, Eugene W. Myers, Nathan D. Olson, Benedict Paten, Paul Peluso, Pavel A. Pevzner, David Porubsky, Tamara Potapova, Evgeny I. Rogaev, Jeffrey A. Rosenfeld, Steven L. Salzberg, Valerie A. Schneider, Fritz J. Sedlazeck, Kishwar Shafin, Colin J. Shew, Alaina Shumate, Ying Sims, Arian F. A. Smit, Daniela C. Soto, Ivan Sović, Jessica M. Storer, Aaron Streets, Beth A. Sullivan, Françoise Thibaud-Nissen, James Torrance, Justin Wagner, Brian P. Walenz, Aaron Wenger, Jonathan M. D. Wood, Chunlin Xiao, Stephanie M. Yan, Alice C. Young, Samantha Zarate, Urvashi Surti, Rajiv C. McCoy, Megan Y. Dennis, Ivan A. Alexandrov, Jennifer L. Gerton, Rachel J. O'Neill, Winston Timp, Justin M. Zook, Michael C. Schatz, Evan E. Eichler, Karen H. Miga, and Adam M. Phillippy. 2022. The Complete Sequence of a Human Genome. Science 376, 6588 (April 2022), 44--53.
    [36]
    René Rahn, Stefan Budach, Pascal Costanza, Marcel Ehrhardt, Jonny Hancox, and Knut Reinert. 2018. Generic Accelerated Sequence Alignment in SeqAn Using Vectorization and Multi-Threading. Bioinformatics 34, 20 (Oct. 2018), 3437--3445.
    [37]
    Knut Reinert, Temesgen Hailemariam Dadi, Marcel Ehrhardt, Hannes Hauswedell, Svenja Mehringer, René Rahn, Jongkyu Kim, Christopher Pockrandt, Jörg Winkler, Enrico Siragusa, Gianvito Urgese, and David Weese. 2017. The SeqAn C++ Template Library for Efficient Sequence Analysis: A Resource for Programmers. Journal of Biotechnology 261 (Nov. 2017), 157--168.
    [38]
    Oguz Selvitopi, Saliya Ekanayake, Giulia Guidi, Georgios A Pavlopoulos, Ariful Azad, and Aydın Buluç. 2020. Distributed many-to-many protein sequence alignment using sparse matrices. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--14.
    [39]
    Mantas Sereika, Rasmus Hansen Kirkegaard, Søren Michael Karst, Thomas Yssing Michaelsen, Emil Aarre Sørensen, Rasmus Dam Wollenberg, and Mads Albertsen. 2022. Oxford Nanopore R10.4 Long-Read Sequencing Enables the Generation of near-Finished Bacterial Genomes from Pure Cultures and Metagenomes without Short-Read or Reference Polishing. Nature Methods 19, 7 (July 2022), 823--826.
    [40]
    Gagandeep Singh, Mohammed Alser, Damla Senol Cali, Dionysios Diamantopoulos, Juan Gómez-Luna, Henk Corporaal, and Onur Mutlu. 2021. FPGA-Based Near-Memory Acceleration of Modern Data-Intensive Applications. IEEE Micro 41, 4 (July 2021), 39--48.
    [41]
    Martin Steinegger and Johannes Söding. 2018. Clustering huge protein sequence sets in linear time. Nature communications 9, 1 (2018), 2542.
    [42]
    Hajime Suzuki and Masahiro Kasahara. 2018. Introducing Difference Recurrence Relations for Faster Semi-Global Alignment of Long Sequences. BMC Bioinformatics 19, 1 (Feb. 2018), 45.
    [43]
    Leslie G Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (1990), 103--111.
    [44]
    Mário Véstias and Horácio Neto. 2014. Trends of CPU, GPU and FPGA for High-Performance Computing. In 2014 24th International Conference on Field Programmable Logic and Applications (FPL). 1--6.
    [45]
    Panagiotis D. Vouzis and Nikolaos V. Sahinidis. 2011. GPU-BLAST: Using Graphics Processors to Accelerate Protein Sequence Alignment. Bioinformatics (Oxford, England) 27, 2 (Jan. 2011), 182--188.
    [46]
    Weihong Xu, Saransh Gupta, Niema Moshiri, and Tajana Rosing. 2023. RAPIDx: High-performance ReRAM Processing in-Memory Accelerator for Sequence Alignment. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2023), 1--1. arXiv:2211.05733 [cs]
    [47]
    Weicai Ye, Ying Chen, Yongdong Zhang, and Yuesheng Xu. 2017. H-BLAST: A Fast Protein Sequence Alignment Toolkit on Heterogeneous Computers with GPUs. Bioinformatics 33, 8 (April 2017), 1130--1138.
    [48]
    Alberto Zeni, Guido Walter Di Donato, Lorenzo Di Tucci, Marco Rabozzi, and Marco D. Santambrogio. 2021. The Importance of Being X-Drop: High Performance Genome Alignment on Reconfigurable Hardware. In 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 133--141.
    [49]
    Alberto Zeni, Giulia Guidi, Marquita Ellis, Nan Ding, Marco D. Santambrogio, Steven Hofmeyr, Aydin Buluc, Leonid Oliker, and Katherine Yelick. 2020. LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, New Orleans, LA, USA, 462--471.
    [50]
    Zheng Zhang, Piotr Berman, and Webb Miller. 1998. Alignments Without Low-Scoring Regions. Journal of Computational Biology 5, 2 (Jan. 1998), 197--210.
    [51]
    Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller. 2000. A Greedy Algorithm for Aligning DNA Sequences. Journal of Computational Biology 7, 1--2 (Feb. 2000), 203--214.
    [52]
    Farzaneh Zokaee, Hamid R. Zarandi, and Lei Jiang. 2018. AligneR: A Process-in-Memory Architecture for Short Read Alignment in ReRAMs. IEEE Computer Architecture Letters 17, 2 (July 2018), 237--240.

    Cited By

    View all
    • (2024)iPuma: High-Performance Sequence Alignment on the Graphcore IPUISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528941(1-11)Online publication date: May-2024

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
    November 2023
    1428 pages
    ISBN:9798400701092
    DOI:10.1145/3581784
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 November 2023

    Check for updates

    Qualifiers

    • Research-article

    Conference

    SC '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)328
    • Downloads (Last 6 weeks)62

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)iPuma: High-Performance Sequence Alignment on the Graphcore IPUISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528941(1-11)Online publication date: May-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media