Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3132402.3132406acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article
Public Access

AIM: accelerating computational genomics through scalable and noninvasive accelerator-interposed memory

Published: 02 October 2017 Publication History

Abstract

Computational genomics plays an important role in health care, but is computationally challenging as most genomics applications use large data sets and are both computation-intensive and memory-intensive. Recent approaches with on-chip hardware accelerators can boost computing capability and energy efficiency, but are limited by the memory requirements of accelerators when processing workloads like computational genomics. In this paper we propose the accelerator-interposed memory (AIM) as a means of scalable and noninvasive near-memory acceleration. To avoid the high memory access latency and bandwidth limitation of CPU-side acceleration, we design accelerators as a separate package, called AIM module, and physically place an AIM module between each DRAM DIMM module and conventional memory bus network. Experimental results for genomics applications confirm the benefits of AIM. Due to the much lower memory access latency and scalable memory bandwidth, our noninvasive AIM achieves much better performance scalability than the CPU-side acceleration when the memory system scales up. When there are 16 instances of accelerators and DIMMs in the system, AIM achieves up to 3.7x better performance than the CPU-side acceleration.

References

[1]
Berkin Akin, Franz Franchetti, and James C Hoe. 2015. Data reorganization in memory using 3D-stacked DRAM. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. 131--143.
[2]
Stephen F Altschul, Warren Gish, Webb Miller, Eugene W Myers, and David J Lipman. 1990. Basic local alignment search tool. Journal of molecular biology 215, 3 (1990), 403--410.
[3]
Vijay Anand. 2004. NVIDIA's Scalable Link Interface (SLI). HardwareZone. com, Jun 30 (2004).
[4]
Alan Ashworth, Christopher J Lord, and Jorge S Reis-Filho. 2011. Genetic interactions in cancer progression and treatment. Cell 145, 1 (2011), 30--38.
[5]
Multi-Drop Bus. 2011. Multi-Drop Bus / Internal Communication Protocol. (2011). http://www.vending.org/images/pdfs/technology/mdb_version_4-2.pdf
[6]
Karthik Chandrasekar, Christian Weis, Yonghui Li, Benny Akesson, Norbert Wehn, and Kees Goossens. 2012. DRAMPower: Open-source DRAM power & energy estimation tool. URL: http://www.drampower.info (2012).
[7]
M. F. Chang, I. Verbauwhede, C. Chien, Z. Xu, J. Kim, J. Ko, Q. Gu, and B. Lai. 2005. Advanced RF/Baseband Interconnect Schemes for Inter- and Intra-ULSI communications. In IEEE Transactions on Electron Devices.
[8]
Jason Chiang, Michael Studniberg, Jack Shaw, Stephen Seto, and Kevin Truong. 2006. Hardware accelerator for genomic sequence alignment. In Engineering in Medicine and Biology Society, 2006. EMBS'06. 28th Annual International Conference of the IEEE. IEEE, 5787--5789.
[9]
Francis S Collins, Eric D Green, Alan E Guttmacher, and Mark S Guyer. 2003. A vision for the future of genomics research. Nature 422, 6934 (2003), 835--847.
[10]
Jason Cong, Mohammad Ali Ghodrat, Michael Gill, Beayna Grigorian, and Glenn Reinman. 2012. Architecture support for accelerator-rich cmps. In Proceedings of the 49th Annual Design Automation Conference. ACM, 843--849.
[11]
Jason Cong, Mohammad Ali Ghodrat, Michael Gill, Beayna Grigorian, and Glenn Reinman. 2012. CHARM: a composable heterogeneous accelerator-rich microprocessor. In Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design. ACM, 379--384.
[12]
Convey. 2015. {Online}. Available: (2015). http://www.conveycomputer.com/products/hcseries/
[13]
Duncan G Elliott, Michael Stumm, W Martin Snelgrove, Christian Cojocaru, and Robert McKenzie. 1999. Computational RAM: Implementing processors in memory. Design & Test of Computers, IEEE 16, 1 (1999), 32--41.
[14]
Tom Feist. 2012. Vivado design suite. Xilinx, White Paper Version 1 (2012).
[15]
Hubertus Franke, Jimi Xenidis, Claude Basso, Brian M Bass, Sandra S Woodward, Jeffrey D Brown, and Charles L Johnson. 2010. Introduction to the wire-speed processor and architecture. IBM Journal of Research and Development 54, 1 (2010), 3--1.
[16]
Cristina Y González, Marta Bleda, Francisco Salavert, Rubén Sánchez, Joaquín Dopazo, and Ignacio Medina. 2013. Multicore and cloud-based solutions for genomic variant analysis. In Euro-Par 2012: Parallel Processing Workshops. Springer, 273--284.
[17]
Venkatraman Govindaraju, Chen-Han Ho, and Karthikeyan Sankaralingam. 2011. Dynamically specialized datapaths for energy efficient computing. In HPCA'11. IEEE, 503--514.
[18]
Alan E Guttmacher, Amy L McGuire, Bruce Ponder, and Kári Stefánsson. 2010. Personalized genomic information: preparing for the future of genetic medicine. Nature Reviews Genetics 11, 2 (2010), 161--165.
[19]
Intel HARP. 2015. {Online}. Available: (2015). http://www.sigarch.org/2015/01/17/call-for-proposals-intel-altera-heterogeneous-architecture-research-platform-program/
[20]
Laiq Hasan, Zaid Al-Ars, and Stamatis Vassiliadis. 2007. Hardware acceleration of sequence alignment algorithms-an overview. In Design & Technology of Integrated Systems in Nanoscale Era, 2007. DTIS. International Conference on. IEEE, 92--97.
[21]
Analog ICs. 2012. NXP SEMICONDUCTOR Analog ICs. (2012).
[22]
Joe Jeddeloh and Brent Keeth. 2012. Hybrid memory cube new DRAM architecture increases density and performance. In VLSI Technology (VLSIT), 2012 Symposium on. IEEE, 87--88.
[23]
Sang-Woo Jun, Ming Liu, Sungjin Lee, Jamey Hicks, John Ankcorn, Myron King, Shuotao Xu, and Arvind. 2016. BlueDBM: Distributed Flash Storage for Big Data Analytics. ACM Trans. Comput. Syst. 34, 3, Article 7, 31 pages.
[24]
Tony Kam-Thong, C-A Azencott, Lawrence Cayton, Benno Pütz, André Altmann, Nazanin Karbalai, Philipp G Sämann, Bernhard Schölkopf, Bertram Müller-Myhsok, and Karsten M Borgwardt. 2012. GLIDE: GPU-based linear regression for detection of epistasis. Human heredity 73, 4 (2012), 220--236.
[25]
Joonyoung Kim and Younsu Kim. 2014. HBM: Memory Solution for Bandwidth-Hungry Processors. In HotChips.
[26]
Christoforos Kozyrakis, Joseph Gebis, David Martin, Samuel Williams, Ioannis Mavroidis, Steven Pope, Darren Jones, David Patterson, and Katherine Yelick. 2000. Vector IRAM: A media-oriented vector processor with embedded DRAM. In Proc. Hot Chips XII.
[27]
Tak-Wah Lam, Kunihiko Sadakane, Wing-Kin Sung, and Siu-Ming Yiu. 2002. A space and time efficient algorithm for constructing compressed suffix arrays. In Computing and Combinatorics. Springer, 401--410.
[28]
Heng Li and Richard Durbin. 2010. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 5 (2010), 589--595.
[29]
Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth, Goncalo Abecasis, Richard Durbin, et al. 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25, 16 (2009), 2078--2079.
[30]
Yongchao Liu and Bertil Schmidt. 2012. Evaluation of GPU-based seed generation for computational genomics using Burrows-Wheeler transform. In Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International. IEEE, 684--690.
[31]
Peter S. Magnusson et al. 2002. Simics: A Full System Simulation Platform. Computer 35 (2002), 50--58.
[32]
Teri A Manolio, Francis S Collins, Nancy J Cox, David B Goldstein, Lucia A Hindorff, David J Hunter, Mark I McCarthy, Erin M Ramos, Lon R Cardon, Aravinda Chakravarti, et al. 2009. Finding the missing heritability of complex diseases. Nature 461, 7265 (2009), 747--753.
[33]
M. Martin et al. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. In Computer Architecture New, Sep 2005.
[34]
Agathoklis Papadopoulos, Ioannis Kirmitzoglou, Vasilis J Promponas, and Theocharis Theocharides. 2013. FPGA-based hardware acceleration for local complexity analysis of massive genomic data. Integration, the VLSI Journal 46, 3 (2013), 230--239.
[35]
David Patterson, Thomas Anderson, Neal Cardwell, Richard Fromm, Kimberly Keeton, Christoforos Kozyrakis, Randi Thomas, and Katherine Yelick. 1997. A case for intelligent RAM. MICRO'97 17, 2 (1997), 34--44.
[36]
Jonathan Pevsner. 2009. Bioinformatics and functional genomics. John Wiley & Sons.
[37]
Louis-Noël Pouchet, Peng Zhang, P. Sadayappan, and Jason Cong. 2013. Polyhedral-Based Data Reuse Optimization for Configurable Computing. In 21st ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'13). ACM Press, Monterey, California.
[38]
Tuomo Rankinen, Aamir Zuberi, Yvon C Chagnon, S John Weisnagel, George Argyropoulos, Brandon Walts, Louis Pérusse, and Claude Bouchard. 2006. The human obesity gene map: the 2005 update. Obesity 14, 4 (2006), 529--644.
[39]
S. Kumar C. Paar J. Pelzl G. Pfeiffer M. Schimmler. 2009. Breaking Ciphers with COPACOBANA. A Cost-Optimized Parallel Code Break.
[40]
Toshia Sunaga, Peter M Kogge, et al. 1996. A processor in memory chip for massively parallel embedded applications. IEEE J. of Solid State Circuits (1996), 1556--1559.
[41]
Michael Tan, Paul Rosenberg, Jong Souk Yeo, Moray McLaren, Sagi Mathai, Terry Morris, Huei Pei Kuo, Joseph Straznicky, Norman P Jouppi, and Shih-Yuan Wang. 2009. A high-speed optical multi-drop bus for computer interconnections. Applied Physics A 95, 4 (2009), 945--953.
[42]
Michael Tan, Paul Rosenberg, Jong Souk Yeo, Moray McLaren, Sagi Mathai, Terry Morris, Huei Pei Kuo, Joseph Straznicky, Norman P Jouppi, and Shih-Yuan Wang. 2009. A high-speed optical multi-drop bus for computer interconnections. Applied Physics A 95, 4 (2009), 945--953.
[43]
M. Tan, P. Rosenberg, Jong Souk Yeo, M. McLaren, S. Mathai, T. Morris, J. Straznicky, N.P. Jouppi, Huei Pei Kuo, Shih-Yuan Wang, S. Lerner, P. Kornilovich, N. Meyer, R. Bicknell, C. Otis, and L. Seals. 2008. A High-Speed Optical Multi-Drop Bus for Computer Interconnections. In High Performance Interconnects, 2008. HOTI '08. 16th IEEE Symposium on. 3--10.
[44]
John Watkins, Raymond Roth, Michael Hsieh, William Radke, Donald Hejna, Byung Kim, and Richard Tom. 1993. A memory controller with an integrated graphics processor. In Computer Design: VLSI in Computers and Processors, 1993. ICCD'93. Proceedings., 1993 IEEE International Conference on. IEEE, 324--338.
[45]
Scott M Williams, MD Ritchie, JA Phillips Iii, E Dawson, M Prince, E Dzhura, A Willis, A Semenya, M Summar, BC White, et al. 2004. Multilocus analysis of hypertension: a hierarchical approach. Human heredity 57, 1 (2004), 28--38.
[46]
Xilinx. 2014. Enabling High-Speed Radio Designs with Xilinx All Programmable FPGAs and SoCs. (2014). Retrieved May 20, 2016 from http://www.xilinx.com/support/documentation/white_papers/wp445_hi-speed-radio-design.pdf
[47]
Salessawi Ferede Yitbarek, Tao Yang, Reetuparna Das, and Todd Austin. 2016. Exploring specialized near-memory processing for data intensive operations. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1449--1452.
[48]
Eleftheria Zeggini, Michael N Weedon, Cecilia M Lindgren, Timothy M Frayling, Katherine S Elliott, Hana Lango, Nicholas J Timpson, JR Perry, Nigel W Rayner, Rachel M Freathy, et al. 2007. Wellcome Trust Case Control Consortium (WTCCC), McCarthy MI, Hattersley AT: Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316, 5829 (2007), 1336--1341.

Cited By

View all
  • (2024)UM-PIM: DRAM-based PIM with Uniform & Shared Memory Space2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00053(644-659)Online publication date: 29-Jun-2024
  • (2024)PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00027(245-260)Online publication date: 29-Jun-2024
  • (2024)Supporting Multi-Channels to DRAM-based PIM Execution for Boosting the Performance2024 International Conference on Electronics, Information, and Communication (ICEIC)10.1109/ICEIC61013.2024.10457142(1-4)Online publication date: 28-Jan-2024
  • Show More Cited By

Index Terms

  1. AIM: accelerating computational genomics through scalable and noninvasive accelerator-interposed memory

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      MEMSYS '17: Proceedings of the International Symposium on Memory Systems
      October 2017
      409 pages
      ISBN:9781450353359
      DOI:10.1145/3132402
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 02 October 2017

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article

      Funding Sources

      • Intel
      • UCLA Institute for Digital Research and Education Postdoc Fellowship
      • C-FAR, one of
      • IBM Research Almaden
      • Huawei
      • Fujitsu Labs
      • NSF
      • Mentor Graphics
      • Google
      • Baidu

      Conference

      MEMSYS 2017

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)109
      • Downloads (Last 6 weeks)14
      Reflects downloads up to 04 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)UM-PIM: DRAM-based PIM with Uniform & Shared Memory Space2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00053(644-659)Online publication date: 29-Jun-2024
      • (2024)PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00027(245-260)Online publication date: 29-Jun-2024
      • (2024)Supporting Multi-Channels to DRAM-based PIM Execution for Boosting the Performance2024 International Conference on Electronics, Information, and Communication (ICEIC)10.1109/ICEIC61013.2024.10457142(1-4)Online publication date: 28-Jan-2024
      • (2023)A Survey of Memory-Centric Energy Efficient Computer ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329759534:10(2657-2670)Online publication date: Oct-2023
      • (2023)DIMM-Link: Enabling Efficient Inter-DIMM Communication for Near-Memory Processing2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071005(302-316)Online publication date: Feb-2023
      • (2022)BEACON: Scalable Near-Data-Processing Accelerators for Genome Analysis near Memory Pool with the CXL Support2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO56248.2022.00057(727-743)Online publication date: Oct-2022
      • (2022)DSIM: Distributed Sequence Matching on Near-DRAM Accelerator for Genome AssemblyIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2022.317277412:2(486-499)Online publication date: Jun-2022
      • (2021)Design space for scaling-in general purpose computing within the DDR DRAM hierarchy for map-reduce workloadsProceedings of the 18th ACM International Conference on Computing Frontiers10.1145/3457388.3458661(113-123)Online publication date: 11-May-2021
      • (2021)GeNVoM: Read Mapping Near Non-Volatile MemoryIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2021.3118018(1-1)Online publication date: 2021
      • (2020)Decentralized Offload-based Execution on Memory-centric Compute CoresProceedings of the International Symposium on Memory Systems10.1145/3422575.3422778(61-76)Online publication date: 28-Sep-2020
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media