research-article

Public Access

AIM: accelerating computational genomics through scalable and noninvasive accelerator-interposed memory

Authors:

Farnoosh Javadi,

Glenn ReinmanAuthors Info & Claims

MEMSYS '17: Proceedings of the International Symposium on Memory Systems

Pages 3 - 14

https://doi.org/10.1145/3132402.3132406

Published: 02 October 2017 Publication History

Abstract

Computational genomics plays an important role in health care, but is computationally challenging as most genomics applications use large data sets and are both computation-intensive and memory-intensive. Recent approaches with on-chip hardware accelerators can boost computing capability and energy efficiency, but are limited by the memory requirements of accelerators when processing workloads like computational genomics. In this paper we propose the accelerator-interposed memory (AIM) as a means of scalable and noninvasive near-memory acceleration. To avoid the high memory access latency and bandwidth limitation of CPU-side acceleration, we design accelerators as a separate package, called AIM module, and physically place an AIM module between each DRAM DIMM module and conventional memory bus network. Experimental results for genomics applications confirm the benefits of AIM. Due to the much lower memory access latency and scalable memory bandwidth, our noninvasive AIM achieves much better performance scalability than the CPU-side acceleration when the memory system scales up. When there are 16 instances of accelerators and DIMMs in the system, AIM achieves up to 3.7x better performance than the CPU-side acceleration.

References

[1]

Berkin Akin, Franz Franchetti, and James C Hoe. 2015. Data reorganization in memory using 3D-stacked DRAM. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. 131--143.

Digital Library

[2]

Stephen F Altschul, Warren Gish, Webb Miller, Eugene W Myers, and David J Lipman. 1990. Basic local alignment search tool. Journal of molecular biology 215, 3 (1990), 403--410.

[3]

Vijay Anand. 2004. NVIDIA's Scalable Link Interface (SLI). HardwareZone. com, Jun 30 (2004).

[4]

Alan Ashworth, Christopher J Lord, and Jorge S Reis-Filho. 2011. Genetic interactions in cancer progression and treatment. Cell 145, 1 (2011), 30--38.

[5]

Multi-Drop Bus. 2011. Multi-Drop Bus / Internal Communication Protocol. (2011). http://www.vending.org/images/pdfs/technology/mdb_version_4-2.pdf

[6]

Karthik Chandrasekar, Christian Weis, Yonghui Li, Benny Akesson, Norbert Wehn, and Kees Goossens. 2012. DRAMPower: Open-source DRAM power & energy estimation tool. URL: http://www.drampower.info (2012).

[7]

M. F. Chang, I. Verbauwhede, C. Chien, Z. Xu, J. Kim, J. Ko, Q. Gu, and B. Lai. 2005. Advanced RF/Baseband Interconnect Schemes for Inter- and Intra-ULSI communications. In IEEE Transactions on Electron Devices.

[8]

Jason Chiang, Michael Studniberg, Jack Shaw, Stephen Seto, and Kevin Truong. 2006. Hardware accelerator for genomic sequence alignment. In Engineering in Medicine and Biology Society, 2006. EMBS'06. 28th Annual International Conference of the IEEE. IEEE, 5787--5789.

[9]

Francis S Collins, Eric D Green, Alan E Guttmacher, and Mark S Guyer. 2003. A vision for the future of genomics research. Nature 422, 6934 (2003), 835--847.

[10]

Jason Cong, Mohammad Ali Ghodrat, Michael Gill, Beayna Grigorian, and Glenn Reinman. 2012. Architecture support for accelerator-rich cmps. In Proceedings of the 49th Annual Design Automation Conference. ACM, 843--849.

Digital Library

[11]

Jason Cong, Mohammad Ali Ghodrat, Michael Gill, Beayna Grigorian, and Glenn Reinman. 2012. CHARM: a composable heterogeneous accelerator-rich microprocessor. In Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design. ACM, 379--384.

Digital Library

[12]

Convey. 2015. {Online}. Available: (2015). http://www.conveycomputer.com/products/hcseries/

[13]

Duncan G Elliott, Michael Stumm, W Martin Snelgrove, Christian Cojocaru, and Robert McKenzie. 1999. Computational RAM: Implementing processors in memory. Design & Test of Computers, IEEE 16, 1 (1999), 32--41.

Digital Library

[14]

Tom Feist. 2012. Vivado design suite. Xilinx, White Paper Version 1 (2012).

[15]

Hubertus Franke, Jimi Xenidis, Claude Basso, Brian M Bass, Sandra S Woodward, Jeffrey D Brown, and Charles L Johnson. 2010. Introduction to the wire-speed processor and architecture. IBM Journal of Research and Development 54, 1 (2010), 3--1.

Digital Library

[16]

Cristina Y González, Marta Bleda, Francisco Salavert, Rubén Sánchez, Joaquín Dopazo, and Ignacio Medina. 2013. Multicore and cloud-based solutions for genomic variant analysis. In Euro-Par 2012: Parallel Processing Workshops. Springer, 273--284.

Digital Library

[17]

Venkatraman Govindaraju, Chen-Han Ho, and Karthikeyan Sankaralingam. 2011. Dynamically specialized datapaths for energy efficient computing. In HPCA'11. IEEE, 503--514.

Digital Library

[18]

Alan E Guttmacher, Amy L McGuire, Bruce Ponder, and Kári Stefánsson. 2010. Personalized genomic information: preparing for the future of genetic medicine. Nature Reviews Genetics 11, 2 (2010), 161--165.

[19]

Intel HARP. 2015. {Online}. Available: (2015). http://www.sigarch.org/2015/01/17/call-for-proposals-intel-altera-heterogeneous-architecture-research-platform-program/

[20]

Laiq Hasan, Zaid Al-Ars, and Stamatis Vassiliadis. 2007. Hardware acceleration of sequence alignment algorithms-an overview. In Design & Technology of Integrated Systems in Nanoscale Era, 2007. DTIS. International Conference on. IEEE, 92--97.

[21]

Analog ICs. 2012. NXP SEMICONDUCTOR Analog ICs. (2012).

[22]

Joe Jeddeloh and Brent Keeth. 2012. Hybrid memory cube new DRAM architecture increases density and performance. In VLSI Technology (VLSIT), 2012 Symposium on. IEEE, 87--88.

[23]

Sang-Woo Jun, Ming Liu, Sungjin Lee, Jamey Hicks, John Ankcorn, Myron King, Shuotao Xu, and Arvind. 2016. BlueDBM: Distributed Flash Storage for Big Data Analytics. ACM Trans. Comput. Syst. 34, 3, Article 7, 31 pages.

Digital Library

[24]

Tony Kam-Thong, C-A Azencott, Lawrence Cayton, Benno Pütz, André Altmann, Nazanin Karbalai, Philipp G Sämann, Bernhard Schölkopf, Bertram Müller-Myhsok, and Karsten M Borgwardt. 2012. GLIDE: GPU-based linear regression for detection of epistasis. Human heredity 73, 4 (2012), 220--236.

[25]

Joonyoung Kim and Younsu Kim. 2014. HBM: Memory Solution for Bandwidth-Hungry Processors. In HotChips.

[26]

Christoforos Kozyrakis, Joseph Gebis, David Martin, Samuel Williams, Ioannis Mavroidis, Steven Pope, Darren Jones, David Patterson, and Katherine Yelick. 2000. Vector IRAM: A media-oriented vector processor with embedded DRAM. In Proc. Hot Chips XII.

[27]

Tak-Wah Lam, Kunihiko Sadakane, Wing-Kin Sung, and Siu-Ming Yiu. 2002. A space and time efficient algorithm for constructing compressed suffix arrays. In Computing and Combinatorics. Springer, 401--410.

Digital Library

[28]

Heng Li and Richard Durbin. 2010. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 5 (2010), 589--595.

Digital Library

[29]

Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth, Goncalo Abecasis, Richard Durbin, et al. 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25, 16 (2009), 2078--2079.

Digital Library

[30]

Yongchao Liu and Bertil Schmidt. 2012. Evaluation of GPU-based seed generation for computational genomics using Burrows-Wheeler transform. In Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International. IEEE, 684--690.

Digital Library

[31]

Peter S. Magnusson et al. 2002. Simics: A Full System Simulation Platform. Computer 35 (2002), 50--58.

Digital Library

[32]

Teri A Manolio, Francis S Collins, Nancy J Cox, David B Goldstein, Lucia A Hindorff, David J Hunter, Mark I McCarthy, Erin M Ramos, Lon R Cardon, Aravinda Chakravarti, et al. 2009. Finding the missing heritability of complex diseases. Nature 461, 7265 (2009), 747--753.

[33]

M. Martin et al. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. In Computer Architecture New, Sep 2005.

Digital Library

[34]

Agathoklis Papadopoulos, Ioannis Kirmitzoglou, Vasilis J Promponas, and Theocharis Theocharides. 2013. FPGA-based hardware acceleration for local complexity analysis of massive genomic data. Integration, the VLSI Journal 46, 3 (2013), 230--239.

Digital Library

[35]

David Patterson, Thomas Anderson, Neal Cardwell, Richard Fromm, Kimberly Keeton, Christoforos Kozyrakis, Randi Thomas, and Katherine Yelick. 1997. A case for intelligent RAM. MICRO'97 17, 2 (1997), 34--44.

Digital Library

[36]

Jonathan Pevsner. 2009. Bioinformatics and functional genomics. John Wiley & Sons.

Digital Library

[37]

Louis-Noël Pouchet, Peng Zhang, P. Sadayappan, and Jason Cong. 2013. Polyhedral-Based Data Reuse Optimization for Configurable Computing. In 21st ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'13). ACM Press, Monterey, California.

Digital Library

[38]

Tuomo Rankinen, Aamir Zuberi, Yvon C Chagnon, S John Weisnagel, George Argyropoulos, Brandon Walts, Louis Pérusse, and Claude Bouchard. 2006. The human obesity gene map: the 2005 update. Obesity 14, 4 (2006), 529--644.

[39]

S. Kumar C. Paar J. Pelzl G. Pfeiffer M. Schimmler. 2009. Breaking Ciphers with COPACOBANA. A Cost-Optimized Parallel Code Break.

[40]

Toshia Sunaga, Peter M Kogge, et al. 1996. A processor in memory chip for massively parallel embedded applications. IEEE J. of Solid State Circuits (1996), 1556--1559.

[41]

Michael Tan, Paul Rosenberg, Jong Souk Yeo, Moray McLaren, Sagi Mathai, Terry Morris, Huei Pei Kuo, Joseph Straznicky, Norman P Jouppi, and Shih-Yuan Wang. 2009. A high-speed optical multi-drop bus for computer interconnections. Applied Physics A 95, 4 (2009), 945--953.

[42]

Michael Tan, Paul Rosenberg, Jong Souk Yeo, Moray McLaren, Sagi Mathai, Terry Morris, Huei Pei Kuo, Joseph Straznicky, Norman P Jouppi, and Shih-Yuan Wang. 2009. A high-speed optical multi-drop bus for computer interconnections. Applied Physics A 95, 4 (2009), 945--953.

[43]

M. Tan, P. Rosenberg, Jong Souk Yeo, M. McLaren, S. Mathai, T. Morris, J. Straznicky, N.P. Jouppi, Huei Pei Kuo, Shih-Yuan Wang, S. Lerner, P. Kornilovich, N. Meyer, R. Bicknell, C. Otis, and L. Seals. 2008. A High-Speed Optical Multi-Drop Bus for Computer Interconnections. In High Performance Interconnects, 2008. HOTI '08. 16th IEEE Symposium on. 3--10.

Digital Library

[44]

John Watkins, Raymond Roth, Michael Hsieh, William Radke, Donald Hejna, Byung Kim, and Richard Tom. 1993. A memory controller with an integrated graphics processor. In Computer Design: VLSI in Computers and Processors, 1993. ICCD'93. Proceedings., 1993 IEEE International Conference on. IEEE, 324--338.

[45]

Scott M Williams, MD Ritchie, JA Phillips Iii, E Dawson, M Prince, E Dzhura, A Willis, A Semenya, M Summar, BC White, et al. 2004. Multilocus analysis of hypertension: a hierarchical approach. Human heredity 57, 1 (2004), 28--38.

[46]

Xilinx. 2014. Enabling High-Speed Radio Designs with Xilinx All Programmable FPGAs and SoCs. (2014). Retrieved May 20, 2016 from http://www.xilinx.com/support/documentation/white_papers/wp445_hi-speed-radio-design.pdf

[47]

Salessawi Ferede Yitbarek, Tao Yang, Reetuparna Das, and Todd Austin. 2016. Exploring specialized near-memory processing for data intensive operations. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1449--1452.

Digital Library

[48]

Eleftheria Zeggini, Michael N Weedon, Cecilia M Lindgren, Timothy M Frayling, Katherine S Elliott, Hana Lango, Nicholas J Timpson, JR Perry, Nigel W Rayner, Rachel M Freathy, et al. 2007. Wellcome Trust Case Control Consortium (WTCCC), McCarthy MI, Hattersley AT: Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316, 5829 (2007), 1336--1341.

Cited By

Zhao YGao MLiu FHu YWang ZLin HLi JXian HDong HYang TJing NLiang XJiang L(2024)UM-PIM: DRAM-based PIM with Uniform & Shared Memory Space2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00053(644-659)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00053
Noh SHong JLim CPark SKim JKim HKim YLee J(2024)PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00027(245-260)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00027
Kim JKim SKim S(2024)Supporting Multi-Channels to DRAM-based PIM Execution for Boosting the Performance2024 International Conference on Electronics, Information, and Communication (ICEIC)10.1109/ICEIC61013.2024.10457142(1-4)Online publication date: 28-Jan-2024
https://doi.org/10.1109/ICEIC61013.2024.10457142
Show More Cited By

Index Terms

AIM: accelerating computational genomics through scalable and noninvasive accelerator-interposed memory
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Special purpose systems
2. Hardware
  1. Emerging technologies
    1. Memory and dense storage

Recommendations

AIM: Energy-Efficient Aggregation Inside the Memory Hierarchy

In this article, we propose Aggregation-in-Memory (AIM), a new processing-in-memory system designed for energy efficiency and near-term adoption. In order to efficiently perform aggregation, we implement simple aggregation operations in main memory and ...
Image-Based Computational Modelling of Coronary Autoregulation with the Primary Aim of Improving Risk Stratification of Alagille's Syndrome Patients Prior to Liver Transplantation
AIM: attentionally based interaction model for the interpretation of vascular angiography

We propose a model to interpret neurovascular X-ray angiogram (XRA) images interactively. This attentionally based interactive model (AIM) exploits human interaction as part of the solution. AIM posits two channels of interaction: context (what to look ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

MEMSYS '17: Proceedings of the International Symposium on Memory Systems

October 2017

409 pages

ISBN:9781450353359

DOI:10.1145/3132402

General Chair:
Bruce Jacob
University of Maryland

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Intel
UCLA Institute for Digital Research and Education Postdoc Fellowship
C-FAR, one of
IBM Research Almaden
Huawei
Fujitsu Labs
NSF
Mentor Graphics
Google
Baidu

Conference

MEMSYS 2017

MEMSYS 2017: The International Symposium on Memory Systems, 2017

October 2 - 5, 2017

Virginia, Alexandria

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
630
Total Downloads

Downloads (Last 12 months)125
Downloads (Last 6 weeks)23

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhao YGao MLiu FHu YWang ZLin HLi JXian HDong HYang TJing NLiang XJiang L(2024)UM-PIM: DRAM-based PIM with Uniform & Shared Memory Space2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00053(644-659)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00053
Noh SHong JLim CPark SKim JKim HKim YLee J(2024)PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00027(245-260)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00027
Kim JKim SKim S(2024)Supporting Multi-Channels to DRAM-based PIM Execution for Boosting the Performance2024 International Conference on Electronics, Information, and Communication (ICEIC)10.1109/ICEIC61013.2024.10457142(1-4)Online publication date: 28-Jan-2024
https://doi.org/10.1109/ICEIC61013.2024.10457142
Zhang CSun HLi SWang YChen HLiu H(2023)A Survey of Memory-Centric Energy Efficient Computer ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329759534:10(2657-2670)Online publication date: Oct-2023
https://doi.org/10.1109/TPDS.2023.3297595
Zhou ZLi CYang FSun G(2023)DIMM-Link: Enabling Efficient Inter-DIMM Communication for Near-Memory Processing2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071005(302-316)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071005
Huangfu WMalladi KChang AXie Y(2022)BEACON: Scalable Near-Data-Processing Accelerators for Genome Analysis near Memory Pool with the CXL Support2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO56248.2022.00057(727-743)Online publication date: Oct-2022
https://doi.org/10.1109/MICRO56248.2022.00057
Sinha AYang HLiu PKuo YFang YChang TLi KLai B(2022)DSIM: Distributed Sequence Matching on Near-DRAM Accelerator for Genome AssemblyIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2022.317277412:2(486-499)Online publication date: Jun-2022
https://doi.org/10.1109/JETCAS.2022.3172774
Rai SSivasubramaniam AKumar ARengasamy PNarayanan VAkel AEilert SPalesi MTumeo AGoumas GAlmudever C(2021)Design space for scaling-in general purpose computing within the DDR DRAM hierarchy for map-reduce workloadsProceedings of the 18th ACM International Conference on Computing Frontiers10.1145/3457388.3458661(113-123)Online publication date: 11-May-2021
https://dl.acm.org/doi/10.1145/3457388.3458661
Khatamifard SChowdhury ZPande NRazaviyayn MKim CKarpuzcu U(2021)GeNVoM: Read Mapping Near Non-Volatile MemoryIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2021.3118018(1-1)Online publication date: 2021
https://doi.org/10.1109/TCBB.2021.3118018
Baskaran SSampson J(2020)Decentralized Offload-based Execution on Memory-centric Compute CoresProceedings of the International Symposium on Memory Systems10.1145/3422575.3422778(61-76)Online publication date: 28-Sep-2020
https://dl.acm.org/doi/10.1145/3422575.3422778
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents