Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3123939.3124553acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Public Access

Summarizer: trading communication with computing near storage

Published: 14 October 2017 Publication History

Abstract

Modern data center solid state drives (SSDs) integrate multiple general-purpose embedded cores to manage flash translation layer, garbage collection, wear-leveling, and etc., to improve the performance and the reliability of SSDs. As the performance of these cores steadily improves there are opportunities to repurpose these cores to perform application driven computations on stored data, with the aim of reducing the communication between the host processor and the SSD. Reducing host-SSD bandwidth demand cuts down the I/O time which is a bottleneck for many applications operating on large data sets. However, the embedded core performance is still significantly lower than the host processor, as generally wimpy embedded cores are used within SSD for cost effective reasons. So there is a trade-off between the computation overhead associated with near SSD processing and the reduction in communication overhead to the host system.
In this work, we design a set of application programming interfaces (APIs) that can be used by the host application to offload a data intensive task to the SSD processor. We describe how these APIs can be implemented by simple modifications to the existing Non-Volatile Memory Express (NVMe) command interface between the host and the SSD processor. We then quantify the computation versus communication tradeoffs for near storage computing using applications from two important domains, namely data analytics and data integration. Using a fully functional SSD evaluation platform we perform design space exploration of our proposed approach by varying the bandwidth and computation capabilities of the SSD processor. We evaluate static and dynamic approaches for dividing the work between the host and SSD processor, and show that our design may improve the performance by up to 20% when compared to processing at the host processor only, and 6X when compared to processing at the SSD processor only.

References

[1]
Anurag Acharya, Mustafa Uysal, and Joel Saltz. 1998. Active Disks: Programming Model, Algorithms and Evaluation. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '98). ACM, New York, NY, USA, 81--91.
[2]
Amber Huffman. 2012. NVM Express Revision 1.1. http://nvmexpress.org/wp-content/uploads/2013/05/NVM_Express_1_1.pdf. (2012).
[3]
J. Banerjee, D. K. Hsiao, and K. Kannan. 1979. DBC A Database Computer for Very Large Databases. Vol. 28. IEEE Computer Society, Washington, DC, USA, 414--429.
[4]
Simona. Boboila, Youngjae Kim, Sudharshan S. Vazhkudai, Peter Desnoyers, and Galen M. Shipman. 2012. Active Flash: Out-of-core data analytics on flash storage. In Mass Storage Systems and Technologies, 2012 IEEE 28th Symposium on (MSST '12). 1--12.
[5]
Adrian M. Caulfield, Arup De, Joel Coburn, Todor I. Mollow, Rajesh K. Gupta, and Steven Swanson. 2010. Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '10). IEEE Computer Society, Washington, DC, USA, 385--395.
[6]
Feng Chen, Rubao Lee, and Xiaodong Zhang. 2011. Essential Roles of Exploiting Internal Parallelism of Flash Memory Based Solid State Drives in High-speed Data Processing. In Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA '11). IEEE Computer Society, Washington, DC, USA, 266--277. http://dl.acm.org/citation.cfm?id=2014698.2014864
[7]
Sangyeun Cho, Chanik Park, Hyunok Oh, Sungchan Kim, Youngmin Yi, and Gregory R. Ganger. 2013. Active Disk Meets Flash: A Case for Intelligent SSDs. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (ICS '13). ACM, New York, NY, USA, 91--102.
[8]
Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park, and David J. DeWitt. 2013. Query Processing on Smart SSDs: Opportunities and Challenges. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD '13). ACM, New York, NY, USA, 1221--1230.
[9]
Jeff Draper, Jacqueline Chame, Mary Hall, Craig Steele, Tim Barrett, Jeff LaCoss, John Granacki, Jaewook Shin, Chun Chen, Chang Woo Kang, Ihn Kim, and Gokhan Daglikoca. 2002. The Architecture of the DIVA Processing-in-memory Chip. In Proceedings of the 16th International Conference on Supercomputing (ICS '02). 14--25.
[10]
EEMBC. 2017. CoreMark Scores. http://www.eembc.org/coremark. (2017).
[11]
Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical Near-Data Processing for In-Memory Analytics Frameworks. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT '15). IEEE Computer Society, Washington, DC, USA, 113--124.
[12]
Boncheol Gu, Andre S. Yoon, Duck-Ho Bae, Insoon Jo, Jinyoung Lee, Jonghyun Yoon, Jeong-Uk Kang, Moonsang Kwon, Chanho Yoon, Sangyeun Cho, Jaeheon Jeong, and Duckhyun Chang. 2016. Biscuit: A Framework for Near-data Processing of Big Data Workloads. (2016), 153--165.
[13]
ITRS. 2009. International Technology Roadmap for Semiconductors 2009 Edition: Assembly and Packaging, http://www.itrs2.net/itrs-reports.html. (2009).
[14]
Yanqin Jin, Hung-Wei Tseng, Yannis Papakonstantinou, and Steven Swanson. 2017. KAML: A Flexible, High-Performance Key-Value SSD. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA '17). 373--384.
[15]
Sang-Woo Jun, Ming Liu, Sungjin Lee, Jamey Hicks, John Ankcorn, Myron King, Shuotao Xu, and Arvind. 2015. BlueDBM: An Appliance for Big Data Analytics. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 1--13.
[16]
Yangwook Kang, Yang-Suk Kee, Ethan L. Miller, and Chanik Park. 2013. Enabling cost-effective data processing with smart SSD. In Mass Storage Systems and Technologies (MSST '13).
[17]
Kimberly Keeton, David A. Patterson, and Joseph M. Hellerstein. 1998. A Case for Intelligent Disks (IDISKs). SIGMOD Rec. 27, 3 (Sept. 1998), 42--52.
[18]
Peter M. Kogge. 1994. EXECUBE-A New Architecture for Scaleable MPPs. In Proceedings of the 1994 International Conference on Parallel Processing - Volume 01 (ICPP '94). IEEE Computer Society, Washington, DC, USA, 77--84.
[19]
Hans-Otto Leilich, Günther Stiege, and Hans Christoph Zeidler. 1978. A Search Processor for Data Base Management Systems. In Proceedings of the Fourth International Conference on Very Large Data Bases - Volume 4 (VLDB '78). VLDB Endowment, 280--287. http://dl.acm.org/citation.cfm?id=1286643.1286682
[20]
Chyuan Shiun Lin, Diane C. P. Smith, and John Miles Smith. 1976. The Design of a Rotating Associative Memory for Relational Database Applications. ACM Trans. Database Syst. 1, 1 (March 1976), 53--65.
[21]
Yang Liu, Hung-Wei Tseng, Mark Gahagan, Jing Li, Yanqin Jin, and Steven Swanson. 2016. Hippogriff: Efficiently moving data in heterogeneous computing systems. In 2016 IEEE 34th International Conference on Computer Design (ICCD '16). 376--379.
[22]
Ken Mai, Tim Paaske, Nuwan Jayasena, Ron Ho, William J. Dally, and Mark Horowitz. 2000. Smart Memories: A Modular Reconfigurable Architecture. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA '00). ACM, New York, NY, USA, 161--171.
[23]
Marvell. 2017. High performance PCIE SSD Controller (88SS1093). http://www.marvell.com/storage/ssd/88SS1093. (2017).
[24]
Micron. 2017. Micron 3D NAND technology. https://www.micron.com/products/nand-flash/3d-nand. (2017).
[25]
Micron. 2017. Micron NAND Flash by Technology, https://www.micron.com/products/nand-flash. (2017).
[26]
Microsemi. 2017. Flashtec NVMe Controllers, http://www.microsemi.com/products/storage/flashtec-nvme-controllers/flashtec-nvme-controllers. (2017).
[27]
PassMark. 2017. Hard Drive Benchmarks - Solid State Drive (SSD) Chart, http://www.harddrivebenchmark.net/ssd.html. (2017).
[28]
PassMark. 2017. PassMark CPU benchmark, http://www.cpubenchmark.net. (2017).
[29]
David Patterson, Thomas Anderson, Neal Cardwell, Richard Fromm, Kimberley Keeton, Christoforos Kozyrakis, Randi Thomas, and Kathy Yelick. 1997. Intelligent RAM (IRAM): chips that remember and compute. In Solid-State Circuits Conference, 1997. Digest of Technical Papers. 43rd ISSCC., 1997 IEEE International. 224--225.
[30]
Erik Riedel, Christos Faloutsos, Garth A. Gibson, and David Nagle. 2001. Active Disks for Large-Scale Data Processing. Computer 34, 6 (June 2001), 68--74.
[31]
Samsung. 2015. Samsung SSD 850 PRO Data Sheet, Rev.2.0 (January, 2015). http://www.samsung.com/semiconductor/minisite/ssd/downloads/document/Samsung_SSD_850_PRO_Data_Sheet_rev_2_0.pdf. (2015).
[32]
Mohit Saxena, Michael M. Swift, and Yiying Zhang. 2012. FlashTier: A Lightweight, Consistent and Durable Storage Cache. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys '12). ACM, New York, NY, USA, 267--280.
[33]
Stewart A. Schuster, H. B. Nguyen, Esen. A. Ozkarahan, and Kenneth C. Smith. 1979. RAP.2 - An Associative Processor for Databases and Its Applications. Computers, IEEE Transactions on C-28, 6 (June 1979), 446--458.
[34]
Sudharsan Seshadri, Mark Gahagan, Sundaram Bhaskaran, Trevor Bunker, Arup De, Yanqin Jin, Yang Liu, and Steven Swanson. 2014. Willow: A User-Programmable SSD. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO, 67--80. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/seshadri
[35]
Devesh Tiwari, Simona Boboila, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter J. Desnoyers, and Yan Solihin. 2013. Active Flash: Towards Energy-efficient, In-situ Data Analytics on Extreme-scale Machines. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST'13). USENIX Association, Berkeley, CA, USA, 119--132. http://dl.acm.org/citation.cfm?id=2591272.2591286
[36]
Devesh Tiwari, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Simona Boboila, and Peter J. Desnoyers. 2012. Reducing Data Movement Costs Using Energy Efficient, Active Computation on SSD. In Proceedings of the 2012 USENIX Conference on Power-Aware Computing and Systems (HotPower'12). USENIX Association, Berkeley, CA, USA, 4--4. http://dl.acm.org/citation.cfm?id=2387869.2387873
[37]
Josep Torrellas. 2012. FlexRAM: Toward an advanced Intelligent Memory system: A retrospective paper. In Computer Design, 2012 IEEE 30th International Conference on (ICCD '12). 3--4.
[38]
Hung-Wei Tseng, Yang Liu, Mark Gahagan, Jing Li, Yanqin Jin, and Steven Swanson. 2015. Gullfoss: Accelerating and Simplifying Data Movement among Heterogeneous Computing and Storage Resources. Technical Report CS2015-1015. Department of Computer Science and Engineering, University of California, San Diego technical report. http://csetechrep.ucsd.edu/Dienst/UI/2-0/Describe/ncstrl.ucsd_cse/CS2015-1015
[39]
Hung-Wei Tseng, Qianchen Zhao, Yuxiao Zhou, Mark Gahagan, and Steven Swanson. 2016. Morpheus: Creating Application Objects Efficiently for Heterogeneous Computing. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). IEEE Press, Piscataway, NJ, USA, 53--65.
[40]
Wikipedia. 2017. Raspberry Pi. http://en.wikipedia.org/wiki/Raspberry_Pi. (2017).
[41]
Louis Woods, Zsolt István, and Gustavo Alonso. 2014. Ibex: An Intelligent Storage Engine with Support for Advanced SQL Offloading. Proc. VLDB Endow. 7, 11 (July 2014), 963--974.
[42]
Chuan Xiao, Wei Wang, Xuemin Lin, and Jeffrey Xu Yu. 2008. Efficient Similarity Joins for Near Duplicate Detection. In Proceedings of the 17th International Conference on World Wide Web. 131--140.
[43]
Yuan Yuan, Rubao Lee, and Xiaodong Zhang. 2013. The Yin and Yang of Processing Data Warehousing Queries on GPU Devices. Proc. VLDB Endow. 6, 10 (Aug. 2013), 817--828.

Cited By

View all
  • (2024)ScalaCacheProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692064(1185-1202)Online publication date: 10-Jul-2024
  • (2024)Explorations and Exploitation for Parity-based RAIDs with Ultra-fast SSDsACM Transactions on Storage10.1145/362799220:1(1-32)Online publication date: 30-Jan-2024
  • (2024)An Analytical Model-based Capacity Planning Approach for Building CSD-based Storage SystemsACM Transactions on Embedded Computing Systems10.1145/362367723:6(1-25)Online publication date: 11-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture
October 2017
850 pages
ISBN:9781450349529
DOI:10.1145/3123939
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. SSD
  2. dynamic workload offloading
  3. near data processing
  4. storage systems

Qualifiers

  • Research-article

Funding Sources

Conference

MICRO-50
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)285
  • Downloads (Last 6 weeks)43
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)ScalaCacheProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692064(1185-1202)Online publication date: 10-Jul-2024
  • (2024)Explorations and Exploitation for Parity-based RAIDs with Ultra-fast SSDsACM Transactions on Storage10.1145/362799220:1(1-32)Online publication date: 30-Jan-2024
  • (2024)An Analytical Model-based Capacity Planning Approach for Building CSD-based Storage SystemsACM Transactions on Embedded Computing Systems10.1145/362367723:6(1-25)Online publication date: 11-Sep-2024
  • (2024)SAQO: Empowering Computational Storage Device for Efficient SQL Query Acceleration2024 International Conference on Networking, Architecture and Storage (NAS)10.1109/NAS63802.2024.10781372(1-4)Online publication date: 9-Nov-2024
  • (2024)Cambricon-LLM: A Chiplet-Based Hybrid Architecture for On-Device Inference of 70B LLM2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00108(1474-1488)Online publication date: 2-Nov-2024
  • (2024)MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00054(660-677)Online publication date: 29-Jun-2024
  • (2024)PreSto: An In-Storage Data Preprocessing System for Training Recommendation Models2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00033(340-353)Online publication date: 29-Jun-2024
  • (2024)HA-CSD: Host and SSD Coordinated Compression for Capacity and Performance2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00078(825-838)Online publication date: 27-May-2024
  • (2024)VarVE: Bringing SIMD Performance to Variable-Width Values2024 IEEE 42nd International Conference on Computer Design (ICCD)10.1109/ICCD63220.2024.00070(418-425)Online publication date: 18-Nov-2024
  • (2024)DockerSSD: Containerized In-Storage Processing and Hardware Acceleration for Computational SSDs2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00036(379-394)Online publication date: 2-Mar-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media