Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

WiscSort: External Sorting for Byte-Addressable Storage

Published: 01 May 2023 Publication History
  • Get Citation Alerts
  • Abstract

    We present WiscSort, a new approach to high-performance concurrent sorting for existing and future byte-addressable storage (BAS) devices. WiscSort carefully reduces writes, exploits random reads by splitting keys and values during sorting, and performs interference-aware scheduling with thread pool sizing to avoid I/O bandwidth degradation. We introduce the BRAID model which encompasses the unique characteristics of BAS devices. Many state-of-the-art sorting systems do not comply with the BRAID model and deliver sub-optimal performance, whereas WiscSort demonstrates the effectiveness of complying with BRAID. We show that WiscSort is 2-7x faster than competing approaches on a standard sort benchmark. We evaluate the effectiveness of key-value separation on different key-value sizes and compare our concurrency optimizations with various other concurrency models. Finally, we emulate generic BAS devices and show how our techniques perform well with various combinations of hardware properties.

    References

    [1]
    2011. Level DB. https://github.com/google/leveldb
    [2]
    2011. SQLite Virtual Database Engine External Sorter. https://github.com/sqlite/sqlite/blob/master/src/vdbesort.c
    [3]
    2012. Rocks DB. http://rocksdb.org/
    [4]
    2022. Apache Avro. https://avro.apache.org/. [Accessed 26-Sep-2022].
    [5]
    2022. Compute Express Link. https://www.computeexpresslink.org/.
    [6]
    2022. CXL and JEDEC MOU Agreement to Advance DRAM and Persistent Memory Technology. https://www.computeexpresslink.org/_files/ugd/0c1418_7f109df8ee1c4d958cb21c24a669caa5.pdf
    [7]
    2022. Data Lakehouse Platform by Databricks. https://www.databricks.com/product/data-lakehouse. [Accessed 26-Sep-2022].
    [8]
    2022. Everspin Announces New STT-MRAM EM128LX xSPI Memory. https://www.businesswire.com/news/home/20220801005856/en/Everspin-Announces-New-STT-MRAM-EM128LX-xSPI-Memory [Accessed 26-Sep-2022].
    [9]
    2022. gensort Data Generator. http://www.ordinal.com/gensort.html
    [10]
    2022. Google Cloud Data lake modernization solutions. https://cloud.google.com/solutions/data-lake. [Accessed 26-Sep-2022].
    [11]
    2022. Heterogeneous Memory Attribute Table. https://lwn.net/Articles/724562/
    [12]
    2022. Intel Optane DC Persistent Memory Start Up Guide. https://www.intel.com/content/dam/support/us/en/documents/memory-and-storage/data-center-persistent-mem/Intel_Optane_Persistent_Memory_Start_Up_Guide.pdf
    [13]
    2022. Intel Optane DC PMM. https://www.intel.com/content/www/us/en/products/docs/memory-storage/optane-persistent-memory/overview.html
    [14]
    2022. Intel Reports Second-Quarter 2022 Financial Results. https://www.intc.com/news-events/press-releases/detail/1563/intel-reports-second-quarter-2022-financial-results [Accessed 26-Sep-2022].
    [15]
    2022. Joint Electron Device Engineering Council.
    [16]
    2022. Key-Length-Value format. https://en.wikipedia.org/wiki/KLV
    [17]
    2022. Kioxia Launches Second Generation of High-Performance, Cost-Effective XL-FLASH Storage Class Memory Solution. https://www.businesswire.com/news/home/20220801005862/en/Kioxia-Launches-Second-Generation-of-High-Performance-Cost-Effective-XL-FLASH%E2%84%A2-Storage-Class-Memory-Solution/ [Accessed 26-Sep-2022].
    [18]
    2022. Oracle Exadata Database Machine X9M. https://www.oracle.com/engineered-systems/exadata/database-machine/ [Accessed 26-Sep-2022].
    [19]
    2022. PostgreSQL storage format. https://www.postgresql.org/docs/current/storage-toast.html#STORAGE-TOAST-ONDISK
    [20]
    2022. Postgress Parallel External Sort. https://wiki.postgresql.org/wiki/Parallel_External_Sort
    [21]
    2022. Samsung Memory-semantic ssd. https://news.samsung.com/global/samsung-electronics-unveils-far-reaching-next-generation-memory-solutions-at-flash-memory-summit-2022
    [22]
    2022. SAMSUNG shows off CXL server memory expander. https://www.nextplatform.com/2022/08/23/samsung-shows-off-cxl-server-memory-expander/
    [23]
    2022. Sort Benchmark Home Page. http://sortbenchmark.org/
    [24]
    2022. SQLite record format. https://www.sqlite.org/fileformat2.html#record_format
    [25]
    2022. VMware vSphere Performance with Intel Optane Persistent Memory in Memory Mode. https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/pmem-balanced-profile-perf.pdf [Accessed 26-Sep-2022].
    [26]
    Alok Aggarwal and S. Vitter, Jeffrey. 1988. The Input/Output Complexity of Sorting and Related Problems. 31, 9 (sep 1988), 1116--1127.
    [27]
    Minseon Ahn, Andrew Chang, Donghun Lee, Jongmin Gim, Jungmin Kim, Jaemin Jung, Oliver Rebholz, Vincent Pham, Krishna Malladi, and Yang Seok Ki. 2022. Enabling CXL Memory Expansion for In-Memory Database Management Systems. In Data Management on New Hardware (Philadelphia, PA, USA) (DaMoN'22). Association for Computing Machinery, New York, NY, USA, Article 8, 5 pages.
    [28]
    Hiroyuki Akinaga and Hisashi Shima. 2010. Resistive Random Access Memory (ReRAM) Based on Metal Oxides. Proc. IEEE 98, 12 (2010), 2237--2251.
    [29]
    Mihnea Andrei, Christian Lemke, Günter Radestock, Robert Schulze, Carsten Thiel, Rolando Blanco, Akanksha Meghlan, Muhammad Sharique, Sebastian Seifert, Surendra Vishnoi, Daniel Booss, Thomas Peh, Ivan Schreter, Werner Thesing, Mehul Wagle, and Thomas Willhalm. 2017. SAP HANA Adoption of Non-Volatile Memory. Proc. VLDB Endow. 10, 12 (aug 2017), 1754--1765.
    [30]
    Dmytro Apalkov, Alexey Khvalkovskiy, Steven Watts, Vladimir Nikitin, Xueti Tang, Daniel Lottis, Kiseok Moon, Xiao Luo, Eugene Chen, Adrian Ong, Alexander Driskill-Smith, and Mohamad Krounbi. 2013. Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM). J. Emerg. Technol. Comput. Syst. 9, 2, Article 13 (may 2013), 35 pages.
    [31]
    Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, David E. Culler, Joseph M. Hellerstein, and David A. Patterson. 1997. High-Performance Sorting on Networks of Workstations. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data (Tucson, Arizona, USA) (SIGMOD '97). Association for Computing Machinery, New York, NY, USA, 243--254.
    [32]
    Michael Axtmann, Sascha Witt, Daniel Ferizovic, and Peter Sanders. 2017. Inplace Parallel Super Scalar Samplesort (IPS4o).
    [33]
    Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. 2018. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Third Edition.
    [34]
    Mohammed Bey Ahmed Khernache, Arezki Laga, and Jalil Boukhobza. 2018. MONTRES-NVM: An External Sorting Algorithm for Hybrid Memory. In 2018 IEEE 7th Non-Volatile Memory Systems and Applications Symposium (NVMSA). 49--54.
    [35]
    Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu, and Julian Shun. 2016. Sorting with Asymmetric Read and Write Costs. CoRR abs/1603.03505 (2016). arXiv:1603.03505 http://arxiv.org/abs/1603.03505
    [36]
    Zhichao Cao, Siying Dong, Sagar Vemuri, and David H.C. Du. 2020. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In 18th USENIX Conference on File and Storage Technologies (FAST 20). USENIX Association, Santa Clara, CA, 209--223. https://www.usenix.org/conference/fast20/presentation/cao-zhichao
    [37]
    Shimin Chen, Phillip B. Gibbons, and Suman Nath. 2011. Rethinking Database Algorithms for Phase Change Memory. In CIDR'11: 5th Biennial Conference on Innovative Data Systems Research (cidr'11: 5th biennial conference on innovative data systems research ed.). https://www.microsoft.com/en-us/research/publication/rethinking-database-algorithms-for-phase-change-memory/
    [38]
    Youmin Chen, Youyou Lu, Kedong Fang, Qing Wang, and Jiwu Shu. 2020. UTree: A Persistent B+-Tree with Low Tail Latency. Proc. VLDB Endow. 13, 12 (jul 2020), 2634--2648.
    [39]
    Zhaole Chu, Yongping Luo, and Peiquan Jin. 2021. An Efficient Sorting Algorithm for Non-Volatile Memory. International Journal of Software Engineering and Knowledge Engineering 31, 11n12 (2021), 1603--1621. arXiv:https://doi.org/10.1142/S0218194021400143
    [40]
    Björn Daase, Lars Jonas Bollmeier, Lawrence Benson, and Tilmann Rabl. 2021. Maximizing Persistent Memory Bandwidth Utilization for OLAP Workloads. In Proceedings of the 2021 International Conference on Management of Data (Virtual Event, China) (SIGMOD '21). Association for Computing Machinery, New York, NY, USA, 339--351.
    [41]
    Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1 (jan 2008), 107--113.
    [42]
    Donghyun Gouk, Sangwon Lee, Miryeong Kwon, and Myoungsoo Jung. 2022. Direct Access, High-Performance Memory Disaggregation with DirectCXL. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, CA, 287--294. https://www.usenix.org/conference/atc22/presentation/gouk
    [43]
    Yifan Hua, Kaixin Huang, Shengan Zheng, and Linpeng Huang. 2021. PMSort: An adaptive sorting engine for persistent memory. Journal of Systems Architecture 120 (2021), 102279.
    [44]
    George U. Hubbard. 1963. Some Characteristics of Sorting Computing Systems Using Random Access Storage Devices. Commun. ACM 6, 5 (may 1963), 248--255.
    [45]
    Jie Jiang, Lixiong Zheng, Junfeng Pu, Xiong Cheng, Chongqing Zhao, Mark R Nutter, and Jeremy D Schaub. 2017. Tencent Sort. sortbenchmark.org (2017).
    [46]
    Rohan Kadekodi, Se Kwon Lee, Sanidhya Kashyap, Taesoo Kim, Aasheesh Kolli, and Vijay Chidambaram. 2019. SplitFS: Reducing Software Overhead in File Systems for Persistent Memory. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (Huntsville, Ontario, Canada) (SOSP '19). Association for Computing Machinery, New York, NY, USA, 494--508.
    [47]
    Donald Knuth. 1973. The Art Of Computer Programming, vol. 3: Sorting And Searching. Addison-Wesley. 391--392 pages.
    [48]
    Apostolos Kokolis, Antonis Psistakis, Benjamin Reidys, Jian Huang, and Josep Torrellas. 2021. Distributed Data Persistency. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (Virtual Event, Greece) (MICRO '21). Association for Computing Machinery, New York, NY, USA, 71--85.
    [49]
    S. Lai. 2003. Current status of the phase change memory and its future. In IEEE International Electron Devices Meeting 2003. 10.1.1--10.1.4.
    [50]
    Huaicheng Li, Daniel S. Berger, Stanko Novakovic, Lisa Hsu, Dan Ernst, Pantea Zardoshti, Monish Shah, Ishwar Agarwal, Mark D. Hill, Marcus Fontoura, and Ricardo Bianchini. 2022. Pond: CXL-Based Memory Pooling Systems for Cloud Platforms. arXiv:arXiv:2203.00241
    [51]
    Jihang Liu, Shimin Chen, and Lujun Wang. 2020. LB+Trees: Optimizing Persistent Index Performance on 3DXPoint Memory. Proc. VLDB Endow. 13, 7 (mar 2020), 1078--1090.
    [52]
    Yannis Manolopoulos, Yannis Theodoridis, and Vassilis J. Tsotras. 2000. Parallel External Sorting. Springer US, Boston, MA, 209--218.
    [53]
    Hasan Al Maruf, Hao Wang, Abhishek Dhanotia, Johannes Weiner, Niket Agarwal, Pallab Bhattacharya, Chris Petersen, Mosharaf Chowdhury, Shobhit Kanaujia, and Prakash Chauhan. 2022. TPP: Transparent Page Placement for CXL-Enabled Tiered Memory. arXiv:arXiv:2206.02878
    [54]
    Chris Nyberg, Tom Barclay, Zarka Cvetanovic, Jim Gray, and Dave Lomet. 1995. AlphaSort: A Cache-Sensitive Parallel External Sort. The VLDB Journal 4, 4 (oct 1995), 603--628.
    [55]
    Ivy B. Peng, Maya B. Gokhale, and Eric W. Green. 2019. System Evaluation of the Intel Optane Byte-Addressable NVM. In Proceedings of the International Symposium on Memory Systems (Washington, District of Columbia, USA) (MEM-SYS '19). Association for Computing Machinery, New York, NY, USA, 304--315.
    [56]
    Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: An Embeddable Analytical Database. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). Association for Computing Machinery, New York, NY, USA, 1981--1984.
    [57]
    Alexander Rasmussen, George Porter, Michael Conley, Harsha V. Madhyastha, Radhika Niranjan Mysore, Alexander Pucher, and Amin Vahdat. 2011. Triton-Sort: A Balanced Large-Scale Sorting System. In 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI 11). USENIX Association, Boston, MA. https://www.usenix.org/conference/nsdi11/tritonsort-balanced-large-scale-sorting-system
    [58]
    Aashaka Shah, Vinay Banakar, Supreeth Shastri, Melissa Wasserman, and Vijay Chidambaram. 2019. Analyzing the Impact of GDPR on Storage Systems. In 11th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 19). USENIX Association, Renton, WA. https://www.usenix.org/conference/hotstorage19/presentation/banakar
    [59]
    Supreeth Shastri, Vinay Banakar, Melissa Wasserman, Arun Kumar, and Vijay Chidambaram. 2020. Understanding and Benchmarking the Impact of GDPR on Database Systems. Proc. VLDB Endow. 13, 7 (mar 2020), 1064--1077.
    [60]
    Stratis D. Viglas. 2012. Adapting the B + -tree for Asymmetric I/O. In Advances in Databases and Information Systems, Tadeusz Morzy, Theo Härder, and Robert Wrembel (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 399--412.
    [61]
    Stratis D. Viglas. 2014. Write-Limited Sorts and Joins for Persistent Memory. Proc. VLDB Endow. 7, 5 (jan 2014), 413--424.
    [62]
    Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight Persistent Memory. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (Newport Beach, California, USA) (ASPLOS XVI). Association for Computing Machinery, New York, NY, USA, 91--104.
    [63]
    Kan Wu, Kaiwei Tu, Yuvraj Patel, Rathijit Sen, Kwanghyun Park, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2022. NyxCache: Flexible and Efficient Multi-tenant Persistent Memory Caching. In 20th USENIX Conference on File and Storage Technologies (FAST 22). USENIX Association, Santa Clara, CA, 1--16. https://www.usenix.org/conference/fast22/presentation/wu
    [64]
    Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, and Matei Zaharia. 2014. GraySort on Apache Spark by Databricks. sortbenchmark.org (2014).
    [65]
    Jian Xu and Steven Swanson. 2016. NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories. In 14th USENIX Conference on File and Storage Technologies (FAST 16). USENIX Association, Santa Clara, CA, 323--338. https://www.usenix.org/conference/fast16/technical-sessions/presentation/xu
    [66]
    Jian Yang, Juno Kim, Morteza Hoseinzadeh, Joseph Izraelevitz, and Steve Swanson. 2020. An Empirical Guide to the Behavior and Use of Scalable Persistent Memory. In 18th USENIX Conference on File and Storage Technologies (FAST 20). USENIX Association, Santa Clara, CA, 169--182. https://www.usenix.org/conference/fast20/presentation/yang

    Cited By

    View all
    • (2024)Sorting on Byte-Addressable Storage: The Resurgence of Tree StructureProceedings of the VLDB Endowment10.14778/3648160.364818517:6(1487-1500)Online publication date: 3-May-2024

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 16, Issue 9
    May 2023
    330 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 May 2023
    Published in PVLDB Volume 16, Issue 9

    Check for updates

    Badges

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)63
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Sorting on Byte-Addressable Storage: The Resurgence of Tree StructureProceedings of the VLDB Endowment10.14778/3648160.364818517:6(1487-1500)Online publication date: 3-May-2024

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media