Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3361525.3361537acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

Scalable Data-structures with Hierarchical, Distributed Delegation

Published: 09 December 2019 Publication History

Abstract

Scaling data-structures up to the increasing number of cores provided by modern systems is challenging. The quest for scalability is complicated by the non-uniform memory accesses (NUMA) of multi-socket machines that often prohibit the effective use of data-structures that span memory localities. Conventional shared memory data-structures using efficient non-blocking or lock-based implementations inevitably suffer from cache-coherency overheads, and non-local memory accesses between sockets. Multi-socket systems are common in cloud hardware, and many products are pushing shared memory systems to greater scales, thus making the ability to scale data-structures all the more pressing.
In this paper, we present the Distributed, Delegated Parallel Sections (DPS) runtime system that uses message-passing to move the computation on portions of data-structures between memory localities, while leveraging efficient shared memory implementations within each locality to harness efficient parallelism. We show through a series of data-structure scalability evaluations, and through an adaptation of memcached, that DPS enables strong data-structure scalability. DPS provides more than a factor of 3.1 improvements in throughput, and 23x decreases in tail latency for memcached.

References

[1]
Dan Alistarh, Justin Kopinsky, Jerry Li, and Nir Shavit. 2015. The SprayList: A scalable relaxed priority queue. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'15), San Francisco, CA, USA.
[2]
Maya Arbel and Hagit Attiya. 2014. Concurrent Updates with RCU: Search Tree As an Example. In Proceedings of the 2014 ACM Symposium on Principles of Distributed Computing (PODC '14).
[3]
ASCYLIB [n. d.]. ASCYLIB (with OPTIK) concurrent data-structure library: https://github.com/LPD-EPFL/ASCYLIB.
[4]
Peter Bailis, Aaron Davidson, Alan Fekete, Ali Ghodsi, Joseph M. Heller-stein, and Ion Stoica. 2013. Highly Available Transactions: Virtues and Limitations. Proc. VLDB Endowement 7, 3 (Nov. 2013).
[5]
Peter Bailis, Alan Fekete, Michael J. Franklin, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. 2014. Coordination Avoidance in Database Systems. Proceedings of the VLDB Endowment 8, 3 (Nov. 2014).
[6]
Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O'Neil, and Patrick O'Neil. 1995. A Critique of ANSI SQL Isolation Levels. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD '95).
[7]
E. Brewer. 2010. Towards robust distributed systems. Keynote at PODC'10.
[8]
Nathan G Bronson, Jared Casper, Hassan Chafi, and Kunle Olukotun. 2010. A practical concurrent binary search tree. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'10), Bangalore, India.
[9]
Irina Calciu, Dave Dice, Tim Harris, Maurice Herlihy, Alex Kogan, Virendra Marathe, and Mark Moir. 2013. Message Passing or Shared Memory: Evaluating the Delegation Abstraction for Multicores (OPODIS 2013).
[10]
Irina Calciu, Justin Emile Gottschlich, and Maurice Herlihy. 2013. Using Elimination and Delegation to Implement a Scalable NUMA-Friendly Stack. In Presented as part of the 5th USENIX Workshop on Hot Topics in Parallelism, San Jose, CA, USA.
[11]
Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, and Marcos K Aguilera. 2017. Black-box concurrent data structures for NUMA architectures. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'17), Xi'an, China.
[12]
Sang Kyun Cha, Sangyong Hwang, Kihong Kim, and Keunjoo Kwon. 2001. Cache-conscious concurrency control of main-memory indexes on shared-memory multiprocessor systems. In VLDB, Vol. 1. 181--190.
[13]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In SoCC.
[14]
Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. 2015. Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15).
[15]
David Dice, Virendra J Marathe, and Nir Shavit. 2012. Lock cohorting: a general technique for designing NUMA locks. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'12), New Orleans, Louisiana, USA.
[16]
Panagiota Fatourou and Nikolaos D Kallimanis. 2012. Revisiting the combining synchronization technique. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '12).
[17]
Vincent Gramoli. 2015. More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'15), San Francisco, CA, USA.
[18]
Rachid Guerraoui and Vasileios Trigonakis. 2016. Optimistic concurrency with OPTIK. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'16), Barcelona, Spain.
[19]
Andreas Haas, Michael Lippautz, Thomas A Henzinger, Hannes Payer, Ana Sokolova, Christoph M Kirsch, and Ali Sezgin. 2013. Distributed queues in shared memory: multicore performance and scalability through quantitative relaxation. In Proceedings of the ACM International Conference on Computing Frontiers (CF'13), Ischia, Italy.
[20]
Steve Heller, Maurice Herlihy, Victor Luchangco, Mark Moir, William N Scherer, and Nir Shavit. 2005. A lazy concurrent list-based set algorithm. In Proceedings of the 9th International Conference on Principles of Distributed Systems (OPODIS '05), Pisa, Italy.
[21]
Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. 2010. Flat Combining and the Synchronization-parallelism Tradeoff. In Proceedings of the 22Nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '10). ACM, New York, NY, USA, 355--364. https://doi.org/10.1145/1810479.1810540
[22]
Thomas A Henzinger, Christoph M Kirsch, Hannes Payer, Ali Sezgin, and Ana Sokolova. 2013. Quantitative relaxation of concurrent data structures. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'13), Rome, Italy.
[23]
Maurice Herlihy, Yossi Lev, Victor Luchangco, and Nir Shavit. 2007. A simple optimistic skiplist algorithm. In Proceedings of the 14th International Conference on Structural Information and Communication Complexity (SIROCCO'07), Castiglioncello, LI, Italy.
[24]
Maurice P. Herlihy and Jeannette M. Wing. 1990. Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12, 3 (1990), 463--492.
[25]
Shane V Howley and Jeremy Jones. 2012. A non-blocking internal binary search tree. In Proceedings of the 24th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'12), Pittsburgh, Pennsylvania, USA.
[26]
Stefan Kaestle, Reto Achermann, Timothy Roscoe, and Tim Harris. 2015. Shoal: smart allocation and replication of memory for parallel programs. In Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference (ATC'15), Santa Clara, CA, USA.
[27]
David Klaftenegger, Konstantinos Sagonas, and Kjell Winblad. 2014. Brief announcement: Queue delegation locking. In Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA) (SPAA '14).
[28]
Philip L Lehman et al. 1981. Efficient locking for concurrent operations on B-trees. ACM Transactions on Database Systems (TODS) 6, 4 (1981), 650--670.
[29]
Baptiste Lepers, Vivien Quéma, and Alexandra Fedorova. 2015. Thread and Memory Placement on NUMA Systems: Asymmetry Matters. In Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference (ATC'15), Santa Clara, CA, USA.
[30]
Justin J Levandoski, David B Lomet, and Sudipta Sengupta. 2013. The Bw-Tree: A B-tree for new hardware platforms. In 29th IEEE International Conference on Data Engineering (ICDE'13).
[31]
Jean-Pierre Lozi, Florian David, Gaël Thomas, Julia Lawall, and Gilles Muller. 2012. Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications. In Presented as part of the 2012 USENIX Annual Technical Conference (USENIX ATC 12).
[32]
Yandong Mao, Eddie Kohler, and Robert Morris. 2012. Cache Craftiness for Fast Multicore Key-Value Storage. In Proceedings of the ACM EuroSys Conference (EuroSys 2012). Bern, Switzerland.
[33]
Alexander Matveev, Nir Shavit, Pascal Felber, and Patrick Marlier. 2015. Read-log-update: A Lightweight Synchronization Mechanism for Concurrent Programming. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP '15).
[34]
John M. Mellor-Crummey and Michael L. Scott. 1991. Algorithms for Scalable Synchronization on Shared-memory Multiprocessors. ACM Trans. Comput. Syst. (1991).
[35]
Zviad Metreveli, Nickolai Zeldovich, and M Frans Kaashoek. 2012. Cphash: A cache-partitioned hash table. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'12), New Orleans, Louisiana, USA.
[36]
Maged M Michael. 2002. High performance dynamic lock-free hash tables and list-based sets. In Proceedings of the Fourteenth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA'02), Winnipeg, Manitoba, Canada, August 11-13.
[37]
Adam Morrison and Yehuda Afek. 2013. Fast Concurrent Queues for x86 Processors. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).
[38]
Aravind Natarajan and Neeraj Mittal. 2014. Fast concurrent lock-free binary search trees. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'14), Orlando, Florida, USA.
[39]
Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling Memcache at Facebook. In NSDI.
[40]
Darko Petrović, Thomas Ropars, and André Schiper. 2014. Leveraging Hardware Message Passing for Efficient Thread Synchronization. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '14).
[41]
Hamza Rihani, Peter Sanders, and Roman Dementiev. 2015. Brief announcement: Multiqueues: Simple relaxed concurrent priority queues. In Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'15), Portland, Oregon, USA.
[42]
Sepideh Roghanchi, Jakob Eriksson, and Nilanjana Basu. 2017. ffwd: delegation is (much) faster than you think. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP'17), Shanghai, China.
[43]
Jason Sewall, Jatin Chhugani, Changkyu Kim, Nadathur Satish, and Pradeep Dubey. 2011. PALM: Parallel architecture-friendly latch-free modifications to B+ trees on many-core processors. Proceedings of the VLDB Endowment 4, 11 (2011), 795--806.
[44]
N. Shavit and I. Lotan. 2000. Skiplist-based concurrent priority queues. In Proceedings 14th International Parallel and Distributed Processing Symposium (IPDPS'00).
[45]
Nir N Shavit, Yosef Lev, and Maurice P Herlihy. [n. d.]. Concurrent lock-free skiplist with wait-free contains operator. May 3, 2011. US Patent 7,937,378.
[46]
Livio Soares and Michael Stumm. 2010. FlexSC: Flexible System Call Scheduling with Exception-Less System Calls. In OSDI.
[47]
M Aater Suleman, Onur Mutlu, Moinuddin K Qureshi, and Yale N Patt. 2009. Accelerating critical section execution with asymmetric multi-core architectures. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIV).
[48]
Josh Triplett, Paul E. McKenney, and Jonathan Walpole. 2011. Resizable, Scalable, Concurrent Hash Tables via Relativistic Programming. In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference.
[49]
Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. 2013. Speedy transactions in multicore in-memory databases. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP'13).
[50]
Paolo Viotti and Marko Vukolić. 2016. Consistency in Non-Transactional Distributed Storage Systems. Comput. Surveys 49, 1, Article 19 (June 2016).
[51]
Qi Wang, Tim Stamler, and Gabriel Parmer. 2016. Parallel Sections: Scaling System-Level Data-Structures. In Proceedings of the ACM EuroSys Conference.
[52]
Chenggang Wu, Jose M. Faleiro, Yihan Lin, and Joseph M. Hellerstein. 2018. Anna: A KVS for Any Scale. In 34th IEEE International Conference on Data Engineering, (ICDE).

Cited By

View all
  • (2022)A Dynamic Distributed Deterministic Load-Balancer for Decentralized Hierarchical InfrastructuresAlgorithms10.3390/a1503009615:3(96)Online publication date: 18-Mar-2022
  • (2021)Practical Principle of Least Privilege for Secure Embedded Systems2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS52030.2021.00009(1-13)Online publication date: May-2021
  • (2021)Sharing non‐cache‐coherent memory with bounded incoherenceConcurrency and Computation: Practice and Experience10.1002/cpe.641434:2Online publication date: Jun-2021

Index Terms

  1. Scalable Data-structures with Hierarchical, Distributed Delegation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    Middleware '19: Proceedings of the 20th International Middleware Conference
    December 2019
    342 pages
    ISBN:9781450370097
    DOI:10.1145/3361525
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 December 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. NUMA locality
    2. concurrent data-structure
    3. delegation

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    Middleware '19
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 203 of 948 submissions, 21%

    Upcoming Conference

    MIDDLEWARE '24
    25th International Middleware Conference
    December 2 - 6, 2024
    Hong Kong , Hong Kong

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)A Dynamic Distributed Deterministic Load-Balancer for Decentralized Hierarchical InfrastructuresAlgorithms10.3390/a1503009615:3(96)Online publication date: 18-Mar-2022
    • (2021)Practical Principle of Least Privilege for Secure Embedded Systems2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS52030.2021.00009(1-13)Online publication date: May-2021
    • (2021)Sharing non‐cache‐coherent memory with bounded incoherenceConcurrency and Computation: Practice and Experience10.1002/cpe.641434:2Online publication date: Jun-2021

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media