Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Intermediate Value Linearizability: A Quantitative Correctness Criterion

Published: 18 April 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Big data processing systems often employ batched updates and data sketches to estimate certain properties of large data. For example, a CountMin sketch approximates the frequencies at which elements occur in a data stream, and a batched counter counts events in batches. This article focuses on correctness criteria for concurrent implementations of such objects. Specifically, we consider quantitative objects, whose return values are from an ordered domain, with a particular emphasis on (ε,δ)-bounded objects that estimate a numerical quantity with an error of at most ε with probability at least 1 - δ.
    The de facto correctness criterion for concurrent objects is linearizability. Intuitively, under linearizability, when a read overlaps an update, it must return the object’s value either before the update or after it. Consider, for example, a single batched increment operation that counts three new events, bumping a batched counter’s value from 7 to 10. In a linearizable implementation of the counter, a read overlapping this update must return either 7 or 10. We observe, however, that in typical use cases, any intermediate value between 7 and 10 would also be acceptable. To capture this additional degree of freedom, we propose Intermediate Value Linearizability (IVL), a new correctness criterion that relaxes linearizability to allow returning intermediate values, for instance, 8 in the example above. Roughly speaking, IVL allows reads to return any value that is bounded between two return values that are legal under linearizability.
    A key feature of IVL is that we can prove that concurrent IVL implementations of (ε,δ)-bounded objects are themselves (ε,δ)-bounded. To illustrate the power of this result, we give a straightforward and efficient concurrent implementation of an (ε,δ)-bounded CountMin sketch, which is IVL (albeit not linearizable).
    We present four examples for IVL objects, each showcasing a different way of using IVL. The first is a simple wait-free IVL batched counter, with O(1) step complexity for update. The next considers an (ε,δ)-bounded CountMin sketch and further shows how to relax IVL using the notion of r-relaxation. Our third example is a non-atomic iterator over a data structure. In this example, we augment the data structure with an auxiliary history variable state that includes “tombstones” for items deleted from the data structure. Here, IVL semantics are required at the augmented level. Finally, using a priority queue, we show that some objects require IVL to be paired with other correctness criteria; indeed, a natural correctness notion for a concurrent priority queue is IVL coupled with sequential consistency.
    Last, we show that IVL allows for inherently cheaper implementations than linearizable ones. In particular, we show a lower bound of Ω (n) on the step complexity of the update operation of any wait-free linearizable batched counter from single-writer multi-reader registers, which is more expensive than our O(1) IVL implementation.

    References

    [1]
    Yehuda Afek, Guy Korland, and Eitan Yanovsky. 2010. Quasi-linearizability: Relaxed consistency for improved concurrency. In Proceedings of the International Conference on Principles of Distributed Systems. Springer, 395–410.
    [2]
    Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff M. Phillips, Zhewei Wei, and Ke Yi. 2013. Mergeable summaries. ACM Trans. Datab. Syst. 38, 4 (2013), 1–28.
    [3]
    Vitalii Aksenov, Dan Alistarh, and Janne H. Korhonen. 2020. Scalable belief propagation via relaxed scheduling. Adv. Neural Inf. Process. Syst. 33 (2020), 22361–22372.
    [4]
    Dan Alistarh, Trevor Brown, Justin Kopinsky, Jerry Z. Li, and Giorgi Nadiradze. 2018. Distributionally linearizable data structures. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures. ACM, 133–142.
    [5]
    Maya Arbel-Raviv and Trevor Brown. 2018. Harnessing epoch-based reclamation for efficient range queries. ACM SIGPLAN Not. 53, 1 (2018), 14–27.
    [6]
    Hagit Attiya, Faith Ellen, and Panagiota Fatourou. 2006. The complexity of updating multi-writer snapshot objects. In Proceedings of the International Conference on Distributed Computing and Networking. Springer, 319–330.
    [7]
    Armando Castañeda, Sergio Rajsbaum, and Michel Raynal. 2018. Unifying concurrent objects and distributed tasks: Interval-linearizability. J. ACM 65, 6 (2018), 1–42.
    [8]
    Jacek Cichon and Wojciech Macyna. 2011. Approximate counters for flash memory. In Proceedings of the IEEE 17th International Conference on Embedded and Real-Time Computing Systems and Applications, Vol. 1. IEEE, 185–189.
    [9]
    Graham Cormode, Minos Garofalakis, Peter J. Haas, and Chris Jermaine. 2012. Synopses for massive data: Samples, histograms, wavelets, sketches. Found. Trends Datab. 4, 1–3 (2012), 1–294.
    [10]
    Graham Cormode and Shan Muthukrishnan. 2005. An improved data stream summary: The count-min sketch and its applications. J. Algor. 55, 1 (2005), 58–75.
    [11]
    Graham Cormode, Shanmugavelayutham Muthukrishnan, and Ke Yi. 2011. Algorithms for distributed functional monitoring. ACM Trans. Algor. 7, 2 (2011), 1–20.
    [12]
    Mayur Datar and Piotr Indyk. 2002. Comparing data streams using hamming norms. In Proceedingsof the 28th International Conference on Very Large Databases (VLDB). Elsevier.
    [13]
    Druid. 2015. Apache DataSketches. Retrieved from https://datasketches.apache.org/.
    [15]
    Philippe Flajolet. 1985. Approximate counting: A detailed analysis. BIT Numer. Math. 25, 1 (1985), 113–134.
    [16]
    Philippe Flajolet and G. Nigel Martin. 1983. Probabilistic counting. In Proceedings of the 24th Annual Symposium on Foundations of Computer Science (SFCS’83). IEEE, 76–82.
    [17]
    Phillip B. Gibbons and Srikanta Tirthapura. 2001. Estimating simple functions on the union of data streams. In Proceedings of the 13th Annual ACM Symposium on Parallel Algorithms and Architectures. 281–291.
    [18]
    Wojciech Golab, Lisa Higham, and Philipp Woelfel. 2011. Linearizable implementations do not suffice for randomized distributed computation. In Proceedings of the 43rd Annual ACM Symposium on Theory of Computing. 373–382.
    [19]
    Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel computation on natural graphs. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI’12). 17–30.
    [20]
    Thomas A. Henzinger, Christoph M. Kirsch, Hannes Payer, Ali Sezgin, and Ana Sokolova. 2013. Quantitative relaxation of concurrent data structures. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 317–328.
    [21]
    Maurice P. Herlihy and Jeannette M. Wing. 1990. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12, 3 (1990), 463–492.
    [22]
    Stefan Heule, Marc Nunkesser, and Alexander Hall. 2013. HyperLogLog in practice: Algorithmic engineering of a state of the art cardinality estimation algorithm. In Proceedings of the 16th International Conference on Extending Database Technology. 683–692.
    [23]
    Hillview. 2016. Hillview: A Big Data Spreadsheet. Retrieved from https://research.vmware.com/projects/hillview.
    [24]
    Jaap-Henk Hoepman and John Tromp. 1993. Binary snapshots. In Proceedings of the International Workshop on Distributed Algorithms. Springer, 18–25.
    [25]
    Intel. 1989. x86 Instruction Set Reference. Retrieved from https://c9x.me/x86/html/file_module_x86_id_327.html.
    [26]
    Amos Israeli and Asaf Shirazi. 1998. The time complexity of updating snapshot memories. Inform. Process. Lett. 65, 1 (1998), 33–40.
    [27]
    Leslie Lamport. 1986. On interprocess communication. Distrib. Comput. 1, 2 (1986), 86–101.
    [28]
    Leslie Lamport. 1990. Concurrent reading and writing of clocks. ACM Trans. Comput. Syst. 8, 4 (1990), 305–310.
    [29]
    Zaoxing Liu, Antonis Manousis, Gregory Vorsanger, Vyas Sekar, and Vladimir Braverman. 2016. One sketch to rule them all: Rethinking network flow monitoring with UnivMon. In Proceedings of the ACM SIGCOMM Conference. 101–114.
    [30]
    Nihar R. Mahapatra and Balakrishna Venkatrao. 1999. The processor-memory bottleneck: Problems and solutions. XRDS: Crossr. ACM Mag. Stud. 5, 3es (1999).
    [31]
    Hagar Meir, Dmitry Basin, Edward Bortnikov, Anastasia Braginsky, Yonatan Gottesman, Idit Keidar, Eran Meir, Gali Sheffi, and Yoav Zuriel. 2020. Oak: A scalable off-heap allocated key-value map. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 17–31.
    [32]
    Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2005. Efficient computation of frequent and top-k elements in data streams. In Proceedings of the International Conference on Database Theory. Springer, 398–412.
    [33]
    Robert Morris. 1978. Counting large numbers of events in small registers. Commun. ACM 21, 10 (1978), 840–842.
    [34]
    Gil Neiger. 1994. Set-linearizability. In Proceedings of the 13th Annual ACM Symposium on Principles of Distributed Computing. 396.
    [35]
    Sean Ovens and Philipp Woelfel. 2019. Strongly linearizable implementations of snapshots and other types. In Proceedings of the ACM Symposium on Principles of Distributed Computing. 197–206.
    [36]
    Presto. 2018. HyperLogLog in Presto: A significantly faster way to handle cardinality estimation. Retrieved from https://engineering.fb.com/data-infrastructure/hyperloglog/.
    [37]
    Arik Rinberg, Alexander Spiegelman, Edward Bortnikov, Eshcar Hillel, Idit Keidar, Lee Rhodes, and Hadar Serviansky. 2020. Fast concurrent data sketches. In Proceedings of the ACM Symposium on Principles and Practice of Parallel Programming. ACM.
    [38]
    Robert Rönngren and Rassul Ayani. 1997. A comparative study of parallel and sequential priority queue algorithms. ACM Trans. Model. Comput. Simul. 7, 2 (1997), 157–209.
    [39]
    Christoph Scheurich and Michel Dubois. 1987. Correct memory operation of cache-based multiprocessors. In Proceedings of the 14th Annual International Symposium on Computer Architecture. 234–243.
    [40]
    Charalampos Stylianopoulos, Ivan Walulya, Magnus Almgren, Olaf Landsiedel, and Marina Papatriantafilou. 2020. Delegation sketch: A parallel design with support for fast and accurate concurrent operations. In Proceedings of the 15th European Conference on Computer Systems. 1–16.
    [41]
    Peter van Emde Boas, Robert Kaas, and Erik Zijlstra. 1976. Design and implementation of an efficient priority queue. Math. Syst. Theor. 10, 1 (1976), 99–127.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Journal of the ACM
    Journal of the ACM  Volume 70, Issue 2
    April 2023
    329 pages
    ISSN:0004-5411
    EISSN:1557-735X
    DOI:10.1145/3587260
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 April 2023
    Online AM: 22 February 2023
    Accepted: 19 January 2023
    Revised: 28 November 2022
    Received: 23 May 2021
    Published in JACM Volume 70, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Concurrency
    2. concurrent objects
    3. linearizability

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 171
      Total Downloads
    • Downloads (Last 12 months)97
    • Downloads (Last 6 weeks)6

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media