Search | arXiv e-print repository

arXiv:2011.07921 [pdf, other]

Towards a General Framework for ML-based Self-tuning Databases

Authors: Thomas Schmied, Diego Didona, Andreas Döring, Thomas Parnell, Nikolas Ioannou

Abstract: Machine learning (ML) methods have recently emerged as an effective way to perform automated parameter tuning of databases. State-of-the-art approaches include Bayesian optimization (BO) and reinforcement learning (RL). In this work, we describe our experience when applying these methods to a database not yet studied in this context: FoundationDB. Firstly, we describe the challenges we faced, such… ▽ More Machine learning (ML) methods have recently emerged as an effective way to perform automated parameter tuning of databases. State-of-the-art approaches include Bayesian optimization (BO) and reinforcement learning (RL). In this work, we describe our experience when applying these methods to a database not yet studied in this context: FoundationDB. Firstly, we describe the challenges we faced, such as unknown valid ranges of configuration parameters and combinations of parameter values that result in invalid runs, and how we mitigated them. While these issues are typically overlooked, we argue that they are a crucial barrier to the adoption of ML self-tuning techniques in databases, and thus deserve more attention from the research community. Secondly, we present experimental results obtained when tuning FoundationDB using ML methods. Unlike prior work in this domain, we also compare with the simplest of baselines: random search. Our results show that, while BO and RL methods can improve the throughput of FoundationDB by up to 38%, random search is a highly competitive baseline, finding a configuration that is only 4% worse than the, vastly more complex, ML methods. We conclude that future work in this area may want to focus more on randomized, model-free optimization algorithms. △ Less

Submitted 27 April, 2021; v1 submitted 16 November, 2020; originally announced November 2020.

arXiv:2006.04658 [pdf, other]

Toward a Better Understanding and Evaluation of Tree Structures on Flash SSDs

Authors: Diego Didona, Nikolas Ioannou, Radu Stoica, Kornilios Kourtis

Abstract: Solid-state drives (SSDs) are extensively used to deploy persistent data stores, as they provide low latency random access, high write throughput, high data density, and low cost. Tree-based data structures are widely used to build persistent data stores, and indeed they lie at the backbone of many of the data management systems used in production and research today. In this paper, we show that be… ▽ More Solid-state drives (SSDs) are extensively used to deploy persistent data stores, as they provide low latency random access, high write throughput, high data density, and low cost. Tree-based data structures are widely used to build persistent data stores, and indeed they lie at the backbone of many of the data management systems used in production and research today. In this paper, we show that benchmarking a persistent tree-based data structure on an SSD is a complex process, which may easily incur subtle pitfalls that can lead to an inaccurate performance assessment. At a high-level, these pitfalls stem from the interaction of complex software running on complex hardware. On one hand, tree structures implement internal operations that have nontrivial effects on performance. On the other hand, SSDs employ firmware logic to deal with the idiosyncrasies of the underlying flash memory, which are well known to lead to complex performance dynamics. We identify seven benchmarking pitfalls using RocksDB and WiredTiger, two widespread implementations of an LSM-Tree and a B+Tree, respectively. We show that such pitfalls can lead to incorrect measurements of key performance indicators, hinder the reproducibility and the representativeness of the results, and lead to suboptimal deployments in production environments. We also provide guidelines on how to avoid these pitfalls to obtain more reliable performance measurements, and to perform more thorough and fair comparison among different design points. △ Less

Submitted 8 June, 2020; originally announced June 2020.

arXiv:1905.02119 [pdf, ps, other]

Lynceus: Cost-efficient Tuning and Provisioning of Data Analytic Jobs

Authors: Maria Casimiro, Diego Didona, Paolo Romano, Luís Rodrigues, Willy Zwanepoel, David Garlan

Abstract: Modern data analytic and machine learning jobs find in the cloud a natural deployment platform to satisfy their notoriously large resource requirements. Yet, to achieve cost efficiency, it is crucial to identify a deployment configuration that satisfies user-defined QoS constraints (e.g., on execution time), while avoiding unnecessary over-provisioning. This paper introduces Lynceus, a new approac… ▽ More Modern data analytic and machine learning jobs find in the cloud a natural deployment platform to satisfy their notoriously large resource requirements. Yet, to achieve cost efficiency, it is crucial to identify a deployment configuration that satisfies user-defined QoS constraints (e.g., on execution time), while avoiding unnecessary over-provisioning. This paper introduces Lynceus, a new approach for the optimization of cloud based data analytic jobs that improves overstate-of-the-art approaches by enabling significant cost savings both in terms of the final recommended configuration and of the optimization process used to recommend configurations. Unlike existing solutions, Lynceus optimizes in a joint fashion both the cloud-related and the application-level parameters. This allows for a reduction of the cost of recommended configurations by up to 3.7x at the 90-th percentile with respect to existing approaches, which treat the optimization of cloud-related and application-level parameters as two independent problems. Further, Lynceus reduces the cost of the optimization process (i.e., the cloud cost incurred for testing configurations) by up to 11x. Such an improvement is achieved thanks to two mechanisms: i) a timeout approach which allows to abort the exploration of configurations that are deemed suboptimal, while still extracting useful information to guide future explorations and to improve its predictive model - differently from recent works, which either incur the full cost for testing suboptimal configurations or are unable to extract any knowledge from aborted runs; ii) a long-sighted and budget-aware technique that determines which configurations to test by predicting the long-term impact of each exploration - unlike state-of-the-art approaches for the optimization of cloud jobs, which adopt greedy optimization methods. △ Less

Submitted 20 January, 2020; v1 submitted 6 May, 2019; originally announced May 2019.

Comments: This updated version features a novel extension of our approach: the time out mechanism. Additionally, we improved the write-up of the paper, fruit of the collaboration with professor David Garlan and Carnegie Mellon Univeristy

arXiv:1903.09106 [pdf, other]

Distributed Transactional Systems Cannot Be Fast

Authors: Diego Didona, Panagiota Fatourou, Rachid Guerraoui, Jingjing Wang, Willy Zwaenepoel

Abstract: We prove that no fully transactional system can provide fast read transactions (including read-only ones that are considered the most frequent in practice). Specifically, to achieve fast read transactions, the system has to give up support of transactions that write more than one object. We prove this impossibility result for distributed storage systems that are causally consistent, i.e., they do… ▽ More We prove that no fully transactional system can provide fast read transactions (including read-only ones that are considered the most frequent in practice). Specifically, to achieve fast read transactions, the system has to give up support of transactions that write more than one object. We prove this impossibility result for distributed storage systems that are causally consistent, i.e., they do not require to ensure any strong form of consistency. Therefore, our result holds also for any system that ensures a consistency level stronger than causal consistency, e.g., strict serializability. The impossibility result holds even for systems that store only two objects (and support at least two servers and at least four clients). It also holds for systems that are partially replicated. Our result justifies the design choices of state-of-the-art distributed transactional systems and insists that system designers should not put more effort to design fully-functional systems that support both fast read transactions and ensure causal or any stronger form of consistency. △ Less

Submitted 10 April, 2019; v1 submitted 21 March, 2019; originally announced March 2019.

arXiv:1902.09327 [pdf, other]

PaRiS: Causally Consistent Transactions with Non-blocking Reads and Partial Replication

Authors: Kristina Spirovska, Diego Didona, Willy Zwaenepoel

Abstract: Geo-replicated data platforms are at the backbone of several large-scale online services. Transactional Causal Consistency (TCC) is an attractive consistency level for building such platforms. TCC avoids many anomalies of eventual consistency, eschews the synchronization costs of strong consistency, and supports interactive read-write transactions. Partial replication is another attractive design… ▽ More Geo-replicated data platforms are at the backbone of several large-scale online services. Transactional Causal Consistency (TCC) is an attractive consistency level for building such platforms. TCC avoids many anomalies of eventual consistency, eschews the synchronization costs of strong consistency, and supports interactive read-write transactions. Partial replication is another attractive design choice for building geo-replicated platforms, as it increases the storage capacity and reduces update propagation costs. This paper presents PaRiS, the first TCC system that supports partial replication and implements non-blocking parallel read operations, whose latency is paramount for the performance of read-intensive applications. PaRiS relies on a novel protocol to track dependencies, called Universal Stable Time (UST). By means of a lightweight background gossip process, UST identifies a snapshot of the data that has been installed by every DC in the system. Hence, transactions can consistently read from such a snapshot on any server in any replication site without having to block. Moreover, PaRiS requires only one timestamp to track dependencies and define transactional snapshots, thereby achieving resource efficiency and scalability. We evaluate PaRiS on a large-scale AWS deployment composed of up to 10 replication sites. We show that PaRiS scales well with the number of DCs and partitions, while being able to handle larger data-sets than existing solutions that assume full replication. We also demonstrate a performance gain of non-blocking reads vs. a blocking alternative (up to 1.47x higher throughput with 5.91x lower latency for read-dominated workloads and up to 1.46x higher throughput with 20.56x lower latency for write-heavy workloads). △ Less

Submitted 25 February, 2019; originally announced February 2019.

arXiv:1803.06341 [pdf, other]

Distributed Transactions: Dissecting the Nightmare

Authors: Diego Didona, Rachid Guerraoui, Jingjing Wang, Willy Zwaenepoel

Abstract: Many distributed storage systems are transactional and a lot of work has been devoted to optimizing their performance, especially the performance of read-only transactions that are considered the most frequent in practice. Yet, the results obtained so far are rather disappointing, and some of the design decisions seem contrived. This paper contributes to explaining this state of affairs by proving… ▽ More Many distributed storage systems are transactional and a lot of work has been devoted to optimizing their performance, especially the performance of read-only transactions that are considered the most frequent in practice. Yet, the results obtained so far are rather disappointing, and some of the design decisions seem contrived. This paper contributes to explaining this state of affairs by proving intrinsic limitations of transactional storage systems, even those that need not ensure strong consistency but only causality. We first consider general storage systems where some transactions are read-only and some also involve write operations. We show that even read-only transactions cannot be "fast": their operations cannot be executed within one round-trip message exchange between a client seeking an object and the server storing it. We then consider systems (as sometimes implemented today) where all transactions are read-only, i.e., updates are performed as individual operations outside transactions. In this case, read-only transactions can indeed be "fast", but we prove that they need to be "visible". They induce inherent updates on the servers, which in turn impact their overall performance. △ Less

Submitted 16 March, 2018; originally announced March 2018.

arXiv:1803.04237 [pdf, other]

Causal Consistency and Latency Optimality: Friend or Foe?

Authors: Diego Didona, Rachid Guerraoui, Jingjing Wang, Willy Zwaenepoel

Abstract: Causal consistency is an attractive consistency model for replicated data stores. It is provably the strongest model that tolerates partitions, it avoids the long latencies associated with strong consistency, and, especially when using read-only transactions, it prevents many of the anomalies of weaker consistency models. Recent work has shown that causal consistency allows "latency-optimal" read-… ▽ More Causal consistency is an attractive consistency model for replicated data stores. It is provably the strongest model that tolerates partitions, it avoids the long latencies associated with strong consistency, and, especially when using read-only transactions, it prevents many of the anomalies of weaker consistency models. Recent work has shown that causal consistency allows "latency-optimal" read-only transactions, that are nonblocking, single-version and single-round in terms of communication. On the surface, this latency optimality is very appealing, as the vast majority of applications are assumed to have read-dominated workloads. In this paper, we show that such "latency-optimal" read-only transactions induce an extra overhead on writes, the extra overhead is so high that performance is actually jeopardized, even in read-dominated workloads. We show this result from a practical and a theoretical angle. First, we present a protocol that implements "almost laten- cy-optimal" ROTs but does not impose on the writes any of the overhead of latency-optimal protocols. In this protocol, ROTs are nonblocking, one version and can be configured to use either two or one and a half rounds of client-server communication. We experimentally show that this protocol not only provides better throughput, as expected, but also surprisingly better latencies for all but the lowest loads and most read-heavy workloads. Then, we prove that the extra overhead imposed on writes by latency-optimal read-only transactions is inherent, i.e., it is not an artifact of the design we consider, and cannot be avoided by any implementation of latency-optimal read-only transactions. We show in particular that this overhead grows linearly with the number of clients. △ Less

Submitted 12 March, 2018; originally announced March 2018.

arXiv:1802.00696 [pdf, other]

Size-aware Sharding For Improving Tail Latencies in In-memory Key-value Stores

Authors: Diego Didona, Willy Zwaenepoel

Abstract: This paper introduces the concept of size-aware sharding to improve tail latencies for in-memory key-value stores, and describes its implementation in the Minos key-value store. Tail latencies are crucial in distributed applications with high fan-out ratios, because overall response time is determined by the slowest response. Size-aware sharding distributes requests for keys to cores according to… ▽ More This paper introduces the concept of size-aware sharding to improve tail latencies for in-memory key-value stores, and describes its implementation in the Minos key-value store. Tail latencies are crucial in distributed applications with high fan-out ratios, because overall response time is determined by the slowest response. Size-aware sharding distributes requests for keys to cores according to the size of the item associated with the key. In particular, requests for small and large items are sent to disjoint subsets of cores. Size-aware sharding improves tail latencies by avoiding head-of-line blocking, in which a request for a small item gets queued behind a request for a large item. Alternative size-unaware approaches to sharding, such as keyhash-based sharding, request dispatching and stealing do not avoid head-of-line blocking, and therefore exhibit worse tail latencies. The challenge in implementing size-aware sharding is to maintain high throughput by avoiding the cost of software dispatching and by achieving load balancing between different cores. Minos uses hardware dispatch for all requests for small items, which form the very large majority of all requests. It achieves load balancing by adapting the number of cores handling requests for small and large items to their relative presence in the workload. We compare Minos to three state-of-the-art designs of in-memory KV stores. Compared to its closest competitor, Minos achieves a 99th percentile latency that is up to two orders of magnitude lower. Put differently, for a given value for the 99th percentile latency equal to 10 times the mean service time, Minos achieves a throughput that is up to 7.4 times higher. △ Less

Submitted 2 February, 2018; originally announced February 2018.

arXiv:1702.04263 [pdf, other]

Okapi: Causally Consistent Geo-Replication Made Faster, Cheaper and More Available

Authors: Diego Didona, Kristina Spirovska, Willy Zwaenepoel

Abstract: Okapi is a new causally consistent geo-replicated key- value store. Okapi leverages two key design choices to achieve high performance. First, it relies on hybrid logical/physical clocks to achieve low latency even in the presence of clock skew. Second, Okapi achieves higher resource efficiency and better availability, at the expense of a slight increase in update visibility latency. To this end,… ▽ More Okapi is a new causally consistent geo-replicated key- value store. Okapi leverages two key design choices to achieve high performance. First, it relies on hybrid logical/physical clocks to achieve low latency even in the presence of clock skew. Second, Okapi achieves higher resource efficiency and better availability, at the expense of a slight increase in update visibility latency. To this end, Okapi implements a new stabilization protocol that uses a combination of vector and scalar clocks and makes a remote update visible when its delivery has been acknowledged by every data center. We evaluate Okapi with different workloads on Amazon AWS, using three geographically distributed regions and 96 nodes. We compare Okapi with two recent approaches to causal consistency, Cure and GentleRain. We show that Okapi delivers up to two orders of magnitude better performance than GentleRain and that Okapi achieves up to 3.5x lower latency and a 60% reduction of the meta-data overhead with respect to Cure. △ Less

Submitted 14 February, 2017; originally announced February 2017.

arXiv:1411.7910 [pdf, other]

A Flexible Framework for Accurate Simulation of Cloud In-Memory Data Stores

Authors: Pierangelo Di Sanzo, Francesco Quaglia, Bruno Ciciani, Alessandro Pellegrini, Diego Didona, Paolo Romano, Roberto Palmieri, Sebastiano Peluso

Abstract: In-memory (transactional) data stores are recognized as a first-class data management technology for cloud platforms, thanks to their ability to match the elasticity requirements imposed by the pay-as-you-go cost model. On the other hand, defining the well-suited amount of cache servers to be deployed, and the degree of in-memory replication of slices of data, in order to optimize reliability/avai… ▽ More In-memory (transactional) data stores are recognized as a first-class data management technology for cloud platforms, thanks to their ability to match the elasticity requirements imposed by the pay-as-you-go cost model. On the other hand, defining the well-suited amount of cache servers to be deployed, and the degree of in-memory replication of slices of data, in order to optimize reliability/availability and performance tradeoffs, is far from being a trivial task. Yet, it is an essential aspect of the provisioning process of cloud platforms, given that it has an impact on how well cloud resources are actually exploited. To cope with the issue of determining optimized configurations of cloud in-memory data stores, in this article we present a flexible simulation framework offering skeleton simulation models that can be easily specialized in order to capture the dynamics of diverse data grid systems, such as those related to the specific protocol used to provide data consistency and/or transactional guarantees. Besides its flexibility, another peculiar aspect of the framework lies in that it integrates simulation and machine-learning (black-box) techniques, the latter being essentially used to capture the dynamics of the data-exchange layer (e.g. the message passing layer) across the cache servers. This is a relevant aspect when considering that the actual data-transport/networking infrastructure on top of which the data grid is deployed might be unknown, hence being not feasible to be modeled via white-box (namely purely simulative) approaches. We also provide an extended experimental study aimed at validating instances of simulation models supported by our framework against execution dynamics of real data grid systems deployed on top of either private or public cloud infrastructures. △ Less

Submitted 28 November, 2014; originally announced November 2014.

Comments: 34 pages

arXiv:1410.5102 [pdf, other]

On Bootstrapping Machine Learning Performance Predictors via Analytical Models

Authors: Diego Didona, Paolo Romano

Abstract: Performance modeling typically relies on two antithetic methodologies: white box models, which exploit knowledge on system's internals and capture its dynamics using analytical approaches, and black box techniques, which infer relations among the input and output variables of a system based on the evidences gathered during an initial training phase. In this paper we investigate a technique, which… ▽ More Performance modeling typically relies on two antithetic methodologies: white box models, which exploit knowledge on system's internals and capture its dynamics using analytical approaches, and black box techniques, which infer relations among the input and output variables of a system based on the evidences gathered during an initial training phase. In this paper we investigate a technique, which we name Bootstrapping, which aims at reconciling these two methodologies and at compensating the cons of the one with the pros of the other. We thoroughly analyze the design space of this gray box modeling technique, and identify a number of algorithmic and parametric trade-offs which we evaluate via two realistic case studies, a Key-Value Store and a Total Order Broadcast service. △ Less

Submitted 19 October, 2014; originally announced October 2014.

Comments: 11 pages

Showing 1–11 of 11 results for author: Didona, D