Search | arXiv e-print repository

Efficiently Processing Large Relational Joins on GPUs

Authors: Bowen Wu, Dimitrios Koutsoukos, Gustavo Alonso

Abstract: With the growing interest in Machine Learning (ML), Graphic Processing Units (GPUs) have become key elements of any computing infrastructure. Their widespread deployment in data centers and the cloud raises the question of how to use them beyond ML use cases, with growing interest in employing them in a database context. In this paper, we explore and analyze the implementation of relational joins… ▽ More With the growing interest in Machine Learning (ML), Graphic Processing Units (GPUs) have become key elements of any computing infrastructure. Their widespread deployment in data centers and the cloud raises the question of how to use them beyond ML use cases, with growing interest in employing them in a database context. In this paper, we explore and analyze the implementation of relational joins on GPUs from an end-to-end perspective, meaning that we take result materialization into account. We conduct a comprehensive performance study of state-of-the-art GPU-based join algorithms over diverse synthetic workloads and TPC-H/TPC-DS benchmarks. Without being restricted to the conventional setting where each input relation has only one key and one non-key with all attributes being 4-bytes long, we investigate the effect of various factors (e.g., input sizes, number of non-key columns, skewness, data types, match ratios, and number of joins) on the end-to-end throughput. Furthermore, we propose a technique called "Gather-from-Transformed-Relations" (GFTR) to reduce the long-ignored yet high materialization cost in GPU-based joins. The experimental evaluation shows significant performance improvements from GFTR, with throughput gains of up to 2.3 times over previous work. The insights gained from the performance study not only advance the understanding of GPU-based joins but also introduce a structured approach to selecting the most efficient GPU join algorithm based on the input relation characteristics. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2112.00425 [pdf, other]

How to use Persistent Memory in your Database

Authors: Dimitrios Koutsoukos, Raghav Bhartia, Ana Klimovic, Gustavo Alonso

Abstract: Persistent or Non Volatile Memory (PMEM or NVM) has recently become commercially available under several configurations with different purposes and goals. Despite the attention to the topic, we are not aware of a comprehensive empirical analysis of existing relational database engines under different PMEM configurations. Such a study is important to understand the performance implications of the v… ▽ More Persistent or Non Volatile Memory (PMEM or NVM) has recently become commercially available under several configurations with different purposes and goals. Despite the attention to the topic, we are not aware of a comprehensive empirical analysis of existing relational database engines under different PMEM configurations. Such a study is important to understand the performance implications of the various hardware configurations and how different DB engines can benefit from them. To this end, we analyze three different engines (PostgreSQL, MySQL, and SQLServer) under common workloads (TPC-C and TPC-H) with all possible PMEM configurations supported by Intel's Optane NVM devices (PMEM as persistent memory in AppDirect mode and PMEM as volatile memory in Memory mode). Our results paint a complex picture and are not always intuitive due to the many factors involved. Based on our findings, we provide insights on how the different engines behave with PMEM and which configurations and queries perform best. Our results show that using PMEM as persistent storage usually speeds up query execution, but with some caveats as the I/O path is not fully optimized. Additionally, using PMEM in Memory mode does not offer any performance advantage despite the larger volatile memory capacity. Through the extensive coverage of engines and parameters, we provide an important starting point for exploiting PMEM in databases and tuning relational engines to take advantage of this new technology. △ Less

Submitted 1 December, 2021; originally announced December 2021.

arXiv:2106.07102 [pdf, other]

Farview: Disaggregated Memory with Operator Off-loading for Database Engines

Authors: Dario Korolija, Dimitrios Koutsoukos, Kimberly Keeton, Konstantin Taranov, Dejan Milojičić, Gustavo Alonso

Abstract: Cloud deployments disaggregate storage from compute, providing more flexibility to both the storage and compute layers. In this paper, we explore disaggregation by taking it one step further and applying it to memory (DRAM). Disaggregated memory uses network attached DRAM as a way to decouple memory from CPU. In the context of databases, such a design offers significant advantages in terms of maki… ▽ More Cloud deployments disaggregate storage from compute, providing more flexibility to both the storage and compute layers. In this paper, we explore disaggregation by taking it one step further and applying it to memory (DRAM). Disaggregated memory uses network attached DRAM as a way to decouple memory from CPU. In the context of databases, such a design offers significant advantages in terms of making a larger memory capacity available as a central pool to a collection of smaller processing nodes. To explore these possibilities, we have implemented Farview, a disaggregated memory solution for databases, operating as a remote buffer cache with operator offloading capabilities. Farview is implemented as an FPGA-based smart NIC making DRAM available as a disaggregated, network attached memory module capable of performing data processing at line rate over data streams to/from disaggregated memory. Farview supports query offloading using operators such as selection, projection, aggregation, regular expression matching and encryption. In this paper we focus on analytical queries and demonstrate the viability of the idea through an extensive experimental evaluation of Farview under different workloads. Farview is competitive with a local buffer cache solution for all the workloads and outperforms it in a number of cases, proving that a smart disaggregated memory can be a viable alternative for databases deployed in cloud environments. △ Less

Submitted 13 June, 2021; originally announced June 2021.

Comments: 12 pages

arXiv:2004.03488 [pdf, other]

Modularis: Modular Relational Analytics over Heterogeneous Distributed Platforms

Authors: Dimitrios Koutsoukos, Ingo Müller, Renato Marroquín, Ana Klimovic, Gustavo Alonso

Abstract: The enormous quantity of data produced every day together with advances in data analytics has led to a proliferation of data management and analysis systems. Typically, these systems are built around highly specialized monolithic operators optimized for the underlying hardware. While effective in the short term, such an approach makes the operators cumbersome to port and adapt, which is increasing… ▽ More The enormous quantity of data produced every day together with advances in data analytics has led to a proliferation of data management and analysis systems. Typically, these systems are built around highly specialized monolithic operators optimized for the underlying hardware. While effective in the short term, such an approach makes the operators cumbersome to port and adapt, which is increasingly required due to the speed at which algorithms and hardware evolve. To address this limitation, we present Modularis, an execution layer for data analytics based on sub-operators, i.e.,composable building blocks resembling traditional database operators but at a finer granularity. To demonstrate the advantages of our approach, we use Modularis to build a distributed query processing system supporting relational queries running on an RDMA cluster, a serverless cloud platform, and a smart storage engine. Modularis requires minimal code changes to execute queries across these three diverse hardware platforms, showing that the sub-operator approach reduces the amount and complexity of the code. In fact, changes in the platform affect only sub-operators that depend on the underlying hardware. We show the end-to-end performance of Modularis by comparing it with a framework for SQL processing (Presto), a commercial cluster database (SingleStore), as well as Query-as-a-Service systems (Athena, BigQuery). Modularis outperforms all these systems, proving that the design and architectural advantages of a modular design can be achieved without degrading performance. We also compare Modularis with a hand-optimized implementation of a join for RDMA clusters. We show that Modularis has the advantage of being easily extensible to a wider range of join variants and group by queries, all of which are not supported in the hand-tuned join. △ Less

Submitted 29 September, 2021; v1 submitted 7 April, 2020; originally announced April 2020.

Comments: Accepted at PVLDB vol. 14

arXiv:2004.01908 [pdf, other]

The Collection Virtual Machine: An Abstraction for Multi-Frontend Multi-Backend Data Analysis

Authors: Ingo Müller, Renato Marroquín, Dimitrios Koutsoukos, Mike Wawrzoniak, Sabir Akhadov, Gustavo Alonso

Abstract: Getting the best performance from the ever-increasing number of hardware platforms has been a recurring challenge for data processing systems. In recent years, the advent of data science with its increasingly numerous and complex types of analytics has made this challenge even more difficult. In practice, system designers are overwhelmed by the number of combinations and typically implement only o… ▽ More Getting the best performance from the ever-increasing number of hardware platforms has been a recurring challenge for data processing systems. In recent years, the advent of data science with its increasingly numerous and complex types of analytics has made this challenge even more difficult. In practice, system designers are overwhelmed by the number of combinations and typically implement only one analysis/platform combination, leading to repeated implementation effort -- and a plethora of semi-compatible tools for data scientists. In this paper, we propose the "Collection Virtual Machine" (or CVM) -- an extensible compiler framework designed to keep the specialization process of data analytics systems tractable. It can capture at the same time the essence of a large span of low-level, hardware-specific implementation techniques as well as high-level operations of different types of analyses. At its core lies a language for defining nested, collection-oriented intermediate representations (IRs). Frontends produce programs in their IR flavors defined in that language, which get optimized through a series of rewritings (possibly changing the IR flavor multiple times) until the program is finally expressed in an IR of platform-specific operators. While reducing the overall implementation effort, this also improves the interoperability of both analyses and hardware platforms. We have used CVM successfully to build specialized backends for platforms as diverse as multi-core CPUs, RDMA clusters, and serverless computing infrastructure in the cloud and expect similar results for many more frontends and hardware platforms in the near future. △ Less

Submitted 8 April, 2020; v1 submitted 4 April, 2020; originally announced April 2020.

Comments: This paper is currently under review at DaMoN'20

arXiv:1606.06699 [pdf, other]

Resilient Supervisory Control of Autonomous Intersections in the Presence of Sensor Attacks

Authors: Amin Ghafouri, Xenofon D. Koutsoukos

Abstract: Cyber-physical systems (CPS), such as autonomous vehicles crossing an intersection, are vulnerable to cyber-attacks and their safety-critical nature makes them a target for malicious adversaries. This paper studies the problem of supervisory control of autonomous intersections in the presence of sensor attacks. Sensor attacks are performed when an adversary gains access to the transmission channel… ▽ More Cyber-physical systems (CPS), such as autonomous vehicles crossing an intersection, are vulnerable to cyber-attacks and their safety-critical nature makes them a target for malicious adversaries. This paper studies the problem of supervisory control of autonomous intersections in the presence of sensor attacks. Sensor attacks are performed when an adversary gains access to the transmission channel and corrupts the measurements before they are received by the decision-making unit. We show that the supervisory control system is vulnerable to sensor attacks that can cause collision or deadlock among vehicles. To improve the system resilience, we introduce a detector in the control architecture and focus on stealthy attacks that cannot be detected but are capable of compromising safety. We then present a resilient supervisory control system that is safe, non-deadlocking, and maximally permissive, despite the presence of disturbances, uncontrolled vehicles, and sensor attacks. Finally, we demonstrate how the resilient supervisor works by considering illustrative examples. △ Less

Submitted 21 June, 2016; originally announced June 2016.

Comments: 8 pages, submitted to 55th IEEE Conference on Decision and Control (CDC 2016)

Showing 1–6 of 6 results for author: Koutsoukos, D