Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Volume 16, Issue 9May 2023
Reflects downloads up to 17 Oct 2024Bibliometrics
Skip Table Of Content Section
Neighborhood-Based Hypergraph Core Decomposition

We propose neighborhood-based core decomposition: a novel way of decomposing hypergraphs into hierarchical neighborhood-cohesive subhypergraphs. Alternative approaches to decomposing hypergraphs, e.g., reduction to clique or bipartite graphs, are not ...

Temporal SIR-GN: Efficient and Effective Structural Representation Learning for Temporal Graphs

Node representation learning (NRL) generates numerical vectors (embeddings) for the nodes of a graph. Structural NRL specifically assigns similar node embeddings for those nodes that exhibit similar structural roles. This is in contrast with its ...

What Modern NVMe Storage Can Do, and How to Exploit it: High-Performance I/O for High-Performance Storage Engines

NVMe SSDs based on flash are cheap and offer high throughput. Combining several of these devices into a single server enables 10 million I/O operations per second or more. Our experiments show that existing out-of-memory database systems and storage ...

WiscSort: External Sorting for Byte-Addressable Storage

We present WiscSort, a new approach to high-performance concurrent sorting for existing and future byte-addressable storage (BAS) devices. WiscSort carefully reduces writes, exploits random reads by splitting keys and values during sorting, and performs ...

Text Indexing for Long Patterns: Anchors are All you Need

In many real-world database systems, a large fraction of the data is represented by strings: sequences of letters over some alphabet. This is because strings can easily encode data arising from different sources. It is often crucial to represent such ...

The FastLanes Compression Layout: Decoding > 100 Billion Integers per Second with Scalar Code

The open-source FastLanes project aims to improve big data formats, such as Parquet, ORC and columnar database formats, in multiple ways. In this paper, we significantly accelerate decoding of all common Light-Weight Compression (LWC) schemes: DICT, FOR,...

VeriBench: Analyzing the Performance of Database Systems with Verifiability

Database systems are paying more attention to data security in recent years. Immutable systems such as blockchains, verifiable databases, and ledger databases are equipped with various verifiability mechanisms to protect data. Such systems often adopt ...

Towards Designing and Learning Piecewise Space-Filling Curves

To index multi-dimensional data, space-filling curves (SFCs) have been used to map the data to one dimension, and then a one-dimensional indexing method such as the B-tree is used to index the mapped data. The existing SFCs all adopt a single mapping ...

MiniGraph: Querying Big Graphs with a Single Machine

This paper presents MiniGraph, an out-of-core system for querying big graphs with a single machine. As opposed to previous single-machine graph systems, MiniGraph proposes a pipelined architecture to overlap I/O and CPU operations, and improves multi-...

BICE: Exploring Compact Search Space by Using Bipartite Matching and Cell-Wide Verification

Subgraph matching is the problem of searching for all embeddings of a query graph in a data graph, and subgraph query processing (also known as subgraph search) is to find all the data graphs that contain a query graph as subgraphs. Extensive research ...

Maximal D-Truss Search in Dynamic Directed Graphs

Community search (CS) aims at personalized subgraph discovery which is the key to understanding the organisation of many real-world networks. CS in undirected networks has attracted significant attention from researchers, including many solutions for ...

DILI: A Distribution-Driven Learned Index

Targeting in-memory one-dimensional search keys, we propose a novel DIstribution-driven Learned Index tree (DILI), where a concise and computation-efficient linear regression model is used for each node. An internal node's key range is equally divided ...

Pre-Trained Embeddings for Entity Resolution: An Experimental Analysis

Many recent works on Entity Resolution (ER) leverage Deep Learning techniques involving language models to improve effectiveness. This is applied to both main steps of ER, i.e., blocking and matching. Several pre-trained embeddings have been tested, ...

Decoupled Graph Neural Networks for Large Dynamic Graphs

Real-world graphs, such as social networks, financial transactions, and recommendation systems, often demonstrate dynamic behavior. This phenomenon, known as graph stream, involves the dynamic changes of nodes and the emergence and disappearance of ...

Adaptive Indexing of Objects with Spatial Extent

Can we quickly explore large multidimensional data in main memory? Adaptive indexing responds to this need by building an index incrementally, in response to queries; in its default form, it indexes a single attribute or, in the presence of several ...

LEON: A New Framework for ML-Aided Query Optimization

Query optimization has long been a fundamental yet challenging topic in the database field. With the prosperity of machine learning (ML), some recent works have shown the advantages of reinforcement learning (RL) based learned query optimizer. However, ...

TiQuE: Improving the Transactional Performance of Analytical Systems for True Hybrid Workloads

Transactions have been a key issue in database management for a long time and there are a plethora of architectures and algorithms to support and implement them. The current state-of-the-art is focused on storage management and is tightly coupled with ...

Seiden: Revisiting Query Processing in Video Database Systems

State-of-the-art video database management systems (VDBMSs) often use lightweight proxy models to accelerate object retrieval and aggregate queries. The key assumption underlying these systems is that the proxy model is an order of magnitude faster than ...

Extract-Transform-Load for Video Streams

Social media, self-driving cars, and traffic cameras produce video streams at large scales and cheap cost. However, storing and querying video at such scales is prohibitively expensive. We propose to treat large-scale video analytics as a data ...

research-article
Pando: Enhanced Data Skipping with Logical Data Partitioning

With enormous volumes of data, quickly retrieving data that is relevant to a query is essential for achieving high performance. Modern cloud-based database systems often partition the data into blocks and employ various techniques to skip irrelevant ...

Cracking-Like Join for Trusted Execution Environments

Data processing on non-trusted infrastructures, such as the public cloud, has become increasingly popular, despite posing risks to data privacy. However, the existing cloud DBMSs either lack sufficient privacy guarantees or underperform. In this paper, ...

Opportunities for Quantum Acceleration of Databases: Optimization of Queries and Transaction Schedules

The capabilities of quantum computers, such as the number of supported qubits and maximum circuit depth, have grown exponentially in recent years. Commercially relevant applications that take advantage of quantum computing are expected to be available ...

SDPipe: A Semi-Decentralized Framework for Heterogeneity-Aware Pipeline-parallel Training

The increasing size of both deep learning models and training data necessitates the ability to scale out model training through pipeline-parallel training, which combines pipelined model parallelism and data parallelism. However, most of them assume an ...

LRU-C: Parallelizing Database I/Os for Flash SSDs

The conventional database buffer managers have two inherent sources of I/O serialization: read stall and mutex conflict. The serialized I/O makes storage and CPU under-utilized, limiting transaction throughput and latency. Such harm stands out on flash ...

Why Not Yet: Fixing a Top-k Ranking that is Not Fair to Individuals

This work considers why-not questions in the context of top-k queries and score-based ranking functions. Following the popular linear scalarization approach for multi-objective optimization, we study rankings based on the weighted sum of multiple ...

Subjects

Currently Not Available

Comments