Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Reflects downloads up to 15 Oct 2024Bibliometrics
Skip Table Of Content Section
research-article
Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systems
Abstract

Modern machine learning (ML) systems commonly use stochastic gradient descent (SGD) to train ML models. However, SGD relies on random data order to converge, which usually requires a full data shuffle. For in-DB ML systems and deep learning ...

research-article
Discovering approximate implicit domain orders through order dependencies
Abstract

Most real-world data come with explicitly defined domain orders, e.g., lexicographic for strings. Our goal is to discover implicit domain orders that we do not already know, e.g., that the order of months in the Chinese Lunar calendar is Corner...

research-article
Data distribution tailoring revisited: cost-efficient integration of representative data
Abstract

Data scientists often develop data sets for analysis by drawing upon available data sources. A major challenge is ensuring that the data set used for analysis adequately represents relevant demographic groups or other variables. Whether data is ...

research-article
Lero: applying learning-to-rank in query optimizer
Abstract

In recent studies, machine learning techniques have been employed to support or enhance cost-based query optimizers in DBMS. Although these approaches have shown superiority in certain benchmarks, they also suffer from certain drawbacks. These ...

research-article
Hyper-distance oracles in hypergraphs
Abstract

We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer s, which defines the required level of overlap for two hyperedges to be considered adjacent. To answer s-distance queries, we first ...

research-article
Efficient cryptanalysis of an encrypted database supporting data interoperability
Abstract

In an encrypted database, all data items stored at the server are encrypted and some operations can be performed directly over ciphertexts. Most existing encrypted database schemes cannot support data interoperability, that is, it cannot handle ...

research-article
Similarity-driven and task-driven models for diversity of opinion in crowdsourcing markets
Abstract

The recent boom in crowdsourcing has opened up a new avenue for utilizing human intelligence in the realm of data analysis. This innovative approach provides a powerful means for connecting online workers to tasks that cannot effectively be done ...

research-article
Efficient algorithms for reachability and path queries on temporal bipartite graphs
Abstract

Bipartite graphs are naturally used to model relationships between two types of entities, such as people-location, user-post, and investor-stock. When modeling real-world applications like disease outbreaks, edges are often enriched with temporal ...

research-article
Efficient and effective algorithms for densest subgraph discovery and maintenance
Abstract

The densest subgraph problem (DSP) is of great significance due to its wide applications in different domains. Meanwhile, diverse requirements in various applications lead to different density variants for DSP. Unfortunately, existing DSP ...

research-article
Parallelization of butterfly counting on hierarchical memory
Abstract

Butterfly (a cyclic graph motif) counting is a fundamental task with many applications in graph analysis, which aims at computing the number of butterflies in a large graph. With the rapid growth of graph data, it is more and more challenging to ...

research-article
A survey on hybrid transactional and analytical processing
Abstract

To provide applications with the ability to analyze fresh data and eliminate the time-consuming ETL workflow, hybrid transactional and analytical (HTAP) systems have been developed to serve online transaction processing and online analytical ...

research-article
Minimum motif-cut: a workload-aware RDF graph partitioning strategy
Abstract

In designing a distributed RDF system, it is quite common to divide an RDF graph into subgraphs, called partitions, which are then distributed. Graph partitioning in general and RDF graph partitioning in particular are challenging problems. In ...

research-article
GPU-based butterfly counting
Abstract

When dealing with large bipartite graphs, butterfly counting is a crucial and time-consuming operation. Graphics processing units (GPUs) are widely used parallel heterogeneous devices that can significantly boost performance for data science ...

research-article
Flexible grouping of linear segments for highly accurate lossy compression of time series data
Abstract

Approximating a series of timestamped data points through a sequence of line segments with a maximum error guarantee is a fundamental data compression problem, termed as Piecewise Linear Approximation (PLA). As the demand for analyzing large ...

research-article
Survey of vector database management systems
Abstract

There are now over 20 commercial vector database management systems (VDBMSs), all produced within the past five years. But embedding-based retrieval has been studied for over ten years, and similarity search a staggering half century and more. ...

research-article
FedST: secure federated shapelet transformation for time series classification
Abstract

This paper explores how to build a shapelet-based time series classification (TSC) model in the federated learning (FL) scenario, that is, using more data from multiple owners without actually sharing the data. We propose FedST, a novel federated ...

research-article
FlexpushdownDB: rethinking computation pushdown for cloud OLAP DBMSs
Abstract

Modern cloud-native OLAP databases adopt a storage-disaggregation architecture that separates the management of computation and storage. A major bottleneck in such an architecture is the network connecting the computation and storage layers. ...

research-article
Open benchmark for filtering techniques in entity resolution
Abstract

Entity Resolution identifies entity profiles that represent the same real-world object. A brute-force approach that considers all pairs of entities suffers from quadratic time complexity. To ameliorate this issue, filtering techniques reduce the ...

research-article
WavingSketch: an unbiased and generic sketch for finding top-k items in data streams
Abstract

Finding top-k items in data streams is a fundamental problem in data mining. Unbiased estimation is well acknowledged as an elegant and important property for top-k algorithms. In this paper, we propose a novel sketch algorithm, called ...

research-article
FICOM: an effective and scalable active learning framework for GNNs on semi-supervised node classification
Abstract

Active learning for graph neural networks (GNNs) aims to select B nodes to label for the best possible GNN performance. Carefully selected labeled nodes can help improve GNN performance and hence motivates a line of research works. Unfortunately, ...

research-article
AutoCTS++: zero-shot joint neural architecture and hyperparameter search for correlated time series forecasting
Abstract

Sensors in cyber-physical systems often capture interconnected processes and thus emit correlated time series (CTS), the forecasting of which enables important applications. Recent deep learning based forecasting methods show strong capabilities ...

Subjects

Comments