Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Volume 10, Issue 11August 2017
Editor:
Publisher:
  • VLDB Endowment
ISSN:2150-8097
Reflects downloads up to 08 Feb 2025Bibliometrics
research-article
Memory management techniques for large-scale persistent-main-memory systems

Storage Class Memory (SCM) is a novel class of memory technologies that promise to revolutionize database architectures. SCM is byte-addressable and exhibits latencies similar to those of DRAM, while being non-volatile. Hence, SCM could replace both ...

research-article
Trajectory similarity join in spatial networks

The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider the case of trajectory similarity join (TS-Join), where the objects are trajectories of vehicles moving in road networks. Thus,...

research-article
HoloClean: holistic data repairs with probabilistic inference

We introduce HoloClean, a framework for holistic data repairing driven by probabilistic inference. HoloClean unifies qualitative data repairing, which relies on integrity constraints or external data sources, with quantitative data repairing methods, ...

research-article
Caribou: intelligent distributed storage

The ever increasing amount of data being handled in data centers causes an intrinsic inefficiency: moving data around is expensive in terms of bandwidth, latency, and power consumption, especially given the low computational complexity of many database ...

research-article
Towards linear algebra over normalized data

Providing machine learning (ML) over relational data is a mainstream requirement for data analytics systems. While almost all ML tools require the input data to be presented as a single table, many datasets are multi-table. This forces data scientists ...

research-article
Comparative evaluation of big-data systems on scientific image analytics workloads

Scientific discoveries are increasingly driven by analyzing large volumes of image data. Many new libraries and specialized database management systems (DBMSs) have emerged to support such tasks. It is unclear how well these systems support real-world ...

research-article
Revenue maximization in incentivized social advertising

Incentivized social advertising, an emerging marketing model, provides monetization opportunities not only to the owners of the social networking platforms but also to their influential users by offering a "cut" on the advertising revenue. We consider a ...

research-article
SquirrelJoin: network-aware distributed join processing with lazy partitioning

To execute distributed joins in parallel on compute clusters, systems partition and exchange data records between workers. With large datasets, workers spend a considerable amount of time transferring data over the network. When compute clusters are ...

research-article
I've seen "enough": incrementally improving visualizations to support rapid decision making

Data visualization is an effective mechanism for identifying trends, insights, and anomalies in data. On large datasets, however, generating visualizations can take a long time, delaying the extraction of insights, hampering decision making, and ...

research-article
Minimal on-road time route scheduling on time-dependent graphs

On time-dependent graphs, fastest path query is an important problem and has been well studied. It focuses on minimizing the total travel time (waiting time + on-road time) but does not allow waiting on any intermediate vertex if the FIFO property is ...

research-article
A holistic view of stream partitioning costs

Stream processing has become the dominant processing model for monitoring and real-time analytics. Modern Parallel Stream Processing Engines (pSPEs) have made it feasible to increase the performance in both monitoring and analytical queries by ...

research-article
Truss-based community search: a truss-equivalence based indexing approach

We consider the community search problem defined upon a large graph G: given a query vertex q in G, to find as output all the densely connected subgraphs of G, each of which contains the query v. As an online, query-dependent variant of the well-known ...

research-article
Query optimization for dynamic imputation

Missing values are common in data analysis and present a usability challenge. Users are forced to pick between removing tuples with missing values or creating a cleaned version of their data by applying a relatively expensive imputation strategy. Our ...

research-article
In search of an entity resolution OASIS: optimal asymptotic sequential importance sampling

Entity resolution (ER) presents unique challenges for evaluation methodology. While crowdsourcing platforms acquire ground truth, sound approaches to sampling must drive labelling efforts. In ER, extreme class imbalance between matching and non-matching ...

research-article
Flexible online task assignment in real-time spatial data

The popularity of Online To Offline (O2O) service platforms has spurred the need for online task assignment in real-time spatial data, where streams of spatially distributed tasks and workers are matched in real time such that the total number of ...

research-article
A forward scan based plane sweep algorithm for parallel interval joins

The interval join is a basic operation that finds application in temporal, spatial, and uncertain databases. Although a number of centralized and distributed algorithms have been proposed for the efficient evaluation of interval joins, classic plane ...

research-article
ASAP: prioritizing attention via time series smoothing

Time series visualization of streaming telemetry (i.e., charting of key metrics such as server load over time) is increasingly prevalent in modern data platforms and applications. However, many existing systems simply plot the raw data streams as they ...

research-article
Knowledge verification for long-tail verticals

Collecting structured knowledge for real-world entities has become a critical task for many applications. A big gap between the knowledge in existing knowledge repositories and the knowledge in the real world is the knowledge on tail verticals (i.e., ...

research-article
SkyGraph: retrieving regions of interest using skyline subgraph queries

Several services today are annotated with points of interest (PoIs) such as "coffee shop", "park", etc. A region of interest (RoI) is a neighborhood that contains PoIs relevant to the user. In this paper, we study the scenario where a user wants to ...

research-article
Reverse engineering aggregation queries

Query reverse engineering seeks to re-generate the SQL query that produced a given query output table from a given database. In this paper, we solve this problem for OLAP queries with group-by and aggregation. We develop a novel three-phase algorithm ...

research-article
LDA*: a robust and large-scale topic modeling system

We present LDA*, a system that has been deployed in one of the largest Internet companies to fulfil their requirements of "topic modeling as an internal service"---relying on thousands of machines, engineers in different sectors submit their data, some ...

research-article
Social hash partitioner: a scalable distributed hypergraph partitioner

We design and implement a distributed algorithm for balanced k-way hypergraph partitioning that minimizes fanout, a fundamental hypergraph quantity also known as the communication volume and (k - 1)-cut metric, by optimizing a novel objective called ...

research-article
On sampling from massive graph streams

We propose Graph Priority Sampling (gps), a new paradigm for order-based reservoir sampling from massive graph streams. gps provides a general way to weight edge sampling according to auxiliary and/or size variables so as to accomplish various ...

research-article
Pyramid sketch: a sketch framework for frequency estimation of data streams

Sketch is a probabilistic data structure, and is used to store and query the frequency of any item in a given multiset. Due to its high memory efficiency, it has been applied to various fields in computer science, such as stream database, network ...

research-article
Reconciling skyline and ranking queries

Traditionally, skyline and ranking queries have been treated separately as alternative ways of discovering interesting data in potentially large datasets. While ranking queries adopt a specific scoring function to rank tuples, skyline queries return the ...

research-article
CleanM: an optimizable query language for unified scale-out data cleaning

Data cleaning has become an indispensable part of data analysis due to the increasing amount of dirty data. Data scientists spend most of their time preparing dirty data before it can be used for data analysis. At the same time, the existing tools that ...

research-article
Distributed trajectory similarity search

Mobile and sensing devices have already become ubiquitous. They have made tracking moving objects an easy task. As a result, mobile applications like Uber and many IoT projects have generated massive amounts of trajectory data that can no longer be ...

research-article
Runtime optimization of join location in parallel data management systems

Applications running on parallel systems often need to join a streaming relation or a stored relation with data indexed in a parallel data storage system. Some applications also compute UDFs on the joined tuples. The join can be done at the data storage ...

research-article
Stitching web tables for improving matching quality

HTML tables on web pages ("web tables") cover a wide variety of topics. Data from web tables can thus be useful for tasks such as knowledge base completion or ad hoc table extension. Before table data can be used for these tasks, the tables must be ...

research-article
DigitHist: a histogram-based data summary with tight error bounds

We propose DigitHist, a histogram summary for selectivity estimation on multi-dimensional data with tight error bounds. By combining multi-dimensional and one-dimensional histograms along regular grids of different resolutions, DigitHist provides an ...

Subjects

Currently Not Available

Comments