Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJune 2021
Correlation Sketches for Approximate Join-Correlation Queries
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 1531–1544https://doi.org/10.1145/3448016.3458456The increasing availability of structured datasets, from Web tables and open-data portals to enterprise data, opens up opportunities to enrich analytics and improve machine learning models through relational data augmentation. In this paper, we ...
- research-articleJune 2021
Allign: Aligning All-Pair Near-Duplicate Passages in Long Texts
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 541–553https://doi.org/10.1145/3448016.3457548In this paper, we study the problem of aligning all-pair near-duplicate passages in two long texts. A passage is a sequence of consecutive words in a text. It can begin and end with any word in the text, whether around a period or not. Due to the high ...
- research-articleJune 2021
Efficient Deep Learning Pipelines for Accurate Cost Estimations Over Large Scale Query Workload
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 1014–1022https://doi.org/10.1145/3448016.3457546The use of deep learning models for forecasting the resource consumption patterns of SQL queries have recently been a popular area of study. While these models have demonstrated promising accuracy, training them over large scale industry workloads are ...
- tutorialJune 2021
A Deep Dive into Deep Learning Approaches for Text-to-SQL Systems
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 2846–2851https://doi.org/10.1145/3448016.3457543Data is a prevalent part of every business and scientific domain,but its explosive volume and increasing complexity make data querying challenging even for experts. For this reason, numerous text-to-SQL systems have been developed that enable querying ...
- research-articleJune 2021
Adaptive Rule Discovery for Labeling Text Data
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 2217–2225https://doi.org/10.1145/3448016.3457334Creating and collecting labeled data is one of the major bottlenecks in machine learning pipelines and the emergence of automated feature generation techniques such as deep learning, which typically requires a lot of training data, has further ...
-
- research-articleJune 2021
Fast Processing and Querying of 170TB of Genomics Data via a Repeated And Merged BloOm Filter (RAMBO)
- Gaurav Gupta,
- Minghao Yan,
- Benjamin Coleman,
- Bryce Kille,
- R. A. Leo Elworth,
- Tharun Medini,
- Todd Treangen,
- Anshumali Shrivastava
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 2226–2234https://doi.org/10.1145/3448016.3457333DNA sequencing, especially of microbial genomes and metagenomes, has been at the core of recent research advances in large-scale comparative genomics. The data deluge has resulted in exponential growth in genomic datasets over the past years and has ...
- research-articleJune 2021
Proportionality in Spatial Keyword Search
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 885–897https://doi.org/10.1145/3448016.3457309More often than not, spatial objects are associated with some context, in the form of text, descriptive tags (e.g. points of interest, flickr photos), or linked entities in semantic graphs (e.g. Yago2, DBpedia). Hence, location-based retrieval should be ...
- research-articleJune 2021
Marrying Top-k with Skyline Queries: Relaxing the Preference Input while Producing Output of Controllable Size
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 1317–1330https://doi.org/10.1145/3448016.3457299The two most common paradigms to identify records of preference in a multi-objective setting rely either on dominance (e.g., the skyline operator) or on a utility function defined over the records' attributes (typically, using a top-k query). Despite ...
- research-articleJune 2021
PathEnum: Towards Real-Time Hop-Constrained s-t Path Enumeration
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 1758–1770https://doi.org/10.1145/3448016.3457290We study the hop-constrained s-t path enumeration (HcPE ) problem, which takes a graph G, two distinct vertices s,t and a hop constraint k as input, and outputs all paths from s to t whose length is at most k. The state-of-the-art algorithms suffer from ...
- research-articleJune 2021
TENET: Joint Entity and Relation Linking with Coherence Relaxation
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 1142–1155https://doi.org/10.1145/3448016.3457280The joint entity and relation linking task aims to connect the noun phrases (resp., relational phrases) extracted from natural language documents to the entities (resp., predicates) in general knowledge bases (KBs). This task benefits numerous ...
- research-articleJune 2021
A-Tree: A Dynamic Data Structure for Efficiently Indexing Arbitrary Boolean Expressions
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 817–829https://doi.org/10.1145/3448016.3457266Efficiently evaluating a large number of arbitrary Boolean expressions is needed in many applications such as advertising exchanges, complex event processing, and publish/subscribe systems. However, most solutions can support only conjunctive Boolean ...
- research-articleJune 2021
Versatile Equivalences: Speeding up Subgraph Query Processing and Subgraph Matching
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 925–937https://doi.org/10.1145/3448016.3457265Subgraph query processing (also known as subgraph search) and subgraph matching are fundamental graph problems in many application domains. A lot of efforts have been made to develop practical solutions for these problems. Despite the efforts, existing ...
- research-articleJune 2021
Shedding Light on Opaque Application Queries
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 912–924https://doi.org/10.1145/3448016.3457252We investigate a new query reverse-engineering problem of unmasking SQL queries hidden within database applications. The diverse use-cases for this problem range from resurrecting legacy code to query rewriting. As a first step in addressing the ...
- research-articleJune 2021
Bidirectionally Densifying LSH Sketches with Empty Bins
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 830–842https://doi.org/10.1145/3448016.3452833As an efficient tool for approximate similarity computation and search, Locality Sensitive Hashing (LSH) has been widely used in many research areas including databases, data mining, information retrieval, and machine learning. Classical LSH methods ...
- research-articleJune 2021
On m-Impact Regions and Standing Top-k Influence Problems
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 1784–1796https://doi.org/10.1145/3448016.3452832In this paper, we study the m-impact region problem (mIR). In a context where users look for available products with top-k queries, mIR identifies the part of the product space that attracts the most user attention. Specifically, mIR determines the kind ...
- research-articleJune 2021
Why Not Match: On Explanations of Event Pattern Queries
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 1705–1717https://doi.org/10.1145/3448016.3452818Queries over event data are posed in a form of event patterns, for example, to retrieve the flights from IAH to LGA without a stopover. If the expected answer is not returned, one may ask why not, also known as explanations of non-answers. Analogous to ...
- short-paperJune 2021
QuTE: Answering Quantity Queries from Web Tables
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 2740–2744https://doi.org/10.1145/3448016.3452763Quantities are financial, technological, physical and other measures that denote relevant properties of entities, such as revenue of companies, energy efficiency of cars or distance and brightness of stars and galaxies. Queries with filter conditions on ...
- short-paperJune 2021
INCA: Inconsistency-Aware Data Profiling and Querying
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 2745–2749https://doi.org/10.1145/3448016.3452760When exploring and querying inconsistent data, inconsistency measures referring to constraint violations can help the user to quantify the quality of the underlying data and query results. We showcase INCA, a system that allows the user to execute data ...
- short-paperJune 2021
Crosstown Foundry: A Scalable Data-driven Journalism Platform for Hyper-local News
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 2765–2769https://doi.org/10.1145/3448016.3452751Generating hyper-local news at scale is challenging because publicly available data is not provided at the desired spatial and temporal granularity. Besides, there is a lack of automated analytical and publishing tools. Crosstown Foundry, which is being ...
- short-paperJune 2021
Boomerang: Proactive Insight-Based Recommendations for Guiding Conversational Data Analysis
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataPages 2750–2754https://doi.org/10.1145/3448016.3452748Natural-language interfaces are gaining popularity due to their potential to democratize access to data and insights by making the interaction with data more natural and accessible for a wide range of business users. To fully embrace the goal of ...