Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJune 2016
Automatic Entity Recognition and Typing in Massive Text Data
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataJune 2016, Pages 2235–2239https://doi.org/10.1145/2882903.2912567In today's computerized and information-based society, individuals are constantly presented with vast amounts of text data, ranging from news articles, scientific publications, product reviews, to a wide range of textual information from social media. ...
- research-articleJune 2016
ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataJune 2016, Pages 2117–2120https://doi.org/10.1145/2882903.2899409Databases can be corrupted with various errors such as missing, incorrect, or inconsistent values. Increasingly, modern data analysis pipelines involve Machine Learning, and the effects of dirty data can be difficult to debug.Dirty data is often sparse, ...
- research-articleJune 2016
CLAMS: Bringing Quality to Data Lakes
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataJune 2016, Pages 2089–2092https://doi.org/10.1145/2882903.2899391With the increasing incentive of enterprises to ingest as much data as they can in what is commonly referred to as "data lakes", and with the recent development of multiple technologies to support this "load-first" paradigm, the new environment presents ...
- research-articleJune 2016
QFix: Demonstrating Error Diagnosis in Query Histories
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataJune 2016, Pages 2177–2180https://doi.org/10.1145/2882903.2899388An increasing number of applications in all aspects of society rely on data. Despite the long line of research in data cleaning and repairs, data correctness has been an elusive goal. Errors in the data can be extremely disruptive, and are detrimental ...
- research-articleJune 2016
PrivateClean: Data Cleaning and Differential Privacy
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataJune 2016, Pages 937–951https://doi.org/10.1145/2882903.2915248Recent advances in differential privacy make it possible to guarantee user privacy while preserving the main characteristics of the data. However, most differential privacy mechanisms assume that the underlying dataset is clean. This paper explores the ...
-
- research-articleJune 2016
Interactive and Deterministic Data Cleaning
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataJune 2016, Pages 893–907https://doi.org/10.1145/2882903.2915242We present Falcon, an interactive, deterministic, and declarative data cleaning system, which uses SQL update queries as the language to repair data. Falcon does not rely on the existence of a set of pre-defined data quality rules. On the contrary, it ...
- research-articleJune 2016
Sequential Data Cleaning: A Statistical Approach
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataJune 2016, Pages 909–924https://doi.org/10.1145/2882903.2915233Errors are prevalent in data sequences, such as GPS trajectories or sensor readings. Existing methods on cleaning sequential data employ a constraint on value changing speeds and perform constraint-based repairing. While such speed constraints are ...
- research-articleJune 2016
Extracting Databases from Dark Data with DeepDive
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataJune 2016, Pages 847–859https://doi.org/10.1145/2882903.2904442DeepDive is a system for extracting relational databases from dark data: the mass of text, tables, and images that are widely collected and stored but which cannot be exploited by standard relational tools. If the information in dark data --- scientific ...
- research-articleJune 2016
SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataJune 2016, Pages 281–293https://doi.org/10.1145/2882903.2882957We analyze the workload from a multi-year deployment of a database-as-a-service platform targeting scientists and data scientists with minimal database experience. Our hypothesis was that relatively minor changes to the way databases are delivered can ...
- research-articleJune 2016
Automatic Generation of Normalized Relational Schemas from Nested Key-Value Data
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataJune 2016, Pages 295–310https://doi.org/10.1145/2882903.2882924Self-describing key-value data formats such as JSON are becoming increasingly popular as application developers choose to avoid the rigidity imposed by the relational model. Database systems designed for these self-describing formats, such as MongoDB, ...
- research-articleJune 2016
Learning-Based Cleansing for Indoor RFID Data
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataJune 2016, Pages 925–936https://doi.org/10.1145/2882903.2882907RFID is widely used for object tracking in indoor environments, e.g., airport baggage tracking. Analyzing RFID data offers insight into the underlying tracking systems as well as the associated business processes. However, the inherent uncertainty in ...