Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleMarch 2023
Eastwood-Tidy: C Linting for Automated Code Style Assessment in Programming Courses
SIGCSE 2023: Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1March 2023, Pages 799–805https://doi.org/10.1145/3545945.3569817Computer Science students receive significant instruction towards writing functioning code that correctly satisfies requirements. Auto-graders have been shown effective at scalably running student code and determining whether the code correctly ...
- proceedingFebruary 2023
- ArticleJanuary 2023
A Survey of Data Challenges Across a Modernizing Bureaucracy: A New Perspective on Examining Old Government Problems
Heterogeneous Data Management, Polystores, and Analytics for HealthcareSep 2022, Pages 10–23https://doi.org/10.1007/978-3-031-23905-2_2AbstractThe introduction and increasing popularity of artificial intelligence (AI) and machine learning (ML) technologies allow organizations to gain valuable insights from their copious amounts of data. However, legacy organizations often struggle to ...
- proceedingAugust 2021
- research-articleJuly 2021
DICE: data discovery by example
- El Kindi Rezig,
- Anshul Bhandari,
- Anna Fariha,
- Benjamin Price,
- Allan Vanterpool,
- Vijay Gadepally,
- Michael Stonebraker
Proceedings of the VLDB Endowment (PVLDB), Volume 14, Issue 12Pages 2819–2822https://doi.org/10.14778/3476311.3476353In order to conduct analytical tasks, data scientists often need to find relevant data from an avalanche of sources (e.g., data lakes, large organizational databases). This effort is typically made in an ad hoc, non-systematic manner, which makes it a ...
- research-articleJuly 2021
Horizon: scalable dependency-driven data cleaning
- El Kindi Rezig,
- Mourad Ouzzani,
- Walid G. Aref,
- Ahmed K. Elmagarmid,
- Ahmed R. Mahmood,
- Michael Stonebraker
Proceedings of the VLDB Endowment (PVLDB), Volume 14, Issue 11Pages 2546–2554https://doi.org/10.14778/3476249.3476301A large class of data repair algorithms rely on integrity constraints to detect and repair errors. A well-studied class of constraints is Functional Dependencies (FDs, for short). Although there has been an increased interest in developing general data ...
- ArticleSeptember 2020
Towards Data Discovery by Example
- El Kindi Rezig,
- Allan Vanterpool,
- Vijay Gadepally,
- Benjamin Price,
- Michael Cafarella,
- Michael Stonebraker
Heterogeneous Data Management, Polystores, and Analytics for HealthcareSep 2020, Pages 66–71https://doi.org/10.1007/978-3-030-71055-2_6AbstractData scientists today have to query an avalanche of multi-source data (e.g., data lakes, company databases) for diverse analytical tasks. Data discovery is labor-intensive as users have to find the right tables, and the combination thereof to ...
- research-articleAugust 2020
Debugging large-scale data science pipelines using dagger
- El Kindi Rezig,
- Ashrita Brahmaroutu,
- Nesime Tatbul,
- Mourad Ouzzani,
- Nan Tang,
- Timothy Mattson,
- Samuel Madden,
- Michael Stonebraker
Proceedings of the VLDB Endowment (PVLDB), Volume 13, Issue 12Pages 2993–2996https://doi.org/10.14778/3415478.3415527Data pipelines are the new code. Consequently, data scientists need new tools to support the often time-consuming process of debugging their pipelines. We introduce Dagger, an end-to-end system to debug and mitigate data-centric errors in data pipelines,...
- research-articleAugust 2019
Data Civilizer 2.0: a holistic framework for data preparation and analytics
- El Kindi Rezig,
- Lei Cao,
- Michael Stonebraker,
- Giovanni Simonini,
- Wenbo Tao,
- Samuel Madden,
- Mourad Ouzzani,
- Nan Tang,
- Ahmed K. Elmagarmid
Proceedings of the VLDB Endowment (PVLDB), Volume 12, Issue 12Pages 1954–1957https://doi.org/10.14778/3352063.3352108Data scientists spend over 80% of their time (1) parameter-tuning machine learning models and (2) iterating between data cleaning and machine learning model execution. While there are existing efforts to support the first requirement, there is currently ...
- research-articleJuly 2019
Towards an End-to-End Human-Centric Data Cleaning Framework
HILDA '19: Proceedings of the Workshop on Human-In-the-Loop Data AnalyticsJuly 2019, Article No.: 1, Pages 1–7https://doi.org/10.1145/3328519.3329133Data Cleaning refers to the process of detecting and fixing errors in the data. Human involvement is instrumental at several stages of this process such as providing rules or validating computed repairs. There is a plethora of data cleaning algorithms ...
- research-articleAugust 2015
Tornado: a distributed spatio-textual stream processing system
- Ahmed R. Mahmood,
- Ahmed M. Aly,
- Thamir Qadah,
- El Kindi Rezig,
- Anas Daghistani,
- Amgad Madkour,
- Ahmed S. Abdelhamid,
- Mohamed S. Hassan,
- Walid G. Aref,
- Saleh Basalamah
Proceedings of the VLDB Endowment (PVLDB), Volume 8, Issue 12Pages 2020–2023https://doi.org/10.14778/2824032.2824126The widespread use of location-aware devices together with the increased popularity of micro-blogging applications (e.g., Twitter) led to the creation of large streams of spatio-textual data. In order to serve real-time applications, the processing of ...
- articleJuly 2013
Leveraging human experts' knowledge to detect and publish compositions of Semantic Web services in a repository
International Journal of Business Information Systems (IJBIS), Volume 14, Issue 1July 2013, Pages 83–95https://doi.org/10.1504/IJBIS.2013.055548Web services have added a considerable abstraction level to interact with applications regardless of their environment. Semantic Web services have augmented web services with rigorous models to describe web services' functionalities and how they ...
- demonstrationJune 2011
U-MAP: a system for usage-based schema matching and mapping
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of dataJune 2011, Pages 1287–1290https://doi.org/10.1145/1989323.1989478This demo shows how usage information buried in query logs can play a central role in data integration and data exchange. More specifically, our system U-Map uses query logs to generate correspondences between the attributes of two different schemas and ...