Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2588555.2594532acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
demonstration

ABS: a system for scalable approximate queries with accuracy guarantees

Published: 18 June 2014 Publication History
  • Get Citation Alerts
  • Abstract

    Approximate Query Processing (AQP) based on sampling is critical for supporting timely and cost-effective analytics over big data. To be applied successfully, AQP must be accompanied by reliable estimates on the quality of sample-produced approximate answers; the two main techniques used in the past for this purpose are (i) closed-form analytic error estimation, and (ii) the bootstrap method. Approach (i) is extremely efficient but lacks generality, whereas (ii) is general but suffers from high computational overhead. Our recently introduced Analytical Bootstrap method combines the strengths of both approaches and provides the basis for our ABS system, which will be demonstrated at the conference. The ABS system models bootstrap by a probabilistic relational model, and extends relational algebra with operations on probabilistic relations to predict the distributions of the AQP results. Thus, ABS entails a very fast computation of bootstrap-based quality measures for a general class of SQL queries, which is several orders of magnitude faster than the standard simulation-based bootstrap. In this demo, we will demonstrate the generality, automaticity, and ease of use of the ABS system, and its superior performance over the traditional approaches described above.

    References

    [1]
    Apache Hive Project. https://hive.apache.org/.
    [2]
    Shark Project. http://shark.cs.berkeley.edu/.
    [3]
    TPC-H Benchmark. http://www.tpc.org/tpch/.
    [4]
    S. Acharya, P. B. Gibbons, et al. The Aqua Approximate Query Answering System. In SIGMOD, pages 574--576, 1999.
    [5]
    S. Agarwal, B. Mozafari, et al. BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data. In EuroSys, pages 29--42, 2013.
    [6]
    B. Babcock, S. Chaudhuri, et al. Dynamic Sample Selection for Approximate Query Processing. In SIGMOD, pages 539--550, 2003.
    [7]
    B. Babcock, M. Datar, et al. Load Shedding for Aggregation Queries over Data Streams. In ICDE, page 350, 2004.
    [8]
    M. Charikar, S. Chaudhuri, et al. Towards Estimation Error Guarantees for Distinct Values. In PODS, pages 268--279, 2000.
    [9]
    S. Chaudhuri, G. Das, et al. Optimized stratified sampling for approximate query processing. TODS, 32(2):9, 2007.
    [10]
    B. Efron and R. J. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall, New York, 1993.
    [11]
    J. M. Hellerstein, P. J. Haas, et al. Online Aggregation. In SIGMOD, pages 171--182, 1997.
    [12]
    Y. Hu, S. Sundara, et al. Estimating Aggregates in Time-Constrained Approximate Queries in Oracle. In EDBT, pages 1104--1107, 2009.
    [13]
    C. Jermaine, S. Arumugam, et al. Scalable Approximate Query Processing with DBO Engine. In SIGMOD, pages 1--54, 2007.
    [14]
    G. Karvounarakis and T. J. Green. Semiring-Annotated Data: Queries and Provenance? SIGMOD Record, 41(3):5--14, 2012.
    [15]
    A. Kleiner, A. Talwalkar, et al. A General Bootstrap Performance Diagnostic. In KDD, pages 419--427, 2013.
    [16]
    N. Laptev, K. Zeng, et al. Early Accurate Results for Advanced Analytics on MapReduce. PVLDB, 5(10):1028--1039, 2012.
    [17]
    B. Mozafari and C. Zaniolo. Optimal Load Shedding with Aggregates and Mining Queries. In ICDE, pages 76--88, 2010.
    [18]
    C. Olston, E. Bortnikov, et al. Interactive Analysis of Web-Scale Data. In CIDR, 2009.
    [19]
    N. Pansare, V. R. Borkar, et al. Online Aggregation for Large MapReduce Jobs. PVLDB, 4(11):1135--1145, 2011.
    [20]
    A. Pol and C. Jermaine. Relational Confidence Bounds Are Easy With The Bootstrap. In SIGMOD, pages 587--598, 2005.
    [21]
    A. van der Vaart and J. Wellner. Weak Convergence and Empirical Processes. Springer, corrected edition, Nov. 2000.
    [22]
    S. Wu, B. C. Ooi, et al. Continuous Sampling for Online Aggregation over Multiple Queries. In SIGMOD, pages 651--662, 2010.
    [23]
    K. Zeng, S. Gao, et al. The Analytical Bootstrap: a New Method for Fast Error Estimation in Approximate Query Processing. In SIGMOD, 2014.

    Cited By

    View all
    • (2022)Approximate Query Processing with Error GuaranteesBig-Data-Analytics in Astronomy, Science, and Engineering10.1007/978-3-030-96600-3_20(268-278)Online publication date: 18-Feb-2022
    • (2021)SEIZE: Runtime Inspection for Parallel Dataflow SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.303517032:4(842-854)Online publication date: 1-Apr-2021
    • (2020)Turbocharging Geospatial Visualization Dashboards via a Materialized Sampling Cube Approach2020 IEEE 36th International Conference on Data Engineering (ICDE)10.1109/ICDE48307.2020.00105(1165-1176)Online publication date: Apr-2020
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
    June 2014
    1645 pages
    ISBN:9781450323765
    DOI:10.1145/2588555
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 June 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. approximate query processing
    2. bootstrap
    3. error estimation

    Qualifiers

    • Demonstration

    Conference

    SIGMOD/PODS'14
    Sponsor:

    Acceptance Rates

    SIGMOD '14 Paper Acceptance Rate 107 of 421 submissions, 25%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Approximate Query Processing with Error GuaranteesBig-Data-Analytics in Astronomy, Science, and Engineering10.1007/978-3-030-96600-3_20(268-278)Online publication date: 18-Feb-2022
    • (2021)SEIZE: Runtime Inspection for Parallel Dataflow SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.303517032:4(842-854)Online publication date: 1-Apr-2021
    • (2020)Turbocharging Geospatial Visualization Dashboards via a Materialized Sampling Cube Approach2020 IEEE 36th International Conference on Data Engineering (ICDE)10.1109/ICDE48307.2020.00105(1165-1176)Online publication date: Apr-2020
    • (2019)Gapprox: using Gallup approach for approximation in Big Data processingJournal of Big Data10.1186/s40537-019-0185-46:1Online publication date: 26-Feb-2019
    • (2019)Wander Join and XDBACM Transactions on Database Systems10.1145/328455144:1(1-41)Online publication date: 29-Jan-2019
    • (2019)Statistical algorithmic profiling for randomized approximate programsProceedings of the 41st International Conference on Software Engineering10.1109/ICSE.2019.00071(608-618)Online publication date: 25-May-2019
    • (2019)CS*: Approximate Query Processing on Big Data using Scalable Join Correlated Sample Synopsis2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9006440(583-592)Online publication date: Dec-2019
    • (2019)SnappyDataEncyclopedia of Big Data Technologies10.1007/978-3-319-77525-8_258(1522-1531)Online publication date: 20-Feb-2019
    • (2018)VerdictDBProceedings of the 2018 International Conference on Management of Data10.1145/3183713.3196905(1461-1476)Online publication date: 27-May-2018
    • (2018)Demonstration of VerdictDB, the Platform-Independent AQP SystemProceedings of the 2018 International Conference on Management of Data10.1145/3183713.3193538(1665-1668)Online publication date: 27-May-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media