Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

NeuroSketch: Fast and Approximate Evaluation of Range Aggregate Queries with Neural Networks

Published: 30 May 2023 Publication History

Abstract

Range aggregate queries (RAQs) are an integral part of many real-world applications, where, often, fast and approximate answers for the queries are desired. Recent work has studied answering RAQs using machine learning (ML) models, where a model of the data is learned to answer the queries. However, there is no theoretical understanding of why and when the ML based approaches perform well. Furthermore, since the ML approaches model the data, they fail to capitalize on any query specific information to improve performance in practice. In this paper, we focus on modeling "queries" rather than data and train neural networks to learn the query answers. This change of focus allows us to theoretically study our ML approach to provide a distribution and query dependent error bound for neural networks when answering RAQs. We confirm our theoretical results by developing NeuroSketch, a neural network framework to answer RAQs in practice. Extensive experimental study on real-world, TPC-benchmark and synthetic datasets show that NeuroSketch answers RAQs multiple orders of magnitude faster than state-of-the-art and with better accuracy.

Supplemental Material

MP4 File
Presentation video
MP4 File
Presentation video

References

[1]
2020. SafeGraph dataset. https://docs.safegraph.com/v4.0/docs/places-schema#section-patterns. Accessed Dec 29th, 2020.
[2]
2020. Veraset Website. https://www.veraset.com/about-veraset. Accessed: 2020--10--25.
[3]
2021. Parameter Queries (Visual Database Tools). https://docs.microsoft.com/en-us/sql/ssms/visual-db-tools/parameter-queries-visual-database-tools?view=sql-server-ver15. Accessed Jun 30th, 2021.
[4]
2021. Parameterized query. https://node-postgres.com/features/queries. Accessed Jun 30th, 2021.
[5]
2021. Parameterized query. https://docs.data.world/documentation/sql/concepts/dw_specific/parameterized_queries.html. Accessed Jun 30th, 2021.
[6]
2022. Optuna. https://optuna.org/. Accessed Feb 21st, 2022.
[7]
Sameer Agarwal, Barzan Mozafari, Aurojit Panda, Henry Milner, Samuel Madden, and Ion Stoica. 2013. BlinkDB: queries with bounded errors and bounded response times on very large data. In Proceedings of the 8th ACM European Conference on Computer Systems. 29--42.
[8]
Ritesh Ahuja, Sepanta Zeighami, Gabriel Ghinita, and Cyrus Shahabi. 2023. A Neural Approach to Spatio-Temporal Data Release with User-Level Differential Privacy. Proceedings of the 2023 International Conference on Management of Data, SIGMOD '23 (2023). arXiv preprint arXiv:2208.09744.
[9]
Martin Anthony and Peter L. Bartlett. 1999. Neural Network Learning: Theoretical Foundations. Cambridge University Press. https://doi.org/10.1017/CBO9780511624216
[10]
Davis Blalock, Jose Javier Gonzalez Ortiz, Jonathan Frankle, and John Guttag. 2020. What is the state of neural network pruning? arXiv preprint arXiv:2003.03033 (2020).
[11]
Helmut Bolcskei, Philipp Grohs, Gitta Kutyniok, and Philipp Petersen. 2019. Optimal approximation with sparsely connected deep neural networks. SIAM Journal on Mathematics of Data Science 1, 1 (2019), 8--45.
[12]
Surajit Chaudhuri, Gautam Das, and Vivek Narasayya. 2007. Optimized stratified sampling for approximate query processing. ACM Transactions on Database Systems (TODS) 32, 2 (2007), 9--es.
[13]
Graham Cormode, Minos Garofalakis, Peter J. Haas, and Chris Jermaine. 2012. Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches. Found. Trends Databases 4, 1--3 (Jan. 2012), 1--294. https://doi.org/10.1561/1900000004
[14]
Joseph M Hellerstein, Peter J Haas, and Helen J Wang. 1997. Online aggregation. In Proceedings of the 1997 ACM SIGMOD international conference on Management of data. 171--182.
[15]
Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2019. DeepDB: Learn from Data, not from Queries! Proceedings of the VLDB Endowment 13, 7 (2019).
[16]
Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2021. DeepDB Implementation. https://github.com/DataManagementLab/deepdb-public. Accessed May 21th, 2021.
[17]
Xiao Hu, Yuxi Liu, Haibo Xiu, Pankaj K. Agarwal, Debmalya Panigrahi, Sudeepa Roy, and Jun Yang. 2022. Selectivity Functions of Range Queries Are Learnable. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 959--972. https://doi.org/10.1145/3514221.3517896
[18]
Changcun Huang. 2020. ReLU Networks Are Universal Approximators via Piecewise Linear or Constant Functions. Neural Computation 32, 11 (11 2020), 2249--2278. https://doi.org/10.1162/neco_a_01316 arXiv:https://direct.mit.edu/neco/article-pdf/32/11/2249/1865413/neco_a_01316.pdf
[19]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[20]
Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2018. Learned cardinalities: Estimating correlated joins with deep learning. CIDR 2019, 9th Biennial Conference on Innovative Data Systems Research (2018).
[21]
Xuan Liang, Tao Zou, Bin Guo, Shuo Li, Haozhe Zhang, Shuyi Zhang, Hui Huang, and Song Xi Chen. 2015. Assessing Beijing's PM2. 5 pollution: severity, weather impact, APEC and winter heating. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 471, 2182 (2015), 20150257.
[22]
Jianfeng Lu, Zuowei Shen, Haizhao Yang, and Shijun Zhang. 2021. Deep network approximation for smooth functions. SIAM Journal on Mathematical Analysis 53, 5 (2021), 5465--5506.
[23]
Qingzhi Ma and Peter Triantafillou. 2019. Dbest: Revisiting approximate query processing engines with machine learning models. In Proceedings of the 2019 International Conference on Management of Data. 1553--1570.
[24]
Qingzhi Ma and Peter Triantafillou. 2020. DBEst Implementation. https://github.com/qingzma/DBEst_MDN. Accessed Dec 21th, 2020.
[25]
Raghunath Othayoth Nambiar and Meikel Poess. 2006. The Making of TPC-DS (VLDB '06). VLDB Endowment, 1049--1058.
[26]
Yongjoo Park, Barzan Mozafari, Joseph Sorenson, and Junhao Wang. 2018. Verdictdb: Universalizing approximate query processing. In Proceedings of the 2018 International Conference on Management of Data. 1461--1476.
[27]
Yongjoo Park, Barzan Mozafari, Joseph Sorenson, and Junhao Wang. 2021. VerdictDB Implementation. https://github.com/verdict-project/verdict. Accessed Jul 6th, 2021.
[28]
Philipp Petersen and Felix Voigtlaender. 2018. Optimal approximation of piecewise smooth functions using deep ReLU neural networks. Neural Networks 108 (2018), 296--330.
[29]
Allan Pinkus. 1999. Approximation theory of the MLP model in neural networks. Acta numerica 8 (1999), 143--195.
[30]
Douglas A Reynolds. 2009. Gaussian Mixture Models. Encyclopedia of biometrics 741 (2009).
[31]
Rolfe R Schmidt and Cyrus Shahabi. 2002. Propolyne: A fast wavelet-based algorithm for progressive evaluation of polynomial range-sum queries. In International Conference on Extending Database Technology. Springer, 664--681.
[32]
Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding machine learning: From theory to algorithms. Cambridge university press.
[33]
Zuowei Shen, Haizhao Yang, and Shijun Zhang. 2019. Nonlinear approximation via compositions. Neural Networks 119 (2019), 74--84.
[34]
Zuowei Shen, Haizhao Yang, and Shijun Zhang. 2020. Deep Network Approximation Characterized by Number of Neurons. Communications in Computational Physics 28, 5 (2020), 1768--1811. https://doi.org/10.4208/cicp.OA-2020-0149
[35]
Saravanan Thirumuruganathan, Shohedul Hasan, Nick Koudas, and Gautam Das. 2020. Approximate query processing for data exploration using deep generative models. In 2020 IEEE 36th international conference on data engineering (ICDE). IEEE, 1309--1320.
[36]
Peizhi Wu and Gao Cong. 2021. A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation. In Proceedings of the 2021 International Conference on Management of Data. 2009--2022.
[37]
Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: one cardinality estimator for all tables. Proceedings of the VLDB Endowment 14, 1 (2020), 61--73.
[38]
Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep unsupervised cardinality estimation. Proceedings of the VLDB Endowment 13, 3 (2019), 279--292.
[39]
Dmitry Yarotsky. 2017. Error bounds for approximations with deep ReLU networks. Neural Networks 94 (2017), 103--114.
[40]
Dmitry Yarotsky. 2018. Optimal approximation of continuous functions by very deep ReLU networks. In Conference on learning theory. PMLR, 639--649.
[41]
Dmitry Yarotsky and Anton Zhevnerchuk. 2020. The phase diagram of approximation rates for deep neural networks. Advances in neural information processing systems 33 (2020), 13005--13015.
[42]
Yang Ye, Yu Zheng, Yukun Chen, Jianhua Feng, and Xing Xie. 2009. Mining individual life pattern based on location history. In 2009 tenth international conference on mobile data management: Systems, services and middleware. IEEE, 1--10.
[43]
Chulhee Yun, Suvrit Sra, and Ali Jadbabaie. 2019. Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity. In Advances in Neural Information Processing Systems. 15558--15569.
[44]
Sepanta Zeighami, Ritesh Ahuja, Gabriel Ghinita, and Cyrus Shahabi. 2022. A Neural Database for Differentially Private Spatial Range Queries. Proc. VLDB Endow. 15, 5 (jan 2022), 1066--1078. https://doi.org/10.14778/3510397.3510404
[45]
Sepanta Zeighami, Cyrus Shahabi, and Vatsal Sharan. 2022. NeuroSketch: A Neural Network Method for Fast and Approximate Evaluation of Range Aggregate Queries (Technical Report). (2022). https://arxiv.org/abs/2211.10832.
[46]
Sepanta Zeighami, Cyrus Shahabi, and Vatsal Sharan. 2022. NeuroSketch Implementation. https://github.com/szeighami/NeuroSketch.
[47]
Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016).

Cited By

View all
  • (2024)PairwiseHist: Fast, Accurate and Space-Efficient Approximate Query Processing with Data CompressionProceedings of the VLDB Endowment10.14778/3648160.364818117:6(1432-1445)Online publication date: 3-May-2024
  • (2024)ASM: Harmonizing Autoregressive Model, Sampling, and Multi-dimensional Statistics Merging for Cardinality EstimationProceedings of the ACM on Management of Data10.1145/36393002:1(1-27)Online publication date: 26-Mar-2024
  • (2024)A Neural Database for Answering Aggregate Queries on Incomplete Relational Data (Extended Abstract)2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00483(5703-5704)Online publication date: 13-May-2024
  • Show More Cited By

Index Terms

  1. NeuroSketch: Fast and Approximate Evaluation of Range Aggregate Queries with Neural Networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Management of Data
    Proceedings of the ACM on Management of Data  Volume 1, Issue 1
    PACMMOD
    May 2023
    2807 pages
    EISSN:2836-6573
    DOI:10.1145/3603164
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 May 2023
    Published in PACMMOD Volume 1, Issue 1

    Permissions

    Request permissions for this article.

    Author Tags

    1. approximate query processing
    2. machine learning
    3. theory of learned databases

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)311
    • Downloads (Last 6 weeks)41
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)PairwiseHist: Fast, Accurate and Space-Efficient Approximate Query Processing with Data CompressionProceedings of the VLDB Endowment10.14778/3648160.364818117:6(1432-1445)Online publication date: 3-May-2024
    • (2024)ASM: Harmonizing Autoregressive Model, Sampling, and Multi-dimensional Statistics Merging for Cardinality EstimationProceedings of the ACM on Management of Data10.1145/36393002:1(1-27)Online publication date: 26-Mar-2024
    • (2024)A Neural Database for Answering Aggregate Queries on Incomplete Relational Data (Extended Abstract)2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00483(5703-5704)Online publication date: 13-May-2024
    • (2024)Self-adaptive smoothing model for cardinality estimationThe Computer Journal10.1093/comjnl/bxae117Online publication date: 11-Nov-2024
    • (2023)Supporting Pandemic Preparedness with Privacy Enhancing Technology2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA)10.1109/TPS-ISA58951.2023.00014(34-43)Online publication date: 1-Nov-2023
    • (2023)A Framework for Learned Approximate Query Processing for Tabular Data with Trajectory2023 14th International Conference on Information and Communication Technology Convergence (ICTC)10.1109/ICTC58733.2023.10392323(1122-1124)Online publication date: 11-Oct-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media