research-article

Open access

NeuroSketch: Fast and Approximate Evaluation of Range Aggregate Queries with Neural Networks

Authors:

Sepanta Zeighami,

Vatsal SharanAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 1, Issue 1

Article No.: 100, Pages 1 - 26

https://doi.org/10.1145/3588954

Published: 30 May 2023 Publication History

Abstract

Range aggregate queries (RAQs) are an integral part of many real-world applications, where, often, fast and approximate answers for the queries are desired. Recent work has studied answering RAQs using machine learning (ML) models, where a model of the data is learned to answer the queries. However, there is no theoretical understanding of why and when the ML based approaches perform well. Furthermore, since the ML approaches model the data, they fail to capitalize on any query specific information to improve performance in practice. In this paper, we focus on modeling "queries" rather than data and train neural networks to learn the query answers. This change of focus allows us to theoretically study our ML approach to provide a distribution and query dependent error bound for neural networks when answering RAQs. We confirm our theoretical results by developing NeuroSketch, a neural network framework to answer RAQs in practice. Extensive experimental study on real-world, TPC-benchmark and synthetic datasets show that NeuroSketch answers RAQs multiple orders of magnitude faster than state-of-the-art and with better accuracy.

Supplemental Material

MP4 File

Presentation video

Download
23.13 MB

MP4 File

Presentation video

Download
30.80 MB

References

[1]

2020. SafeGraph dataset. https://docs.safegraph.com/v4.0/docs/places-schema#section-patterns. Accessed Dec 29th, 2020.

[2]

2020. Veraset Website. https://www.veraset.com/about-veraset. Accessed: 2020--10--25.

[3]

2021. Parameter Queries (Visual Database Tools). https://docs.microsoft.com/en-us/sql/ssms/visual-db-tools/parameter-queries-visual-database-tools?view=sql-server-ver15. Accessed Jun 30th, 2021.

[4]

2021. Parameterized query. https://node-postgres.com/features/queries. Accessed Jun 30th, 2021.

[5]

2021. Parameterized query. https://docs.data.world/documentation/sql/concepts/dw_specific/parameterized_queries.html. Accessed Jun 30th, 2021.

[6]

2022. Optuna. https://optuna.org/. Accessed Feb 21st, 2022.

[7]

Sameer Agarwal, Barzan Mozafari, Aurojit Panda, Henry Milner, Samuel Madden, and Ion Stoica. 2013. BlinkDB: queries with bounded errors and bounded response times on very large data. In Proceedings of the 8th ACM European Conference on Computer Systems. 29--42.

Digital Library

[8]

Ritesh Ahuja, Sepanta Zeighami, Gabriel Ghinita, and Cyrus Shahabi. 2023. A Neural Approach to Spatio-Temporal Data Release with User-Level Differential Privacy. Proceedings of the 2023 International Conference on Management of Data, SIGMOD '23 (2023). arXiv preprint arXiv:2208.09744.

Digital Library

[9]

Martin Anthony and Peter L. Bartlett. 1999. Neural Network Learning: Theoretical Foundations. Cambridge University Press. https://doi.org/10.1017/CBO9780511624216

[10]

Davis Blalock, Jose Javier Gonzalez Ortiz, Jonathan Frankle, and John Guttag. 2020. What is the state of neural network pruning? arXiv preprint arXiv:2003.03033 (2020).

[11]

Helmut Bolcskei, Philipp Grohs, Gitta Kutyniok, and Philipp Petersen. 2019. Optimal approximation with sparsely connected deep neural networks. SIAM Journal on Mathematics of Data Science 1, 1 (2019), 8--45.

[12]

Surajit Chaudhuri, Gautam Das, and Vivek Narasayya. 2007. Optimized stratified sampling for approximate query processing. ACM Transactions on Database Systems (TODS) 32, 2 (2007), 9--es.

Digital Library

[13]

Graham Cormode, Minos Garofalakis, Peter J. Haas, and Chris Jermaine. 2012. Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches. Found. Trends Databases 4, 1--3 (Jan. 2012), 1--294. https://doi.org/10.1561/1900000004

Digital Library

[14]

Joseph M Hellerstein, Peter J Haas, and Helen J Wang. 1997. Online aggregation. In Proceedings of the 1997 ACM SIGMOD international conference on Management of data. 171--182.

Digital Library

[15]

Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2019. DeepDB: Learn from Data, not from Queries! Proceedings of the VLDB Endowment 13, 7 (2019).

[16]

Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2021. DeepDB Implementation. https://github.com/DataManagementLab/deepdb-public. Accessed May 21th, 2021.

[17]

Xiao Hu, Yuxi Liu, Haibo Xiu, Pankaj K. Agarwal, Debmalya Panigrahi, Sudeepa Roy, and Jun Yang. 2022. Selectivity Functions of Range Queries Are Learnable. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 959--972. https://doi.org/10.1145/3514221.3517896

Digital Library

[18]

Changcun Huang. 2020. ReLU Networks Are Universal Approximators via Piecewise Linear or Constant Functions. Neural Computation 32, 11 (11 2020), 2249--2278. https://doi.org/10.1162/neco_a_01316 arXiv:https://direct.mit.edu/neco/article-pdf/32/11/2249/1865413/neco_a_01316.pdf

Digital Library

[19]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[20]

Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2018. Learned cardinalities: Estimating correlated joins with deep learning. CIDR 2019, 9th Biennial Conference on Innovative Data Systems Research (2018).

[21]

Xuan Liang, Tao Zou, Bin Guo, Shuo Li, Haozhe Zhang, Shuyi Zhang, Hui Huang, and Song Xi Chen. 2015. Assessing Beijing's PM2. 5 pollution: severity, weather impact, APEC and winter heating. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 471, 2182 (2015), 20150257.

[22]

Jianfeng Lu, Zuowei Shen, Haizhao Yang, and Shijun Zhang. 2021. Deep network approximation for smooth functions. SIAM Journal on Mathematical Analysis 53, 5 (2021), 5465--5506.

Digital Library

[23]

Qingzhi Ma and Peter Triantafillou. 2019. Dbest: Revisiting approximate query processing engines with machine learning models. In Proceedings of the 2019 International Conference on Management of Data. 1553--1570.

Digital Library

[24]

Qingzhi Ma and Peter Triantafillou. 2020. DBEst Implementation. https://github.com/qingzma/DBEst_MDN. Accessed Dec 21th, 2020.

[25]

Raghunath Othayoth Nambiar and Meikel Poess. 2006. The Making of TPC-DS (VLDB '06). VLDB Endowment, 1049--1058.

[26]

Yongjoo Park, Barzan Mozafari, Joseph Sorenson, and Junhao Wang. 2018. Verdictdb: Universalizing approximate query processing. In Proceedings of the 2018 International Conference on Management of Data. 1461--1476.

Digital Library

[27]

Yongjoo Park, Barzan Mozafari, Joseph Sorenson, and Junhao Wang. 2021. VerdictDB Implementation. https://github.com/verdict-project/verdict. Accessed Jul 6th, 2021.

[28]

Philipp Petersen and Felix Voigtlaender. 2018. Optimal approximation of piecewise smooth functions using deep ReLU neural networks. Neural Networks 108 (2018), 296--330.

[29]

Allan Pinkus. 1999. Approximation theory of the MLP model in neural networks. Acta numerica 8 (1999), 143--195.

[30]

Douglas A Reynolds. 2009. Gaussian Mixture Models. Encyclopedia of biometrics 741 (2009).

[31]

Rolfe R Schmidt and Cyrus Shahabi. 2002. Propolyne: A fast wavelet-based algorithm for progressive evaluation of polynomial range-sum queries. In International Conference on Extending Database Technology. Springer, 664--681.

[32]

Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding machine learning: From theory to algorithms. Cambridge university press.

Digital Library

[33]

Zuowei Shen, Haizhao Yang, and Shijun Zhang. 2019. Nonlinear approximation via compositions. Neural Networks 119 (2019), 74--84.

Digital Library

[34]

Zuowei Shen, Haizhao Yang, and Shijun Zhang. 2020. Deep Network Approximation Characterized by Number of Neurons. Communications in Computational Physics 28, 5 (2020), 1768--1811. https://doi.org/10.4208/cicp.OA-2020-0149

[35]

Saravanan Thirumuruganathan, Shohedul Hasan, Nick Koudas, and Gautam Das. 2020. Approximate query processing for data exploration using deep generative models. In 2020 IEEE 36th international conference on data engineering (ICDE). IEEE, 1309--1320.

[36]

Peizhi Wu and Gao Cong. 2021. A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation. In Proceedings of the 2021 International Conference on Management of Data. 2009--2022.

Digital Library

[37]

Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: one cardinality estimator for all tables. Proceedings of the VLDB Endowment 14, 1 (2020), 61--73.

Digital Library

[38]

Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep unsupervised cardinality estimation. Proceedings of the VLDB Endowment 13, 3 (2019), 279--292.

Digital Library

[39]

Dmitry Yarotsky. 2017. Error bounds for approximations with deep ReLU networks. Neural Networks 94 (2017), 103--114.

[40]

Dmitry Yarotsky. 2018. Optimal approximation of continuous functions by very deep ReLU networks. In Conference on learning theory. PMLR, 639--649.

[41]

Dmitry Yarotsky and Anton Zhevnerchuk. 2020. The phase diagram of approximation rates for deep neural networks. Advances in neural information processing systems 33 (2020), 13005--13015.

[42]

Yang Ye, Yu Zheng, Yukun Chen, Jianhua Feng, and Xing Xie. 2009. Mining individual life pattern based on location history. In 2009 tenth international conference on mobile data management: Systems, services and middleware. IEEE, 1--10.

[43]

Chulhee Yun, Suvrit Sra, and Ali Jadbabaie. 2019. Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity. In Advances in Neural Information Processing Systems. 15558--15569.

[44]

Sepanta Zeighami, Ritesh Ahuja, Gabriel Ghinita, and Cyrus Shahabi. 2022. A Neural Database for Differentially Private Spatial Range Queries. Proc. VLDB Endow. 15, 5 (jan 2022), 1066--1078. https://doi.org/10.14778/3510397.3510404

Digital Library

[45]

Sepanta Zeighami, Cyrus Shahabi, and Vatsal Sharan. 2022. NeuroSketch: A Neural Network Method for Fast and Approximate Evaluation of Range Aggregate Queries (Technical Report). (2022). https://arxiv.org/abs/2211.10832.

[46]

Sepanta Zeighami, Cyrus Shahabi, and Vatsal Sharan. 2022. NeuroSketch Implementation. https://github.com/szeighami/NeuroSketch.

[47]

Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016).

Cited By

Hurst ALucani DZhang Q(2024)PairwiseHist: Fast, Accurate and Space-Efficient Approximate Query Processing with Data CompressionProceedings of the VLDB Endowment10.14778/3648160.364818117:6(1432-1445)Online publication date: 3-May-2024
https://dl.acm.org/doi/10.14778/3648160.3648181
Kim KLee SKim IHan W(2024)ASM: Harmonizing Autoregressive Model, Sampling, and Multi-dimensional Statistics Merging for Cardinality EstimationProceedings of the ACM on Management of Data10.1145/36393002:1(1-27)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639300
Zeighami SSeshadri RShahabi C(2024)A Neural Database for Answering Aggregate Queries on Incomplete Relational Data (Extended Abstract)2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00483(5703-5704)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00483
Show More Cited By

Index Terms

NeuroSketch: Fast and Approximate Evaluation of Range Aggregate Queries with Neural Networks
1. Information systems
  1. Data management systems

Recommendations

Efficient Execution of Range-Aggregate Queries in Data Warehouse Environments
ER '01: Proceedings of the 20th International Conference on Conceptual Modeling: Conceptual Modeling

Range-aggregate queries on the data cube are powerful tools for analysis in data warehouse environments. Cubetree is a technique materializing a data cube through an R-tree. It provides efficient data accessibility, but involves some drawbacks to ...
Approximate Query Processing with Error Guarantees
Big-Data-Analytics in Astronomy, Science, and Engineering
Abstract
In recent years, with the increase of data and the sophistication of analysis requirements, query processing in databases has become more important. Recently, approximate query processing (AQP) was proposed for efficiently executing database ...
Database Learning: Toward a Database that Becomes Smarter Every Time
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

In today's databases, previous query answers rarely benefit answering future queries. For the first time, to the best of our knowledge, we change this paradigm in an approximate query processing (AQP) context. We make the following observation: the ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data

Proceedings of the ACM on Management of Data Volume 1, Issue 1

PACMMOD

May 2023

2807 pages

EISSN:2836-6573

DOI:10.1145/3603164

Editor:
Divyakant Agrawal
UC Santa Barbara, United States

Issue’s Table of Contents

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023

Published in PACMMOD Volume 1, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Author Tags

Qualifiers

Research-article

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
485
Total Downloads

Downloads (Last 12 months)311
Downloads (Last 6 weeks)41

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hurst ALucani DZhang Q(2024)PairwiseHist: Fast, Accurate and Space-Efficient Approximate Query Processing with Data CompressionProceedings of the VLDB Endowment10.14778/3648160.364818117:6(1432-1445)Online publication date: 3-May-2024
https://dl.acm.org/doi/10.14778/3648160.3648181
Kim KLee SKim IHan W(2024)ASM: Harmonizing Autoregressive Model, Sampling, and Multi-dimensional Statistics Merging for Cardinality EstimationProceedings of the ACM on Management of Data10.1145/36393002:1(1-27)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639300
Zeighami SSeshadri RShahabi C(2024)A Neural Database for Answering Aggregate Queries on Incomplete Relational Data (Extended Abstract)2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00483(5703-5704)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00483
Lin YZhang YYang YLi YZhang J(2024)Self-adaptive smoothing model for cardinality estimationThe Computer Journal10.1093/comjnl/bxae117Online publication date: 11-Nov-2024
https://doi.org/10.1093/comjnl/bxae117
Liu RZeighami SLin HShahabi CCao YTakagi SKonishi YYoshikawa MXiong L(2023)Supporting Pandemic Preparedness with Privacy Enhancing Technology2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA)10.1109/TPS-ISA58951.2023.00014(34-43)Online publication date: 1-Nov-2023
https://doi.org/10.1109/TPS-ISA58951.2023.00014
Nam KKim SPark CNam TLee T(2023)A Framework for Learned Approximate Query Processing for Tabular Data with Trajectory2023 14th International Conference on Information and Communication Technology Convergence (ICTC)10.1109/ICTC58733.2023.10392323(1122-1124)Online publication date: 11-Oct-2023
https://doi.org/10.1109/ICTC58733.2023.10392323

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents