research-article

Reinforced approximate exploratory data analysis

AUTHORs:

Arjun KashettiwarAuthors Info & Claims

AAAI'23/IAAI'23/EAAI'23: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence

Article No.: 860, Pages 7660 - 7669

https://doi.org/10.1609/aaai.v37i6.25929

Published: 07 February 2023 Publication History

Abstract

Exploratory data analytics (EDA) is a sequential decision making process where analysts choose subsequent queries that might lead to some interesting insights based on the previous queries and corresponding results. Data processing systems often execute the queries on samples to produce results with low latency. Different downsampling strategy preserves different statistics of the data and have different magnitude of latency reductions. The optimum choice of sampling strategy often depends on the particular context of the analysis flow and the hidden intent of the analyst. In this paper, we are the first to consider the impact of sampling in interactive data exploration settings as they introduce approximation errors. We propose a Deep Reinforcement Learning (DRL) based framework which can optimize the sample selection in order to keep the analysis and insight generation flow intact. Evaluations with 3 real datasets show that our technique can preserve the original insight generation flow while improving the interaction latency, compared to baseline methods.

References

[1]

Agarwal, S.; Mozafari, B.; Panda, A.; Milner, H.; Madden, S.; and Stoica, I. 2013. BlinkDB: queries with bounded errors and bounded response times on very large data. In EuroSys.

[2]

Aggarwal, S.; Garg, R.; Sancheti, A.; Guda, B. P. R.; and Burhanuddin, I. A. 2020. Goal-driven command recommendations for analysts. In Fourteenth ACM Conference on Recommender Systems, 160-169.

[3]

Arnold, S.; Schneider, R.; Cudré-Mauroux, P.; Gers, F. A.; and Löser, A. 2019. SECTOR: A neural model for coherent topic segmentation and classification. Transactions of the Association for Computational Linguistics, 7: 169-184.

[4]

Babcock, B.; Chaudhuri, S.; and Das, G. 2003. Dynamic sample selection for approximate query processing. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, 539-550.

[5]

Bar El, O.; Milo, T.; Somech, A.; Bar El, O.; Milo, T.; and Somech, A. 2020. Automatically Generating Data Exploration Sessions Using Deep Reinforcement Learning. In SIGMOD.

[6]

Bater, J.; Park, Y.; He, X.; Wang, X.; and Rogers, J. 2020. Saqe: practical privacy-preserving approximate query processing for data federations. Proceedings of the VLDB Endowment.

[7]

Brachmann, M.; Bautista, C.; Castelo, S.; Feng, S.; Freire, J.; Glavic, B.; Kennedy, O.; Mueller, H.; Rampin, R.; Spoth, W.; et al. 2019. Data debugging and exploration with vizier. In SIGMOD.

[8]

Bureau, U. S. C. 2014. Income Data. https://www.census.gov/topics/income-poverty/income/data/datasets.html. Accessed: 2021-04-05.

[9]

Chaudhuri, S.; Ding, B.; and Kandula, S. 2017. Approximate query processing: No silver bullet. In Proceedings of the 2017 ACM International Conference on Management of Data, 511-519.

[10]

Christakopoulou, K.; Radlinski, F.; Hofmann, K.; Christakopoulou, K.; Radlinski, F.; and Hofmann, K. 2016. Towards conversational recommender systems. In KDD.

[11]

Dev, H.; and Liu, Z. 2017. Identifying frequent user tasks from application logs. In IUI.

[12]

Ding, R.; Han, S.; Xu, Y.; Zhang, H.; and Zhang, D. 2019. Quickinsights: Quick and automatic discovery of insights from multi-dimensional data. In SIGMOD.

Digital Library

[13]

Galakatos, A.; Crotty, A.; Zgraggen, E.; Binnig, C.; and Kraska, T. 2017. Revisiting reuse for approximate query processing. VLDB.

[14]

Garg, S.; Mitra, S.; Yu, T.; Gadhia, Y.; and Kashettiwar, A. 2022. Reinforced Approximate Exploratory Data Analysis. arXiv:2212.06225.

[15]

Guo, X.; Wu, H.; Cheng, Y.; Rennie, S.; Tesauro, G.; and Feris, R. 2018. Dialog-based interactive image retrieval. In NeurIPS.

[16]

Hua, Y.; Li, Y.-F.; Haffari, G.; Qi, G.; and Wu, T. 2020. Few-Shot Complex Knowledge Base Question Answering via Meta Reinforcement Learning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 5827-5837.

[17]

Jain, S.; Moritz, D.; Halperin, D.; Howe, B.; and Lazowska, E. 2016. Sqlshare: Results from a multi-year sql-as-aservice experiment. In SIGMOD.

Digital Library

[18]

Jiang, W.; Jiang, W.; Jiang, W.; and Pardos, Z. A. 2019a. Time slice imputation for personalized goal-based recommendation in higher education. In RecSys.

[19]

Jiang, W.; Jiang, W.; Jiang, W.; Pardos, Z. A.; and Wei, Q. 2019b. Goal-based course recommendation. In 9th International Conference on Learning Analytics & Knowledge.

[20]

Kaiser, M.; Saha Roy, R.; and Weikum, G. 2021. Reinforcement learning from reformulations in conversational question answering over knowledge graphs. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 459-469.

[21]

Kery, M. B.; Radensky, M.; Arya, M.; John, B. E.; and Myers, B. A. 2018. The story in the notebook: Exploratory data science using a literate programming tool. In CHI.

[22]

Lei, W.; Zhang, G.; He, X.; Miao, Y.; Wang, X.; Chen, L.; and Chua, T.-S. 2020. Interactive path reasoning on graph for conversational recommendation. In KDD.

[23]

Lianjia.com. 2018. Housing Data. https://www.kaggle.com/ruiqurm/lianjia/version/2/. Accessed: 2021-04-05.

[24]

Lillicrap, T. P.; Hunt, J. J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; and Wierstra, D. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.

[25]

Liu, B.; Tur, G.; Hakkani-Tür, D.; Shah, P.; and Heck, L. 2018. Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems. In Proceedings of NAACL-HLT.

[26]

Ma, P.; Ding, R.; Han, S.; and Zhang, D. 2021a. MetaInsight: Automatic Discovery of Structured Knowledge for Exploratory Data Analysis. In SIGMOD.

[27]

Ma, Q.; Shanghooshabad, A. M.; Almasi, M.; Kurmanji, M.; and Triantafillou, P. 2021b. Learned Approximate Query Processing: Make it Light, Accurate and Fast. In CIDR.

[28]

Milo, T.; Somech, A.; Milo, T.; and Somech, A. 2018a. Next-Step Suggestions for Modern Interactive Data Analysis Platforms. In KDD.

[29]

Milo, T.; Somech, A.; Milo, T.; and Somech, A. 2018b. Next-step suggestions for modern interactive data analysis platforms. In KDD.

[30]

Milo, T.; Somech, A.; Milo, T.; and Somech, A. 2020. Automating exploratory data analysis via machine learning: An overview. In SIGMOD.

[31]

Mnih, V.; Badia, A. P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; and Kavukcuoglu, K. 2016. Asynchronous methods for deep reinforcement learning. In ICML.

[32]

Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; and Riedmiller, M. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.

[33]

Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, A. K.; Ostrovski, G.; et al. 2015. Human-level control through deep reinforcement learning. Nature.

[34]

Moritz, D.; Fisher, D.; Ding, B.; and Wang, C. 2017. Trust, but verify: Optimistic visualizations of approximate queries for exploring big data. In CHI.

[35]

Mozafari, B.; and Niu, N. 2015. A Handbook for Building an Approximate Query Engine. IEEE Data Eng. Bull.

[36]

Newman, D.; Noh, Y.; Talley, E.; Karimi, S.; and Baldwin, T. 2010. Evaluating topic models for digital libraries. In 10th annual joint conference on Digital libraries.

[37]

Park, Y.; Cafarella, M.; and Mozafari, B. 2016. Visualization-aware sampling for very large databases. In ICDE.

[38]

Park, Y.; Mozafari, B.; Sorenson, J.; and Wang, J. 2018. Verdictdb: Universalizing approximate query processing. In SIGMOD.

Digital Library

[39]

Porwal, V.; Mitra, S.; Du, F.; Anderson, J.; Sheoran, N.; Rao, A.; Mai, T.; Kowshik, G.; Nair, S.; Arora, S.; et al. 2022. Efficient Insights Discovery through Conditional Generative Model based Query Approximation. In Proceedings of the 2022 International Conference on Management of Data, 2397-2400.

Digital Library

[40]

Rule, A.; Tabard, A.; and Hollan, J. D. 2018. Exploration and explanation in computational notebooks. In CHI.

[41]

Sheoran, N.; Mitra, S.; Porwal, V.; Ghetia, S.; Varshney, J.; Mai, T.; Rao, A.; and Maddukuri, V. 2022a. Conditional Generative Model based Predicate-Aware Query Approximation. In Proceedings of the AAAI Conference on Artificial Intelligence.

[42]

Sheoran, N.; Mitra, S.; Porwal, V.; Ghetia, S.; Varshney, J.; Mai, T.; Rao, A.; and Maddukuri, V. 2022b. Electra: Conditional Generative Model based Predicate-Aware Query Approximation. arXiv preprint arXiv:2201.12420.

[43]

Shi, W.; Qian, K.; Wang, X.; and Yu, Z. 2019. How to Build User Simulators to Train RL-based Dialog Systems. In EMNLP-IJCNLP.

[44]

Sutton, R. S.; and Barto, A. G. 2018. Reinforcement learning: An introduction. MIT press.

[45]

Takanobu, R.; Liang, R.; Huang, M.; Takanobu, R.; Liang, R.; and Huang, M. 2020. Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward Decomposition. In ACL.

[46]

Tan, F.; Cascante-Bonilla, P.; Guo, X.; Wu, H.; Feng, S.; and Ordonez, V. 2019. Drill-down: Interactive retrieval of complex scenes using natural language queries. In NeurIPS.

[47]

Transtats. 2019. Flights Data. https://www.transtats.bts.gov/. Accessed: 2021-04-05.

[48]

Wang, J.; Krishnan, S.; Franklin, M. J.; Goldberg, K.; Kraska, T.; and Milo, T. 2014. A sample-and-clean framework for fast and accurate query processing on dirty data. In SIGMOD.

[49]

Xie, H.; Liu, Z.; Xiong, C.; Liu, Z.; and Copestake, A. 2021. TIAGE: A Benchmark for Topic-Shift Aware Dialog Modeling. In Findings of the Association for Computational Linguistics: EMNLP 2021, 1684-1690.

[50]

Yan, X.; Guo, J.; Lan, Y.; and Cheng, X. 2013. A Biterm Topic Model for Short Texts. In WWW.

[51]

Yu, T.; Shen, Y.; Jin, H.; Yu, T.; Shen, Y.; and Jin, H. 2019. A visual dialog augmented interactive recommender system. In KDD.

[52]

Zhang, X.; Xie, H.; Li, H.; and CS Lui, J. 2020. Conversational Contextual Bandit: Algorithm and Application. In The Web Conference.

[53]

Zhao, Z.; De Stefani, L.; Zgraggen, E.; Binnig, C.; Upfal, E.; and Kraska, T. 2017. Controlling false discoveries during interactive data exploration. In SIGMOD.

Cited By

Hurst ALucani DZhang Q(2024)PairwiseHist: Fast, Accurate and Space-Efficient Approximate Query Processing with Data CompressionProceedings of the VLDB Endowment10.14778/3648160.364818117:6(1432-1445)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.14778/3648160.3648181

Recommendations

A Space-Filling Multidimensional Visualization (SFMDVis for Exploratory Data Analysis
VINCI '14: Proceedings of the 7th International Symposium on Visual Information Communication and Interaction

We introduce a new Space-Filling Multidimensional Data Visualization (SFMDVis) that can be used to facilitate the viewing, interaction and analysis of the multidimensional data with a fully utilized display space. The existing multidimensional ...
Cognitive Effects of Animated Visualization in Exploratory Visual Data Analysis
IV '01: Proceedings of the Fifth International Conference on Information Visualisation

Abstract: The goal of this research is to study the role and effects of the use of animated information visualization in early stages of exploratory data analysis tasks. Despite the existence of a large body of research on information visualization, ...
What Users Don't Expect about Exploratory Data Analysis on Approximate Query Processing Systems
HILDA '17: Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics

Pangloss implements "Optimistic Visualization", a method that gives analysts confidence to use approximate results for exploratory data analysis. In this paper, we outline how analysts' experience with an approximate visualization system did not match ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'23/IAAI'23/EAAI'23: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence

February 2023

16496 pages

ISBN:978-1-57735-880-0

Copyright © 2023 Association for the Advancement of Artificial Intelligence.

Sponsors

Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 07 February 2023

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to

Other Metrics

View Author Metrics

Citations

Cited By

Hurst ALucani DZhang Q(2024)PairwiseHist: Fast, Accurate and Space-Efficient Approximate Query Processing with Data CompressionProceedings of the VLDB Endowment10.14778/3648160.364818117:6(1432-1445)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.14778/3648160.3648181

View Options

View options

Media

Figures

Other

Tables

View Table of Contents