Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1609/aaai.v37i6.25929guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Reinforced approximate exploratory data analysis

Published: 07 February 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Exploratory data analytics (EDA) is a sequential decision making process where analysts choose subsequent queries that might lead to some interesting insights based on the previous queries and corresponding results. Data processing systems often execute the queries on samples to produce results with low latency. Different downsampling strategy preserves different statistics of the data and have different magnitude of latency reductions. The optimum choice of sampling strategy often depends on the particular context of the analysis flow and the hidden intent of the analyst. In this paper, we are the first to consider the impact of sampling in interactive data exploration settings as they introduce approximation errors. We propose a Deep Reinforcement Learning (DRL) based framework which can optimize the sample selection in order to keep the analysis and insight generation flow intact. Evaluations with 3 real datasets show that our technique can preserve the original insight generation flow while improving the interaction latency, compared to baseline methods.

    References

    [1]
    Agarwal, S.; Mozafari, B.; Panda, A.; Milner, H.; Madden, S.; and Stoica, I. 2013. BlinkDB: queries with bounded errors and bounded response times on very large data. In EuroSys.
    [2]
    Aggarwal, S.; Garg, R.; Sancheti, A.; Guda, B. P. R.; and Burhanuddin, I. A. 2020. Goal-driven command recommendations for analysts. In Fourteenth ACM Conference on Recommender Systems, 160-169.
    [3]
    Arnold, S.; Schneider, R.; Cudré-Mauroux, P.; Gers, F. A.; and Löser, A. 2019. SECTOR: A neural model for coherent topic segmentation and classification. Transactions of the Association for Computational Linguistics, 7: 169-184.
    [4]
    Babcock, B.; Chaudhuri, S.; and Das, G. 2003. Dynamic sample selection for approximate query processing. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, 539-550.
    [5]
    Bar El, O.; Milo, T.; Somech, A.; Bar El, O.; Milo, T.; and Somech, A. 2020. Automatically Generating Data Exploration Sessions Using Deep Reinforcement Learning. In SIGMOD.
    [6]
    Bater, J.; Park, Y.; He, X.; Wang, X.; and Rogers, J. 2020. Saqe: practical privacy-preserving approximate query processing for data federations. Proceedings of the VLDB Endowment.
    [7]
    Brachmann, M.; Bautista, C.; Castelo, S.; Feng, S.; Freire, J.; Glavic, B.; Kennedy, O.; Mueller, H.; Rampin, R.; Spoth, W.; et al. 2019. Data debugging and exploration with vizier. In SIGMOD.
    [8]
    Bureau, U. S. C. 2014. Income Data. https://www.census.gov/topics/income-poverty/income/data/datasets.html. Accessed: 2021-04-05.
    [9]
    Chaudhuri, S.; Ding, B.; and Kandula, S. 2017. Approximate query processing: No silver bullet. In Proceedings of the 2017 ACM International Conference on Management of Data, 511-519.
    [10]
    Christakopoulou, K.; Radlinski, F.; Hofmann, K.; Christakopoulou, K.; Radlinski, F.; and Hofmann, K. 2016. Towards conversational recommender systems. In KDD.
    [11]
    Dev, H.; and Liu, Z. 2017. Identifying frequent user tasks from application logs. In IUI.
    [12]
    Ding, R.; Han, S.; Xu, Y.; Zhang, H.; and Zhang, D. 2019. Quickinsights: Quick and automatic discovery of insights from multi-dimensional data. In SIGMOD.
    [13]
    Galakatos, A.; Crotty, A.; Zgraggen, E.; Binnig, C.; and Kraska, T. 2017. Revisiting reuse for approximate query processing. VLDB.
    [14]
    Garg, S.; Mitra, S.; Yu, T.; Gadhia, Y.; and Kashettiwar, A. 2022. Reinforced Approximate Exploratory Data Analysis. arXiv:2212.06225.
    [15]
    Guo, X.; Wu, H.; Cheng, Y.; Rennie, S.; Tesauro, G.; and Feris, R. 2018. Dialog-based interactive image retrieval. In NeurIPS.
    [16]
    Hua, Y.; Li, Y.-F.; Haffari, G.; Qi, G.; and Wu, T. 2020. Few-Shot Complex Knowledge Base Question Answering via Meta Reinforcement Learning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 5827-5837.
    [17]
    Jain, S.; Moritz, D.; Halperin, D.; Howe, B.; and Lazowska, E. 2016. Sqlshare: Results from a multi-year sql-as-aservice experiment. In SIGMOD.
    [18]
    Jiang, W.; Jiang, W.; Jiang, W.; and Pardos, Z. A. 2019a. Time slice imputation for personalized goal-based recommendation in higher education. In RecSys.
    [19]
    Jiang, W.; Jiang, W.; Jiang, W.; Pardos, Z. A.; and Wei, Q. 2019b. Goal-based course recommendation. In 9th International Conference on Learning Analytics & Knowledge.
    [20]
    Kaiser, M.; Saha Roy, R.; and Weikum, G. 2021. Reinforcement learning from reformulations in conversational question answering over knowledge graphs. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 459-469.
    [21]
    Kery, M. B.; Radensky, M.; Arya, M.; John, B. E.; and Myers, B. A. 2018. The story in the notebook: Exploratory data science using a literate programming tool. In CHI.
    [22]
    Lei, W.; Zhang, G.; He, X.; Miao, Y.; Wang, X.; Chen, L.; and Chua, T.-S. 2020. Interactive path reasoning on graph for conversational recommendation. In KDD.
    [23]
    Lianjia.com. 2018. Housing Data. https://www.kaggle.com/ruiqurm/lianjia/version/2/. Accessed: 2021-04-05.
    [24]
    Lillicrap, T. P.; Hunt, J. J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; and Wierstra, D. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.
    [25]
    Liu, B.; Tur, G.; Hakkani-Tür, D.; Shah, P.; and Heck, L. 2018. Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems. In Proceedings of NAACL-HLT.
    [26]
    Ma, P.; Ding, R.; Han, S.; and Zhang, D. 2021a. MetaInsight: Automatic Discovery of Structured Knowledge for Exploratory Data Analysis. In SIGMOD.
    [27]
    Ma, Q.; Shanghooshabad, A. M.; Almasi, M.; Kurmanji, M.; and Triantafillou, P. 2021b. Learned Approximate Query Processing: Make it Light, Accurate and Fast. In CIDR.
    [28]
    Milo, T.; Somech, A.; Milo, T.; and Somech, A. 2018a. Next-Step Suggestions for Modern Interactive Data Analysis Platforms. In KDD.
    [29]
    Milo, T.; Somech, A.; Milo, T.; and Somech, A. 2018b. Next-step suggestions for modern interactive data analysis platforms. In KDD.
    [30]
    Milo, T.; Somech, A.; Milo, T.; and Somech, A. 2020. Automating exploratory data analysis via machine learning: An overview. In SIGMOD.
    [31]
    Mnih, V.; Badia, A. P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; and Kavukcuoglu, K. 2016. Asynchronous methods for deep reinforcement learning. In ICML.
    [32]
    Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; and Riedmiller, M. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
    [33]
    Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, A. K.; Ostrovski, G.; et al. 2015. Human-level control through deep reinforcement learning. Nature.
    [34]
    Moritz, D.; Fisher, D.; Ding, B.; and Wang, C. 2017. Trust, but verify: Optimistic visualizations of approximate queries for exploring big data. In CHI.
    [35]
    Mozafari, B.; and Niu, N. 2015. A Handbook for Building an Approximate Query Engine. IEEE Data Eng. Bull.
    [36]
    Newman, D.; Noh, Y.; Talley, E.; Karimi, S.; and Baldwin, T. 2010. Evaluating topic models for digital libraries. In 10th annual joint conference on Digital libraries.
    [37]
    Park, Y.; Cafarella, M.; and Mozafari, B. 2016. Visualization-aware sampling for very large databases. In ICDE.
    [38]
    Park, Y.; Mozafari, B.; Sorenson, J.; and Wang, J. 2018. Verdictdb: Universalizing approximate query processing. In SIGMOD.
    [39]
    Porwal, V.; Mitra, S.; Du, F.; Anderson, J.; Sheoran, N.; Rao, A.; Mai, T.; Kowshik, G.; Nair, S.; Arora, S.; et al. 2022. Efficient Insights Discovery through Conditional Generative Model based Query Approximation. In Proceedings of the 2022 International Conference on Management of Data, 2397-2400.
    [40]
    Rule, A.; Tabard, A.; and Hollan, J. D. 2018. Exploration and explanation in computational notebooks. In CHI.
    [41]
    Sheoran, N.; Mitra, S.; Porwal, V.; Ghetia, S.; Varshney, J.; Mai, T.; Rao, A.; and Maddukuri, V. 2022a. Conditional Generative Model based Predicate-Aware Query Approximation. In Proceedings of the AAAI Conference on Artificial Intelligence.
    [42]
    Sheoran, N.; Mitra, S.; Porwal, V.; Ghetia, S.; Varshney, J.; Mai, T.; Rao, A.; and Maddukuri, V. 2022b. Electra: Conditional Generative Model based Predicate-Aware Query Approximation. arXiv preprint arXiv:2201.12420.
    [43]
    Shi, W.; Qian, K.; Wang, X.; and Yu, Z. 2019. How to Build User Simulators to Train RL-based Dialog Systems. In EMNLP-IJCNLP.
    [44]
    Sutton, R. S.; and Barto, A. G. 2018. Reinforcement learning: An introduction. MIT press.
    [45]
    Takanobu, R.; Liang, R.; Huang, M.; Takanobu, R.; Liang, R.; and Huang, M. 2020. Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward Decomposition. In ACL.
    [46]
    Tan, F.; Cascante-Bonilla, P.; Guo, X.; Wu, H.; Feng, S.; and Ordonez, V. 2019. Drill-down: Interactive retrieval of complex scenes using natural language queries. In NeurIPS.
    [47]
    Transtats. 2019. Flights Data. https://www.transtats.bts.gov/. Accessed: 2021-04-05.
    [48]
    Wang, J.; Krishnan, S.; Franklin, M. J.; Goldberg, K.; Kraska, T.; and Milo, T. 2014. A sample-and-clean framework for fast and accurate query processing on dirty data. In SIGMOD.
    [49]
    Xie, H.; Liu, Z.; Xiong, C.; Liu, Z.; and Copestake, A. 2021. TIAGE: A Benchmark for Topic-Shift Aware Dialog Modeling. In Findings of the Association for Computational Linguistics: EMNLP 2021, 1684-1690.
    [50]
    Yan, X.; Guo, J.; Lan, Y.; and Cheng, X. 2013. A Biterm Topic Model for Short Texts. In WWW.
    [51]
    Yu, T.; Shen, Y.; Jin, H.; Yu, T.; Shen, Y.; and Jin, H. 2019. A visual dialog augmented interactive recommender system. In KDD.
    [52]
    Zhang, X.; Xie, H.; Li, H.; and CS Lui, J. 2020. Conversational Contextual Bandit: Algorithm and Application. In The Web Conference.
    [53]
    Zhao, Z.; De Stefani, L.; Zgraggen, E.; Binnig, C.; Upfal, E.; and Kraska, T. 2017. Controlling false discoveries during interactive data exploration. In SIGMOD.

    Cited By

    View all
    • (2024)PairwiseHist: Fast, Accurate and Space-Efficient Approximate Query Processing with Data CompressionProceedings of the VLDB Endowment10.14778/3648160.364818117:6(1432-1445)Online publication date: 1-Feb-2024

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    AAAI'23/IAAI'23/EAAI'23: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence
    February 2023
    16496 pages
    ISBN:978-1-57735-880-0

    Sponsors

    • Association for the Advancement of Artificial Intelligence

    Publisher

    AAAI Press

    Publication History

    Published: 07 February 2023

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)PairwiseHist: Fast, Accurate and Space-Efficient Approximate Query Processing with Data CompressionProceedings of the VLDB Endowment10.14778/3648160.364818117:6(1432-1445)Online publication date: 1-Feb-2024

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media