Abstract
Exploratory Data Analysis (EDA) is the tedious activity of interactively analyzing a dataset to extract insights. Many approaches aiming at supporting EDA were recently proposed. They all rely on interestingness measures to score the importance of insights. This paper surveys and categorizes the different interestingness measures proposed in the literature for approaches aiming at automating EDA. The lessons learned from this survey allow to point out promising research directions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abuzaid, F., Kraft, P., et al.: DIFF: a relational interface for large-scale data explanation. VLDB J. 30(1), 45–70 (2021)
Amer-Yahia, S., Marcel, P., et al.: Data narration for the people: challenges and opportunities. In: EDBT, pp. 855–858. OpenProceedings.org (2023)
Bie, T.D., Raedt, L.D., et al.: Automating data science. Commun. ACM 65(3), 76–87 (2022)
Chanson, A., Crulis, B., et al.: Profiling user belief in BI exploration for measuring subjective interestingness. In: DOLAP, CEUR Proceedings, vol. 2324 (2019)
Chanson, A., Labroche, N., et al.: Automatic generation of comparison notebooks for interactive data exploration. In: EDBT, pp. 2:274–2:284 (2022)
Dadvar, V., Golab, L., et al.: Exploring data using patterns: a survey. Inf. Syst. 108, 101985 (2022)
De Bie, T.: Subjective interestingness in exploratory data mining. IDA 8207, 19–31 (2013)
Ding, R., Han, S., et al.: QuickInsights: quick and automatic discovery of insights from multi-dimensional data. In: Proceedings of SIGMOD, pp. 317–332 (2019)
El, O.B., Milo, T., et al.: ATENA: an autonomous system for data exploration based on deep reinforcement learning. In: CIKM, pp. 2873–2876 (2019)
El, O.B., Milo, T., et al.: Automatically generating data exploration sessions using deep reinforcement learning. In: SIGMOD, pp. 1527–1537 (2020)
Francia, M., Golfarelli, M., et al.: Assess queries for interactive analysis of data cubes. In: EDBT (2021)
Francia, M., Marcel, P., et al.: Enhancing cubes with models to describe multidimensional data. Inf. Syst. Front. 24(1) (2021)
Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. 38(3), 9 (2006)
Gkesoulis, D., Vassiliadis, P., et al.: CineCubes: aiding data workers gain insights from OLAP queries. Inf. Syst. 53, 60–86 (2015)
Gkitsakis, D., Kaloudis, S., et al.: Cube query interestingness: novelty, relevance, peculiarity and surprise. Inf. Syst. 123, 102381 (2024)
Idreos, S., Papaemmanouil, O., et al.: Overview of data exploration techniques. In: SIGMOD, pp. 277–281. ACM (2015)
Kaminskas, M., Bridge, D.: Diversity, serendipity, novelty, and coverage: a survey and empirical analysis of beyond-accuracy objectives in recommender systems. TiiS 7(1), 2:1–2:42 (2017)
Ma, P., Ding, R., et al.: MetaInsight: automatic discovery of structured knowledge for exploratory data analysis. In: Proceedings of SIGMOD, pp. 1262–1274 (2021)
Ma, P., Ding, R., et al.: XInsight: explainable data analysis through the lens of causality. Proc. ACM Manag. Data 1(2) (2023)
Marcel, P., Peralta, V., Vassiliadis, P.: A framework for learning cell interestingness from cube explorations. In: Welzer, T., Eder, J., Podgorelec, V., Kamišalić Latifić, A. (eds.) ADBIS 2019. LNCS, vol. 11695, pp. 425–440. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28730-6_26
Milo, T., Somech, A.: Automating exploratory data analysis via machine learning: an overview. In: SIGMOD (2020)
Patil, Y., Amer-Yahia, S., et al.: Designing the evaluation of operator-enabled interactive data exploration in VALIDE. In: HILDA@SIGMOD, pp. 4:1–4:7 (2022)
Personnaz, A., Amer-Yahia, S., et al.: DORA THE EXPLORER: exploring very large data with interactive deep reinforcement learning. In: CIKM (2021)
Razmadze, K., Amsterdamer, Y., et al.: SubTab: data exploration with informative sub-tables. In: SIGMOD, pp. 2369–2372 (2022)
Sarawagi, S.: Explaining differences in multidimensional aggregates. In: Proceedings VLDB, pp. 42–53 (1999)
Sarawagi, S.: User-adaptive exploration of multidimensional data. In: VLDB, pp. 307–316 (2000)
Sarawagi, S., Agrawal, R., Megiddo, N.: Discovery-driven exploration of OLAP data cubes. In: Schek, H.-J., Alonso, G., Saltor, F., Ramos, I. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 168–182. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0100984
Sathe, G., Sarawagi, S.: Intelligent rollups in multidimensional OLAP data. In: Proceedings VLDB, pp. 531–540 (2001)
Shi, D., Xu, X., et al.: Calliope: automatic visual data story generation from a spreadsheet. TVCG 27(2), 453–463 (2021)
Siddiqui, T., Chaudhuri, S., et al.: COMPARE: accelerating groupwise comparison in relational databases for data analytics. In: VLDB, vol. 14, no. 11, pp. 2419–2431 (2021)
Sintos, S., Agarwal, P.K., et al.: Selecting data to clean for fact checking: minimizing uncertainty vs. maximizing surprise. Proc. VLDB Endow. 12(13), 2408–2421 (2019)
Somech, A., Milo, T., et al.: Predicting “what is interesting” by mining interactive-data-analysis session logs. In: EDBT (2019)
Tang, B., Han, S., et al.: Extracting top-k insights from multi-dimensional data. In: SIGMOD (2017)
Tukey, J.W.: Exploratory Data Analysis. Addison-Wesley (1977)
Wang, Y., Sun, Z., et al.: DataShot: automatic generation of fact sheets from tabular data. TVCG 26(1), 895–905 (2020)
Youngmann, B., Amer-Yahia, S., et al.: Guided exploration of data summaries. Proc. VLDB Endow. 15(9), 1798–1807 (2022)
Zgraggen, E., Zhao, Z., et al.: Investigating the effect of the multiple comparisons problem in visual analysis. In: Proceedings of CHI, p. 479 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chanson, A., Labroche, N., Marcel, P., Perlata, V., Vassiliadis, P. (2025). Interestingness Measures for Exploratory Data Analysis: a Survey. In: Tekli, J., et al. New Trends in Database and Information Systems. ADBIS 2024. Communications in Computer and Information Science, vol 2186. Springer, Cham. https://doi.org/10.1007/978-3-031-70421-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-70421-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70420-8
Online ISBN: 978-3-031-70421-5
eBook Packages: Computer ScienceComputer Science (R0)