Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Interestingness Measures for Exploratory Data Analysis: a Survey

  • Conference paper
  • First Online:
New Trends in Database and Information Systems (ADBIS 2024)

Abstract

Exploratory Data Analysis (EDA) is the tedious activity of interactively analyzing a dataset to extract insights. Many approaches aiming at supporting EDA were recently proposed. They all rely on interestingness measures to score the importance of insights. This paper surveys and categorizes the different interestingness measures proposed in the literature for approaches aiming at automating EDA. The lessons learned from this survey allow to point out promising research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    [13] proposes peculiarity, surprise, diversity and presentation (some of them with different names) and [15, 20] propose relevance, novelty, peculiarity and surprise.

  2. 2.

    Called surprise in [12], but reclassified here since it does not refer to a user’s belief.

References

  1. Abuzaid, F., Kraft, P., et al.: DIFF: a relational interface for large-scale data explanation. VLDB J. 30(1), 45–70 (2021)

    Article  Google Scholar 

  2. Amer-Yahia, S., Marcel, P., et al.: Data narration for the people: challenges and opportunities. In: EDBT, pp. 855–858. OpenProceedings.org (2023)

    Google Scholar 

  3. Bie, T.D., Raedt, L.D., et al.: Automating data science. Commun. ACM 65(3), 76–87 (2022)

    Article  Google Scholar 

  4. Chanson, A., Crulis, B., et al.: Profiling user belief in BI exploration for measuring subjective interestingness. In: DOLAP, CEUR Proceedings, vol. 2324 (2019)

    Google Scholar 

  5. Chanson, A., Labroche, N., et al.: Automatic generation of comparison notebooks for interactive data exploration. In: EDBT, pp. 2:274–2:284 (2022)

    Google Scholar 

  6. Dadvar, V., Golab, L., et al.: Exploring data using patterns: a survey. Inf. Syst. 108, 101985 (2022)

    Article  Google Scholar 

  7. De Bie, T.: Subjective interestingness in exploratory data mining. IDA 8207, 19–31 (2013)

    Google Scholar 

  8. Ding, R., Han, S., et al.: QuickInsights: quick and automatic discovery of insights from multi-dimensional data. In: Proceedings of SIGMOD, pp. 317–332 (2019)

    Google Scholar 

  9. El, O.B., Milo, T., et al.: ATENA: an autonomous system for data exploration based on deep reinforcement learning. In: CIKM, pp. 2873–2876 (2019)

    Google Scholar 

  10. El, O.B., Milo, T., et al.: Automatically generating data exploration sessions using deep reinforcement learning. In: SIGMOD, pp. 1527–1537 (2020)

    Google Scholar 

  11. Francia, M., Golfarelli, M., et al.: Assess queries for interactive analysis of data cubes. In: EDBT (2021)

    Google Scholar 

  12. Francia, M., Marcel, P., et al.: Enhancing cubes with models to describe multidimensional data. Inf. Syst. Front. 24(1) (2021)

    Google Scholar 

  13. Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. 38(3), 9 (2006)

    Article  Google Scholar 

  14. Gkesoulis, D., Vassiliadis, P., et al.: CineCubes: aiding data workers gain insights from OLAP queries. Inf. Syst. 53, 60–86 (2015)

    Article  Google Scholar 

  15. Gkitsakis, D., Kaloudis, S., et al.: Cube query interestingness: novelty, relevance, peculiarity and surprise. Inf. Syst. 123, 102381 (2024)

    Article  Google Scholar 

  16. Idreos, S., Papaemmanouil, O., et al.: Overview of data exploration techniques. In: SIGMOD, pp. 277–281. ACM (2015)

    Google Scholar 

  17. Kaminskas, M., Bridge, D.: Diversity, serendipity, novelty, and coverage: a survey and empirical analysis of beyond-accuracy objectives in recommender systems. TiiS 7(1), 2:1–2:42 (2017)

    Google Scholar 

  18. Ma, P., Ding, R., et al.: MetaInsight: automatic discovery of structured knowledge for exploratory data analysis. In: Proceedings of SIGMOD, pp. 1262–1274 (2021)

    Google Scholar 

  19. Ma, P., Ding, R., et al.: XInsight: explainable data analysis through the lens of causality. Proc. ACM Manag. Data 1(2) (2023)

    Google Scholar 

  20. Marcel, P., Peralta, V., Vassiliadis, P.: A framework for learning cell interestingness from cube explorations. In: Welzer, T., Eder, J., Podgorelec, V., Kamišalić Latifić, A. (eds.) ADBIS 2019. LNCS, vol. 11695, pp. 425–440. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28730-6_26

    Chapter  Google Scholar 

  21. Milo, T., Somech, A.: Automating exploratory data analysis via machine learning: an overview. In: SIGMOD (2020)

    Google Scholar 

  22. Patil, Y., Amer-Yahia, S., et al.: Designing the evaluation of operator-enabled interactive data exploration in VALIDE. In: HILDA@SIGMOD, pp. 4:1–4:7 (2022)

    Google Scholar 

  23. Personnaz, A., Amer-Yahia, S., et al.: DORA THE EXPLORER: exploring very large data with interactive deep reinforcement learning. In: CIKM (2021)

    Google Scholar 

  24. Razmadze, K., Amsterdamer, Y., et al.: SubTab: data exploration with informative sub-tables. In: SIGMOD, pp. 2369–2372 (2022)

    Google Scholar 

  25. Sarawagi, S.: Explaining differences in multidimensional aggregates. In: Proceedings VLDB, pp. 42–53 (1999)

    Google Scholar 

  26. Sarawagi, S.: User-adaptive exploration of multidimensional data. In: VLDB, pp. 307–316 (2000)

    Google Scholar 

  27. Sarawagi, S., Agrawal, R., Megiddo, N.: Discovery-driven exploration of OLAP data cubes. In: Schek, H.-J., Alonso, G., Saltor, F., Ramos, I. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 168–182. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0100984

    Chapter  Google Scholar 

  28. Sathe, G., Sarawagi, S.: Intelligent rollups in multidimensional OLAP data. In: Proceedings VLDB, pp. 531–540 (2001)

    Google Scholar 

  29. Shi, D., Xu, X., et al.: Calliope: automatic visual data story generation from a spreadsheet. TVCG 27(2), 453–463 (2021)

    Google Scholar 

  30. Siddiqui, T., Chaudhuri, S., et al.: COMPARE: accelerating groupwise comparison in relational databases for data analytics. In: VLDB, vol. 14, no. 11, pp. 2419–2431 (2021)

    Google Scholar 

  31. Sintos, S., Agarwal, P.K., et al.: Selecting data to clean for fact checking: minimizing uncertainty vs. maximizing surprise. Proc. VLDB Endow. 12(13), 2408–2421 (2019)

    Google Scholar 

  32. Somech, A., Milo, T., et al.: Predicting “what is interesting” by mining interactive-data-analysis session logs. In: EDBT (2019)

    Google Scholar 

  33. Tang, B., Han, S., et al.: Extracting top-k insights from multi-dimensional data. In: SIGMOD (2017)

    Google Scholar 

  34. Tukey, J.W.: Exploratory Data Analysis. Addison-Wesley (1977)

    Google Scholar 

  35. Wang, Y., Sun, Z., et al.: DataShot: automatic generation of fact sheets from tabular data. TVCG 26(1), 895–905 (2020)

    Google Scholar 

  36. Youngmann, B., Amer-Yahia, S., et al.: Guided exploration of data summaries. Proc. VLDB Endow. 15(9), 1798–1807 (2022)

    Article  Google Scholar 

  37. Zgraggen, E., Zhao, Z., et al.: Investigating the effect of the multiple comparisons problem in visual analysis. In: Proceedings of CHI, p. 479 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Marcel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chanson, A., Labroche, N., Marcel, P., Perlata, V., Vassiliadis, P. (2025). Interestingness Measures for Exploratory Data Analysis: a Survey. In: Tekli, J., et al. New Trends in Database and Information Systems. ADBIS 2024. Communications in Computer and Information Science, vol 2186. Springer, Cham. https://doi.org/10.1007/978-3-031-70421-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70421-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70420-8

  • Online ISBN: 978-3-031-70421-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics