Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Maximal exceptions with minimal descriptions

Published: 01 September 2010 Publication History

Abstract

We introduce a new approach to Exceptional Model Mining. Our algorithm, called EMDM, is an iterative method that alternates between Exception Maximisation and Description Minimisation. As a result, it finds maximally exceptional models with minimal descriptions. Exceptional Model Mining was recently introduced by Leman et al. (Exceptional model mining 1---16, 2008) as a generalisation of Subgroup Discovery. Instead of considering a single target attribute, it allows for multiple `model' attributes on which models are fitted. If the model for a subgroup is substantially different from the model for the complete database, it is regarded as an exceptional model. To measure exceptionality, we propose two information-theoretic measures. One is based on the Kullback---Leibler divergence, the other on Krimp. We show how compression can be used for exception maximisation with these measures, and how classification can be used for description minimisation. Experiments show that our approach efficiently identifies subgroups that are both exceptional and interesting.

References

[1]
Andritsos P, Tsaparas P, Miller RJ, Sevcik KC (2004) LIMBO: scalable clustering of categorical data. In: Proceedings of the EDBT, pp 124-146.
[2]
Asuncion A, Newman DJ (2007) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/ MLRepository.html.
[3]
Cohen WW (1995) Fast effective rule induction. In: Proceedings of the ICML'95, pp 115-123.
[4]
Garriga GC, Heikinheimo H, Seppänen JK (2007) Cross-mining binary and numerical attributes. In: Proceedings of the ICDM'07, pp 481-486.
[5]
Heikinheimo H, Fortelius M, Eronen J, Mannila H (2007) Biogeography of european land mammals shows environmentally distinct and spatially coherent clusters. J Biogeogr 34(6):1053-1064.
[6]
Klösgen W (2002) Subgroup discovery chapter 16.3. Oxford University Press, Oxford.
[7]
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79-86.
[8]
Leeuwen M, Vreeken J, Siebes A (2006) Compression picks the item sets that matter. In: Proceedings of the ECML PKDD'06 pp 585-592.
[9]
Leeuwen M, Bonchi F, Sigurbjörnsson B, Siebes A (2009) Compressing tags to find interesting media groups. In: Proceedings of the CIKM'09, pp 1147-1156.
[10]
Leman D, Feelders A, Knobbe A (2008) Exceptional model mining. In: Proceedings of the ECML/ PKDD'08, 2:1-16.
[11]
Mitchell-Jones AJ, Amori G, Bogdanowicz W, Krystufek B, Reijnders PJH, Spitzenberger F, Stubbe M, Thissen JBM, Vohralik V, Zima J (1999) The atlas of european mammals. Academic Press, London.
[12]
Rissanen J (1978) Modeling by shortest data description. Automatica 14(1):465-471.
[13]
Siebes A, Vreeken J, van Leeuwen M (2006) Item sets that compress. In: Proceedings of the SDM'06, pp 393-404.
[14]
Slonim N, Tishby N (1999) Agglomerative information bottleneck. In: Proceedings of the NIPS'99, pp 617-623.
[15]
Tsoumakas G, Vilcek J, Spyromitros L (2010) MULAN: a java library for multi-label learning. http://mulan. sourceforge.net/
[16]
Umek L, Zupan B, Toplak M, Morin A, Chauchat J-H, Makovec G, Smrke D (2009) Subgroup discovery in data sets with multi-dimensional responses: A method and a case study in traumatology. In: Proceedings of AIME'09, pp 265-274.
[17]
Warner HR, Toronto AF, Veasey LR, Stephenson R (1961) A mathematical model for medical diagnosis, application to congenital heart disease. J Am Med Assoc 177:177-184.
[18]
Witten IH, Frank Eibe (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco.

Cited By

View all

Index Terms

  1. Maximal exceptions with minimal descriptions
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Data Mining and Knowledge Discovery
      Data Mining and Knowledge Discovery  Volume 21, Issue 2
      September 2010
      123 pages

      Publisher

      Kluwer Academic Publishers

      United States

      Publication History

      Published: 01 September 2010

      Author Tags

      1. Exceptional Model Mining
      2. Information theory
      3. Subgroup Discovery

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 30 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)ATENA-PRO: Generating Personalized Exploration Notebooks with Constrained Reinforcement LearningCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589727(167-170)Online publication date: 4-Jun-2023
      • (2022)FEDEXProceedings of the VLDB Endowment10.14778/3565838.356584115:13(3854-3868)Online publication date: 1-Sep-2022
      • (2022)Robust subgroup discoveryData Mining and Knowledge Discovery10.1007/s10618-022-00856-x36:5(1885-1970)Online publication date: 1-Sep-2022
      • (2020)ExplainEDProceedings of the VLDB Endowment10.14778/3415478.341550813:12(2917-2920)Online publication date: 14-Sep-2020
      • (2020)Automatically Generating Data Exploration Sessions Using Deep Reinforcement LearningProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389779(1527-1537)Online publication date: 11-Jun-2020
      • (2020)Discovering Outstanding Subgroup Lists for Numeric Targets Using MDLMachine Learning and Knowledge Discovery in Databases10.1007/978-3-030-67658-2_2(19-35)Online publication date: 14-Sep-2020
      • (2018)Mining exceptional closed patterns in attributed graphsKnowledge and Information Systems10.1007/s10115-017-1109-256:1(1-25)Online publication date: 1-Jul-2018
      • (2016)Mining Subgroups with Exceptional Transition BehaviorProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2939672.2939752(965-974)Online publication date: 13-Aug-2016
      • (2016)Exceptional Model MiningData Mining and Knowledge Discovery10.1007/s10618-015-0403-430:1(47-98)Online publication date: 1-Jan-2016
      • (2016)Mining exceptional relationships with grammar-guided genetic programmingKnowledge and Information Systems10.1007/s10115-015-0859-y47:3(571-594)Online publication date: 1-Jun-2016
      • Show More Cited By

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media