Optimum simultaneous discretization with data grid models in supervised classification: a Bayesian model selection approach

Boullé, Marc

doi:10.1007/s11634-009-0038-7

Optimum simultaneous discretization with data grid models in supervised classification: a Bayesian model selection approach

Regular Article
Published: 31 March 2009

Volume 3, pages 39–61, (2009)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Marc Boullé¹

143 Accesses
Explore all metrics

Abstract

In the domain of data preparation for supervised classification, filter methods for variable ranking are time efficient. However, their intrinsic univariate limitation prevents them from detecting redundancies or constructive interactions between variables. This paper introduces a new method to automatically, rapidly and reliably extract the classificatory information of a pair of input variables. It is based on a simultaneous partitioning of the domains of each input variable, into intervals in the numerical case and into groups of categories in the categorical case. The resulting input data grid allows to quantify the joint information between the two input variables and the output variable. The best joint partitioning is searched by maximizing a Bayesian model selection criterion. Intensive experiments demonstrate the benefits of the approach, especially the significant improvement of accuracy for classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assessing variable importance in clustering: a new method based on unsupervised binary decision trees

Article 05 January 2019

A Novel Dynamic Programming Method for Non-parametric Data Discretization

A Comparison of Two Approaches to Discretization: Multiple Scanning and C4.5

References

Abramowitz M, Stegun I (1970) Handbook of mathematical functions. Dover, New York
Google Scholar
Bay S (2001) Multivariate discretization for set mining. Mach Learn 3(4): 491–512
MATH Google Scholar
Berger J (2006) The case of objective Bayesian analysis. Bayesian Anal 1(3): 385–402
MathSciNet Google Scholar
Bernardo J, Smith A (2000) Bayesian theory. Wiley, New York
MATH Google Scholar
Bertier P, Bouroche J (1981) Analyse des données multidimensionnelles. Presses Universitaires de France
Blake C, Merz C (1996) UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html
Boullé M (2004) Khiops: a statistical discretization method of continuous attributes. Mach Learn 55(1): 53–69
Article MATH Google Scholar
Boullé M (2005) A Bayes optimal approach for partitioning the values of categorical attributes. J Mach Learn Res 6: 1431–1452
MathSciNet Google Scholar
Boullé M (2006) MODL: a Bayes optimal discretization method for continuous attributes. Mach Learn 65(1): 131–165
Article Google Scholar
Boullé M (2007) Compression-based averaging of selective naive Bayes classifiers. J Mach Learn Res 8: 1659–1685
MathSciNet Google Scholar
Boullé M (2008) Bivariate data grid models for supervised learning. Technical Report NSM/R&D/ TECH/EASY/TSI/4/MB, France Telecom R&D. http://perso.rd.francetelecom.fr/boulle/publications/BoulleNTTSI4MB08.pdf
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth International, California
MATH Google Scholar
Carr D, Littlefield R, Nicholson W, Littlefield J (1987) Scatterplot matrix techniques for large n. J Am Stat Assoc 82: 424–436
Article MathSciNet Google Scholar
Chapman P, Clinton J, Kerber R, Khabaza T, Reinartz T, Shearer C, Wirth R (2000) CRISP-DM 1.0: step-by-step data mining guide
Cochran W (1954) Some methods for strengthening the common chi-squared tests. Biometrics 10(4): 417–451
Article MATH MathSciNet Google Scholar
Connor-Linton J (2003) Chi square tutorial. http://www.georgetown.edu/faculty/ballc/webtools/web_chi_tut.html
Fayyad U, Irani K (1992) On the handling of continuous-valued attributes in decision tree generation. Mach Learn 8: 87–102
MATH Google Scholar
Goldstein M (2006) Subjective Bayesian analysis: principles and practice. Bayesian Anal 1(3): 403–420
MathSciNet Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3: 1157–1182
Article MATH Google Scholar
Guyon I, Gunn S, Hur AB, Dror G (2006) Design and analysis of the NIPS2003 challenge. In: Guyon I, Gunn S, Nikravesh M, Zadeh L (eds) Feature extraction: foundations and applications, chap 9. Springer, New York, pp 237–263
Google Scholar
Hansen P, Mladenovic N (2001) Variable neighborhood search: principles and applications. Eur J Oper Res 130: 449–467
Article MATH MathSciNet Google Scholar
Holte R (1993) Very simple classification rules perform well on most commonly used datasets. Mach Learn 11: 63–90
Article MATH Google Scholar
Kass G (1980) An exploratory technique for investigating large quantities of categorical data. Appl Stat 29(2): 119–127
Article Google Scholar
Kerber R (1992) ChiMerge discretization of numeric attributes. In: Proceedings of the 10th international conference on artificial intelligence. MIT Press, Cambridge, pp 123–128
Kohavi R, John G (1997) Wrappers for feature selection. Artif Intell 97(1-2): 273–324
Article MATH Google Scholar
Kohavi R, Sahami M (1996) Error-based and entropy-based discretization of continuous features. In: Proceedings of the 2nd international conference on knowledge discovery and data mining. AAAI Press, Menlo Park, pp 114–119
Kononenko I, Bratko I, Roskar E (1984) Experiments in automatic learning of medical diagnostic rules. Technical report, Joseph Stefan Institute, Faculty of Electrical Engineering and Computer Science, Ljubljana
Kurgan L, Cios J (2004) CAIM discretization algorithm. IEEE Trans Knowl Data Eng 16(2): 145–153
Article Google Scholar
Kwedlo W, Kretowski M (1999) An evolutionary algorithm using multivariate discretization for decision rule induction. In: Principles of data mining and knowledge discovery. Lecture notes in computer science, vol 1704. Springer, Berlin, 392–397
Langley P, Iba W, Thompson K (1992) An analysis of Bayesian classifiers. In: 10th National conference on artificial intelligence. AAAI Press, San Jose, pp 223–228
Maass W (1994) Efficient agnostic pac-learning with simple hypothesis. In: COLT ’94: Proceedings of the seventh annual conference on Computational learning theory. ACM Press, New York, pp 67–75
Nadif M, Govaert G (2005) Block clustering of contingency table and mixture model. In: Advances in intelligent data analysis VI. Lecture notes in computer science, vol 3646. Springer, Berlin, pp 249–259
Olszak M, Ritschard G (1995) The behaviour of nominal and ordinal partial association measures. The Statistician 44(2): 195–212
Article Google Scholar
Pyle D (1999) Data preparation for data mining. Morgan Kaufmann, San Francisco
Google Scholar
Quinlan J (1986) Induction of decision trees. Mach Learn 1: 81–106
Google Scholar
Quinlan J (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco
Rissanen J (1978) Modeling by shortest data description. Automatica 14: 465–471
Article MATH Google Scholar
Ritschard G, Nicoloyannis N (2000) Aggregation and association in cross tables. In: PKDD ’00: proceedings of the 4th European conference on principles of data mining and knowledge discovery. Springer, Berlin, pp 593–598
Robert C (1997) The Bayesian choice: a decision-theoretic motivation. Springer, New York
Google Scholar
Saporta G (1990) Probabilités analyse des données et statistique. TECHNIP, Paris
MATH Google Scholar
Shannon C (1948) A mathematical theory of communication. Technical Report 27, Bell systems technical journal
Steck H, Jaakkola T (2004) Predictive discretization during model selection. Pattern Recognit LNCS 3175: 1–8
Google Scholar
Weaver W, Shannon C (1949) The mathematical theory of communication. University of Illinois Press, Urbana
MATH Google Scholar
Zighed D, Rakotomalala R (2000) Graphes d’induction. Hermes, France
Google Scholar
Zighed D, Rabaseda S, Rakotomalala R (1998) Fusinter: a method for discretization of continuous attributes for supervised learning. Int J Uncertain Fuzziness Knowl Based Syst 6(33): 307–326
Article MATH Google Scholar
Zighed D, Ritschard G, Erray W, Scuturici V (2005) Decision trees with optimal joint partitioning. Int J Intell Syst 20(7): 693–718
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Orange Labs, 2, Avenue Pierre Marzin, 22300, Lannion, France
Marc Boullé

Authors

Marc Boullé
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marc Boullé.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boullé, M. Optimum simultaneous discretization with data grid models in supervised classification: a Bayesian model selection approach. Adv Data Anal Classif 3, 39–61 (2009). https://doi.org/10.1007/s11634-009-0038-7

Download citation

Received: 16 January 2008
Revised: 24 February 2009
Accepted: 10 March 2009
Published: 31 March 2009
Issue Date: June 2009
DOI: https://doi.org/10.1007/s11634-009-0038-7

Keywords

Mathematics Subject Classification (2000)

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimum simultaneous discretization with data grid models in supervised classification: a Bayesian model selection approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Assessing variable importance in clustering: a new method based on unsupervised binary decision trees

A Novel Dynamic Programming Method for Non-parametric Data Discretization

A Comparison of Two Approaches to Discretization: Multiple Scanning and C4.5

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Subscribe and save

Buy Now

Navigation

Optimum simultaneous discretization with data grid models in supervised classification: a Bayesian model selection approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Assessing variable importance in clustering: a new method based on unsupervised binary decision trees

A Novel Dynamic Programming Method for Non-parametric Data Discretization

A Comparison of Two Approaches to Discretization: Multiple Scanning and C4.5

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Subscribe and save

Buy Now

Search

Navigation