A Bayesian Approach for Classification Rule Mining in Quantitative Databases

Gay, Dominique; Boullé, Marc

doi:10.1007/978-3-642-33486-3_16

Dominique Gay²¹ &
Marc Boullé²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7524))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

5281 Accesses
8 Citations

Abstract

We suggest a new framework for classification rule mining in quantitative data sets founded on Bayes theory – without univariate preprocessing of attributes. We introduce a space of rule models and a prior distribution defined on this model space. As a result, we obtain the definition of a parameter-free criterion for classification rules. We show that the new criterion identifies interesting classification rules while being highly resilient to spurious patterns. We develop a new parameter-free algorithm to mine locally optimal classification rules efficiently. The mined rules are directly used as new features in a classification process based on a selective naive Bayes classifier. The resulting classifier demonstrates higher inductive performance than state-of-the-art rule-based classifiers.

Download to read the full chapter text

Chapter PDF

A Bayesian Criterion for Evaluating the Robustness of Classification Rules in Binary Data Sets

Data Mining Methods and Applications

HiPaR: Hierarchical Pattern-Aided Regression

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: ACM SIGMOD 1993, pp. 207–216 (1993)
Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml/
Boley, M., Gärtner, T., Grosskreutz, H.: Formal concept sampling for counting and threshold-free local pattern mining. In: SIAM DM 2010, pp. 177–188 (2010)
Google Scholar
Boullé, M.: A bayes optimal approach for partitioning the values of categorical attributes. Journal of Machine Learning Research 6, 1431–1452 (2005)
MATH Google Scholar
Boullé, M.: MODL: A bayes optimal discretization method for continuous attributes. Machine Learning 65(1), 131–165 (2006)
Article Google Scholar
Boullé, M.: Compression-based averaging of selective naive Bayes classifiers. Journal of Machine Learning Research 8, 1659–1685 (2007)
MATH Google Scholar
Bringmann, B., Nijssen, S., Zimmermann, A.: Pattern-based classification: A unifying perspective. In: LeGo 2009 Workshop @ EMCL/PKDD 2009 (2009)
Google Scholar
Cheng, H., Yan, X., Han, J., Hsu, C.W.: Discriminative frequent pattern analysis for effective classification. In: Proceedings ICDE 2007, pp. 716–725 (2007)
Google Scholar
Cohen, W.W.: Fast effective rule induction. In: ICML 1995, pp. 115–123 (1995)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley (2006)
Google Scholar
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: ICML 1998, pp. 144–151 (1998)
Google Scholar
Fürnkranz, J.: Separate-and-conquer rule learning. Artificial Intelligence Revue 13(1), 3–54 (1999)
Article MATH Google Scholar
Gay, D., Selmaoui, N., Boulicaut, J.-F.: Feature Construction Based on Closedness Properties Is Not That Simple. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 112–123. Springer, Heidelberg (2008)
Chapter Google Scholar
Grünwald, P.: The minimum description length principle. MIT Press (2007)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Expl. 11(1), 10–18 (2009)
Article Google Scholar
Jorge, A.M., Azevedo, P.J., Pereira, F.: Distribution Rules with Numeric Attributes of Interest. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 247–258. Springer, Heidelberg (2006)
Chapter Google Scholar
Ke, Y., Cheng, J., Ng, W.: Correlated pattern mining in quantitative databases. ACM Transactions on Database Systems 33(3) (2008)
Google Scholar
Kontonasios, K.N., de Bie, T.: An information-theoretic approach to finding informative noisy tiles in binary databases. In: SIAM DM 2010, pp. 153–164 (2010)
Google Scholar
van Leeuwen, M., Vreeken, J., Siebes, A.: Compression Picks Item Sets That Matter. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 585–592. Springer, Heidelberg (2006)
Chapter Google Scholar
Li, M., Vitányi, P.M.B.: An Introduction to Kolmogorov Complexity and Its Applications. Springer (2008)
Google Scholar
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings KDD 1998, pp. 80–86 (1998)
Google Scholar
Pfahringer, B.: A New MDL Measure for Robust Rule Induction. In: Lavrač, N., Wrobel, S. (eds.) ECML 1995. LNCS, vol. 912, pp. 331–334. Springer, Heidelberg (1995)
Chapter Google Scholar
Quinlan, J.R., Cameron-Jones, R.M.: FOIL: A Midterm Report. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, pp. 3–20. Springer, Heidelberg (1993)
Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal (1948)
Google Scholar
Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: SIGMOD 1996, pp. 1–12 (1996)
Google Scholar
Tatti, N.: Probably the best itemsets. In: KDD 2010, pp. 293–302 (2010)
Google Scholar
Voisine, N., Boullé, M., Hue, C.: A bayes evaluation criterion for decision trees. In: Advances in Knowledge Discovery & Management, pp. 21–38. Springer (2010)
Google Scholar
Wang, J., Karypis, G.: HARMONY : efficiently mining the best rules for classification. In: Proceedings SIAM DM 2005, pp. 34–43 (2005)
Google Scholar
Webb, G.I.: Discovering associations with numeric variables. In: KDD 2001, pp. 383–388 (2001)
Google Scholar
Webb, G.I.: Discovering significant patterns. Machine Learning 68(1), 1–33 (2007)
Article Google Scholar
Yin, X., Han, J.: CPAR : Classification based on predictive association rules. In: Proceedings SIAM DM 2003, pp. 369–376 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Orange Labs, 2, avenue Pierre Marzin, F-22307, Lannion Cedex, France
Dominique Gay & Marc Boullé

Authors

Dominique Gay
View author publications
You can also search for this author in PubMed Google Scholar
Marc Boullé
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road, BS8 1UB, Bristol, UK
Peter A. Flach
Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road,, BS8 1UB, Bristol, UK
Tijl De Bie & Nello Cristianini &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gay, D., Boullé, M. (2012). A Bayesian Approach for Classification Rule Mining in Quantitative Databases. In: Flach, P.A., De Bie, T., Cristianini, N. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2012. Lecture Notes in Computer Science(), vol 7524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33486-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-33486-3_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33485-6
Online ISBN: 978-3-642-33486-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Bayesian Approach for Classification Rule Mining in Quantitative Databases

Abstract

Chapter PDF

Similar content being viewed by others

A Bayesian Criterion for Evaluating the Robustness of Classification Rules in Binary Data Sets

Data Mining Methods and Applications

HiPaR: Hierarchical Pattern-Aided Regression

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Bayesian Approach for Classification Rule Mining in Quantitative Databases

Abstract

Chapter PDF

Similar content being viewed by others

A Bayesian Criterion for Evaluating the Robustness of Classification Rules in Binary Data Sets

Data Mining Methods and Applications

HiPaR: Hierarchical Pattern-Aided Regression

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation