Article

On the discovery of significant statistical quantitative rules

Authors:

Balaji Padmanabhan,

Alexander TuzhilinAuthors Info & Claims

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 374 - 383

https://doi.org/10.1145/1014052.1014094

Published: 22 August 2004 Publication History

Abstract

In this paper we study market share rules, rules that have a certain market share statistic associated with them. Such rules are particularly relevant for decision making from a business perspective. Motivated by market share rules, in this paper we consider statistical quantitative rules (SQ rules) that are quantitative rules in which the RHS can be any statistic that is computed for the segment satisfying the LHS of the rule. Building on prior work, we present a statistical approach for learning all significant SQ rules, i.e., SQ rules for which a desired statistic lies outside a confidence interval computed for this rule. In particular we show how resampling techniques can be effectively used to learn significant rules. Since our method considers the significance of a large number of rules in parallel, it is susceptible to learning a certain number of "false" rules. To address this, we present a technique that can determine the number of significant SQ rules that can be expected by chance alone, and suggest that this number can be used to determine a "false discovery rate" for the learning procedure. We apply our methods to online consumer purchase data and report the results.

References

[1]

Agrawal, R. and Srikant, R., Fast Algorithms for Mining Association Rules, in Proceedings of the 20th International Conference on Very Large Databases, Santiago, Chile, 1994.

Digital Library

[2]

Aumann, Y. and Lindell, Y., A Statistical Theory for Quantitative Association Rules, in Proceedings of The Fifth ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, pp. 261--270, San Diego, CA, 1999.

Digital Library

[3]

Bay, S. D. and Pazzani, M. J., Detecting Change in Categorical Data: Mining Contrast Sets, in Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 302 -- 306, San Diego, CA, 1999.

Digital Library

[4]

Benjamini, Y. and Hochberg, Y., Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing, Journal of Royal Statistical Society B, vol. 57, iss. 1, pp. 289--300, 1995.

[5]

Bolton, R. and Adams, N., An Iterative Hypothesis-Testing Strategy for Pattern Discovery, in Proceedings of the Ninth ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, pp. 49--58, Washington, DC, 2003.

Digital Library

[6]

Bolton, R. J. and Hand, D. J., Significance Tests for Patterns in Continuous Data, in Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 67--74, San Jose, CA, 2001.

Digital Library

[7]

Bolton, R. J., Hand, D. J., and Adams, N. M., Determining Hit Rate in Pattern Search, in Pattern Detection and Discovery, ESF Exploratory Workshop, pp. 36--48, London, UK, 2002.

Digital Library

[8]

Brijs, T., Swinnen, G., Vanhoof, K., and Wets, G., Using Association Rules for Product Assortment: Decisions Case Study, in Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 254--260, San Diego, CA, 1999.

Digital Library

[9]

Brin, S., Motwani, R., and Silverstein, C., Beyond Market Baskets: Generalizing Association Rules to Correlations, in Proceedings of the ACM SIGMOD/PODS '97 Joint Conference, pp. 265--276, Tucson, AZ, 1997.

Digital Library

[10]

Brin, S., Motwani, R., Ullman, J. D., and Tsur, S., Dynamic Itemset Counting and Implication Rules for Market Basket Data, in Proceedings ACM SIGMOD International Conference on Management of Data (SIGMOD'97), pp. 255--264, Tucson, AZ, 1997.

Digital Library

[11]

Clark, P. and Niblett, T., The Cn2 Induction Algorithm, Machine Learning, vol. 3, pp. 261--283, 1989.

Digital Library

[12]

Clearwater, S. and Provost, F., Rl4: A Tool for Knowledge-Based Induction, in Procs. of the Second International IEEE Conference on Tools for Artificial Intelligence, pp. 24--30, 1990.

[13]

Efron, B. and Tibshirani, R. J., An Introduction to the Bootstrap. New York, NY: Chapman & Hall, 1993.

[14]

Frank, E. and Witten, I. H., Using a Permutation Test for Attribute Selection in Decision Trees, in Proceedings of 15th Int'l Conference on Machine Learning, pp. 152--160, 1998.

Digital Library

[15]

Fukuda, T., Morimoto, Y., Morishita, S., and Tokuyama, T., Data Mining Using Two-Dimensional Optimized Association Rules: Scheme, Algorithms and Visualization, in Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD'96), pp. 13--23, Montreal, Quebec, Canada, 1996.

Digital Library

[16]

Good, P., Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses - 2nd Edition. New York: Springer, 2000.

[17]

Hsu, J. C., Multiple Comparisons - Theory and Methods. London, UK: Chapman & Hall, 1996.

[18]

Jensen, D., Knowledge Discovery through Induction with Randomization Testing, in Proceedings of the 1991 Knowledge Discovery in Databases Workshop, pp. 148--159, Menlo Park, 1991.

[19]

Jensen, D. and Cohen, P. R., Multiple Comparisons in Induction Algorithms, Machine Learning, vol. 38, pp. 309--338, 2000.

Digital Library

[20]

Kohavi, R., A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection, in Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1137--1143, San Mateo, CA, 1995.

Digital Library

[21]

Lee, Y., Buchanan, B. G., and Aronis, J. M., Knowledge-Based Learning in Exploratory Science: Learning Rules to Predict Rodent Carcinogenicity, Machine Learning, vol. 30, pp. 217--240, 1998.

Digital Library

[22]

Ling, C. X. and Li, C., Data Mining for Direct Marketing: Problems and Solutions, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 73--79, New York, NY, 1998.

[23]

Liu, B., Hsu, W., and Ma, Y., Identifying Non-Actionable Association Rules, in Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 329--334, San Francisco, CA, 2001.

Digital Library

[24]

Mani, D. R., Drew, J., Betz, A., and Datta, P., Statistics and Data Mining Techniques for Lifetime Value Modeling, in Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 94--103, San Diego, CA, 1999.

Digital Library

[25]

Megiddo, N. and Srikant, R., Discovering Predictive Association Rules, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 274--278, New York, NY, 1998.

Digital Library

[26]

Oates, T. and Jensen, D., Large Datasets Lead to Overly Complex Models: An Explanation and a Solution, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 294--298, Menlo Park, CA, 1998.

[27]

Padmanabhan, B. and Tuzhilin, A., A Belief-Driven Method for Discovering Unexpected Patterns, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 94--100, New York, NY, 1998.

Digital Library

[28]

Padmanabhan, B. and Tuzhilin, A., Small Is Beautiful: Discovering the Minimal Set of Unexpected Patterns, in Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 54--63, Boston, MA, 2000.

Digital Library

[29]

Piatesky-Shapiro, G., Discovery, Analysis, and Presentation of Strong Rules, in Knowledge Discovery in Databases, Piatesky-Shapiro, G. and Frawley, W. J., Eds. Menlo Park, CA: AAAI/MIT Press, pp. 229-248, 1991.

Digital Library

[30]

Sarawagi, S., Agrawal, R., and Megiddo, N., Discovery-Driven Exploration of Olap Data Cubes, in Proceedings of the Sixth International Conference on Extending Database Technology (EDBT'98), pp. 168--182, Valencia, Spain, 1998.

Digital Library

[31]

Srikant, R. and Agrawal, R., Mining Quantitative Association Rules in Large Relational Tables, in Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, 1996.

Digital Library

[32]

Webb, G., Butler, S., and Newlands, D., On Detecting Differences between Groups, in Proceedings of the Ninth ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, pp. 256--265, Washington, DC, 2003.

Digital Library

[33]

Webb, G. I., Discovering Associations with Numeric Variables, in Proceedings of The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, 2001.

Digital Library

[34]

Westfall, P. H. and Young, S. S., Resampling-Based Multiple Testing - Examples and Methods for P-Value Adjustment. New York, NY: John Wiley & Sons, Inc, 1993.

[35]

Wong, W.-K., Moore, A., Cooper, G., and Wagner, M., Rule-Based Anomaly Pattern Detection for Detecting Disease Outbreaks, in Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI-2002), Edmonton, Canada, 2002.

Digital Library

[36]

Wong, W.-K., Moore, A., Cooper, G., and Wagner, M., Bayesian Network Anomaly Pattern Detection for Disease Outbreaks, in Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington, DC, 2003.

Cited By

Cecconi ADe Giacomo GDi Ciccio CMaggi FMendling J(2022)Measuring the interestingness of temporal logic behavioral specifications in process miningInformation Systems10.1016/j.is.2021.101920107:COnline publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1016/j.is.2021.101920
Zhang TFeng HChen WChen ZZheng WLuo XHuang WTung A(2021)ChartNavigator: An Interactive Pattern Identification and Annotation Framework for ChartsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3094236(1-1)Online publication date: 2021
https://doi.org/10.1109/TKDE.2021.3094236
Henriques RMadeira S(2018)BSigData Mining and Knowledge Discovery10.1007/s10618-017-0521-232:1(124-161)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1007/s10618-017-0521-2
Show More Cited By

Index Terms

On the discovery of significant statistical quantitative rules
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Discovering significant rules
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

In many applications, association rules will only be interesting if they represent non-trivial correlations between all constituent items. Numerous techniques have been developed that seek to avoid false discoveries. However, while all provide useful ...
Privacy-preserving statistical quantitative rules mining
InfoScale '07: Proceedings of the 2nd international conference on Scalable information systems

This paper considers the problem of mining Statistical Quantitative rules (SQ rules) without revealing the private information of parties who compute jointly and share distributed data. Based on several basic tools for Privacy-Preserving Data Mining (...
On Optimal Rule Discovery

In machine learning and data mining, heuristic and association rules are two dominant schemes for rule discovery. Heuristic rule discovery usually produces a small set of accurate rules, but fails to find many globally optimal rules. Association rule ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

August 2004

874 pages

ISBN:1581138881

DOI:10.1145/1014052

General Chairs:
Won Kim
Cyber Database Solutions
,
Ronny Kohavi
Amazon.com
,
Program Chairs:
Johannes Gehrke
Cornell University
,
William DuMouchel
AT&T Labs Research

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD04

Sponsor:

KDD04: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 22 - 25, 2004

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

45
Total Citations
View Citations
745
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cecconi ADe Giacomo GDi Ciccio CMaggi FMendling J(2022)Measuring the interestingness of temporal logic behavioral specifications in process miningInformation Systems10.1016/j.is.2021.101920107:COnline publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1016/j.is.2021.101920
Zhang TFeng HChen WChen ZZheng WLuo XHuang WTung A(2021)ChartNavigator: An Interactive Pattern Identification and Annotation Framework for ChartsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3094236(1-1)Online publication date: 2021
https://doi.org/10.1109/TKDE.2021.3094236
Henriques RMadeira S(2018)BSigData Mining and Knowledge Discovery10.1007/s10618-017-0521-232:1(124-161)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1007/s10618-017-0521-2
Sethi RShekar B(2018)Subjective Interestingness in Association Rule Mining: A Theoretical AnalysisDigital Business10.1007/978-3-319-93940-7_15(375-389)Online publication date: 27-Jul-2018
https://doi.org/10.1007/978-3-319-93940-7_15
Gao CYao Y(2017)Actionable Strategies in Three-Way Decisions with Rough SetsRough Sets10.1007/978-3-319-60840-2_13(183-199)Online publication date: 22-Jun-2017
https://doi.org/10.1007/978-3-319-60840-2_13
Zhang AShi WWebb G(2016)Mining significant association rules from uncertain dataData Mining and Knowledge Discovery10.1007/s10618-015-0446-630:4(928-963)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.1007/s10618-015-0446-6
Sahoo JDas AGoswami A(2015)An efficient approach for mining association rules from high utility itemsetsExpert Systems with Applications: An International Journal10.1016/j.eswa.2015.02.05142:13(5754-5778)Online publication date: 1-Aug-2015
https://dl.acm.org/doi/10.1016/j.eswa.2015.02.051
Cunjin XWanjiao SLijuan QQing DXiaoyang W(2015)A mutual-information-based mining method for marine abnormal association rulesComputers & Geosciences10.1016/j.cageo.2014.12.00176:C(121-129)Online publication date: 1-Mar-2015
https://dl.acm.org/doi/10.1016/j.cageo.2014.12.001
Shaharanee IJamil J(2015)A Framework for Interestingness Measures for Association Rules with Discrete and Continuous Attributes Based on Statistical ValidityArtificial Intelligence in Theory and Practice IV10.1007/978-3-319-25261-2_11(119-128)Online publication date: 25-Sep-2015
https://doi.org/10.1007/978-3-319-25261-2_11
Sengstock CGertz M(2013)Spatial Itemset MiningProceedings of the 17th East European Conference on Advances in Databases and Information Systems - Volume 813310.1007/978-3-642-40683-6_12(148-161)Online publication date: 1-Sep-2013
https://dl.acm.org/doi/10.1007/978-3-642-40683-6_12
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents