Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1014052.1014094acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

On the discovery of significant statistical quantitative rules

Published: 22 August 2004 Publication History

Abstract

In this paper we study market share rules, rules that have a certain market share statistic associated with them. Such rules are particularly relevant for decision making from a business perspective. Motivated by market share rules, in this paper we consider statistical quantitative rules (SQ rules) that are quantitative rules in which the RHS can be any statistic that is computed for the segment satisfying the LHS of the rule. Building on prior work, we present a statistical approach for learning all significant SQ rules, i.e., SQ rules for which a desired statistic lies outside a confidence interval computed for this rule. In particular we show how resampling techniques can be effectively used to learn significant rules. Since our method considers the significance of a large number of rules in parallel, it is susceptible to learning a certain number of "false" rules. To address this, we present a technique that can determine the number of significant SQ rules that can be expected by chance alone, and suggest that this number can be used to determine a "false discovery rate" for the learning procedure. We apply our methods to online consumer purchase data and report the results.

References

[1]
Agrawal, R. and Srikant, R., Fast Algorithms for Mining Association Rules, in Proceedings of the 20th International Conference on Very Large Databases, Santiago, Chile, 1994.
[2]
Aumann, Y. and Lindell, Y., A Statistical Theory for Quantitative Association Rules, in Proceedings of The Fifth ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, pp. 261--270, San Diego, CA, 1999.
[3]
Bay, S. D. and Pazzani, M. J., Detecting Change in Categorical Data: Mining Contrast Sets, in Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 302 -- 306, San Diego, CA, 1999.
[4]
Benjamini, Y. and Hochberg, Y., Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing, Journal of Royal Statistical Society B, vol. 57, iss. 1, pp. 289--300, 1995.
[5]
Bolton, R. and Adams, N., An Iterative Hypothesis-Testing Strategy for Pattern Discovery, in Proceedings of the Ninth ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, pp. 49--58, Washington, DC, 2003.
[6]
Bolton, R. J. and Hand, D. J., Significance Tests for Patterns in Continuous Data, in Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 67--74, San Jose, CA, 2001.
[7]
Bolton, R. J., Hand, D. J., and Adams, N. M., Determining Hit Rate in Pattern Search, in Pattern Detection and Discovery, ESF Exploratory Workshop, pp. 36--48, London, UK, 2002.
[8]
Brijs, T., Swinnen, G., Vanhoof, K., and Wets, G., Using Association Rules for Product Assortment: Decisions Case Study, in Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 254--260, San Diego, CA, 1999.
[9]
Brin, S., Motwani, R., and Silverstein, C., Beyond Market Baskets: Generalizing Association Rules to Correlations, in Proceedings of the ACM SIGMOD/PODS '97 Joint Conference, pp. 265--276, Tucson, AZ, 1997.
[10]
Brin, S., Motwani, R., Ullman, J. D., and Tsur, S., Dynamic Itemset Counting and Implication Rules for Market Basket Data, in Proceedings ACM SIGMOD International Conference on Management of Data (SIGMOD'97), pp. 255--264, Tucson, AZ, 1997.
[11]
Clark, P. and Niblett, T., The Cn2 Induction Algorithm, Machine Learning, vol. 3, pp. 261--283, 1989.
[12]
Clearwater, S. and Provost, F., Rl4: A Tool for Knowledge-Based Induction, in Procs. of the Second International IEEE Conference on Tools for Artificial Intelligence, pp. 24--30, 1990.
[13]
Efron, B. and Tibshirani, R. J., An Introduction to the Bootstrap. New York, NY: Chapman & Hall, 1993.
[14]
Frank, E. and Witten, I. H., Using a Permutation Test for Attribute Selection in Decision Trees, in Proceedings of 15th Int'l Conference on Machine Learning, pp. 152--160, 1998.
[15]
Fukuda, T., Morimoto, Y., Morishita, S., and Tokuyama, T., Data Mining Using Two-Dimensional Optimized Association Rules: Scheme, Algorithms and Visualization, in Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD'96), pp. 13--23, Montreal, Quebec, Canada, 1996.
[16]
Good, P., Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses - 2nd Edition. New York: Springer, 2000.
[17]
Hsu, J. C., Multiple Comparisons - Theory and Methods. London, UK: Chapman & Hall, 1996.
[18]
Jensen, D., Knowledge Discovery through Induction with Randomization Testing, in Proceedings of the 1991 Knowledge Discovery in Databases Workshop, pp. 148--159, Menlo Park, 1991.
[19]
Jensen, D. and Cohen, P. R., Multiple Comparisons in Induction Algorithms, Machine Learning, vol. 38, pp. 309--338, 2000.
[20]
Kohavi, R., A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection, in Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1137--1143, San Mateo, CA, 1995.
[21]
Lee, Y., Buchanan, B. G., and Aronis, J. M., Knowledge-Based Learning in Exploratory Science: Learning Rules to Predict Rodent Carcinogenicity, Machine Learning, vol. 30, pp. 217--240, 1998.
[22]
Ling, C. X. and Li, C., Data Mining for Direct Marketing: Problems and Solutions, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 73--79, New York, NY, 1998.
[23]
Liu, B., Hsu, W., and Ma, Y., Identifying Non-Actionable Association Rules, in Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 329--334, San Francisco, CA, 2001.
[24]
Mani, D. R., Drew, J., Betz, A., and Datta, P., Statistics and Data Mining Techniques for Lifetime Value Modeling, in Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 94--103, San Diego, CA, 1999.
[25]
Megiddo, N. and Srikant, R., Discovering Predictive Association Rules, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 274--278, New York, NY, 1998.
[26]
Oates, T. and Jensen, D., Large Datasets Lead to Overly Complex Models: An Explanation and a Solution, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 294--298, Menlo Park, CA, 1998.
[27]
Padmanabhan, B. and Tuzhilin, A., A Belief-Driven Method for Discovering Unexpected Patterns, in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 94--100, New York, NY, 1998.
[28]
Padmanabhan, B. and Tuzhilin, A., Small Is Beautiful: Discovering the Minimal Set of Unexpected Patterns, in Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 54--63, Boston, MA, 2000.
[29]
Piatesky-Shapiro, G., Discovery, Analysis, and Presentation of Strong Rules, in Knowledge Discovery in Databases, Piatesky-Shapiro, G. and Frawley, W. J., Eds. Menlo Park, CA: AAAI/MIT Press, pp. 229-248, 1991.
[30]
Sarawagi, S., Agrawal, R., and Megiddo, N., Discovery-Driven Exploration of Olap Data Cubes, in Proceedings of the Sixth International Conference on Extending Database Technology (EDBT'98), pp. 168--182, Valencia, Spain, 1998.
[31]
Srikant, R. and Agrawal, R., Mining Quantitative Association Rules in Large Relational Tables, in Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, 1996.
[32]
Webb, G., Butler, S., and Newlands, D., On Detecting Differences between Groups, in Proceedings of the Ninth ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, pp. 256--265, Washington, DC, 2003.
[33]
Webb, G. I., Discovering Associations with Numeric Variables, in Proceedings of The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, 2001.
[34]
Westfall, P. H. and Young, S. S., Resampling-Based Multiple Testing - Examples and Methods for P-Value Adjustment. New York, NY: John Wiley & Sons, Inc, 1993.
[35]
Wong, W.-K., Moore, A., Cooper, G., and Wagner, M., Rule-Based Anomaly Pattern Detection for Detecting Disease Outbreaks, in Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI-2002), Edmonton, Canada, 2002.
[36]
Wong, W.-K., Moore, A., Cooper, G., and Wagner, M., Bayesian Network Anomaly Pattern Detection for Disease Outbreaks, in Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington, DC, 2003.

Cited By

View all
  • (2022)Measuring the interestingness of temporal logic behavioral specifications in process miningInformation Systems10.1016/j.is.2021.101920107:COnline publication date: 1-Jul-2022
  • (2021)ChartNavigator: An Interactive Pattern Identification and Annotation Framework for ChartsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3094236(1-1)Online publication date: 2021
  • (2018)BSigData Mining and Knowledge Discovery10.1007/s10618-017-0521-232:1(124-161)Online publication date: 1-Jan-2018
  • Show More Cited By

Index Terms

  1. On the discovery of significant statistical quantitative rules

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2004
    874 pages
    ISBN:1581138881
    DOI:10.1145/1014052
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 August 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. market share rules
    2. nonparametric methods
    3. resampling
    4. rule discovery
    5. statistical quantitative rules

    Qualifiers

    • Article

    Conference

    KDD04

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 01 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Measuring the interestingness of temporal logic behavioral specifications in process miningInformation Systems10.1016/j.is.2021.101920107:COnline publication date: 1-Jul-2022
    • (2021)ChartNavigator: An Interactive Pattern Identification and Annotation Framework for ChartsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3094236(1-1)Online publication date: 2021
    • (2018)BSigData Mining and Knowledge Discovery10.1007/s10618-017-0521-232:1(124-161)Online publication date: 1-Jan-2018
    • (2018)Subjective Interestingness in Association Rule Mining: A Theoretical AnalysisDigital Business10.1007/978-3-319-93940-7_15(375-389)Online publication date: 27-Jul-2018
    • (2017)Actionable Strategies in Three-Way Decisions with Rough SetsRough Sets10.1007/978-3-319-60840-2_13(183-199)Online publication date: 22-Jun-2017
    • (2016)Mining significant association rules from uncertain dataData Mining and Knowledge Discovery10.1007/s10618-015-0446-630:4(928-963)Online publication date: 1-Jul-2016
    • (2015)An efficient approach for mining association rules from high utility itemsetsExpert Systems with Applications: An International Journal10.1016/j.eswa.2015.02.05142:13(5754-5778)Online publication date: 1-Aug-2015
    • (2015)A mutual-information-based mining method for marine abnormal association rulesComputers & Geosciences10.1016/j.cageo.2014.12.00176:C(121-129)Online publication date: 1-Mar-2015
    • (2015)A Framework for Interestingness Measures for Association Rules with Discrete and Continuous Attributes Based on Statistical ValidityArtificial Intelligence in Theory and Practice IV10.1007/978-3-319-25261-2_11(119-128)Online publication date: 25-Sep-2015
    • (2013)Spatial Itemset MiningProceedings of the 17th East European Conference on Advances in Databases and Information Systems - Volume 813310.1007/978-3-642-40683-6_12(148-161)Online publication date: 1-Sep-2013
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media