A statistical framework for mining substitution rules

Teng, Wei-Guang; Hsieh, Ming-Jyh; Chen, Ming-Syan

doi:10.1007/s10115-003-0142-5

A statistical framework for mining substitution rules

Published: 01 February 2005

Volume 7, pages 158–178, (2005)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Wei-Guang Teng¹,
Ming-Jyh Hsieh¹ &
Ming-Syan Chen¹

103 Accesses
Explore all metrics

Abstract

In this paper, a new mining capability, called mining of substitution rules, is explored. A substitution refers to the choice made by a customer to replace the purchase of some items with that of others. The mining of substitution rules in a transaction database, the same as that of association rules, will lead to very valuable knowledge in various aspects, including market prediction, user behaviour analysis and decision support. The process of mining substitution rules can be decomposed into two procedures. The first procedure is to identify concrete itemsets among a large number of frequent itemsets, where a concrete itemset is a frequent itemset whose items are statistically dependent. The second procedure is then on the substitution rule generation. In this paper, we first derive theoretical properties for the model of substitution rule mining and devise a technique on the induction of positive itemset supports to improve the efficiency of support counting for negative itemsets. Then, in light of these properties, the SRM (substitution rule mining) algorithm is designed and implemented to discover the substitution rules efficiently while attaining good statistical significance. Empirical studies are performed to evaluate the performance of the SRM algorithm proposed. It is shown that the SRM algorithm not only has very good execution efficiency but also produces substitution rules of very high quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aggarwal CC, Yu PS (1998) A new framework for itemset generation. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Seattle, WA, June 1998, pp 18–24
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) Proceedings of the 1993 ACM SIGMOD international conference on management of data, Washington DC, May 1993, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Bocca JB, Jarke M, Zaniolo C (eds) Proceedings of the 20th international conference on very large data bases, Santiago de Chile, Chile, September 1994, pp 478–499
Ale JM, Rossi G (2000) An approach to discovering temporal association rules. In: Proceedings of the 2000 ACM symposium on applied computing, Villa Olmo, Como, Italy, March 2000, pp 294–300
Ayad AM, El-Makky NM, Taha Y (2001) Incremental mining of constrained association rules. In: Proceedings of the 1st SIAM conference on data mining, Chicago, IL, April 2001
Bayardo RJ, Agrawal R, Gunopulos D (1999) Constraint-based rule mining in large, dense databases. In: Proceedings of the 15th international conference on data engineering, Sydney, Austrialia, March 1999, pp 188–197
Boulicaut J-F, Bykowski A, Jeudy B (2000) Towards the tractable discovery of association rules with negations. In: Larsen HL, Kacprzyk J, Zadrozny S et al (eds) Proceedings of the 4th international conference on flexible query answering systems, Warsaw, Poland, October 2000, pp 425–434
Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: generalizing association rules to correlations. In: Peckham J (ed) Proceedings of the 1997 ACM SIGMOD international conference on the management of data, Tucson, AZ, May 1997, pp 265–276
Chen M-S, Han J, Yu PS (1996) Data mining: an overview from database perspective. IEEE Trans Knowl Data Eng 8(6):866–883
Article Google Scholar
Chen X, Petrounias I (2000) Discovering temporal association rules: algorithms, language and system. In: Proceedings of the 16th international conference on data engineering, San Diego, CA, February 2000, pp 306
DuMouchel W, Pregibon D (2001) Empirical Bayes screening for multi-item associations. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, August 2001, pp 67–76
Han J, Fu Y (1995) Discovery of Multiple-level association rules from large databases. In: Dayal U, Gray PMD, Nishio S (eds) Proceedings of the 21st international conference on very large data bases, Zurich, Switzerland, September 1995, pp 420–431
Han J, Kamber M (2000) Data mining: concepts and techniques. Morgan Kaufmann Publishers, San Francisco, CA
Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Chen W, Naughton JF, Bernstein PA (eds) Proceedings of the 2000 ACM-SIGMOD international conference on management of data, Dallas, TX, May 2000, pp 1–12
Hogg RV, Tanis EA (2000) Probability and statistical inference, 6/e. Prentice Hall, Upper Saddle River, NJ
Hosseini JC, Harmon RR, Zwick M (1991) An information theoretic framework for exploratory multivariate market segmentation research. Decision Sci 22:663–677
Article Google Scholar
Jermaine C (2001) The computational complexity of high-dimensional correlation search. In: Cercone N, Lin T-Y, Wu X (eds) Proceedings of the 1st IEEE international conference on data Mining, San Jose, CA, November 2001, pp 249–256
Johnson RA, Wichern DW (2002) Applied multivariate statistical analysis, 5/e. Prentice Hall, Upper Saddle River, NJ
Lakshmanan LVS, Ng R, Han J et al (1999) Optimization of constrained frequent set queries with 2-variable constraints. In: Delis A, Faloutsos C, Ghandeharizadeh S (eds) Proceedings of the 1999 ACM SIGMOD international conference on management of data, Philadelphia, PA, June 1999, pp 157–168
Lee C-H, Lin C-R, Chen M-S (2001) On mining general temporal association rules in a publication database. In: Cercone N, Lin T-Y, Wu X (eds) Proceedings of the 1st IEEE international conference on data mining, San Jose, CA, November 2001, pp 337–344
Lee C-H, Lin C-R, Chen M-S (2001) Sliding-window filtering: an efficient algorithm for incremental mining. In: Proceedings of ACM 10th international conference on information and knowledge management, Atlanta, GA, November 2001, pp 263–270
Liu B, Hsu W, Ma Y (1999) Mining association rules with multiple minimum supports. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, CA, August 1999, pp 337–341
Liu B, Hsu W, Ma Y (2001) Identifying non-actionable association rules. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, August 2001, pp 329–334
Ma S, Hellerstein JL (2001) Mining mutually dependent patterns. In: Cercone N, Lin T-Y, Wu X (eds) Proceedings of the 1st IEEE international conference on data mining, San Jose, CA, November 2001, pp 409–416
Mannila H, Rusakov D (2001) Decomposition of event sequences into independent components. In: Proceedings of the 1st SIAM conference on data mining, Chicago, IL, April 2001
Meo R (2000) Theory of dependence values. ACM Trans Database Syst 25(3):380–406
Article Google Scholar
Park J-S, Chen M-S, Yu PS (1997) Using a hash-based method with transaction trimming for mining association rules. IEEE Trans Knowl Data Eng 9(5):813–825
Article Google Scholar
Pei J, Han J (2000) Can we push more constraints into frequent pattern mining? In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, Boston, MA, August 2000, pp 350–354
Savasere A, Omiecinski E, Navathe S (1998) Mining for strong negative associations in a large database of customer transactions. In: Proceedings of the 14th international conference on data engineering, Orlando, FL, February 1998, pp 494–502
Srikant R, Agrawal R (1995) Mining generalized association rules. In: Dayal U, Gray PMD, Nishio S (eds) Proceedings of the 21st international conference on very large data bases, Zurich, Switzerland, September 1995, pp 407–419
Wang K, He Y, Han J (2000) Mining frequent itemsets using support constraints. In: Abbadi AE, Brodie ML, Chakravarthy S et al (eds) Proceedings of the 26th international conference on very large data bases, Cairo, Egypt, September 2000, pp 43–52

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan
Wei-Guang Teng, Ming-Jyh Hsieh & Ming-Syan Chen

Authors

Wei-Guang Teng
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Jyh Hsieh
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Syan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming-Syan Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Teng, WG., Hsieh, MJ. & Chen, MS. A statistical framework for mining substitution rules. Knowl Inf Syst 7, 158–178 (2005). https://doi.org/10.1007/s10115-003-0142-5

Download citation

Received: 09 December 2002
Revised: 10 February 2003
Accepted: 20 June 2003
Published: 01 February 2005
Issue Date: February 2005
DOI: https://doi.org/10.1007/s10115-003-0142-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A statistical framework for mining substitution rules

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Algorithms for frequent itemset mining: a literature review

A Comparative Analysis of Algorithms for Mining Frequent Itemsets

A Survey of High Utility Itemset Mining

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

A statistical framework for mining substitution rules

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Algorithms for frequent itemset mining: a literature review

A Comparative Analysis of Algorithms for Mining Frequent Itemsets

A Survey of High Utility Itemset Mining

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now