Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1065167.1065215acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Relative risk and odds ratio: a data mining perspective

Published: 13 June 2005 Publication History

Abstract

We are often interested to test whether a given cause has a given effect. If we cannot specify the nature of the factors involved, such tests are called model-free studies. There are two major strategies to demonstrate associations between risk factors (ie. patterns) and outcome phenotypes (ie. class labels). The first is that of prospective study designs, and the analysis is based on the concept of "relative risk": What fraction of the exposed (ie. has the pattern) or unexposed (ie. lacks the pattern) individuals have the phenotype (ie. the class label)? The second is that of retrospective designs, and the analysis is based on the concept of "odds ratio": The odds that a case has been exposed to a risk factor is compared to the odds for a case that has not been exposed. The efficient extraction of patterns that have good relative risk and/or odds ratio has not been previously studied in the data mining context. In this paper, we investigate such patterns. We show that this pattern space can be systematically stratified into plateaus of convex spaces based on their support levels. Exploiting convexity, we formulate a number of sound and complete algorithms to extract the most general and the most specific of such patterns at each support level. We compare these algorithms. We further demonstrate that the most efficient among these algorithms is able to mine these sophisticated patterns at a speed comparable to that of mining frequent closed patterns, which are patterns that satisfy considerably simpler conditions.

References

[1]
R. Agrawal, et al. Mining association rules between sets of items in large databases. In Proceedings of 12th ACM-SIGMOD International Conference on Management of Data, pages 207--216, 1993.]]
[2]
A. Agresti. An Introduction to Categorical Data Analysis. Wiley & Sons, New York, 1996.]]
[3]
Y. Bastide, et al. Mining minimal non-redundant association rules using frequent closed itemsets. In Computational Logic, pages 972--986, 2000.]]
[4]
Y. Bastide, et al. Mining frequent patterns with counting inference. SIGKDD Explorations, 2:66--75, 2000.]]
[5]
R. J. Bayardo. Efficiently mining long patterns from databases. In Proceedings of 17th ACM-SIGMOD International Conference on Management of Data, pages 85--93, 1998.]]
[6]
G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of 5th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 15--18, 1999.]]
[7]
E. Fredkin. Trie memory. Communications of ACM, 3:490--500, 1960.]]
[8]
B. Goethals and M. J. Zaki. FIMI03: Workshop on frequent itemset mining implementations. In Proceedings of ICDM2003 Workshop on Frequent Itemset Mining implementations, pages 1--13, 2003.]]
[9]
G. Grahne and J. Zhu. Efficiently using prefix-trees in mining frequent itemsets. In Proceedings of ICDM2003 Workshop on Frequent Itemset Mining Implementations, 2003.]]
[10]
J. Han, et al. Mining frequent patterns without candidates generation. In Proceedings of 19th ACM-SIGMOD International Conference on Management of Data, pages 1--12, 2000.]]
[11]
J. Li, et al. The space of jumping emerging patterns and its incremental maintenance algorithms. In Proceedings of 17th International Conference on Machine Learning, pages 551--558, 2000.]]
[12]
V. P. Luong. The closed keys base of frequent itemsets. In Proceedings of 4th International Conference on Data Warehousing and Knowledge Discovery, pages 181--190, 2002.]]
[13]
F. Pan, et al. CARPENTER: Finding closed patterns in long biological datasets. In Proceedings of 9th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 637--642, 2003.]]
[14]
N. Pasquier, et al. Discovering frequent closed itemsets for association rules. In Proceedings of 7th International Conference on Database Theory, pages 398--416, 1999.]]
[15]
N. Pasquier, et al. Efficient mining of association rules using closed itemset lattices. Information Systems, 24:25--46, 1999.]]
[16]
J. Pei, et al. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 21--30, 2000.]]
[17]
P.-N. Tan, et al. Selecting the right interestingness measure for association patterns, In Proceedings of 8th ACM-SIGKDD International Conference on Knowledge Dicovery and Data Mining, pages 32--41, 2002.]]
[18]
P.-N. Tan, et al. Selecting the right objective measure for association analysis, Information systems, 29:293--313, 2004.]]
[19]
J. Wang, et al. CLOSET+: Search for the best strategies for mining frequent closed itemsets. In Proceedings of 9th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 236--245, 2003.]]
[20]
K. M. Weiss. Genetic Variation and Human Disease: Principles and Evolutionary Approaches. Cambridge University Press, 1993.]]
[21]
M. J. Zaki and C.-J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In Proceedings of 2nd SIAM International Conference on Data Mining, pages 457--473, 2002.]]

Cited By

View all
  • (2024)A Time-Efficient Distributed Constant Conditional Functional Dependency Discovery Algorithm for Data Consistency2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00036(198-203)Online publication date: 2-Jul-2024
  • (2022)RHPTree—Risk Hierarchical Pattern Tree for Scalable Long Pattern MiningACM Transactions on Knowledge Discovery from Data10.1145/348838016:4(1-33)Online publication date: 8-Jan-2022
  • (2020)Discovering Specific Sales Patterns Among Different Market SegmentsInternational Journal of Data Warehousing and Mining10.4018/IJDWM.202007010316:3(37-59)Online publication date: 1-Jul-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2005
388 pages
ISBN:1595930620
DOI:10.1145/1065167
  • General Chair:
  • Georg Gottlob,
  • Program Chair:
  • Foto Afrati
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2005

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS05

Acceptance Rates

Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)3
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Time-Efficient Distributed Constant Conditional Functional Dependency Discovery Algorithm for Data Consistency2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00036(198-203)Online publication date: 2-Jul-2024
  • (2022)RHPTree—Risk Hierarchical Pattern Tree for Scalable Long Pattern MiningACM Transactions on Knowledge Discovery from Data10.1145/348838016:4(1-33)Online publication date: 8-Jan-2022
  • (2020)Discovering Specific Sales Patterns Among Different Market SegmentsInternational Journal of Data Warehousing and Mining10.4018/IJDWM.202007010316:3(37-59)Online publication date: 1-Jul-2020
  • (2019)A Taxonomy of Discretization Techniques based on Class Labels and Attributes' Relationship2019 14th International Conference on Computer Engineering and Systems (ICCES)10.1109/ICCES48960.2019.9068185(316-321)Online publication date: Dec-2019
  • (2019)Boosting Discrimination Information Based Document Clustering Using Consensus and ClassificationIEEE Access10.1109/ACCESS.2019.29234627(78954-78962)Online publication date: 2019
  • (2019)Disease relative risk downscaling model to localize spatial epidemiologic indicators for mapping hand, foot, and mouth disease over ChinaStochastic Environmental Research and Risk Assessment10.1007/s00477-019-01728-533:10(1815-1833)Online publication date: 12-Sep-2019
  • (2019)A New Method to Evaluate Subgroup Discovery AlgorithmsProgress in Pattern Recognition, Image Analysis, Computer Vision, and Applications10.1007/978-3-030-33904-3_39(417-426)Online publication date: 22-Oct-2019
  • (2019)Data Mining/Mediation to Evaluate Risk of a Humanitarian Logistics Network in MexicoTechniques, Tools and Methodologies Applied to Global Supply Chain Ecosystems10.1007/978-3-030-26488-8_16(359-381)Online publication date: 30-Aug-2019
  • (2018)MacroBaseACM Transactions on Database Systems10.1145/327646343:4(1-45)Online publication date: 6-Dec-2018
  • (2018)iCFDMinerProceedings of the 2018 International Conference on Computing and Data Engineering10.1145/3219788.3219808(15-21)Online publication date: 4-May-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media