Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2623330.2630812acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
tutorial

Statistically sound pattern discovery

Published: 24 August 2014 Publication History

Abstract

Pattern discovery is a core data mining activity. Initial approaches were dominated by the frequent pattern discovery paradigm -- only patterns that occur frequently in the data were explored. Having been thoroughly researched and its limitations now well understood, this paradigm is giving way to a new one, which can be called statistically sound pattern discovery. In this paradigm, the main impetus is to discover statistically significant patterns, which are unlikely to have occurred by chance and are likely to hold in future data. Thus, the new paradigm provides a strict control over false discoveries and overfitting.
This tutorial covers both classic and cutting-edge research topics on pattern discovery combined to statistical significance testing. We start with an advanced introduction to the relevant forms of statistical significance testing, including different schools and alternative models, their underlying assumptions, practical issues, and limitations. We then discuss their application to data mining specific problems, including evaluation of nested patterns, the multiple testing problem, algorithmic strategies and real-world considerations. We present the current state-of-the art solutions and explore in detail how this approach to pattern discovery can deliver efficient and effective discovery of small sets of interesting patterns.

Supplementary Material

Part 1 of 3 (p1976-sidebyside1.mp4)
Part 2 of 3 (p1976-sidebyside2.mp4)
Part 3 of 3 (p1976-sidebyside3.mp4)

Cited By

View all
  • (2019)A tutorial on statistically sound pattern discoveryData Mining and Knowledge Discovery10.1007/s10618-018-0590-x33:2(325-377)Online publication date: 1-Mar-2019
  • (2019)Evaluation Measures for Extended Association Rules Based on Distributed RepresentationsPrimate Life Histories, Sex Roles, and Adaptability10.1007/978-3-030-15035-8_29(305-313)Online publication date: 15-Mar-2019
  • (2018)Evaluation Measures for Frequent Itemsets Based on Distributed Representations2018 Sixth International Symposium on Computing and Networking (CANDAR)10.1109/CANDAR.2018.00028(153-159)Online publication date: Nov-2018
  • Show More Cited By

Index Terms

  1. Statistically sound pattern discovery

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2014
    2028 pages
    ISBN:9781450329569
    DOI:10.1145/2623330
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 August 2014

    Check for updates

    Author Tags

    1. association mining
    2. pattern discovery
    3. statistics

    Qualifiers

    • Tutorial

    Conference

    KDD '14
    Sponsor:

    Acceptance Rates

    KDD '14 Paper Acceptance Rate 151 of 1,036 submissions, 15%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)A tutorial on statistically sound pattern discoveryData Mining and Knowledge Discovery10.1007/s10618-018-0590-x33:2(325-377)Online publication date: 1-Mar-2019
    • (2019)Evaluation Measures for Extended Association Rules Based on Distributed RepresentationsPrimate Life Histories, Sex Roles, and Adaptability10.1007/978-3-030-15035-8_29(305-313)Online publication date: 15-Mar-2019
    • (2018)Evaluation Measures for Frequent Itemsets Based on Distributed Representations2018 Sixth International Symposium on Computing and Networking (CANDAR)10.1109/CANDAR.2018.00028(153-159)Online publication date: Nov-2018
    • (2017)Extraction of Characteristic Frequent Visual Patterns by Distributed Representation2017 31st International Conference on Advanced Information Networking and Applications Workshops (WAINA)10.1109/WAINA.2017.71(525-530)Online publication date: Mar-2017

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media