Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2480362.2480387acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Out-of-bag discriminative graph mining

Published: 18 March 2013 Publication History

Abstract

In class-labeled graph databases, each graph is associated with one from a finite set of classes, which induces associations between classes and subgraphs occurring in the database graphs. The subgraphs with strong class associations are called discriminative subgraphs. In this work, discriminative subgraphs are repeatedly mined on bootstrap samples of a graph database in order to improve on estimation of subgraph associations. The number of times a subgraph occurs in a graph associated with each class (support values) is recorded over the out-of-bag instances of the bootstrap process. We investigate sample mean and maximum likelihood estimation for the approximation of the true underlying support from these empirical values. It is shown that both significantly improve on the process, compared to single runs of discriminative graph mining, by applying the methods to publicly available toxicological databases, and validating support values, class bias, and class significance. In toxicology, the detection of subgraphs (fragments of chemical structure) that induce toxicity is a major goal. Apart from the subgraph associations being statistically validated, the number of subgraphs created by the proposed methods are much lower than for ordinary discriminative graph mining, which is often a bottleneck in the application of computational models to such databases, and hinders interpretation of the results.

References

[1]
Leo Breiman. Out-Of-Bag Estimation. Technical Report, Statistics Department, University of California, 1996.
[2]
Björn Bringmann, Siegfried Nijssen, and Albrecht Zimmermann. From Local Patterns to Classification Models. In Saso Džeroski, Bart Goethals, and Pance Panov, editors, Inductive Databases and Constraint-Based Data Mining, pages 127--154. Springer New York, 2010.
[3]
Tom Bylander. Estimating Generalization Error on Two-Class Datasets Using Out-of-Bag Estimates. Machine Learning, 48(1-3):287--297, 2002.
[4]
Yun Chi, Richard R. Muntz, Siegfried Nijssen, and Joost N. Kok. Frequent Subtree Mining - An Overview. Fundamenta Informaticae, 66(1--2):161--198, 2004.
[5]
Jun Huan, Wei Wang, Jan Prins, and Jiong Yang. SPIN: Mining Maximal Frequent Subgraphs From Graph Databases. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '04, pages 581--586, New York, NY, USA, 2004. ACM.
[6]
Jeroen Kazius, Ross McGuire, and Roberta Bursi. Derivation and Validation of Toxicophores for Mutagenicity Prediction. Journal of Medicinal Chemistry, 48(1):312--320, 2005.
[7]
Stefan Kramer, Luc De Raedt, and Christoph Helma. Molecular Feature Mining in HIV Data. In KDD '01: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 136--143, New York, NY, USA, 2001. ACM.
[8]
Andreas Maunz, Christoph Helma, and Stefan Kramer. Efficient Mining for Structurally Diverse Subgraph Patterns in Large Molecular Databases. Machine Learning, 83:193--218, 2011.
[9]
Hiroto Saigo, Sebastian Nowozin, Tadashi Kadowaki, Taku Kudo, and Koji Tsuda. gBoost: A Mathematical Programming Approach to Graph Classification and Regression. Machine Learning, 75(1):69--89, 2009.
[10]
Claudia Suenderhauf, Felix Hammann, Andreas Maunz, Christoph Helma, and Jörg Huwyler. Combinatorial QSAR Modeling of Human Intestinal Absorption. Molecular Pharmaceutics, 8(1):213--224, 2011.
  1. Out-of-bag discriminative graph mining

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '13: Proceedings of the 28th Annual ACM Symposium on Applied Computing
    March 2013
    2124 pages
    ISBN:9781450316569
    DOI:10.1145/2480362
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 March 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ensemble methods
    2. graph mining
    3. out-of-bag estimation

    Qualifiers

    • Research-article

    Conference

    SAC '13
    Sponsor:
    SAC '13: SAC '13
    March 18 - 22, 2013
    Coimbra, Portugal

    Acceptance Rates

    SAC '13 Paper Acceptance Rate 255 of 1,063 submissions, 24%;
    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Upcoming Conference

    SAC '25
    The 40th ACM/SIGAPP Symposium on Applied Computing
    March 31 - April 4, 2025
    Catania , Italy

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 83
      Total Downloads
    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media