Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

RDFRules: : Making RDF rule mining easier and even more efficient

Published: 01 January 2021 Publication History
  • Get Citation Alerts
  • Abstract

    AMIE+ is a state-of-the-art algorithm for learning rules from RDF knowledge graphs (KGs). Based on association rule learning, AMIE+ constituted a breakthrough in terms of speed on large data compared to the previous generation of ILP-based systems. In this paper we present several algorithmic extensions to AMIE+, which make it faster, and the support for data pre-processing and model post-processing, which provides a more comprehensive coverage of the linked data mining process than does the original AMIE+ implementation. The main contributions are related to performance improvement: (1) the top-k approach, which addresses the problem of combinatorial explosion often resulting from a hand-set minimum support threshold, (2) a grammar that allows to define fine-grained patterns reducing the size of the search space, and (3) a faster projection binding reducing the number of repetitive calculations. Other enhancements include the possibility to mine across multiple graphs, the support for discretization of continuous values, and the selection of the most representative rules using proven rule pruning and clustering algorithms. Benchmarks show reductions in mining time of up to several orders of magnitude compared to AMIE+. An open-source implementation is available under the name RDFRules at https://github.com/propi/rdfrules.

    References

    [1]
    R. Agrawal, T. Imieliński and A. Swami, Mining association rules between sets of items in large databases, in: ACM SIGMOD Record, Vol. 22, ACM, 1993, pp. 207–216.
    [2]
    R. Agrawal, R. Srikant et al., Fast algorithms for mining association rules in large databases, in: Proc. 20th Int. Conf. Very Large Data Bases, VLDB, Vol. 1215, 1994, pp. 487–499.
    [3]
    M. Barati, Q. Bai and Q. Liu, Mining semantic association rules from RDF data, Knowledge-Based Systems 133 (2017), 183–196.
    [4]
    D. Beckett and B. McBride, RDF/XML syntax specification (revised), W3C recommendation 10(2.3) (2004).
    [5]
    C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak and S. Hellmann, DBpedia – a crystallization point for the Web of Data, Web Semantics: science, services and agents on the world wide web 7(3) (2009), 154–165.
    [6]
    A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston and O. Yakhnenko, Translating embeddings for modeling multi-relational data, in: Advances in Neural Information Processing Systems, 2013, pp. 2787–2795.
    [7]
    L. Bustio-Martínez, M. Letras-Luna, R. Cumplido, R. Hernández-León, C. Feregrino-Uribe and J.M. Bande-Serrano, Using hashing and lexicographic order for frequent itemsets mining on data streams, Journal of Parallel and Distributed Computing 125 (2019), 58–71.
    [8]
    A.V. Carreiro, A.J. Ferreira, M.A. Figueiredo and S.C. Madeira, Prognostic prediction using clinical expression time series: Towards a supervised learning approach based on meta-biclusters, in: 6th International Conference on Practical Applications of Computational Biology & Bioinformatics, Springer, 2012, pp. 11–20.
    [9]
    S. Ceri, G. Gottlob and L. Tanca, Logic Programming and Databases, Springer Science & Business Media, 2012. ISBN 978-3-642-83952-8.
    [10]
    A. Cropper, S. Dumančić and S.H. Muggleton, Turning 30: new ideas in inductive logic programming, 2020, arXiv preprint, available at.
    [11]
    A. Cropper and S.H. Muggleton, Learning higher-order logic programs through abstraction and invention, in: IJCAI, 2016, pp. 1418–1424.
    [12]
    C. d’Amato, S. Staab, A.G. Tettamanzi, T.D. Minh and F. Gandon, Ontology enrichment by discovering multi-relational association rules from ontological knowledge bases, in: Proceedings of the 31st Annual ACM Symposium on Applied Computing, 2016, pp. 333–338.
    [13]
    C. d’Amato, A.G. Tettamanzi and T.D. Minh, Evolutionary discovery of multi-relational association rules from ontological knowledge bases, in: European Knowledge Acquisition Workshop, Springer, 2016, pp. 113–128.
    [14]
    R. Das, A. Godbole, N. Monath, M. Zaheer and A. McCallum, Probabilistic case-based reasoning for open-world knowledge graph completion, 2020, arXiv preprint, available at.
    [15]
    M. Ester, H.-P. Kriegel, J. Sander, X. Xu et al., A density-based algorithm for discovering clusters in large spatial databases with noise, in: KDD, Vol. 96, 1996, pp. 226–231.
    [16]
    M. Fernández-Delgado, E. Cernadas, S. Barro and D. Amorim, Do we need hundreds of classifiers to solve real world classification problems?, Journal of Machine Learning Research 15(1) (2014), 3133–3181.
    [17]
    A.A. Freitas, Automated machine learning for studying the trade-off between predictive accuracy and interpretability, in: International Cross-Domain Conference for Machine Learning and Knowledge Extraction, Springer, 2019, pp. 48–66.
    [18]
    J. Fürnkranz, D. Gamberger and N. Lavrač, Foundations of Rule Learning, Springer Science & Business Media, 2012. ISBN 978-3-540-75196-0.
    [19]
    J. Fürnkranz, T. Kliegr and H. Paulheim, On cognitive preferences and the plausibility of rule-based models, Machine Learning 109(4) (2020), 853–898.
    [20]
    L. Galárraga, C. Teflioudi, K. Hose and F.M. Suchanek, Fast rule mining in ontological knowledge bases with AMIE+, The VLDB Journal 24(6) (2015), 707–730.
    [21]
    L.A. Galárraga, N. Preda and F.M. Suchanek, Mining rules to align knowledge bases, in: Proceedings of the 2013 Workshop on Automated Knowledge Base Construction, ACM, 2013, pp. 43–48.
    [22]
    B. Goethals and J. Van den Bussche, Relational Association Rules: Getting Warmer, in: Pattern Detection and Discovery, Springer, 2002, pp. 125–139.
    [23]
    M. Hahsler, S. Chelluboina, K. Hornik and C. Buchta, The arules R-package ecosystem: Analyzing interesting patterns from large transaction data sets, Journal of Machine Learning Research 12 (2011), 2021–2025.
    [24]
    M. Hahsler, I. Johnson T. Kliegr and J. Kuchař, Associative Classification in R: arc, arulesCBA, and rCBA, R Journal 9(2) (2019).
    [25]
    P. Hájek, M. Holeňa and J. Rauch, The GUHA method and its meaning for data mining, Journal of Computer and System Sciences 76(1) (2010), 34–48.
    [26]
    P. Heim, S. Hellmann, J. Lehmann, S. Lohmann and T. Stegemann, RelFinder: Revealing relationships in RDF knowledge bases, in: International Conference on Semantic and Digital Media Technologies, Springer, 2009, pp. 182–187.
    [27]
    V.T. Ho, D. Stepanova, M.H. Gad-Elrab, E. Kharlamov and G. Weikum, Rule learning from knowledge graphs guided by embedding models, in: International Semantic Web Conference, Springer, 2018, pp. 72–90.
    [28]
    C. Hocquette, Can meta-interpretive learning outperform deep reinforcement learning of evaluable game strategies? 2019, pp. 6440–6441.
    [29]
    I. Horrocks, P.F. Patel-Schneider, H. Boley, S. Tabet, B. Grosofand and M. Dean, SWRL: A Semantic Web Rule Language Combining OWL and RuleML, 2004, http://www.w3.org/Submission/SWRL/.
    [30]
    J. Józefowska, A. Ławrynowicz and T. Łukaszewski, The role of semantics in mining frequent patterns from knowledge bases in description logics with rules, Theory and Practice of Logic Programming 10(3) (2010), 251–289.
    [31]
    M. Kifer and H. Boley, RIF Overview, 2010, available at http://www.w3.org/TR/rif-overview/.
    [32]
    J. Kim, E.-K. Kim, Y. Won, S. Nam and K.-S. Choi, The association rule mining system for acquiring knowledge of DBpedia from Wikipedia categories, in: NLP-DBPEDIA@ ISWC, 2015, pp. 68–80.
    [33]
    T. Kliegr, Š. Bahník and J. Fürnkranz, A review of possible effects of cognitive biases on interpretation of rule-based machine learning models, 2018, arXiv preprint, available at.
    [34]
    T. Kliegr and J. Kuchař, Tuning hyperparameters of classification based on associations (CBA), in: Proceedings of ITAT, 2019.
    [35]
    J. Lajus, L. Galárraga and F. Suchanek, Fast and exact rule mining with AMIE 3, in: European Semantic Web Conference, Springer, 2020, pp. 36–52.
    [36]
    A. Ławrynowicz and J. Potoniec, Fr-ONT: An algorithm for frequent concept mining with formal ontologies, in: International Symposium on Methodologies for Intelligent Systems, Springer, 2011, pp. 428–437.
    [37]
    A. Ławrynowicz and J. Potoniec, Pattern based feature construction in semantic data mining, International Journal on Semantic Web and Information Systems (IJSWIS) 10(1) (2014), 27–65.
    [38]
    J. Lehmann, G. Sejdiu, L. Bühmann, P. Westphal, C. Stadler, I. Ermilov, S. Bin, N. Chakraborty, M. Saleem, A.-C.N. Ngomo et al., Distributed semantic analytics using the SANSA stack, in: International Semantic Web Conference, Springer, 2017, pp. 147–155.
    [39]
    J. Li, H. Shen and R. Topor, Mining optimal class association rule set, Knowledge-Based Systems 15(7) (2002), 399–405.
    [40]
    B. Liu, W. Hsu and Y. Ma, Integrating classification and association rule mining, in: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, KDD’98, AAAI Press, 1998, pp. 80–86.
    [41]
    C. Meilicke, M.W. Chekol, D. Ruffinelli and H. Stuckenschmidt, Anytime bottom-up rule learning for knowledge graph completion, 2019, pp. 3137–3143.
    [42]
    C. Meilicke, M. Fink, Y. Wang, D. Ruffinelli, R. Gemulla and H. Stuckenschmidt, Fine-grained evaluation of rule and embedding-based systems for knowledge graph completion, in: International Semantic Web Conference, Springer, 2018, pp. 3–20.
    [43]
    M. Morzy, A. Ławrynowicz and M. Zozuliński, Using substitutive itemset mining framework for finding synonymous properties in linked data, in: International Symposium on Rules and Rule Markup Languages for the Semantic Web, Springer, 2015, pp. 422–430.
    [44]
    V. Nebot and R. Berlanga, Finding association rules in semantic web data, Knowledge-Based Systems 25(1) (2012), 51–62.
    [45]
    M. Nickel, L. Rosasco and T. Poggio, Holographic embeddings of knowledge graphs, in: Thirtieth AAAI Conference on Artificial Intelligence, 2016.
    [46]
    M. Nickel, V. Tresp and H.-P. Kriegel, A three-way model for collective learning on multi-relational data, in: ICML, Vol. 11, 2011, pp. 809–816.
    [47]
    P.G. Omran, K. Wang and Z. Wang, Scalable rule learning via learning representation, in: Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI’18, AAAI Press, 2018, pp. 2149–2155. ISBN 978-0-9992411-2-7.
    [48]
    H. Paulheim and C. Bizer, Type inference on noisy RDF data, in: International Semantic Web Conference, Springer, 2013, pp. 510–525.
    [49]
    J. Potoniec, P. Jakubowski and A. Lawrynowicz, Swift linked data miner: Mining OWL 2 EL class expressions directly from online RDF datasets, Journal of Web Semantics First Look (2017).
    [50]
    J. Rabatel, M. Croitoru, D. Ienco and P. Poncelet, Contextual itemset mining in DBpedia, in: LD4KD: Linked Data for Knowledge Discovery, Vol. 1232, CEUR, 2014.
    [51]
    T. Rebele, F. Suchanek, J. Hoffart, J. Biega, E. Kuzey and G. Weikum, YAGO: A multilingual knowledge base from Wikipedia, wordnet, and geonames, in: International Semantic Web Conference, Springer, 2016, pp. 177–185.
    [52]
    P.J. Rousseeuw, Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, Journal of computational and applied mathematics 20 (1987), 53–65.
    [53]
    M. Svatoš, S. Schockaert, J. Davis and O. Kuzelka, STRiKE: Rule-driven relational learning using stratified k-entailment, in: ECAI 2020: 24th European Conference on Artificial Intelligence, 2020.
    [54]
    K. Vanhoof and B. Depaire, Structure of association rule classifiers: A review, in: 2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering, IEEE, 2010, pp. 9–12.
    [55]
    D. Vrandečić, Wikidata: A new platform for collaborative data collection, in: Proceedings of the 21st International Conference on World Wide Web, ACM, 2012, pp. 1063–1064.
    [56]
    J. Wang, J. Han, Y. Lu and P. Tzvetkov, TFP: An efficient algorithm for mining top-k frequent closed itemsets, IEEE Transactions on Knowledge and Data Engineering 17(5) (2005), 652–663.
    [57]
    G.I. Webb, Filtered-top-k association discovery, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1(3) (2011), 183–192.
    [58]
    H. Xiao, M. Huang, L. Meng and X. Zhu, SSP: Semantic space projection for knowledge graph embedding with text descriptions, in: Thirty-First AAAI Conference on Artificial Intelligence, 2017.
    [59]
    Z. Yin and Y. Shen, On the dimensionality of word embedding, in: Advances in Neural Information Processing Systems, 2018, pp. 887–898.
    [60]
    V. Zeman, T. Kliegr and V. Svátek, RdfRules preview: towards an analytics engine for rule mining in RDF knowledge graphs, RuleML Challenge, 2018.
    [61]
    W. Zhang, B. Paudel, L. Wang, J. Chen, H. Zhu, W. Zhang, A. Bernstein and H. Chen, Iteratively learning embeddings and rules for knowledge graph reasoning, in: The World Wide Web Conference, ACM, 2019, pp. 2366–2377.

    Cited By

    View all
    • (2024)Inductive autoencoder for efficiently compressing RDF graphsInformation Sciences: an International Journal10.1016/j.ins.2024.120210662:COnline publication date: 1-Mar-2024

    Index Terms

    1. RDFRules: Making RDF rule mining easier and even more efficient
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Semantic Web
          Semantic Web  Volume 12, Issue 4
          Advancing Agriculture through Semantic Data Management
          2021
          162 pages
          ISSN:1570-0844
          EISSN:2210-4968
          Issue’s Table of Contents

          Publisher

          IOS Press

          Netherlands

          Publication History

          Published: 01 January 2021

          Author Tags

          1. Rule mining
          2. rule learning
          3. exploratory data analysis
          4. machine learning
          5. inductive logical programming

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 10 Aug 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Inductive autoencoder for efficiently compressing RDF graphsInformation Sciences: an International Journal10.1016/j.ins.2024.120210662:COnline publication date: 1-Mar-2024

          View Options

          View options

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media