research-article

RDF shape induction using knowledge base profiling

Authors:

Nandana Mihindukulasooriya,

Mohammad Rifat Ahmmad Rashid,

Giuseppe Rizzo,

Raúl García-Castro,

Marco TorchianoAuthors Info & Claims

SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

Pages 1952 - 1959

https://doi.org/10.1145/3167132.3167341

Published: 09 April 2018 Publication History

Abstract

Knowledge Graphs (KGs) are becoming the core of most artificial intelligent and cognitive applications. Popular KGs such as DBpedia and Wikidata have chosen the RDF data model to represent their data. Despite the advantages, there are challenges in using RDF data, for example, data validation. Ontologies for specifying domain conceptualizations in RDF data are designed for entailments rather than validation. Most ontologies lack the granular information needed for validating constraints. Recent work on RDF Shapes and standardization of languages such as SHACL and ShEX provide better mechanisms for representing integrity constraints for RDF data. However, manually creating constraints for large KGs is still a tedious task. In this paper, we present a data driven approach for inducing integrity constraints for RDF data using data profiling. Those constraints can be combined into RDF Shapes and can be used to validate RDF graphs. Our method is based on machine learning techniques to automatically generate RDF shapes using profiled RDF data as features. In the experiments, the proposed approach achieved 97% precision in deriving RDF Shapes with cardinality constraints for a subset of DBpedia data.

References

[1]

Ziawasch Abedjan and Felix Naumann. 2013. Improving RDF Data Through Association Rule Mining. Datenbank-Spektrum 13, 2 (01 Jul 2013), 111--120.

[2]

Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of databases: the logical level. (1995).

Digital Library

[3]

Rakesh Agrawal, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, A Inkeri Verkamo, et al. 1996. Fast discovery of association rules. Advances in knowledge discovery and data mining 12, 1 (1996), 307--328.

Digital Library

[4]

Adrien Basse, Fabien Gandon, Isabelle Mirbel, and Moussa Lo. 2010. DFS-based frequent graph pattern extraction to characterize the content of RDF Triple Stores. In Web Science Conference 2010 (WebSci10).

[5]

Christopher M Bishop. 2006. Pattern recognition and machine learning. springer.

Digital Library

[6]

Peter Bloem and Gerben K. D. De Vries. 2014. Machine Learning on Linked Data, a Position Paper. In Proceedings of the 1st International Conference on Linked Data for Knowledge Discovery - Volume 1232 (LD4KD'14). CEUR-WS.org, Aachen, Germany, Germany, 64--68. http://dl.acm.org/citation.cfm?id=3053827.3053834

Digital Library

[7]

Eva Blomqvist, Ziqi Zhang, Anna Lisa Gentile, Isabelle Augenstein, and Fabio Ciravegna. 2013. Statistical knowledge patterns for characterising linked data. In Proceedings of the 4th International Conference on Ontology and Semantic Web Patterns-Volume 1188. CEUR-WS. org, 1--13.

Digital Library

[8]

Lorenz Bühmann, Daniel Fleischhacker, Jens Lehmann, Andre Melo, and Johanna Völker. 2014. Inductive lexical learning of class expressions. In International Conference on Knowledge Engineering and Knowledge Management. Springer, 42--53.

[9]

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321--357.

[10]

Luc De Raedt, Tias Guns, and Siegfried Nijssen. 2010. Constraint programming for data mining and machine learning. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10). 1671--1675.

Digital Library

[11]

David A Freedman. 2009. Statistical models: theory and practice. cambridge university press.

[12]

Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.

[13]

Johannes Fürnkranz and Peter A Flach. 2005. Roc 'n' rule learning - towards a better understanding of covering algorithms. Machine Learning 58, 1 (2005), 39--77.

Digital Library

[14]

Lise Getoor and Ben Taskar. 2007. Introduction to statistical relational learning. MIT press.

Digital Library

[15]

TinKamHo. 1995. Random decision forests. In Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on, Vol. 1. IEEE, 278--282.

Digital Library

[16]

Aidan Hogan, Andreas Harth, Alexandre Passant, Stefan Decker, and Axel Polleres. 2010. Weaving the Pedantic Web. In Proceedings of the Linked Data on the Web (LDOW 2010), Vol. 628. CEUR Workshop Proceedings.

[17]

Theodore Johnson. 2009. Data Profiling. In Encyclopedia of Database Systems, LING LIU and M. TAMER ÖZSU (Eds.). Springer US, Boston, MA, 604--608.

[18]

Hassan Khosravi and Bahareh Bina. 2010. A Survey on Statistical Relational Learning. In Canadian Conference on AI. Springer, 256--268.

Digital Library

[19]

Holger Knublauch and Dimitris Kontokostas. 2017. W3C Shapes Constraint Language (SHACL). (July 2017). https://www.w3.org/TR/shacl/

[20]

Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning. The MIT Press.

Digital Library

[21]

Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van Kleef, Sören Auer, et al. 2015. DBpedia-a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6, 2 (2015), 167--195.

[22]

Stephen W Liddle, David W Embley, and Scott N Woodfield. 1993. Cardinality constraints in semantic data models. Data & Knowledge Engineering 11, 3 (1993), 235--270.

Digital Library

[23]

Deborah L McGuinness, Frank Van Harmelen, et al. 2004. OWL web ontology language overview. W3C recommendation 10, 10 (2004), 2004.

[24]

Nandana Mihindukulasooriya, María Poveda-Villalón, Raúl García-Castro, and Asunción Gómez-Pérez. 2015. Loupe-An Online Tool for Inspecting Datasets in the Linked Data Cloud. In Demo at the 14th International Semantic Web Conference. Bethlehem, USA.

[25]

Thomas Neumann and Guido Moerkotte. 2011. Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In Data Engineering (ICDE), 2011 IEEE 27th International Conference on. IEEE, 984--994.

Digital Library

[26]

Eric Prud'hommeaux, Iovka Boneva, Jose Emilio Labra-Gayo, and Gregg Kellogg. 2017. Shape Expressions Language 2.0. (July 2017). http://shex.io/shex-semantics/

[27]

Eric Prud'hommeaux, Jose Emilio Labra Gayo, and Harold Solbrig. 2014. Shape expressions: an RDF validation and transformation language. In Proceedings of the 10th International Conference on Semantic Systems. ACM, 32--40.

Digital Library

[28]

J Ross Quinlan. 2014. C4. 5: programs for machine learning. (2014).

[29]

Dan Steinberg and Phillip Colla. 2009. CART: classification and regression trees. The top ten algorithms in data mining 9 (2009), 179.

[30]

Johan AK Suykens, Tony Van Gestel, and Jos De Brabanter. 2002. Least squares support vector machines. World Scientific.

[31]

Jiao Tao, Evren Sirin, Jie Bao, and Deborah L McGuinness. 2010. Extending OWL with Integrity Constraints. Description Logics 573 (2010).

[32]

Giri Kumar Tayi and Donald P Ballou. 1998. Examining Data Quality. Commun. ACM 41, 2 (1998), 54--57.

Digital Library

[33]

Raphael Troncy and Giuseppe Rizzo et al. 2017. 3cixty: Building Comprehensive Knowledge Bases for City Exploration. Web Semantics: Science, Services and Agents on the World Wide Web 46-47, Supplement C (2017), 2 -- 13.

Digital Library

[34]

WEKA. 2013. Weka Manual for Version 3-7-8. Technical Report. WEKA. https://pdfs.semanticscholar.org/d617/d41097bdf97d994d1481adbcfe0c05a51696.pdf

Cited By

Carriero VGroth PPresutti V(2024)Empirical ontology design patterns and shapes from WikidataSemantic Web10.3233/SW-243613(1-25)Online publication date: 20-Mar-2024
https://doi.org/10.3233/SW-243613
Rabbani KLissandrini MBonifati AHose K(2024)Transforming RDF Graphs to Property Graphs using Standardized SchemasProceedings of the ACM on Management of Data10.1145/36988172:6(1-25)Online publication date: 20-Dec-2024
https://dl.acm.org/doi/10.1145/3698817
Seifer PHernández DLämmel RStaab SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)From Shapes to Shapes: Inferring SHACL Shapes for Results of SPARQL CONSTRUCT QueriesProceedings of the ACM Web Conference 202410.1145/3589334.3645550(2064-2074)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645550
Show More Cited By

Index Terms

RDF shape induction using knowledge base profiling
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
2. Information systems
  1. Data management systems
    1. Information integration
      1. Data cleaning

Recommendations

The RDF foundry: call for an initiative to build enhanced RDF resources for biological data integration
WIMS '11: Proceedings of the International Conference on Web Intelligence, Mining and Semantics

Currently, the OBO Foundry plays an important role by setting guidelines to formalise the concepts within the biomedical domain. The ontologies within the OBO Foundry are usually represented in the OBO ontology language. While being human-readable, this ...
The role of reasoning for RDF validation
SEMANTICS '15: Proceedings of the 11th International Conference on Semantic Systems

For data practitioners embracing the world of RDF and Linked Data, the openness and flexibility is a mixed blessing. For them, data validation according to predefined constraints is a much sought-after feature, particularly as this is taken for granted ...
Extended RDF: Computability and complexity issues

ERDF stable model semantics is a recently proposed semantics for ERDF ontologies and a faithful extension of RDFS semantics on RDF graphs. In this paper, we elaborate on the computability and complexity issues of the ERDF stable model semantics. Based ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

April 2018

2327 pages

ISBN:9781450351911

DOI:10.1145/3167132

Conference Chairs:
Hisham M. Haddad
Kennesaw State University
,
Roger L. Wainwright
University of Tulsa
,
Richard Chbeir
University of Pau & Pays Adour, France

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 April 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Ministerio de Economía, Industria y Competitividad

Conference

SAC 2018

Sponsor:

SIGAPP

SAC 2018: Symposium on Applied Computing

April 9 - 13, 2018

Pau, France

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25

Sponsor:
sigapp

The 40th ACM/SIGAPP Symposium on Applied Computing

March 31 - April 4, 2025

Catania , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
181
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)1

Reflects downloads up to 29 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Carriero VGroth PPresutti V(2024)Empirical ontology design patterns and shapes from WikidataSemantic Web10.3233/SW-243613(1-25)Online publication date: 20-Mar-2024
https://doi.org/10.3233/SW-243613
Rabbani KLissandrini MBonifati AHose K(2024)Transforming RDF Graphs to Property Graphs using Standardized SchemasProceedings of the ACM on Management of Data10.1145/36988172:6(1-25)Online publication date: 20-Dec-2024
https://dl.acm.org/doi/10.1145/3698817
Seifer PHernández DLämmel RStaab SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)From Shapes to Shapes: Inferring SHACL Shapes for Results of SPARQL CONSTRUCT QueriesProceedings of the ACM Web Conference 202410.1145/3589334.3645550(2064-2074)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645550
Duan XChaves-Fraga DDerom ODimou A(2024)SCOOP All the Constraints’ Flavours for Your Knowledge GraphThe Semantic Web10.1007/978-3-031-60635-9_13(217-234)Online publication date: 19-May-2024
https://doi.org/10.1007/978-3-031-60635-9_13
Burse RBertolotto MMcArdle G(2023)Enhancing Semantic Web Technologies Using Lexical Auditing Techniques for Quality Assurance of Biomedical OntologiesBioMedInformatics10.3390/biomedinformatics30400593:4(962-984)Online publication date: 1-Nov-2023
https://doi.org/10.3390/biomedinformatics3040059
Rabbani KLissandrini MHose K(2023)Extraction of Validating Shapes from Very Large Knowledge GraphsProceedings of the VLDB Endowment10.14778/3579075.357907816:5(1023-1032)Online publication date: 6-Mar-2023
https://dl.acm.org/doi/10.14778/3579075.3579078
Duan XChaves-Fraga DDimou A(2023)XSD2SHACL: Capturing RDF Constraints from XML SchemaProceedings of the 12th Knowledge Capture Conference 202310.1145/3587259.3627565(214-222)Online publication date: 5-Dec-2023
https://dl.acm.org/doi/10.1145/3587259.3627565
Zhao XLi FYang H(2023) RDFS (c) Schema Inconsistency Checking Based on a Key Instance and Its Query Rewriting IEEE Access10.1109/ACCESS.2023.323408411(6122-6132)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3234084
Felin RFaron CTettamanzi A(2023)A Framework to Include and Exploit Probabilistic Information in SHACL Validation ReportsThe Semantic Web10.1007/978-3-031-33455-9_6(91-104)Online publication date: 28-May-2023
https://dl.acm.org/doi/10.1007/978-3-031-33455-9_6
Candela G(2023)An automatic data quality approach to assess semantic data from cultural heritage institutionsJournal of the Association for Information Science and Technology10.1002/asi.2476174:7(866-878)Online publication date: 21-Apr-2023
https://dl.acm.org/doi/10.1002/asi.24761
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten