Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3167132.3167341acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

RDF shape induction using knowledge base profiling

Published: 09 April 2018 Publication History

Abstract

Knowledge Graphs (KGs) are becoming the core of most artificial intelligent and cognitive applications. Popular KGs such as DBpedia and Wikidata have chosen the RDF data model to represent their data. Despite the advantages, there are challenges in using RDF data, for example, data validation. Ontologies for specifying domain conceptualizations in RDF data are designed for entailments rather than validation. Most ontologies lack the granular information needed for validating constraints. Recent work on RDF Shapes and standardization of languages such as SHACL and ShEX provide better mechanisms for representing integrity constraints for RDF data. However, manually creating constraints for large KGs is still a tedious task. In this paper, we present a data driven approach for inducing integrity constraints for RDF data using data profiling. Those constraints can be combined into RDF Shapes and can be used to validate RDF graphs. Our method is based on machine learning techniques to automatically generate RDF shapes using profiled RDF data as features. In the experiments, the proposed approach achieved 97% precision in deriving RDF Shapes with cardinality constraints for a subset of DBpedia data.

References

[1]
Ziawasch Abedjan and Felix Naumann. 2013. Improving RDF Data Through Association Rule Mining. Datenbank-Spektrum 13, 2 (01 Jul 2013), 111--120.
[2]
Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of databases: the logical level. (1995).
[3]
Rakesh Agrawal, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, A Inkeri Verkamo, et al. 1996. Fast discovery of association rules. Advances in knowledge discovery and data mining 12, 1 (1996), 307--328.
[4]
Adrien Basse, Fabien Gandon, Isabelle Mirbel, and Moussa Lo. 2010. DFS-based frequent graph pattern extraction to characterize the content of RDF Triple Stores. In Web Science Conference 2010 (WebSci10).
[5]
Christopher M Bishop. 2006. Pattern recognition and machine learning. springer.
[6]
Peter Bloem and Gerben K. D. De Vries. 2014. Machine Learning on Linked Data, a Position Paper. In Proceedings of the 1st International Conference on Linked Data for Knowledge Discovery - Volume 1232 (LD4KD'14). CEUR-WS.org, Aachen, Germany, Germany, 64--68. http://dl.acm.org/citation.cfm?id=3053827.3053834
[7]
Eva Blomqvist, Ziqi Zhang, Anna Lisa Gentile, Isabelle Augenstein, and Fabio Ciravegna. 2013. Statistical knowledge patterns for characterising linked data. In Proceedings of the 4th International Conference on Ontology and Semantic Web Patterns-Volume 1188. CEUR-WS. org, 1--13.
[8]
Lorenz Bühmann, Daniel Fleischhacker, Jens Lehmann, Andre Melo, and Johanna Völker. 2014. Inductive lexical learning of class expressions. In International Conference on Knowledge Engineering and Knowledge Management. Springer, 42--53.
[9]
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321--357.
[10]
Luc De Raedt, Tias Guns, and Siegfried Nijssen. 2010. Constraint programming for data mining and machine learning. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10). 1671--1675.
[11]
David A Freedman. 2009. Statistical models: theory and practice. cambridge university press.
[12]
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.
[13]
Johannes Fürnkranz and Peter A Flach. 2005. Roc 'n' rule learning - towards a better understanding of covering algorithms. Machine Learning 58, 1 (2005), 39--77.
[14]
Lise Getoor and Ben Taskar. 2007. Introduction to statistical relational learning. MIT press.
[15]
TinKamHo. 1995. Random decision forests. In Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on, Vol. 1. IEEE, 278--282.
[16]
Aidan Hogan, Andreas Harth, Alexandre Passant, Stefan Decker, and Axel Polleres. 2010. Weaving the Pedantic Web. In Proceedings of the Linked Data on the Web (LDOW 2010), Vol. 628. CEUR Workshop Proceedings.
[17]
Theodore Johnson. 2009. Data Profiling. In Encyclopedia of Database Systems, LING LIU and M. TAMER ÖZSU (Eds.). Springer US, Boston, MA, 604--608.
[18]
Hassan Khosravi and Bahareh Bina. 2010. A Survey on Statistical Relational Learning. In Canadian Conference on AI. Springer, 256--268.
[19]
Holger Knublauch and Dimitris Kontokostas. 2017. W3C Shapes Constraint Language (SHACL). (July 2017). https://www.w3.org/TR/shacl/
[20]
Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning. The MIT Press.
[21]
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van Kleef, Sören Auer, et al. 2015. DBpedia-a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6, 2 (2015), 167--195.
[22]
Stephen W Liddle, David W Embley, and Scott N Woodfield. 1993. Cardinality constraints in semantic data models. Data & Knowledge Engineering 11, 3 (1993), 235--270.
[23]
Deborah L McGuinness, Frank Van Harmelen, et al. 2004. OWL web ontology language overview. W3C recommendation 10, 10 (2004), 2004.
[24]
Nandana Mihindukulasooriya, María Poveda-Villalón, Raúl García-Castro, and Asunción Gómez-Pérez. 2015. Loupe-An Online Tool for Inspecting Datasets in the Linked Data Cloud. In Demo at the 14th International Semantic Web Conference. Bethlehem, USA.
[25]
Thomas Neumann and Guido Moerkotte. 2011. Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In Data Engineering (ICDE), 2011 IEEE 27th International Conference on. IEEE, 984--994.
[26]
Eric Prud'hommeaux, Iovka Boneva, Jose Emilio Labra-Gayo, and Gregg Kellogg. 2017. Shape Expressions Language 2.0. (July 2017). http://shex.io/shex-semantics/
[27]
Eric Prud'hommeaux, Jose Emilio Labra Gayo, and Harold Solbrig. 2014. Shape expressions: an RDF validation and transformation language. In Proceedings of the 10th International Conference on Semantic Systems. ACM, 32--40.
[28]
J Ross Quinlan. 2014. C4. 5: programs for machine learning. (2014).
[29]
Dan Steinberg and Phillip Colla. 2009. CART: classification and regression trees. The top ten algorithms in data mining 9 (2009), 179.
[30]
Johan AK Suykens, Tony Van Gestel, and Jos De Brabanter. 2002. Least squares support vector machines. World Scientific.
[31]
Jiao Tao, Evren Sirin, Jie Bao, and Deborah L McGuinness. 2010. Extending OWL with Integrity Constraints. Description Logics 573 (2010).
[32]
Giri Kumar Tayi and Donald P Ballou. 1998. Examining Data Quality. Commun. ACM 41, 2 (1998), 54--57.
[33]
Raphael Troncy and Giuseppe Rizzo et al. 2017. 3cixty: Building Comprehensive Knowledge Bases for City Exploration. Web Semantics: Science, Services and Agents on the World Wide Web 46-47, Supplement C (2017), 2 -- 13.
[34]
WEKA. 2013. Weka Manual for Version 3-7-8. Technical Report. WEKA. https://pdfs.semanticscholar.org/d617/d41097bdf97d994d1481adbcfe0c05a51696.pdf

Cited By

View all
  • (2024)Empirical ontology design patterns and shapes from WikidataSemantic Web10.3233/SW-243613(1-25)Online publication date: 20-Mar-2024
  • (2024)From Shapes to Shapes: Inferring SHACL Shapes for Results of SPARQL CONSTRUCT QueriesProceedings of the ACM Web Conference 202410.1145/3589334.3645550(2064-2074)Online publication date: 13-May-2024
  • (2024)SCOOP All the Constraints’ Flavours for Your Knowledge GraphThe Semantic Web10.1007/978-3-031-60635-9_13(217-234)Online publication date: 19-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing
April 2018
2327 pages
ISBN:9781450351911
DOI:10.1145/3167132
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 April 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. RDF shape
  2. data quality
  3. knowledge base
  4. machine learning

Qualifiers

  • Research-article

Funding Sources

  • Ministerio de Economía, Industria y Competitividad

Conference

SAC 2018
Sponsor:
SAC 2018: Symposium on Applied Computing
April 9 - 13, 2018
Pau, France

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Empirical ontology design patterns and shapes from WikidataSemantic Web10.3233/SW-243613(1-25)Online publication date: 20-Mar-2024
  • (2024)From Shapes to Shapes: Inferring SHACL Shapes for Results of SPARQL CONSTRUCT QueriesProceedings of the ACM Web Conference 202410.1145/3589334.3645550(2064-2074)Online publication date: 13-May-2024
  • (2024)SCOOP All the Constraints’ Flavours for Your Knowledge GraphThe Semantic Web10.1007/978-3-031-60635-9_13(217-234)Online publication date: 19-May-2024
  • (2023)Enhancing Semantic Web Technologies Using Lexical Auditing Techniques for Quality Assurance of Biomedical OntologiesBioMedInformatics10.3390/biomedinformatics30400593:4(962-984)Online publication date: 1-Nov-2023
  • (2023)Extraction of Validating Shapes from Very Large Knowledge GraphsProceedings of the VLDB Endowment10.14778/3579075.357907816:5(1023-1032)Online publication date: 6-Mar-2023
  • (2023)XSD2SHACL: Capturing RDF Constraints from XML SchemaProceedings of the 12th Knowledge Capture Conference 202310.1145/3587259.3627565(214-222)Online publication date: 5-Dec-2023
  • (2023) RDFS (c) Schema Inconsistency Checking Based on a Key Instance and Its Query Rewriting IEEE Access10.1109/ACCESS.2023.323408411(6122-6132)Online publication date: 2023
  • (2023)A Framework to Include and Exploit Probabilistic Information in SHACL Validation ReportsThe Semantic Web10.1007/978-3-031-33455-9_6(91-104)Online publication date: 28-May-2023
  • (2023)An automatic data quality approach to assess semantic data from cultural heritage institutionsJournal of the Association for Information Science and Technology10.1002/asi.2476174:7(866-878)Online publication date: 21-Apr-2023
  • (2022)Learning SHACL shapes from knowledge graphsSemantic Web10.3233/SW-22306314:1(101-121)Online publication date: 30-Nov-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media