Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3502223.3502247acmotherconferencesArticle/Chapter ViewAbstractPublication PagesijckgConference Proceedingsconference-collections
poster
Open access

Knowledge Graph Curation: A Practical Framework

Published: 24 January 2022 Publication History

Abstract

Knowledge Graphs (KGs) have shown to be very important for applications such as personal assistants, question-answering systems, and search engines. Therefore, it is crucial to ensure their high quality. However, KGs inevitably contain errors, duplicates, and missing values, which may hinder their adoption and utility in business applications, as they are not curated, e.g., low-quality KGs produce low-quality applications that are built on top of them. In this vision paper, we propose a practical knowledge graph curation framework for improving the quality of KGs. First, we define a set of quality metrics for assessing the status of KGs, Second, we describe the verification and validation of KGs as cleaning tasks, Third, we present duplicate detection and knowledge fusion strategies for enriching KGs. Furthermore, we give insights and directions toward a better architecture for curating KGs.

References

[1]
Manel Achichi, Zohra Bellahsene, and Konstantin Todorov. 2017. Legato results for OAEI 2017. In Proceedings of the 16th International Semantic Web Conference (ISWC2017): 12th Workshop on Ontology Matching (OM2017), Vienna, Austria, October 21, 2017(CEUR Workshop Proceedings, Vol. 2032). CEUR-WS.org, 146–152.
[2]
Nikolaos Aletras and Mark Stevenson. 2013. Evaluating Topic Coherence Using Distributional Semantics. In Proceedings of the 10th International Conference on Computational Semantics, (IWCS2013), Potsdam, Germany, March 19-22, 2013. The Association for Computer Linguistics, 13–22.
[3]
Samur Araújo, Jan Hidders, Daniel Schwabe, and Arjen P. de Vries. 2011. SERIMI - Resource Description Similarity, RDF Instance Matching and Interlinking. In Proceedings of the 6th International Workshop on Ontology Matching (OM2011), Bonn, Germany, October 24, 2011(CEUR Workshop Proceedings, Vol. 814). CEUR-WS.org.
[4]
Spiros Athanasiou, Michail Alexakis, Giorgos Giannopoulos, Nikos Karagiannakis, Yannis Kouvaras, Pantelis Mitropoulos, Kostas Patroumpas, and Dimitrios Skoutas. 2019. SLIPO: Large-Scale Data Integration for Points of Interest. In Proceedings of the 22nd International Conference on Extending Database Technology (EDBT2019), Lisbon, Portugal, March 26-29, 2019. OpenProceedings.org, 574–577.
[5]
Carlo Batini, Cinzia Cappiello, Chiara Francalanci, and Andrea Maurino. 2009. Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41, 3 (2009), 16:1–16:52.
[6]
Carlo Batini and Monica Scannapieco. 2006. Data Quality: Concepts, Methodologies and Techniques. Springer.
[7]
Mikhail Bilenko and Raymond J. Mooney. 2003. Adaptive duplicate detection using learnable string similarity measures. In Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD2003), Washington, USA, August 24 - 27, 2003. ACM, 39–48.
[8]
Karlis Cerans, Guntis Barzdins, Renars Liepins, Julija Ovcinnikova, Sergejs Rikacovs, and Arturs Sprogis. 2012. Graphical Schema Editing for Stardog OWL/RDF Databases using OWLGrEd/S. In OWLED, Vol. 849.
[9]
Karen Coyle and Tom Baker. 2013. Dublin core application profiles. separating validation from semantics. In RDF Validation Workshop. Practical Assurances for Quality RDF Data, Cambridge, Ma, Boston. http://www.w3.org/2012/12/rdf-val
[10]
Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA - August 24 - 27, 2014. ACM, 601–610.
[11]
Xin Luna Dong. 2019. Building a Broad Knowledge Graph for Products. In 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, April 8-11, 2019. IEEE, 25. https://doi.org/10.1109/ICDE.2019.00010
[12]
Xin Luna Dong and Divesh Srivastava. 2015. Big Data Integration. Morgan & Claypool Publishers.
[13]
Uwe Draisbach and Felix Naumann. 2008. DuDe: The duplicate detection toolkit. In Proceedings of the 36th Conference on Very Large Data Bases (VLDB2010): Workshop on Quality in Databases, Singapore, Singapore, September 13 - 17, 2010.
[14]
Gonenc Ercan, Shady Elbassuoni, and Katja Hose. 2019. Retrieving Textual Evidence for Knowledge Graph Facts. In Proceedings of the 16th European Semantic Web Conference (ESWC 2019), Portorož, Slovenia, June 2-6, 2019(Lecture Notes in Computer Science, Vol. 11503). Springer, 52–67.
[15]
Michael Färber, Frederic Bartscherer, Carsten Menne, and Achim Rettinger. 2018. Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Semantic Web 9, 1 (2018), 77–129.
[16]
Dieter Fensel, Umutcan Simsek, Kevin Angele, Elwin Huaman, Elias Kärle, Oleksandra Panasiuk, Ioan Toma, Jürgen Umbrich, and Alexander Wahler. 2020. Knowledge Graphs - Methodology, Tools and Selected Use Cases. Springer. https://doi.org/10.1007/978-3-030-37439-6
[17]
Peter M. Fischer, Georg Lausen, Alexander Schätzle, and Michael Schmidt. 2015. RDF Constraint Checking. In Proceedings of the Workshops of the 2015 Joint Conference (EDBT/ICDT)(CEUR Workshop Proceedings, Vol. 1330). CEUR-WS.org, 205–212.
[18]
Mohamed H. Gad-Elrab, Daria Stepanova, Jacopo Urbani, and Gerhard Weikum. 2019. ExFaKT: A Framework for Explaining Facts over Knowledge Graphs and Text. In Proceedings of the 12th ACM International Conference on Web Search and Data Mining, (WSDM2019), Melbourne, Australia, February 11-15, 2019. ACM, 87–95.
[19]
Lars Marius Garshol and Axel Borge. 2013. Hafslund Sesam - An Archive on Semantics. In Proceedings of the 10th Extending Semantic Web Conference (ESWC2013), Montpellier, France, May 26-30, 2013(LNCS, Vol. 7882). Springer, 578–592.
[20]
Daniel Gerber, Diego Esteves, Jens Lehmann, Lorenz Bühmann, Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, and René Speck. 2015. DeFacto - Temporal and multilingual Deep Fact Validation. Journal of Web Semantics 35 (2015), 85–101.
[21]
Giorgos Giannopoulos, Dimitrios Skoutas, Thomas Maroulis, Nikos Karagiannakis, and Spiros Athanasiou. 2014. FAGI: A Framework for Fusing Geospatial RDF Data. In Proceedings of the Confederated International Conferences ”On the Move to Meaningful Internet Systems” (OTM2014), Amantea, Italy, October 27-31, 2014(LNCS, Vol. 8841). Springer, 553–561.
[22]
Elwin Huaman, Amar Tauqeer, Geni Bushati, and Anna Fensel. 2021. Towards Knowledge Graphs Validation through Weighted Knowledge Sources. CoRR abs/2104.12622(2021). arXiv:2104.12622https://arxiv.org/abs/2104.12622
[23]
Shengbin Jia, Yang Xiang, Xiaojun Chen, Kun Wang, and Shijia E. 2019. Triple Trustworthiness Measurement for Knowledge Graph. In Proceedings of The World Wide Web Conference, (WWW2019), San Francisco, USA, May 13-17, 2019. ACM, 2865–2871.
[24]
Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens Lehmann, Roland Cornelissen, and Amrapali Zaveri. 2014. Test-driven evaluation of linked data quality. In Proceedings of the 23rd international conference on World Wide Web. ACM, 747–758.
[25]
Huiying Li, Yuanyuan Li, Feifei Xu, and Xinyu Zhong. 2015. Probabilistic Error Detecting in Numerical Linked Data. In Database and Expert Systems Applications - 26th International Conference, DEXA 2015, Valencia, Spain, September 1-4, 2015, Proceedings, Part I(Lecture Notes in Computer Science, Vol. 9261). Springer, 61–75.
[26]
Yunfeng Li, Xiaoyong Li, and Mingjian Lei. 2020. CTransE: An Effective Information Credibility Evaluation Method Based on Classified Translating Embedding in Knowledge Graphs. In Database and Expert Systems Applications - 31st International Conference, DEXA 2020, Bratislava, Slovakia, September 14-17, 2020, Proceedings, Part II(Lecture Notes in Computer Science, Vol. 12392). Springer, 287–300.
[27]
Pablo N. Mendes, Hannes Mühleisen, and Christian Bizer. 2012. Sieve: linked data quality assessment and fusion. In Proceedings of 2nd International Workshop on Linked Web Data Management (LWDM 2012), in conjunction with the 15th International Conference on Extending Database Technology (EDBT2012): Workshops, Berlin, Germany, March 30, 2012. ACM, 116–123.
[28]
Libby Miller and Dan Brickley. 2001. RDF: Schemarama. ILRT (2001). https://web.archive.org/web/20011119222635/
[29]
Axel-Cyrille Ngonga Ngomo and Sören Auer. 2011. LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI2011), Barcelona, Spain, July 16–22, 2011. AAAI Press, 2312–2317.
[30]
Natasha F. Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, and Jamie Taylor. 2019. Industry-scale Knowledge Graphs: Lessons and Challenges. ACM Queue 17, 2 (2019), 20.
[31]
Ankur Padia, Francis Ferraro, and Tim Finin. 2018. SURFACE: Semantically Rich Fact Validation with Explanations. CoRR abs/1810.13223(2018). arxiv:1810.13223
[32]
Heiko Paulheim. 2017. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web 8, 3 (2017), 489–508.
[33]
Leo Pipino, Yang W. Lee, and Richard Y. Wang. 2002. Data quality assessment. Commun. ACM 45, 4 (2002), 211–218.
[34]
Julien Plu, Raphaël Troncy, and Giuseppe Rizzo. 2017. ADEL@OKE 2017: A Generic Method for Indexing Knowledge Bases for Entity Linking. In Proceedings of the 4th Semantic Web Evaluation Challenge at ESWC2017, Portoroz, Slovenia, May 28 - June 1, 2017(CCIS, Vol. 769). Springer, 49–55.
[35]
Anisa Rula, Matteo Palmonari, Simone Rubinacci, Axel-Cyrille Ngonga Ngomo, Jens Lehmann, Andrea Maurino, and Diego Esteves. 2019. TISCO: Temporal scoping of facts. Journal of Web Semantics 54 (2019), 72–86.
[36]
Arthur G Ryman, Arnaud Le Hors, and Steve Speicher. 2013. OSLC Resource Shape: A language for defining constraints on Linked Data.LDOW 996(2013).
[37]
Baoxu Shi and Tim Weninger. 2016. Discriminative predicate path mining for fact checking in knowledge graphs. Knowledge-Based Systems 104 (2016), 123–133.
[38]
Prashant Shiralkar, Alessandro Flammini, Filippo Menczer, and Giovanni Luca Ciampaglia. 2017. Finding Streams in Knowledge Graphs to Support Fact Checking. In Proceedings of the IEEE International Conference on Data Mining (ICDM2017), New Orleans, USA, November 18-21, 2017. IEEE Computer Society, 859–864.
[39]
Shawn Simister and Dan Brickley. 2013. Simple application-specific constraints for rdf models. In RDF Validation Workshop. Practical Assurances for Quality RDF Data, Cambridge, Ma, Boston. https://www.w3.org/2012/12/rdf-val/
[40]
Umutcan Simsek, Kevin Angele, Elias Kärle, Juliette Opdenplatz, Dennis Sommer, Jürgen Umbrich, and Dieter Fensel. 2021. Knowledge Graph Lifecycle: Building and Maintaining Knowledge Graphs. In Proceedings of the 2nd International Workshop on Knowledge Graph Construction (KGC 2021) co-located with 18th Extended Semantic Web Conference (ESWC 2021), Online, June 6th, 2021(CEUR Workshop Proceedings, Vol. 2873). CEUR-WS.org.
[41]
René Speck and Axel-Cyrille Ngonga Ngomo. 2019. Leopard - A baseline approach to attribute prediction and validation for knowledge graph population. Journal of Web Semantics 55 (2019), 102–107.
[42]
Damian Steer, Libby Miller, and Dan Brickley. 2004. Validating RDF with TreeHugger and Schematron. w3.org (2004). https://www.w3.org/2001/sw/Europe/events/foaf-galway/papers/pp/validating_rdf/
[43]
Zafar Habeeb Syed, Michael Röder, and Axel-Cyrille Ngonga Ngomo. 2018. FactCheck: Validating RDF Triples Using Textual Evidence. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, (CIKM2018), Torino, Italy, October 22-26, 2018. ACM, 1599–1602.
[44]
Zafar Habeeb Syed, Michael Röder, and Axel-Cyrille Ngonga Ngomo. 2019. Unsupervised Discovery of Corroborative Paths for Fact Validation. In Proceedings of the 18th International Semantic Web Conference (ISWC2019), Auckland, New Zealand, October 26-30, 2019(Lecture Notes in Computer Science, Vol. 11778). Springer, 630–646.
[45]
James Thorne and Andreas Vlachos. 2017. An Extensible Framework for Verification of Numerical Claims. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL2017), Valencia, Spain, April 3-7, 2017. Association for Computational Linguistics, 37–40.
[46]
Ramneesh Vaidyambath, Jeremy Debattista, Neha Srivatsa, and Rob Brennan. 2019. An Intelligent Linked Data Quality Dashboard. In Proceedings for the 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science, Galway, Ireland, December 5-6, 2019(CEUR Workshop Proceedings, Vol. 2563), Edward Curry, Mark T. Keane, Adegboyega Ojo, and Dhaval Salwala (Eds.). CEUR-WS.org, 341–352.
[47]
Julius Volz, Christian Bizer, Martin Gaedke, and Georgi Kobilarov. 2009. Discovering and Maintaining Links on the Web of Data. In Proceedings of the 8th International Semantic Web Conference (ISWC 2009), Chantilly, USA, October 25-29, 2009(LNCS, Vol. 5823). Springer, 650–665.
[48]
Richard Y. Wang. 1998. A Product Perspective on Total Data Quality Management. Commun. ACM 41, 2 (1998), 58–65.
[49]
Richard Y. Wang, Mostapha Ziad, and Yang W. Lee. 2001. Data Quality. Advances in Database Systems, Vol. 23. Kluwer.
[50]
Dominik Wienand and Heiko Paulheim. 2014. Detecting Incorrect Numerical Data in DBpedia. In The Semantic Web: Trends and Challenges - 11th International Conference, ESWC 2014, Anissaras, Crete, Greece, May 25-29, 2014. Proceedings(Lecture Notes in Computer Science, Vol. 8465), Valentina Presutti, Claudia d’Amato, Fabien Gandon, Mathieu d’Aquin, Steffen Staab, and Anna Tordai (Eds.). Springer, 504–518. https://doi.org/10.1007/978-3-319-07443-6_34
[51]
Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, and Sören Auer. 2016. Quality assessment for Linked Data: A Survey. Semantic Web 7, 1 (2016), 63–93.

Cited By

View all
  • (2025)A review on the reliability of knowledge graph: from a knowledge representation learning perspectiveWorld Wide Web10.1007/s11280-024-01316-w28:1Online publication date: 1-Jan-2025
  • (2025)CKGLD: Curating Knowledge Graph for Linked Open Data Generation for Cyber Law and EthicsEmerging Trends and Technologies on Intelligent Systems10.1007/978-981-97-5703-9_31(363-371)Online publication date: 28-Feb-2025
  • (2024)A Method for Constructing an Urban Waterlogging Emergency Knowledge Graph Based on Spatiotemporal ProcessesISPRS International Journal of Geo-Information10.3390/ijgi1310034913:10(349)Online publication date: 3-Oct-2024
  • Show More Cited By

Index Terms

  1. Knowledge Graph Curation: A Practical Framework
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      IJCKG '21: Proceedings of the 10th International Joint Conference on Knowledge Graphs
      December 2021
      204 pages
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 January 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Knowledge graph assessment
      2. Knowledge graph cleaning
      3. Knowledge graph curation
      4. Knowledge graph enrichment

      Qualifiers

      • Poster
      • Research
      • Refereed limited

      Conference

      IJCKG'21

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)305
      • Downloads (Last 6 weeks)24
      Reflects downloads up to 05 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)A review on the reliability of knowledge graph: from a knowledge representation learning perspectiveWorld Wide Web10.1007/s11280-024-01316-w28:1Online publication date: 1-Jan-2025
      • (2025)CKGLD: Curating Knowledge Graph for Linked Open Data Generation for Cyber Law and EthicsEmerging Trends and Technologies on Intelligent Systems10.1007/978-981-97-5703-9_31(363-371)Online publication date: 28-Feb-2025
      • (2024)A Method for Constructing an Urban Waterlogging Emergency Knowledge Graph Based on Spatiotemporal ProcessesISPRS International Journal of Geo-Information10.3390/ijgi1310034913:10(349)Online publication date: 3-Oct-2024
      • (2023)Developing and implementing a superconnector of producers in the printing industry to facilitate book historical researchProceedings of the 34th ACM Conference on Hypertext and Social Media10.1145/3603163.3609061(1-6)Online publication date: 4-Sep-2023
      • (2023)Knowledge graph-based manufacturing process planning: A state-of-the-art reviewJournal of Manufacturing Systems10.1016/j.jmsy.2023.08.00670(417-435)Online publication date: Oct-2023
      • (2023)Getting Quechua Closer to Final Users Through Knowledge GraphsInformation Management and Big Data10.1007/978-3-031-35445-8_5(61-69)Online publication date: 11-Jun-2023
      • (2022)Interactive Search on the Web: The Story So FarInformation10.3390/info1307032413:7(324)Online publication date: 4-Jul-2022
      • (2022)Workflow for Domain- and Task-Sensitive Curation of Knowledge Graphs, with Use Case of DRKG2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020536(3692-3701)Online publication date: 17-Dec-2022
      • (2022)On Contrasting YAGO with GPT-J: An Experiment for Person-Related AttributesKnowledge Graphs and Semantic Web10.1007/978-3-031-21422-6_17(234-245)Online publication date: 13-Nov-2022

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media