research-article

Public Access

Ontological Pathfinding

Authors:

Daisy Zhe Wang,

Soumitra Siddharth JohriAuthors Info & Claims

SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

Pages 835 - 846

https://doi.org/10.1145/2882903.2882954

Published: 14 June 2016 Publication History

Abstract

Recent years have seen a drastic rise in the construction of web-scale knowledge bases (e.g., Freebase, YAGO, DBPedia). These knowledge bases store structured information about real-world people, places, organizations, etc. However, due to limitations of human knowledge and information extraction algorithms, these knowledge bases are still far from complete. In this paper, we study the problem of mining first-order inference rules to facilitate knowledge expansion. We propose the Ontological Pathfinding algorithm (OP) that scales to web-scale knowledge bases via a series of parallelization and optimization techniques: a relational knowledge base model to apply inference rules in batches, a new rule mining algorithm that parallelizes the join queries, a novel partitioning algorithm to break the mining tasks into smaller independent sub-tasks, and a pruning strategy to eliminate unsound and resource-consuming rules before applying them. Combining these techniques, we develop the first rule mining system that scales to Freebase, the largest public knowledge base with 112 million entities and 388 million facts. We mine 36,625 inference rules in 34 hours; no existing approach achieves this scale.

References

[1]

R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In ACM SIGMOD Record, 1993.

Digital Library

[2]

R. Agrawal, R. Srikant, et al. Fast algorithms for mining association rules. In Proceedings of the VLDB Endowment, 1994.

Digital Library

[3]

S. Arumugam, A. Dobra, C. M. Jermaine, N. Pansare, and L. Perez. The datapath system: a data-centric analytic processing engine for large data warehouses. In SIGMOD. ACM, 2010.

Digital Library

[4]

S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. Springer, 2007.

[5]

M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction for the web. In IJCAI, 2007.

Digital Library

[6]

J. Biega, E. Kuzey, and F. M. Suchanek. Inside yago2s: a transparent information extraction architecture. In WWW, 2013.

Digital Library

[7]

G. O. Blog. Introducing the knowledge graph: thing, not strings. http://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-no%t.html.

[8]

K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD. ACM, 2008.

Digital Library

[9]

A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka Jr, and T. M. Mitchell. Toward an architecture for never-ending language learning. In AAAI, volume 5, page 3, 2010.

Digital Library

[10]

A. Carlson, J. Betteridge, R. C. Wang, E. R. Hruschka Jr, and T. M. Mitchell. Coupled semi-supervised learning for information extraction. In Proceedings of WSCM, 2010.

Digital Library

[11]

C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, and N. Weizenbaum. Flumejava: easy, efficient data-parallel pipelines. In ACM Sigplan Notices, volume 45, pages 363--375. ACM, 2010.

Digital Library

[12]

Y. Chen and D. Z. Wang. Knowledge expansion over probabilistic knowledge bases. In SIGMOD Conference, pages 649--660, 2014.

Digital Library

[13]

Y. Cheng, C. Qin, and F. Rusu. Glade: big data analytics made easy. In SIGMOD, 2012.

Digital Library

[14]

P. Clark, J. Thompson, and B. Porter. A knowledge-based approach to question-answering. In In the AAAI Fall Symposium on Question-Answering Systems, AAAI, 1999.

[15]

J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.

Digital Library

[16]

X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In SIGKDD, 2014.

Digital Library

[17]

M. Elseidy, E. Abdelhamid, S. Skiadopoulos, and P. Kalnis. Grami: Frequent subgraph and pattern mining in a single large graph. Proceedings of the VLDB Endowment, 2014.

Digital Library

[18]

O. Etzioni, A. Fader, J. Christensen, S. Soderland, and M. Mausam. Open information extraction: The second generation. In IJCAI, 2011.

Digital Library

[19]

A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. In EMNLP, 2011.

Digital Library

[20]

L. A. Galárraga, C. Teflioudi, K. Hose, and F. Suchanek. Amie: association rule mining under incomplete evidence in ontological knowledge bases. In Proceedings of the 22nd international conference on World Wide Web, 2013.

Digital Library

[21]

J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, 2012.

Digital Library

[22]

J. Han and J. Pei. Mining frequent patterns by pattern-growth: methodology and implications. ACM SIGKDD explorations newsletter, 2000.

Digital Library

[23]

J. M. Hellerstein, C. Ré, F. Schoppmann, D. Z. Wang, E. Fratkin, A. Gorajek, K. S. Ng, C. Welton, X. Feng, K. Li, et al. The madlib analytics library: or mad skills, the sql. VLDB, 2012.

Digital Library

[24]

J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum. Yago2: a spatially and temporally enhanced knowledge base from wikipedia. Artificial Intelligence, 194:28--61, 2013.

Digital Library

[25]

A. Horn. On sentences which are true of direct unions of algebras. The Journal of Symbolic Logic, 1951.

[26]

T. N. Huynh. Discriminative learning with markov logic networks. Technical report, DTIC Document, 2009.

[27]

S. Kok. Structure Learning in Markov Logic Networks. PhD thesis, University of Washington, 2010.

Digital Library

[28]

M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Data Mining, 2001. ICDM 2001. IEEE, 2001.

Digital Library

[29]

M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph*. Data mining and knowledge discovery, 2005.

Digital Library

[30]

N. Lao, T. Mitchell, and W. W. Cohen. Random walk inference and learning in a large scale knowledge base. In Proceedings of EMNLP, 2011.

Digital Library

[31]

K. Li, D. Z. Wang, A. Dobra, and C. Dudley. Uda-gist: an in-database framework to unify data-parallel and state-parallel analytics. Proceedings of the VLDB Endowment, 2015.

Digital Library

[32]

T. Lin, O. Etzioni, et al. Identifying functional relations in web text. In EMNLP, 2010.

Digital Library

[33]

Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. VLDB, 2012.

Digital Library

[34]

Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Graphlab: A new parallel framework for machine learning. In UAI, July 2010.

Digital Library

[35]

F. Mahdisoltani, J. Biega, and F. Suchanek. Yago3: A knowledge base from multilingual wikipedias. In CIDR, 2015.

[36]

S. Muggleton. Inductive logic programming: derivations, successes and shortcomings. ACM SIGART Bulletin, 1994.

Digital Library

[37]

S. Muggleton. Inverse entailment and progol. New generation computing, 1995.

[38]

F. Niu, C. Ré, A. Doan, and J. Shavlik. Tuffy: Scaling up statistical inference in markov logic networks using an rdbms. VLDB, 2011.

Digital Library

[39]

F. Niu, C. Zhang, C. Ré, and J. W. Shavlik. Deepdive: Web-scale knowledge-base construction using statistical learning and inference. In VLDS, pages 25--28, 2012.

[40]

J. S. Park, M.-S. Chen, and P. S. Yu. An effective hash-based algorithm for mining association rules. SIGMOD Record, 1995.

Digital Library

[41]

J. R. Quinlan. Learning logical definitions from relations. Machine learning, 5(3):239--266, 1990.

Digital Library

[42]

B. L. Richards and R. J. Mooney. Learning relations by pathfinding. In Proc. of AAAI-92, 1992.

Digital Library

[43]

M. Richardson and P. Domingos. Markov logic networks. Machine learning, 62(1--2):107--136, 2006.

Digital Library

[44]

A. Ritter, D. Downey, S. Soderland, and O. Etzioni. It's a contradiction--no, it's not: a case study using functional relations. In EMNLP, 2008.

Digital Library

[45]

A. Savasere, E. Omiecinski, and S. B. Navathe. An efficient algorithm for mining association rules in large databases. In Proceedings of the VLDB Endowment, 1995.

Digital Library

[46]

S. Schoenmackers, O. Etzioni, and D. S. Weld. Scaling textual inference to the web. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2008.

Digital Library

[47]

S. Schoenmackers, O. Etzioni, D. S. Weld, and J. Davis. Learning first-order horn clauses from web text. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010.

Digital Library

[48]

J. Shin, S. Wu, F. Wang, C. De Sa, C. Zhang, and C. Ré. Incremental knowledge base construction using deepdive. Proceedings of the VLDB Endowment, 2015.

Digital Library

[49]

B. Tausend. Representing biases for inductive logic programming. In Machine Learning: ECML-94. Springer, 1994.

Digital Library

[50]

D. Z. Wang, Y. Chen, C. Grant, and K. Li. Efficient in-database analytics with graphical models. IEEE Data Engineering Bulletin, 2014.

[51]

D. Z. Wang, M. J. Franklin, M. Garofalakis, J. M. Hellerstein, and M. L. Wick. Hybrid in-database inference for declarative information extraction. In SIGMOD, 2011.

Digital Library

[52]

D. Wijaya, P. P. Talukdar, and T. Mitchell. Pidgin: ontology alignment using web text as interlingua. In CIKM, 2013.

Digital Library

[53]

W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: A probabilistic taxonomy for text understanding. In SIGMOD. ACM, 2012.

Digital Library

[54]

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI. USENIX Association, 2012.

Digital Library

[55]

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, pages 10--10, 2010.

Digital Library

[56]

L. Zou, L. Chen, and M. T. Özsu. Distance-join: Pattern match query in a large graph database. Proceedings of VLDB, 2009.

Digital Library

Cited By

Li HLiu HWang YXin GWei Y(2023)DegreEmbed: Incorporating entity embedding into logic rule learning for knowledge graph reasoningSemantic Web10.3233/SW-23341314:6(1099-1119)Online publication date: 13-Dec-2023
https://doi.org/10.3233/SW-233413
Liu XLu HLi H(2023)Intelligent generation method of emergency plan for hydraulic engineering based on knowledge graph – take the South-to-North Water Diversion Project as an exampleLHB10.1080/27678490.2022.2153629108:1Online publication date: 18-Jan-2023
https://doi.org/10.1080/27678490.2022.2153629
Schramm SWehner CSchmid U(2023)Comprehensible Artificial Intelligence on Knowledge GraphsWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2023.10080679:COnline publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1016/j.websem.2023.100806
Show More Cited By

Index Terms

Ontological Pathfinding

Recommendations

Knowledge vault: a web-scale approach to probabilistic knowledge fusion
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

Recent years have witnessed a proliferation of large-scale knowledge bases, including Wikipedia, Freebase, YAGO, Microsoft's Satori, and Google's Knowledge Graph. To increase the scale even further, we need to explore automatic methods for constructing ...
Knowledge structure, knowledge granulation and knowledge distance in a knowledge base

One of the strengths of rough set theory is the fact that an unknown target concept can be approximately characterized by existing knowledge structures in a knowledge base. Knowledge structures in knowledge bases have two categories: complete and ...
Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

We report on the construction of the Wikidata Vandalism Corpus WDVC-2015, the first corpus for vandalism in knowledge bases. Our corpus is based on the entire revision history of Wikidata, the knowledge base underlying Wikipedia. Among Wikidata's 24 ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

June 2016

2300 pages

ISBN:9781450335317

DOI:10.1145/2882903

General Chairs:
Fatma Özcan
IBM Research, USA
,
Georgia Koutrika
HP Labs, USA
,
Program Chair:
Sam Madden
Massachusetts Institute of Technology, USA

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

SIGMOD/PODS'16

Sponsor:

SIGMOD

SIGMOD/PODS'16: International Conference on Management of Data

June 26 - July 1, 2016

California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

38
Total Citations
View Citations
1,332
Total Downloads

Downloads (Last 12 months)155
Downloads (Last 6 weeks)26

Reflects downloads up to 11 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li HLiu HWang YXin GWei Y(2023)DegreEmbed: Incorporating entity embedding into logic rule learning for knowledge graph reasoningSemantic Web10.3233/SW-23341314:6(1099-1119)Online publication date: 13-Dec-2023
Liu XLu HLi H(2023)Intelligent generation method of emergency plan for hydraulic engineering based on knowledge graph – take the South-to-North Water Diversion Project as an exampleLHB10.1080/27678490.2022.2153629108:1Online publication date: 18-Jan-2023
Schramm SWehner CSchmid U(2023)Comprehensible Artificial Intelligence on Knowledge GraphsWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2023.10080679:COnline publication date: 1-Dec-2023
Xu ZYe PLi JChen HZhang W(2023)Differentiable learning of rules with constants in knowledge graphKnowledge-Based Systems10.1016/j.knosys.2023.110686275:COnline publication date: 5-Sep-2023
Qiu QWang BMa KLü HTao LXie Z(2023)A Practical Approach to Constructing a Geological Knowledge Graph: A Case Study of Mineral Exploration DataJournal of Earth Science10.1007/s12583-023-1809-334:5(1374-1389)Online publication date: 18-Oct-2023
Meilicke CChekol MBetz PFink MStuckeschmidt H(2023)Anytime bottom-up rule learning for large-scale knowledge graph completionThe VLDB Journal10.1007/s00778-023-00800-533:1(131-161)Online publication date: 16-Jun-2023
Qudus URöder MKirrane SNgomo A(2023)TemporalFC: A Temporal Fact Checking Approach over Knowledge GraphsThe Semantic Web – ISWC 202310.1007/978-3-031-47240-4_25(465-483)Online publication date: 27-Oct-2023
Jin LYao ZChen MChen HZhang W(2023)A Comprehensive Study on Knowledge Graph Embedding over Relational Patterns Based on Rule LearningThe Semantic Web – ISWC 202310.1007/978-3-031-47240-4_16(290-308)Online publication date: 27-Oct-2023
Qudus URöder MSaleem MNgonga Ngomo A(2022)HybridFC: A Hybrid Fact-Checking Approach for Knowledge GraphsThe Semantic Web – ISWC 202210.1007/978-3-031-19433-7_27(462-480)Online publication date: 16-Oct-2022
Abedjan ZGolab LNaumann FPapenbrock T(2022)Data ProfilingundefinedOnline publication date: 25-Feb-2022
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents