Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2882903.2882954acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Ontological Pathfinding

Published: 14 June 2016 Publication History

Abstract

Recent years have seen a drastic rise in the construction of web-scale knowledge bases (e.g., Freebase, YAGO, DBPedia). These knowledge bases store structured information about real-world people, places, organizations, etc. However, due to limitations of human knowledge and information extraction algorithms, these knowledge bases are still far from complete. In this paper, we study the problem of mining first-order inference rules to facilitate knowledge expansion. We propose the Ontological Pathfinding algorithm (OP) that scales to web-scale knowledge bases via a series of parallelization and optimization techniques: a relational knowledge base model to apply inference rules in batches, a new rule mining algorithm that parallelizes the join queries, a novel partitioning algorithm to break the mining tasks into smaller independent sub-tasks, and a pruning strategy to eliminate unsound and resource-consuming rules before applying them. Combining these techniques, we develop the first rule mining system that scales to Freebase, the largest public knowledge base with 112 million entities and 388 million facts. We mine 36,625 inference rules in 34 hours; no existing approach achieves this scale.

References

[1]
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In ACM SIGMOD Record, 1993.
[2]
R. Agrawal, R. Srikant, et al. Fast algorithms for mining association rules. In Proceedings of the VLDB Endowment, 1994.
[3]
S. Arumugam, A. Dobra, C. M. Jermaine, N. Pansare, and L. Perez. The datapath system: a data-centric analytic processing engine for large data warehouses. In SIGMOD. ACM, 2010.
[4]
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. Springer, 2007.
[5]
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction for the web. In IJCAI, 2007.
[6]
J. Biega, E. Kuzey, and F. M. Suchanek. Inside yago2s: a transparent information extraction architecture. In WWW, 2013.
[7]
G. O. Blog. Introducing the knowledge graph: thing, not strings. http://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-no%t.html.
[8]
K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD. ACM, 2008.
[9]
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka Jr, and T. M. Mitchell. Toward an architecture for never-ending language learning. In AAAI, volume 5, page 3, 2010.
[10]
A. Carlson, J. Betteridge, R. C. Wang, E. R. Hruschka Jr, and T. M. Mitchell. Coupled semi-supervised learning for information extraction. In Proceedings of WSCM, 2010.
[11]
C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, and N. Weizenbaum. Flumejava: easy, efficient data-parallel pipelines. In ACM Sigplan Notices, volume 45, pages 363--375. ACM, 2010.
[12]
Y. Chen and D. Z. Wang. Knowledge expansion over probabilistic knowledge bases. In SIGMOD Conference, pages 649--660, 2014.
[13]
Y. Cheng, C. Qin, and F. Rusu. Glade: big data analytics made easy. In SIGMOD, 2012.
[14]
P. Clark, J. Thompson, and B. Porter. A knowledge-based approach to question-answering. In In the AAAI Fall Symposium on Question-Answering Systems, AAAI, 1999.
[15]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.
[16]
X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In SIGKDD, 2014.
[17]
M. Elseidy, E. Abdelhamid, S. Skiadopoulos, and P. Kalnis. Grami: Frequent subgraph and pattern mining in a single large graph. Proceedings of the VLDB Endowment, 2014.
[18]
O. Etzioni, A. Fader, J. Christensen, S. Soderland, and M. Mausam. Open information extraction: The second generation. In IJCAI, 2011.
[19]
A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. In EMNLP, 2011.
[20]
L. A. Galárraga, C. Teflioudi, K. Hose, and F. Suchanek. Amie: association rule mining under incomplete evidence in ontological knowledge bases. In Proceedings of the 22nd international conference on World Wide Web, 2013.
[21]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, 2012.
[22]
J. Han and J. Pei. Mining frequent patterns by pattern-growth: methodology and implications. ACM SIGKDD explorations newsletter, 2000.
[23]
J. M. Hellerstein, C. Ré, F. Schoppmann, D. Z. Wang, E. Fratkin, A. Gorajek, K. S. Ng, C. Welton, X. Feng, K. Li, et al. The madlib analytics library: or mad skills, the sql. VLDB, 2012.
[24]
J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum. Yago2: a spatially and temporally enhanced knowledge base from wikipedia. Artificial Intelligence, 194:28--61, 2013.
[25]
A. Horn. On sentences which are true of direct unions of algebras. The Journal of Symbolic Logic, 1951.
[26]
T. N. Huynh. Discriminative learning with markov logic networks. Technical report, DTIC Document, 2009.
[27]
S. Kok. Structure Learning in Markov Logic Networks. PhD thesis, University of Washington, 2010.
[28]
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Data Mining, 2001. ICDM 2001. IEEE, 2001.
[29]
M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph*. Data mining and knowledge discovery, 2005.
[30]
N. Lao, T. Mitchell, and W. W. Cohen. Random walk inference and learning in a large scale knowledge base. In Proceedings of EMNLP, 2011.
[31]
K. Li, D. Z. Wang, A. Dobra, and C. Dudley. Uda-gist: an in-database framework to unify data-parallel and state-parallel analytics. Proceedings of the VLDB Endowment, 2015.
[32]
T. Lin, O. Etzioni, et al. Identifying functional relations in web text. In EMNLP, 2010.
[33]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. VLDB, 2012.
[34]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Graphlab: A new parallel framework for machine learning. In UAI, July 2010.
[35]
F. Mahdisoltani, J. Biega, and F. Suchanek. Yago3: A knowledge base from multilingual wikipedias. In CIDR, 2015.
[36]
S. Muggleton. Inductive logic programming: derivations, successes and shortcomings. ACM SIGART Bulletin, 1994.
[37]
S. Muggleton. Inverse entailment and progol. New generation computing, 1995.
[38]
F. Niu, C. Ré, A. Doan, and J. Shavlik. Tuffy: Scaling up statistical inference in markov logic networks using an rdbms. VLDB, 2011.
[39]
F. Niu, C. Zhang, C. Ré, and J. W. Shavlik. Deepdive: Web-scale knowledge-base construction using statistical learning and inference. In VLDS, pages 25--28, 2012.
[40]
J. S. Park, M.-S. Chen, and P. S. Yu. An effective hash-based algorithm for mining association rules. SIGMOD Record, 1995.
[41]
J. R. Quinlan. Learning logical definitions from relations. Machine learning, 5(3):239--266, 1990.
[42]
B. L. Richards and R. J. Mooney. Learning relations by pathfinding. In Proc. of AAAI-92, 1992.
[43]
M. Richardson and P. Domingos. Markov logic networks. Machine learning, 62(1--2):107--136, 2006.
[44]
A. Ritter, D. Downey, S. Soderland, and O. Etzioni. It's a contradiction--no, it's not: a case study using functional relations. In EMNLP, 2008.
[45]
A. Savasere, E. Omiecinski, and S. B. Navathe. An efficient algorithm for mining association rules in large databases. In Proceedings of the VLDB Endowment, 1995.
[46]
S. Schoenmackers, O. Etzioni, and D. S. Weld. Scaling textual inference to the web. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2008.
[47]
S. Schoenmackers, O. Etzioni, D. S. Weld, and J. Davis. Learning first-order horn clauses from web text. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010.
[48]
J. Shin, S. Wu, F. Wang, C. De Sa, C. Zhang, and C. Ré. Incremental knowledge base construction using deepdive. Proceedings of the VLDB Endowment, 2015.
[49]
B. Tausend. Representing biases for inductive logic programming. In Machine Learning: ECML-94. Springer, 1994.
[50]
D. Z. Wang, Y. Chen, C. Grant, and K. Li. Efficient in-database analytics with graphical models. IEEE Data Engineering Bulletin, 2014.
[51]
D. Z. Wang, M. J. Franklin, M. Garofalakis, J. M. Hellerstein, and M. L. Wick. Hybrid in-database inference for declarative information extraction. In SIGMOD, 2011.
[52]
D. Wijaya, P. P. Talukdar, and T. Mitchell. Pidgin: ontology alignment using web text as interlingua. In CIKM, 2013.
[53]
W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: A probabilistic taxonomy for text understanding. In SIGMOD. ACM, 2012.
[54]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI. USENIX Association, 2012.
[55]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, pages 10--10, 2010.
[56]
L. Zou, L. Chen, and M. T. Özsu. Distance-join: Pattern match query in a large graph database. Proceedings of VLDB, 2009.

Cited By

View all
  • (2023)DegreEmbed: Incorporating entity embedding into logic rule learning for knowledge graph reasoningSemantic Web10.3233/SW-23341314:6(1099-1119)Online publication date: 13-Dec-2023
  • (2023)Intelligent generation method of emergency plan for hydraulic engineering based on knowledge graph – take the South-to-North Water Diversion Project as an exampleLHB10.1080/27678490.2022.2153629108:1Online publication date: 18-Jan-2023
  • (2023)Comprehensible Artificial Intelligence on Knowledge GraphsWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2023.10080679:COnline publication date: 1-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data mining
  2. first-order logic
  3. knowledge bases
  4. scalability

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'16
Sponsor:
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)155
  • Downloads (Last 6 weeks)26
Reflects downloads up to 11 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)DegreEmbed: Incorporating entity embedding into logic rule learning for knowledge graph reasoningSemantic Web10.3233/SW-23341314:6(1099-1119)Online publication date: 13-Dec-2023
  • (2023)Intelligent generation method of emergency plan for hydraulic engineering based on knowledge graph – take the South-to-North Water Diversion Project as an exampleLHB10.1080/27678490.2022.2153629108:1Online publication date: 18-Jan-2023
  • (2023)Comprehensible Artificial Intelligence on Knowledge GraphsWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2023.10080679:COnline publication date: 1-Dec-2023
  • (2023)Differentiable learning of rules with constants in knowledge graphKnowledge-Based Systems10.1016/j.knosys.2023.110686275:COnline publication date: 5-Sep-2023
  • (2023)A Practical Approach to Constructing a Geological Knowledge Graph: A Case Study of Mineral Exploration DataJournal of Earth Science10.1007/s12583-023-1809-334:5(1374-1389)Online publication date: 18-Oct-2023
  • (2023)Anytime bottom-up rule learning for large-scale knowledge graph completionThe VLDB Journal10.1007/s00778-023-00800-533:1(131-161)Online publication date: 16-Jun-2023
  • (2023)TemporalFC: A Temporal Fact Checking Approach over Knowledge GraphsThe Semantic Web – ISWC 202310.1007/978-3-031-47240-4_25(465-483)Online publication date: 27-Oct-2023
  • (2023)A Comprehensive Study on Knowledge Graph Embedding over Relational Patterns Based on Rule LearningThe Semantic Web – ISWC 202310.1007/978-3-031-47240-4_16(290-308)Online publication date: 27-Oct-2023
  • (2022)HybridFC: A Hybrid Fact-Checking Approach for Knowledge GraphsThe Semantic Web – ISWC 202210.1007/978-3-031-19433-7_27(462-480)Online publication date: 16-Oct-2022
  • (2022)Data ProfilingundefinedOnline publication date: 25-Feb-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media