Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Query-driven on-the-fly knowledge base construction

Published: 01 September 2017 Publication History

Abstract

Today's openly available knowledge bases, such as DBpedia, Yago, Wikidata or Freebase, capture billions of facts about the world's entities. However, even the largest among these (i) are still limited in up-to-date coverage of what happens in the real world, and (ii) miss out on many relevant predicates that precisely capture the wide variety of relationships among entities. To overcome both of these limitations, we propose a novel approach to build on-the-fly knowledge bases in a query-driven manner. Our system, called QKBfly, supports analysts and journalists as well as question answering on emerging topics, by dynamically acquiring relevant facts as timely and comprehensively as possible. QKBfly is based on a semantic-graph representation of sentences, by which we perform three key IE tasks, namely named-entity disambiguation, co-reference resolution and relation extraction, in a light-weight and integrated manner. In contrast to Open IE, our output is canonicalized. In contrast to traditional IE, we capture more predicates, including ternary and higher-arity ones. Our experiments demonstrate that QKBfly can build high-quality, on-the-fly knowledge bases that can readily be deployed, e.g., for the task of ad-hoc question answering.

References

[1]
G. Angeli, M. J. J. Premkumar, and C. D. Manning. Leveraging Linguistic Structure For Open Domain Information Extraction. In ACL, pages 344--354, 2015.
[2]
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, and Z. Ives. DBpedia: A Nucleus for a Web of Open Data. In ISWC, pages 11--15, 2007.
[3]
D. Bamman, T. Underwood, and N. Smith. A Bayesian Mixed Effects Model of Literary Character. In ACL, pages 370--379, 2014.
[4]
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open Information Extraction from the Web. In IJCAI, pages 2670--2676, 2007.
[5]
H. Bast and E. Haussmann. More Accurate Question Answering on Freebase. In CIKM, pages 1431--1440, 2015.
[6]
J. Berant, A. Chou, R. Frostig, and P. Liang. Semantic Parsing on Freebase from Question-Answer Pairs. In EMNLP, pages 1533--1544, 2013.
[7]
K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In SIGMOD, pages 1247--1250, 2008.
[8]
C. D. Bovi, L. Telesca, and R. Navigli. Large-Scale Information Extraction from Textual Definitions through Deep Syntactic and Semantic Analysis. TACL, 3:529--543, 2015.
[9]
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. H. Jr., and T. M. Mitchell. Toward an Architecture for Never-Ending Language Learning. In AAAI, pages 1306--1313, 2010.
[10]
A. X. Chang and C. Manning. SUTime: A Library for Recognizing and Normalizing Time Expressions. In LREC, pages 3735--3740, 2012.
[11]
L. Chiticariu, R. Krishnamurthy, Y. Li, S. Raghavan, F. R. Reiss, and S. Vaithyanathan. SystemT: An Algebraic Approach to Declarative Information Extraction. In ACL, pages 128--137, 2010.
[12]
L. Chiticariu, Y. Li, and F. R. Reiss. Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems! In EMNLP, pages 827--832, 2013.
[13]
L. Del Corro and R. Gemulla. ClausIE: Clause-based Open Information Extraction. In WWW, pages 355--366, 2013.
[14]
P. Domingos and D. Lowd. Markov Logic: An Interface Layer for Artificial Intelligence. Morgan and Claypool Publishers, 2009.
[15]
G. Durrett and D. Klein. Easy Victories and Uphill Battles in Coreference Resolution. In EMNLP, pages 1971--1982, 2013.
[16]
G. Durrett and D. Klein. A Joint Model for Entity Analysis: Coreference, Typing, and Linking. TACL, 2:477--490, 2014.
[17]
M. Dylla, I. Miliaraki, and M. Theobald. Top-k Query Processing in Probabilistic Databases with Non-Materialized Views. In ICDE, pages 122--133, 2013.
[18]
M. Dylla, M. Theobald, and I. Miliaraki. Querying and Learning in Probabilistic Databases. In Reasoning Web, pages 313--368, 2014.
[19]
O. Etzioni, A. Fader, J. Christensen, S. Soderland, and M. Mausam. Open Information Extraction: The Second Generation. In IJCAI, pages 3--10, 2011.
[20]
A. Fader, S. Soderland, and O. Etzioni. Identifying Relations for Open Information Extraction. In EMNLP, pages 1535--1545, 2011.
[21]
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A Library for Large Linear Classification. JMLR, 9:1871--1874, 2008.
[22]
L. Galárraga, G. Heitz, K. Murphy, and F. M. Suchanek. Canonicalizing Open Knowledge Bases. In CIKM, pages 1679--1688, 2014.
[23]
W. Gatterbauer and D. Suciu. Dissociation and Propagation for Approximate Lifted Inference with Standard Relational Database Management Systems. VLDB Journal, 26(1):5--30, 2017.
[24]
E. Gribkoff and D. Suciu. SlimShot: In-database Probabilistic Inference for Knowledge Bases. PVLDB, 9(7):552--563, 2016.
[25]
A. Haghighi and D. Klein. Coreference Resolution in a Modular, Entity-centered Model. In HLT, pages 385--393, 2010.
[26]
J. Hoffart, S. Seufert, D. B. Nguyen, M. Theobald, and G. Weikum. KORE: Keyphrase Overlap Relatedness for Entity Disambiguation. In CIKM, pages 545--554, 2012.
[27]
J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust Disambiguation of Named Entities in Text. In EMNLP, pages 782--792, 2011.
[28]
D. Klein and C. D. Manning. Accurate Unlexicalized Parsing. In ACL, pages 423--430, 2003.
[29]
S. Krause, L. Hennig, A. Moro, D. Weissenborn, F. Xu, H. Uszkoreit, and R. Navigli. Sar-graphs: A Language Resource Connecting Linguistic Knowledge with Semantic Relations from Knowledge Graphs. JWS, 37:112--131, 2016.
[30]
S. Krause, H. Li, H. Uszkoreit, and F. Xu. Large-Scale Learning of Relation-Extraction Rules with Distant Supervision from the Web. In ISWC, pages 263--278, 2012.
[31]
K. Li, X. Zhou, D. Z. Wang, C. Grant, A. Dobra, and C. Dudley. In-database Batch and Query-time Inference over Probabilistic Graphical Models Using UDA-GIST. VLDB Journal, 26(2):177--201, 2017.
[32]
Q. Li and H. Ji. Incremental Joint Extraction of Entity Mentions and Relations. In ACL, pages 402--412, 2014.
[33]
D. C. Liu and J. Nocedal. On the Limited Memory BFGS Method for Large Scale Optimization. Mathematical Programming, 45(3):503--528, 1989.
[34]
C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky. The Stanford CoreNLP Natural Language Processing Toolkit. In ACL, pages 55--60, 2014.
[35]
Mausam, M. Schmitz, R. Bart, S. Soderland, and O. Etzioni. Open Language Learning for Information Extraction. In EMNLP-CoNLL, pages 523--534, 2012.
[36]
A. Moro, A. Raganato, and R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. TACL, 2:231--244, 2014.
[37]
N. Nakashole and G. Weikum. Real-time Population of Knowledge Bases: Opportunities and Challenges. In AKBC Workshop, 2012.
[38]
N. Nakashole, G. Weikum, and F. Suchanek. PATTY: A Taxonomy of Relational Patterns with Semantic Types. In EMNLP-CoNLL, pages 1135--1145, 2012.
[39]
R. Navigli and S. P. Ponzetto. BabelNet: The Automatic Construction, Evaluation and Application of a Wide-coverage Multilingual Semantic Network. AIJ, 193:217--250, 2012.
[40]
D. B. Nguyen, J. Hoffart, M. Theobald, and G. Weikum. AIDA-light: High-Throughput Named-Entity Disambiguation. In LDOW, 2014.
[41]
D. B. Nguyen, M. Theobald, and G. Weikum. J-NERD: Joint Named Entity Recognition and Disambiguation with Rich Linguistic Features. TACL, 4:215--229, 2016.
[42]
F. Niu, C. Zhang, C. Re, and J. W. Shavlik. DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference. In VLDS, pages 25--28, 2012.
[43]
J. Nivre and J. Hall. Maltparser: A Language-Independent System for Data-Driven Dependency Parsing. In TLT, pages 13--95, 2005.
[44]
R. Quirk, S. Greenbaum, G. Leech, and J. Svartvik. A Comprehensive Grammar of the English Language. Longman, 1985.
[45]
F. Reiss, S. Raghavan, R. Krishnamurthy, H. Zhu, and S. Vaithyanathan. An Algebraic Approach to Rule-Based Information Extraction. In ICDE, pages 933--942, 2008.
[46]
S. Riedel. Improving the Accuracy and Efficiency of MAP Inference for Markov Logic. In UAI, pages 468--475, 2008.
[47]
S. Riedel, L. Yao, A. McCallum, and B. M. Marlin. Relation Extraction with Matrix Factorization and Universal Schemas. In HLT-NAACL, pages 74--84, 2013.
[48]
J. Shin, S. Wu, F. Wang, C. De Sa, C. Zhang, and C. Ré. Incremental Knowledge Base Construction Using DeepDive. PVLDB, 8(11):1310--1321, 2015.
[49]
S. Singh, S. Riedel, B. Martin, J. Zheng, and A. McCallum. Joint Inference of Entities, Relations, and Coreference. In AKBC, pages 1--6, 2013.
[50]
M. Sozio and A. Gionis. The Community-search Problem and How to Plan a Successful Cocktail Party. In KDD, pages 939--948, 2010.
[51]
F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A Core of Semantic Knowledge. In WWW, pages 697--706, 2007.
[52]
F. M. Suchanek, M. Sozio, and G. Weikum. SOFIE: A Self-organizing Framework for Information Extraction. In WWW, pages 631--640, 2009.
[53]
D. Suciu, D. Olteanu, R. Christopher, and C. Koch. Probabilistic Databases. Morgan & Claypool Publishers, 2011.
[54]
D. Vrandečić and M. Krötzsch. Wikidata: A Free Collaborative Knowledgebase. CACM, 57(10):78--85, 2014.
[55]
K. Xu, S. Reddy, Y. Feng, S. Huang, and D. Zhao. Question Answering on Freebase via Relation Extraction and Textual Evidence. In ACL, 2016.
[56]
M. A. Yosef, J. Hoffart, I. Bordino, M. Spaniol, and G. Weikum. AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables. PVLDB, 4(12):1450--1453, 2011.
[57]
C. Zhang, J. Shin, C. Ré, M. Cafarella, and F. Niu. Extracting Databases from Dark Data with DeepDive. In SIGMOD, pages 847--859, 2016.

Cited By

View all
  • (2024)Jointly Canonicalizing and Linking Open Knowledge Base via Unified Embedding LearningProceedings of the ACM Web Conference 202410.1145/3589334.3645700(2304-2314)Online publication date: 13-May-2024
  • (2024)KartGPS: Knowledge Base Update with Temporal Graph Pattern-based Semantic Rules2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00105(5075-5087)Online publication date: 13-May-2024
  • (2024)Cost-Aware Outdated Facts Correction in the Knowledge BasesDatabase Systems for Advanced Applications10.1007/978-981-97-5562-2_17(257-272)Online publication date: 2-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 11, Issue 1
Proceedings of the 44th International Conference on Very Large Data Bases, Rio de Janeiro, Brazil
September 2017
120 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 September 2017
Published in PVLDB Volume 11, Issue 1

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Jointly Canonicalizing and Linking Open Knowledge Base via Unified Embedding LearningProceedings of the ACM Web Conference 202410.1145/3589334.3645700(2304-2314)Online publication date: 13-May-2024
  • (2024)KartGPS: Knowledge Base Update with Temporal Graph Pattern-based Semantic Rules2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00105(5075-5087)Online publication date: 13-May-2024
  • (2024)Cost-Aware Outdated Facts Correction in the Knowledge BasesDatabase Systems for Advanced Applications10.1007/978-981-97-5562-2_17(257-272)Online publication date: 2-Jul-2024
  • (2023)A BERT-enhanced Graph Neural Network for Knowledge Base Population2023 IEEE International Conference on Big Data and Smart Computing (BigComp)10.1109/BigComp57234.2023.00021(81-84)Online publication date: Feb-2023
  • (2023)CLART: A cascaded lattice-and-radical transformer network for Chinese medical named entity recognitionHeliyon10.1016/j.heliyon.2023.e206929:10(e20692)Online publication date: Oct-2023
  • (2023)A discovery system for narrative query graphs: entity-interaction-aware document retrievalInternational Journal on Digital Libraries10.1007/s00799-023-00356-325:1(3-24)Online publication date: 24-Apr-2023
  • (2022)Demonstrating ASET: Ad-hoc Structured Exploration of Text CollectionsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3520174(2393-2396)Online publication date: 10-Jun-2022
  • (2022)Tree-KGQA: An Unsupervised Approach for Question Answering Over Knowledge GraphsIEEE Access10.1109/ACCESS.2022.317335510(50467-50478)Online publication date: 2022
  • (2022)Leveraging Multi-source knowledge for Chinese clinical named entity recognition via relational graph convolutional networkJournal of Biomedical Informatics10.1016/j.jbi.2022.104035128:COnline publication date: 1-Apr-2022
  • (2022) WIP - SKOD: A Framework for Situational Knowledge on DemandHeterogeneous Data Management, Polystores, and Analytics for Healthcare10.1007/978-3-030-33752-0_11(154-166)Online publication date: 27-Dec-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media