research-article

HOFD: An Outdated Fact Detector for Knowledge Bases

Authors:

Chengliang Chai,

Xiang YuAuthors Info & Claims

IEEE Transactions on Knowledge and Data Engineering, Volume 35, Issue 10

Pages 10775 - 10789

https://doi.org/10.1109/TKDE.2023.3248223

Published: 01 October 2023 Publication History

Abstract

Knowledge bases (KBs), which store high-quality information, are crucial for many applications, such as enhancing search results and serving as external sources for data cleaning. Not surprisingly, there exist outdated facts in most KBs due to the rapid change of information. Naturally, it is important to keep KBs up-to-date. Traditional wisdom has investigated the problem of using reference data (such as new facts extracted from the news) to detect outdated facts in KBs. However, existing approaches can only cover a small percentage of facts in KBs. In this paper, we propose <monospace>HOFD</monospace>, a novel human-in-the-loop approach for outdated fact detection in KBs. <monospace>HOFD</monospace> trains a binary classifier using features such as historical update frequency and update time of a fact to compute the likelihood of a fact in a KB to be outdated. Then, <monospace>HOFD</monospace> interacts with humans to verify whether a fact with high likelihood is indeed outdated. In addition, <monospace>HOFD</monospace> also uses logical rules to detect more outdated facts based on human feedback. The outdated facts detected by the logical rules will also be fed back to train the ML model further for <italic>data augmentation</italic>. Extensive experiments on real-world KBs, such as Yago and DBpedia, show the effectiveness of our solution.

References

[1]

[Online]. Available: https://www.mturk.com/

[2]

M. Acosta, A. Zaveri, E. Simperl, D. Kontokostas, F. Flöck, and J. Lehmann, “Detecting linked data quality issues via crowdsourcing: A. dbpedia study,” Semantic Web, vol. 9, no. 3, pp. 303–335, 2018.

Digital Library

[3]

M. Aigner, “Turán's graph theorem,” The Amer. Math. Monthly, vol. 102, no. 9, pp. 808–816, 1995.

[4]

N. S. Altman, “An introduction to kernel and nearest-neighbor nonparametric regression,” Amer. Statistician, vol. 46, no. 3, pp. 175–185, 1992.

[5]

A. Arioua and A. Bonifati, “User-guided repairing of inconsistent knowledge bases,” in Proc. Int. Conf. Extending Database Technol., 2018, pp. 133–144.

[6]

M. Bienvenu, C. Bourgaux, and F. Goasdoué, “Query-driven repairing of inconsistent dl-lite knowledge bases,” in Proc. 25th Int. Joint Conf. Artif. Intell., 2016, pp. 957–964.

[7]

A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, “Translating embeddings for modeling multi-relational data,” in Proc. Adv. Neural Inf. Process. Syst., 2013, pp. 2787–2795.

[8]

B. Cai, Y. Xiang, L. Gao, H. Zhang, Y. Li, and J. Li, “Temporal knowledge graph completion: A survey,” 2022,.

[9]

C. Chai, J. Fan, and G. Li, “Incentive-based entity collection using crowdsourcing,” in Proc. IEEE 34th Int. Conf. Data Eng., 2018, pp. 341–352.

[10]

C. Chai, G. Li, J. Li, D. Deng, and J. Feng, “Cost-effective crowdsourced entity resolution: A partial-order approach,” in Proc. Int. Conf. Manage. Data, 2016, pp. 969–984.

[11]

C. Chai, J. Liu, N. Tang, G. Li, and Y. Luo, “Selective data acquisition in the wild for model charging,” Proc. VLDB Endowment, vol. 15, pp. 1466–1478, 2022.

[12]

D. Y. Chen and D. Z. Wang, “Web-scale knowledge inference using Markov logic networks,” in Proc. Int. Conf. Mach. Learn. Workshop, 2013, pp. 106–110.

[13]

X. Chu et al., “KATARA: A data cleaning system powered by knowledge bases and crowdsourcing,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2015, pp. 1247–1261.

[14]

A. Criminisi et al., “Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning,” Found. Trends Comput. Graph. Vis., vol. 7, no. 2/3, pp. 81–227, 2012.

Digital Library

[15]

L. Cui, J. Chen, W. He, H. Li, W. Guo, and Z. Su, “Achieving approximate global optimization of truth inference for crowdsourcing microtasks,” Data Sci. Eng., vol. 6, no. 3, pp. 294–309, 2021.

[16]

B. Dhingra, J. R. Cole, J. M. Eisenschlos, D. Gillick, J. Eisenstein, and W. W. Cohen, “Time-aware language models as temporal knowledge bases,” Trans. Assoc. Comput. Linguistics, vol. 10, pp. 257–273, 2022.

[17]

J. Fan, M. Lu, B. C. Ooi, W. Tan, and M. Zhang, “A hybrid machine-crowdsourcing system for matching web tables,” in Proc. IEEE 30th Int. Conf. Data Eng., 2014, pp. 976–987.

[18]

L. Galárraga, S. Razniewski, A. Amarilli, and F. M. Suchanek, “Predicting completeness in knowledge bases,” in Proc. 10th ACM Int. Conf. Web Search Data Mining, 2017, pp. 375–383.

[19]

L. A. Galárraga, C. Teflioudi, K. Hose, and F. Suchanek, “AMIE: Association rule mining under incomplete evidence in ontological knowledge bases,” in Proc. 22nd Int. Conf. World Wide Web, 2013, pp. 413–422.

[20]

R. Goel, S. M. Kazemi, M. Brubaker, and P. Poupart, “Diachronic embedding for temporal knowledge graph completion,” in Proc. AAAI Conf. Artif. Intell., vol. 34, pp. 3988–3995, 2020.

[21]

S. Hao, C. Chai, G. Li, N. Tang, N. Wang, and X. Yu, “Outdated fact detection in knowledge bases,” in Proc. IEEE 36th Int. Conf. Data Eng., 2020, pp. 1890–1893.

[22]

S. Hao, N. Tang, G. Li, and J. Li, “Cleaning relations using knowledge bases,” in Proc. IEEE 33rd Int. Conf. Data Eng., 2017, pp. 933–944.

[23]

J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum, “YAGO2: A spatially and temporally enhanced knowledge base from wikipedia,” Artif. Intell., vol. 194, pp. 28–61, 2013.

Digital Library

[24]

R. Hoffmann, C. Zhang, X. Ling, L. Zettlemoyer, and D. S. Weld, “Knowledge-based weak supervision for information extraction of overlapping relations,” in Proc. 49th Annu. Meeting Assoc. Comput. Linguistics: Hum. Lang. Technol. Proc. Conf., 2011, pp. 541–550.

[25]

A. Hopkinson, A. Gurdasani, D. Palfrey, and A. Mittal, “Demand-weighted completeness prediction for a knowledge base,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics, 2018, pp. 200–207.

[26]

L. Jiang, L. Chen, and Z. Chen, “Knowledge base enhancement via data facts and crowdsourcing,” in Proc. IEEE 34th Int. Conf. Data Eng., 2018, pp. 1109–1119.

[27]

Y. Kiyota, S. Kurohashi, and F. Kido, “Dialog navigator: A question answering system based on large text knowledge base,” in Proc. 19th Int. Conf. Comput. Linguistics, 2002, pp. 1–7.

[28]

K. Leetaru and P. A. Schrodt, “GDELT: Global data on events, location, and tone, 1979,” ISA Annu. Conv., vol. vol. 2, pp. 1–49, 2013.

[29]

J. Lehmann et al., “DBpedia–a large-scale, multilingual knowledge base extracted from wikipedia,” Semantic Web, vol. 6, no. 2, pp. 167–195, 2015.

[30]

Z. Li et al., “Temporal knowledge graph reasoning based on evolutional representation learning,” in Proc. 44th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2021, pp. 408–417.

[31]

J. Liang, S. Zhang, and Y. Xiao, “How to keep a knowledge base synchronized with its encyclopedia source,” in Proc. 26th Int. Joint Conf. Artif. Intell., 2017, pp. 3749–3755.

[32]

Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu, “Learning entity and relation embeddings for knowledge graph completion,” in Proc. AAAI Conf. Artif. Intell., 2015, pp. 2181–2187.

[33]

J. Liu, C. Chai, Y. Luo, Y. Lou, J. Feng, and N. Tang, “Feature augmentation with reinforcement learning,” Proc. IEEE 38th Int. Conf. Data Eng., 2022, pp. 3360–3372.

[34]

A. Melo and H. Paulheim, “Detection of relation assertion errors in knowledge graphs,” in Proc. Knowl. Capture Conf., 2017, pp. 1–8.

[35]

M. Morsey, J. Lehmann, S. Auer, and A.-C. N. Ngomo, “Dbpedia sparql benchmark–performance assessment with real queries on real data,” in Proc. Int. Semantic Web Conf., Springer, 2011, pp. 454–469.

[36]

M. Morsey, J. Lehmann, S. Auer, C. Stadler, and S. Hellmann, “Dbpedia and the live extraction of structured data from wikipedia,” Program, vol. 46, no. 2, pp. 157–181, 2012.

[37]

V. Nebot and R. Berlanga, “Finding association rules in semantic web data,” Knowl.-Based Syst., vol. 25, pp. 51–62, 2012.

Digital Library

[38]

S. Ortona, V. V. Meduri, and P. Papotti, “RuDik: Rule discovery in knowledge bases,” Proc. VLDB Endowment, vol. 11, no. 12, pp. 1946–1949, 2018.

[39]

H. Paulheim and C. Bizer, “Improving the quality of linked data using statistical distributions,” Int. J. Semantic Web Inf. Syst., vol. 10, no. 2, pp. 63–86, 2014.

Digital Library

[40]

T. Pellissier Tanon, C. Bourgaux, and F. Suchanek, “Learning how to correct a knowledge base from the edit history,” in Proc. World Wide Web Conf., 2019, pp. 1465–1475.

[41]

S. Razniewski, “Optimizing update frequencies for decaying information,” in Proc. 25th ACM Int. Conf. Inf. Knowl. Manage., 2016, pp. 1191–1200.

[42]

F. Rosenblatt, “Principles of neurodynamics. perceptrons and the theory of brain mechanisms,” Tech. rep., Cornell Aeronautical Lab, Inc., Buffalo NY, USA, 1961.

[43]

S. Schoenmackers, O. Etzioni, D. S. Weld, and J. Davis, “Learning first-order horn clauses from web text,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2010, pp. 1088–1098.

[44]

J. Tolles and W. J. Meurer, “Logistic regression: Relating patient characteristics to outcomes,” Jama, vol. 316, no. 5, pp. 533–534, 2016.

[45]

Q. Wang, B. Wang, and L. Guo, “Knowledge base completion using embeddings and rules,” in Proc. 24th Int. Conf. Artif. Intell., 2015, pp. 1859–1865.

[46]

R. Xie, Z. Liu, F. Lin, and L. Lin, “Does william shakespeare REALLY write Hamlet? knowledge representation learning with confidence,” in Proc. 32nd AAAI Conf. Artif. Intell. 13th Innov. Appl. Artif. Intell. Conf. 8th AAAI Symp. Educ. Adv. Artif. Intell., 2018, pp. 4954–4961.

[47]

L. Xu and X. Zhou, “A crowd-powered task generation method for study of struggling search,” Data Sci. Eng., vol. 6, no. 4, pp. 472–484, 2021.

[48]

Y. Zhao, S. Gao, P. Gallinari, and J. Guo, “Knowledge base completion by learning pairwise-interaction differentiated embeddings. Data mining and knowledge discovery,” vol. 29, no. 5, pp. 1486–1504, 2015.

[49]

Y. Zheng, J. Wang, G. Li, R. Cheng, and J. Feng, “QASCA: A quality-aware task assignment system for crowdsourcing applications,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2015, pp. 1031–1046.

Recommendations

MonitoringKnowledge Acquisition Instead of Evaluating Knowledge Bases
EKAW '00: Proceedings of the 12th European Workshop on Knowledge Acquisition, Modeling and Management

Evaluating the success of a knowledge acquisition (KA) task is difficult and expensive. Most evaluation approaches rely on the expert themselves, either directly, or indirectly by relying on data previously prepared with the help of experts. In ...
Extending and refining knowledge bases with an interactive multistrategy concept learner
Using empirical analysis to refine expert system knowledge bases (seek)

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering

IEEE Transactions on Knowledge and Data Engineering Volume 35, Issue 10

Oct. 2023

1088 pages

ISSN:1041-4347

Issue’s Table of Contents

1041-4347 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 October 2023

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents