HOFD: An Outdated Fact Detector for Knowledge Bases
Pages 10775 - 10789
Abstract
Knowledge bases (KBs), which store high-quality information, are crucial for many applications, such as enhancing search results and serving as external sources for data cleaning. Not surprisingly, there exist outdated facts in most KBs due to the rapid change of information. Naturally, it is important to keep KBs up-to-date. Traditional wisdom has investigated the problem of using reference data (such as new facts extracted from the news) to detect outdated facts in KBs. However, existing approaches can only cover a small percentage of facts in KBs. In this paper, we propose <monospace>HOFD</monospace>, a novel human-in-the-loop approach for outdated fact detection in KBs. <monospace>HOFD</monospace> trains a binary classifier using features such as historical update frequency and update time of a fact to compute the likelihood of a fact in a KB to be outdated. Then, <monospace>HOFD</monospace> interacts with humans to verify whether a fact with high likelihood is indeed outdated. In addition, <monospace>HOFD</monospace> also uses logical rules to detect more outdated facts based on human feedback. The outdated facts detected by the logical rules will also be fed back to train the ML model further for <italic>data augmentation</italic>. Extensive experiments on real-world KBs, such as Yago and DBpedia, show the effectiveness of our solution.
References
[1]
[Online]. Available: https://www.mturk.com/
[2]
M. Acosta, A. Zaveri, E. Simperl, D. Kontokostas, F. Flöck, and J. Lehmann, “Detecting linked data quality issues via crowdsourcing: A. dbpedia study,” Semantic Web, vol. 9, no. 3, pp. 303–335, 2018.
[3]
M. Aigner, “Turán's graph theorem,” The Amer. Math. Monthly, vol. 102, no. 9, pp. 808–816, 1995.
[4]
N. S. Altman, “An introduction to kernel and nearest-neighbor nonparametric regression,” Amer. Statistician, vol. 46, no. 3, pp. 175–185, 1992.
[5]
A. Arioua and A. Bonifati, “User-guided repairing of inconsistent knowledge bases,” in Proc. Int. Conf. Extending Database Technol., 2018, pp. 133–144.
[6]
M. Bienvenu, C. Bourgaux, and F. Goasdoué, “Query-driven repairing of inconsistent dl-lite knowledge bases,” in Proc. 25th Int. Joint Conf. Artif. Intell., 2016, pp. 957–964.
[7]
A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, “Translating embeddings for modeling multi-relational data,” in Proc. Adv. Neural Inf. Process. Syst., 2013, pp. 2787–2795.
[8]
B. Cai, Y. Xiang, L. Gao, H. Zhang, Y. Li, and J. Li, “Temporal knowledge graph completion: A survey,” 2022,.
[9]
C. Chai, J. Fan, and G. Li, “Incentive-based entity collection using crowdsourcing,” in Proc. IEEE 34th Int. Conf. Data Eng., 2018, pp. 341–352.
[10]
C. Chai, G. Li, J. Li, D. Deng, and J. Feng, “Cost-effective crowdsourced entity resolution: A partial-order approach,” in Proc. Int. Conf. Manage. Data, 2016, pp. 969–984.
[11]
C. Chai, J. Liu, N. Tang, G. Li, and Y. Luo, “Selective data acquisition in the wild for model charging,” Proc. VLDB Endowment, vol. 15, pp. 1466–1478, 2022.
[12]
D. Y. Chen and D. Z. Wang, “Web-scale knowledge inference using Markov logic networks,” in Proc. Int. Conf. Mach. Learn. Workshop, 2013, pp. 106–110.
[13]
X. Chu et al., “KATARA: A data cleaning system powered by knowledge bases and crowdsourcing,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2015, pp. 1247–1261.
[14]
A. Criminisi et al., “Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning,” Found. Trends Comput. Graph. Vis., vol. 7, no. 2/3, pp. 81–227, 2012.
[15]
L. Cui, J. Chen, W. He, H. Li, W. Guo, and Z. Su, “Achieving approximate global optimization of truth inference for crowdsourcing microtasks,” Data Sci. Eng., vol. 6, no. 3, pp. 294–309, 2021.
[16]
B. Dhingra, J. R. Cole, J. M. Eisenschlos, D. Gillick, J. Eisenstein, and W. W. Cohen, “Time-aware language models as temporal knowledge bases,” Trans. Assoc. Comput. Linguistics, vol. 10, pp. 257–273, 2022.
[17]
J. Fan, M. Lu, B. C. Ooi, W. Tan, and M. Zhang, “A hybrid machine-crowdsourcing system for matching web tables,” in Proc. IEEE 30th Int. Conf. Data Eng., 2014, pp. 976–987.
[18]
L. Galárraga, S. Razniewski, A. Amarilli, and F. M. Suchanek, “Predicting completeness in knowledge bases,” in Proc. 10th ACM Int. Conf. Web Search Data Mining, 2017, pp. 375–383.
[19]
L. A. Galárraga, C. Teflioudi, K. Hose, and F. Suchanek, “AMIE: Association rule mining under incomplete evidence in ontological knowledge bases,” in Proc. 22nd Int. Conf. World Wide Web, 2013, pp. 413–422.
[20]
R. Goel, S. M. Kazemi, M. Brubaker, and P. Poupart, “Diachronic embedding for temporal knowledge graph completion,” in Proc. AAAI Conf. Artif. Intell., vol. 34, pp. 3988–3995, 2020.
[21]
S. Hao, C. Chai, G. Li, N. Tang, N. Wang, and X. Yu, “Outdated fact detection in knowledge bases,” in Proc. IEEE 36th Int. Conf. Data Eng., 2020, pp. 1890–1893.
[22]
S. Hao, N. Tang, G. Li, and J. Li, “Cleaning relations using knowledge bases,” in Proc. IEEE 33rd Int. Conf. Data Eng., 2017, pp. 933–944.
[23]
J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum, “YAGO2: A spatially and temporally enhanced knowledge base from wikipedia,” Artif. Intell., vol. 194, pp. 28–61, 2013.
[24]
R. Hoffmann, C. Zhang, X. Ling, L. Zettlemoyer, and D. S. Weld, “Knowledge-based weak supervision for information extraction of overlapping relations,” in Proc. 49th Annu. Meeting Assoc. Comput. Linguistics: Hum. Lang. Technol. Proc. Conf., 2011, pp. 541–550.
[25]
A. Hopkinson, A. Gurdasani, D. Palfrey, and A. Mittal, “Demand-weighted completeness prediction for a knowledge base,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics, 2018, pp. 200–207.
[26]
L. Jiang, L. Chen, and Z. Chen, “Knowledge base enhancement via data facts and crowdsourcing,” in Proc. IEEE 34th Int. Conf. Data Eng., 2018, pp. 1109–1119.
[27]
Y. Kiyota, S. Kurohashi, and F. Kido, “Dialog navigator: A question answering system based on large text knowledge base,” in Proc. 19th Int. Conf. Comput. Linguistics, 2002, pp. 1–7.
[28]
K. Leetaru and P. A. Schrodt, “GDELT: Global data on events, location, and tone, 1979,” ISA Annu. Conv., vol. vol. 2, pp. 1–49, 2013.
[29]
J. Lehmann et al., “DBpedia–a large-scale, multilingual knowledge base extracted from wikipedia,” Semantic Web, vol. 6, no. 2, pp. 167–195, 2015.
[30]
Z. Li et al., “Temporal knowledge graph reasoning based on evolutional representation learning,” in Proc. 44th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2021, pp. 408–417.
[31]
J. Liang, S. Zhang, and Y. Xiao, “How to keep a knowledge base synchronized with its encyclopedia source,” in Proc. 26th Int. Joint Conf. Artif. Intell., 2017, pp. 3749–3755.
[32]
Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu, “Learning entity and relation embeddings for knowledge graph completion,” in Proc. AAAI Conf. Artif. Intell., 2015, pp. 2181–2187.
[33]
J. Liu, C. Chai, Y. Luo, Y. Lou, J. Feng, and N. Tang, “Feature augmentation with reinforcement learning,” Proc. IEEE 38th Int. Conf. Data Eng., 2022, pp. 3360–3372.
[34]
A. Melo and H. Paulheim, “Detection of relation assertion errors in knowledge graphs,” in Proc. Knowl. Capture Conf., 2017, pp. 1–8.
[35]
M. Morsey, J. Lehmann, S. Auer, and A.-C. N. Ngomo, “Dbpedia sparql benchmark–performance assessment with real queries on real data,” in Proc. Int. Semantic Web Conf., Springer, 2011, pp. 454–469.
[36]
M. Morsey, J. Lehmann, S. Auer, C. Stadler, and S. Hellmann, “Dbpedia and the live extraction of structured data from wikipedia,” Program, vol. 46, no. 2, pp. 157–181, 2012.
[37]
V. Nebot and R. Berlanga, “Finding association rules in semantic web data,” Knowl.-Based Syst., vol. 25, pp. 51–62, 2012.
[38]
S. Ortona, V. V. Meduri, and P. Papotti, “RuDik: Rule discovery in knowledge bases,” Proc. VLDB Endowment, vol. 11, no. 12, pp. 1946–1949, 2018.
[39]
H. Paulheim and C. Bizer, “Improving the quality of linked data using statistical distributions,” Int. J. Semantic Web Inf. Syst., vol. 10, no. 2, pp. 63–86, 2014.
[40]
T. Pellissier Tanon, C. Bourgaux, and F. Suchanek, “Learning how to correct a knowledge base from the edit history,” in Proc. World Wide Web Conf., 2019, pp. 1465–1475.
[41]
S. Razniewski, “Optimizing update frequencies for decaying information,” in Proc. 25th ACM Int. Conf. Inf. Knowl. Manage., 2016, pp. 1191–1200.
[42]
F. Rosenblatt, “Principles of neurodynamics. perceptrons and the theory of brain mechanisms,” Tech. rep., Cornell Aeronautical Lab, Inc., Buffalo NY, USA, 1961.
[43]
S. Schoenmackers, O. Etzioni, D. S. Weld, and J. Davis, “Learning first-order horn clauses from web text,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2010, pp. 1088–1098.
[44]
J. Tolles and W. J. Meurer, “Logistic regression: Relating patient characteristics to outcomes,” Jama, vol. 316, no. 5, pp. 533–534, 2016.
[45]
Q. Wang, B. Wang, and L. Guo, “Knowledge base completion using embeddings and rules,” in Proc. 24th Int. Conf. Artif. Intell., 2015, pp. 1859–1865.
[46]
R. Xie, Z. Liu, F. Lin, and L. Lin, “Does william shakespeare REALLY write Hamlet? knowledge representation learning with confidence,” in Proc. 32nd AAAI Conf. Artif. Intell. 13th Innov. Appl. Artif. Intell. Conf. 8th AAAI Symp. Educ. Adv. Artif. Intell., 2018, pp. 4954–4961.
[47]
L. Xu and X. Zhou, “A crowd-powered task generation method for study of struggling search,” Data Sci. Eng., vol. 6, no. 4, pp. 472–484, 2021.
[48]
Y. Zhao, S. Gao, P. Gallinari, and J. Guo, “Knowledge base completion by learning pairwise-interaction differentiated embeddings. Data mining and knowledge discovery,” vol. 29, no. 5, pp. 1486–1504, 2015.
[49]
Y. Zheng, J. Wang, G. Li, R. Cheng, and J. Feng, “QASCA: A quality-aware task assignment system for crowdsourcing applications,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2015, pp. 1031–1046.
Recommendations
MonitoringKnowledge Acquisition Instead of Evaluating Knowledge Bases
EKAW '00: Proceedings of the 12th European Workshop on Knowledge Acquisition, Modeling and ManagementEvaluating the success of a knowledge acquisition (KA) task is difficult and expensive. Most evaluation approaches rely on the expert themselves, either directly, or indirectly by relying on data previously prepared with the help of experts. In ...
Comments
Information & Contributors
Information
Published In
1041-4347 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
Publisher
IEEE Educational Activities Department
United States
Publication History
Published: 01 October 2023
Qualifiers
- Research-article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 10 Aug 2024
Other Metrics
Citations
View Options
View options
Get Access
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in