Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

HOFD: An Outdated Fact Detector for Knowledge Bases

Published: 01 October 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Knowledge bases (KBs), which store high-quality information, are crucial for many applications, such as enhancing search results and serving as external sources for data cleaning. Not surprisingly, there exist outdated facts in most KBs due to the rapid change of information. Naturally, it is important to keep KBs up-to-date. Traditional wisdom has investigated the problem of using reference data (such as new facts extracted from the news) to detect outdated facts in KBs. However, existing approaches can only cover a small percentage of facts in KBs. In this paper, we propose <monospace>HOFD</monospace>, a novel human-in-the-loop approach for outdated fact detection in KBs. <monospace>HOFD</monospace> trains a binary classifier using features such as historical update frequency and update time of a fact to compute the likelihood of a fact in a KB to be outdated. Then, <monospace>HOFD</monospace> interacts with humans to verify whether a fact with high likelihood is indeed outdated. In addition, <monospace>HOFD</monospace> also uses logical rules to detect more outdated facts based on human feedback. The outdated facts detected by the logical rules will also be fed back to train the ML model further for <italic>data augmentation</italic>. Extensive experiments on real-world KBs, such as Yago and DBpedia, show the effectiveness of our solution.

    References

    [1]
    [2]
    M. Acosta, A. Zaveri, E. Simperl, D. Kontokostas, F. Flöck, and J. Lehmann, “Detecting linked data quality issues via crowdsourcing: A. dbpedia study,” Semantic Web, vol. 9, no. 3, pp. 303–335, 2018.
    [3]
    M. Aigner, “Turán's graph theorem,” The Amer. Math. Monthly, vol. 102, no. 9, pp. 808–816, 1995.
    [4]
    N. S. Altman, “An introduction to kernel and nearest-neighbor nonparametric regression,” Amer. Statistician, vol. 46, no. 3, pp. 175–185, 1992.
    [5]
    A. Arioua and A. Bonifati, “User-guided repairing of inconsistent knowledge bases,” in Proc. Int. Conf. Extending Database Technol., 2018, pp. 133–144.
    [6]
    M. Bienvenu, C. Bourgaux, and F. Goasdoué, “Query-driven repairing of inconsistent dl-lite knowledge bases,” in Proc. 25th Int. Joint Conf. Artif. Intell., 2016, pp. 957–964.
    [7]
    A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, “Translating embeddings for modeling multi-relational data,” in Proc. Adv. Neural Inf. Process. Syst., 2013, pp. 2787–2795.
    [8]
    B. Cai, Y. Xiang, L. Gao, H. Zhang, Y. Li, and J. Li, “Temporal knowledge graph completion: A survey,” 2022,.
    [9]
    C. Chai, J. Fan, and G. Li, “Incentive-based entity collection using crowdsourcing,” in Proc. IEEE 34th Int. Conf. Data Eng., 2018, pp. 341–352.
    [10]
    C. Chai, G. Li, J. Li, D. Deng, and J. Feng, “Cost-effective crowdsourced entity resolution: A partial-order approach,” in Proc. Int. Conf. Manage. Data, 2016, pp. 969–984.
    [11]
    C. Chai, J. Liu, N. Tang, G. Li, and Y. Luo, “Selective data acquisition in the wild for model charging,” Proc. VLDB Endowment, vol. 15, pp. 1466–1478, 2022.
    [12]
    D. Y. Chen and D. Z. Wang, “Web-scale knowledge inference using Markov logic networks,” in Proc. Int. Conf. Mach. Learn. Workshop, 2013, pp. 106–110.
    [13]
    X. Chu et al., “KATARA: A data cleaning system powered by knowledge bases and crowdsourcing,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2015, pp. 1247–1261.
    [14]
    A. Criminisi et al., “Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning,” Found. Trends Comput. Graph. Vis., vol. 7, no. 2/3, pp. 81–227, 2012.
    [15]
    L. Cui, J. Chen, W. He, H. Li, W. Guo, and Z. Su, “Achieving approximate global optimization of truth inference for crowdsourcing microtasks,” Data Sci. Eng., vol. 6, no. 3, pp. 294–309, 2021.
    [16]
    B. Dhingra, J. R. Cole, J. M. Eisenschlos, D. Gillick, J. Eisenstein, and W. W. Cohen, “Time-aware language models as temporal knowledge bases,” Trans. Assoc. Comput. Linguistics, vol. 10, pp. 257–273, 2022.
    [17]
    J. Fan, M. Lu, B. C. Ooi, W. Tan, and M. Zhang, “A hybrid machine-crowdsourcing system for matching web tables,” in Proc. IEEE 30th Int. Conf. Data Eng., 2014, pp. 976–987.
    [18]
    L. Galárraga, S. Razniewski, A. Amarilli, and F. M. Suchanek, “Predicting completeness in knowledge bases,” in Proc. 10th ACM Int. Conf. Web Search Data Mining, 2017, pp. 375–383.
    [19]
    L. A. Galárraga, C. Teflioudi, K. Hose, and F. Suchanek, “AMIE: Association rule mining under incomplete evidence in ontological knowledge bases,” in Proc. 22nd Int. Conf. World Wide Web, 2013, pp. 413–422.
    [20]
    R. Goel, S. M. Kazemi, M. Brubaker, and P. Poupart, “Diachronic embedding for temporal knowledge graph completion,” in Proc. AAAI Conf. Artif. Intell., vol. 34, pp. 3988–3995, 2020.
    [21]
    S. Hao, C. Chai, G. Li, N. Tang, N. Wang, and X. Yu, “Outdated fact detection in knowledge bases,” in Proc. IEEE 36th Int. Conf. Data Eng., 2020, pp. 1890–1893.
    [22]
    S. Hao, N. Tang, G. Li, and J. Li, “Cleaning relations using knowledge bases,” in Proc. IEEE 33rd Int. Conf. Data Eng., 2017, pp. 933–944.
    [23]
    J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum, “YAGO2: A spatially and temporally enhanced knowledge base from wikipedia,” Artif. Intell., vol. 194, pp. 28–61, 2013.
    [24]
    R. Hoffmann, C. Zhang, X. Ling, L. Zettlemoyer, and D. S. Weld, “Knowledge-based weak supervision for information extraction of overlapping relations,” in Proc. 49th Annu. Meeting Assoc. Comput. Linguistics: Hum. Lang. Technol. Proc. Conf., 2011, pp. 541–550.
    [25]
    A. Hopkinson, A. Gurdasani, D. Palfrey, and A. Mittal, “Demand-weighted completeness prediction for a knowledge base,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics, 2018, pp. 200–207.
    [26]
    L. Jiang, L. Chen, and Z. Chen, “Knowledge base enhancement via data facts and crowdsourcing,” in Proc. IEEE 34th Int. Conf. Data Eng., 2018, pp. 1109–1119.
    [27]
    Y. Kiyota, S. Kurohashi, and F. Kido, “Dialog navigator: A question answering system based on large text knowledge base,” in Proc. 19th Int. Conf. Comput. Linguistics, 2002, pp. 1–7.
    [28]
    K. Leetaru and P. A. Schrodt, “GDELT: Global data on events, location, and tone, 1979,” ISA Annu. Conv., vol. vol. 2, pp. 1–49, 2013.
    [29]
    J. Lehmann et al., “DBpedia–a large-scale, multilingual knowledge base extracted from wikipedia,” Semantic Web, vol. 6, no. 2, pp. 167–195, 2015.
    [30]
    Z. Li et al., “Temporal knowledge graph reasoning based on evolutional representation learning,” in Proc. 44th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2021, pp. 408–417.
    [31]
    J. Liang, S. Zhang, and Y. Xiao, “How to keep a knowledge base synchronized with its encyclopedia source,” in Proc. 26th Int. Joint Conf. Artif. Intell., 2017, pp. 3749–3755.
    [32]
    Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu, “Learning entity and relation embeddings for knowledge graph completion,” in Proc. AAAI Conf. Artif. Intell., 2015, pp. 2181–2187.
    [33]
    J. Liu, C. Chai, Y. Luo, Y. Lou, J. Feng, and N. Tang, “Feature augmentation with reinforcement learning,” Proc. IEEE 38th Int. Conf. Data Eng., 2022, pp. 3360–3372.
    [34]
    A. Melo and H. Paulheim, “Detection of relation assertion errors in knowledge graphs,” in Proc. Knowl. Capture Conf., 2017, pp. 1–8.
    [35]
    M. Morsey, J. Lehmann, S. Auer, and A.-C. N. Ngomo, “Dbpedia sparql benchmark–performance assessment with real queries on real data,” in Proc. Int. Semantic Web Conf., Springer, 2011, pp. 454–469.
    [36]
    M. Morsey, J. Lehmann, S. Auer, C. Stadler, and S. Hellmann, “Dbpedia and the live extraction of structured data from wikipedia,” Program, vol. 46, no. 2, pp. 157–181, 2012.
    [37]
    V. Nebot and R. Berlanga, “Finding association rules in semantic web data,” Knowl.-Based Syst., vol. 25, pp. 51–62, 2012.
    [38]
    S. Ortona, V. V. Meduri, and P. Papotti, “RuDik: Rule discovery in knowledge bases,” Proc. VLDB Endowment, vol. 11, no. 12, pp. 1946–1949, 2018.
    [39]
    H. Paulheim and C. Bizer, “Improving the quality of linked data using statistical distributions,” Int. J. Semantic Web Inf. Syst., vol. 10, no. 2, pp. 63–86, 2014.
    [40]
    T. Pellissier Tanon, C. Bourgaux, and F. Suchanek, “Learning how to correct a knowledge base from the edit history,” in Proc. World Wide Web Conf., 2019, pp. 1465–1475.
    [41]
    S. Razniewski, “Optimizing update frequencies for decaying information,” in Proc. 25th ACM Int. Conf. Inf. Knowl. Manage., 2016, pp. 1191–1200.
    [42]
    F. Rosenblatt, “Principles of neurodynamics. perceptrons and the theory of brain mechanisms,” Tech. rep., Cornell Aeronautical Lab, Inc., Buffalo NY, USA, 1961.
    [43]
    S. Schoenmackers, O. Etzioni, D. S. Weld, and J. Davis, “Learning first-order horn clauses from web text,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2010, pp. 1088–1098.
    [44]
    J. Tolles and W. J. Meurer, “Logistic regression: Relating patient characteristics to outcomes,” Jama, vol. 316, no. 5, pp. 533–534, 2016.
    [45]
    Q. Wang, B. Wang, and L. Guo, “Knowledge base completion using embeddings and rules,” in Proc. 24th Int. Conf. Artif. Intell., 2015, pp. 1859–1865.
    [46]
    R. Xie, Z. Liu, F. Lin, and L. Lin, “Does william shakespeare REALLY write Hamlet? knowledge representation learning with confidence,” in Proc. 32nd AAAI Conf. Artif. Intell. 13th Innov. Appl. Artif. Intell. Conf. 8th AAAI Symp. Educ. Adv. Artif. Intell., 2018, pp. 4954–4961.
    [47]
    L. Xu and X. Zhou, “A crowd-powered task generation method for study of struggling search,” Data Sci. Eng., vol. 6, no. 4, pp. 472–484, 2021.
    [48]
    Y. Zhao, S. Gao, P. Gallinari, and J. Guo, “Knowledge base completion by learning pairwise-interaction differentiated embeddings. Data mining and knowledge discovery,” vol. 29, no. 5, pp. 1486–1504, 2015.
    [49]
    Y. Zheng, J. Wang, G. Li, R. Cheng, and J. Feng, “QASCA: A quality-aware task assignment system for crowdsourcing applications,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2015, pp. 1031–1046.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Knowledge and Data Engineering
    IEEE Transactions on Knowledge and Data Engineering  Volume 35, Issue 10
    Oct. 2023
    1088 pages

    Publisher

    IEEE Educational Activities Department

    United States

    Publication History

    Published: 01 October 2023

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media