research-article

Knowledge-based trust: estimating the trustworthiness of web sources

Editors: Chen Li, Volker Markl Authors:

Evgeniy Gabrilovich,

Camillo Lugaresi,

Wei ZhangAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 8, Issue 9

Pages 938 - 949

https://doi.org/10.14778/2777598.2777603

Published: 01 May 2015 Publication History

Abstract

The quality of web sources has been traditionally evaluated using exogenous signals such as the hyperlink structure of the graph. We propose a new approach that relies on endogenous signals, namely, the correctness of factual information provided by the source. A source that has few false facts is considered to be trustworthy.

The facts are automatically extracted from each source by information extraction methods commonly used to construct knowledge bases. We propose a way to distinguish errors made in the extraction process from factual errors in the web source per se, by using joint inference in a novel multi-layer probabilistic model.

We call the trustworthiness score we computed Knowledge-Based Trust (KBT). On synthetic data, we show that our method can reliably compute the true trustworthiness levels of the sources. We then apply it to a database of 2.8B facts extracted from the web, and thereby estimate the trustworthiness of 119M webpages. Manual evaluation of a subset of the results confirms the effectiveness of the method.

References

[1]

J. Bleiholder and F. Naumann. Data fusion. ACM Computing Surveys, 41(1): 1--41, 2008.

Digital Library

[2]

K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, pages 1247--1250, 2008.

Digital Library

[3]

A. Borodin, G. Roberts, J. Rosenthal, and P. Tsaparas. Link analysis ranking: algorithms, theory, and experiments. TOIT, 5: 231--297, 2005.

Digital Library

[4]

S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7): 107--117, 1998.

Digital Library

[5]

C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: Web spam detection using the web topology. In SIGIR, 2007.

Digital Library

[6]

C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, and N. Weizenbaum. Flumejava: Easy, efficient data-parallel pipelines. In PLDI, pages 363--375, 2010.

Digital Library

[7]

X. L. Dong, L. Berti-Equille, Y. Hu, and D. Srivastava. Global detection of complex copying relationships between sources. PVLDB, 2010.

Digital Library

[8]

X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: the role of source dependence. PVLDB, 2(1), 2009.

Digital Library

[9]

X. L. Dong, L. Berti-Equille, and D. Srivastava. Truth discovery and copying detection in a dynamic world. PVLDB, 2(1), 2009.

Digital Library

[10]

X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In SIGKDD, 2014.

Digital Library

[11]

X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, K. Murphy, S. Sun, and W. Zhang. From data fusion to knowledge fusion. PVLDB, 2014.

Digital Library

[12]

X. L. Dong and F. Naumann. Data fusion--resolving data conflicts for integration. PVLDB, 2009.

Digital Library

[13]

X. L. Dong, B. Saha, and D. Srivastava. Less is more: Selecting sources wisely for integration. PVLDB, 6, 2013.

Digital Library

[14]

O. Etzioni, A. Fader, J. Christensen, S. Soderland, and Mausam. Open information extraction: the second generation. In IJCAI, 2011.

Digital Library

[15]

L. A. Galárraga, C. Teflioudi, K. Hose, and F. Suchanek. Amie: association rule mining under incomplete evidence in ontological knowledge bases. In WWW, pages 413--422, 2013.

Digital Library

[16]

Top 15 most popular celebrity gossip websites. http://www.ebizmba.com/articles/gossip-websites, 2014.

[17]

Z. Gyngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with TrustRank. In VLDB, pages 576--587, 2014.

Digital Library

[18]

S. Kamvar, M. Schlosser, and H. Garcia-Molina. The Eigentrust algorithm for reputation management in P2P networks. In WWW, 2003.

Digital Library

[19]

J. M. Kleinberg. Authoritative sources in a hyperlinked environment. In SODA, 1998.

Digital Library

[20]

V. Krishnan and R. Raj. Web spam detection with anti-trust rank. In AIRWeb, 2006.

[21]

Q. Li, Y. Li, J. Gao, B. Zhao, W. Fan, and J. Han. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In SIGMOD, pages 1187--1198, 2014.

Digital Library

[22]

X. Li, X. L. Dong, K. B. Lyons, W. Meng, and D. Srivastava. Truth finding on the Deep Web: Is the problem solved? PVLDB, 6(2), 2013.

Digital Library

[23]

X. Li, X. L. Dong, K. B. Lyons, W. Meng, and D. Srivastava. Scaling up copy detection. In ICDE, 2015.

[24]

J. Pasternack and D. Roth. Knowing what to believe (when you already know something). In COLING, pages 877--885, 2010.

Digital Library

[25]

J. Pasternack and D. Roth. Making better informed trust decisions with generalized fact-finding. In IJCAI, pages 2324--2329, 2011.

Digital Library

[26]

J. Pasternack and D. Roth. Latent credibility analysis. In WWW, 2013.

Digital Library

[27]

R. Pochampally, A. D. Sarma, X. L. Dong, A. Meliou, and D. Srivastava. Fusing data with correlations. In Sigmod, 2014.

Digital Library

[28]

A. Singh and L. Liu. TrustMe: anonymous management of trust relationshiops in decentralized P2P systems. In IEEE Intl. Conf. on Peer-to-Peer Computing, 2003.

Digital Library

[29]

M. Wu and A. Marian. Corroborating answers from multiple web sources. In Proc. of the WebDB Workshop, 2007.

[30]

X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. In Proc. of SIGKDD, 2007.

Digital Library

[31]

X. Yin and W. Tan. Semi-supervised truth discovery. In WWW, pages 217--226, 2011.

Digital Library

[32]

B. Zhao and J. Han. A probabilistic model for estimating real-valued truth from conflicting sources. In QDB, 2012.

[33]

B. Zhao, B. I. P. Rubinstein, J. Gemmell, and J. Han. A Bayesian approach to discovering truth from conflicting sources for data integration. PVLDB, 5(6): 550--561, 2012.

Digital Library

Cited By

Bai SWang DMuller TCheng PChen JDastani MSichman JAlechina NDignum V(2024)Stability of Weighted Majority Voting under Estimated WeightsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662856(96-104)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3662856
Zhu JMao YChen LGe CWei ZGao Y(2024)FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous DataProceedings of the VLDB Endowment10.14778/3648160.364817417:6(1337-1349)Online publication date: 3-May-2024
https://doi.org/10.14778/3648160.3648174
Advani RPapotti PAsudeh ASingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Maximizing Neutrality in News OrderingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599425(11-24)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599425
Show More Cited By

Index Terms

Knowledge-based trust: estimating the trustworthiness of web sources

Recommendations

Benevolence trust: a key determinant of user continuance use of online social networks

Online social networking (OSN) has attracted increased attention and growing membership in recent years. In this paper, we propose and test an extended and unified theory of acceptance and use of technology (UTAUT) model, including the additional areas ...
Does Technology Trust Substitute Interpersonal Trust?: Examining Technology Trust's Influence on Individual Decision-Making

While an increasing number of trust studies examine technological artifacts as trust recipients, there is still a lack of basic understanding of how technology trust relates to traditional trust and its role within the broader nomological net ...
How Do Institution-Based Trust and Interpersonal Trust Affect Interdepartmental Knowledge Sharing?

There are two typical forms of trust in organisational settings-institution-based trust and interpersonal trust. The role of interpersonal trust in promoting interdepartmental knowledge sharing has been investigated. The effect of institution-based ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 8, Issue 9

May 2015

76 pages

ISSN:2150-8097

Editors:
Chen Li
University of California, Irvine
,
Volker Markl
TU Berlin

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 May 2015

Published in PVLDB Volume 8, Issue 9

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

50
Total Citations
View Citations
564
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)6

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bai SWang DMuller TCheng PChen JDastani MSichman JAlechina NDignum V(2024)Stability of Weighted Majority Voting under Estimated WeightsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662856(96-104)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3662856
Zhu JMao YChen LGe CWei ZGao Y(2024)FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous DataProceedings of the VLDB Endowment10.14778/3648160.364817417:6(1337-1349)Online publication date: 3-May-2024
https://doi.org/10.14778/3648160.3648174
Advani RPapotti PAsudeh ASingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Maximizing Neutrality in News OrderingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599425(11-24)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599425
Weck MAfanassieva M(2023)Toward the adoption of digital assistive technologyTelecommunications Policy10.1016/j.telpol.2022.10248347:2Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1016/j.telpol.2022.102483
Ilyas IRekatsinas TKonda VPound JQi XSoliman MIves ZBonifati AEl Abbadi A(2022)Saga: A Platform for Continuous Construction and Serving of Knowledge at ScaleProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526049(2259-2272)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526049
Zhang DVakili Tahami AAbualsaud MSmucker MAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)Learning Trustworthy Web Sources to Derive Correct Answers and Reduce Health Misinformation in SearchProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531812(2099-2104)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531812
Azzalini FPiantella DRabosio ETanca L(2022)Enhancing domain-aware multi-truth data fusion using copy-based source authority and value similarityThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00757-x32:3(475-500)Online publication date: 19-Jul-2022
https://dl.acm.org/doi/10.1007/s00778-022-00757-x
Metzler DTay YBahri DNajork M(2021)Rethinking searchACM SIGIR Forum10.1145/3476415.347642855:1(1-27)Online publication date: 16-Jul-2021
https://dl.acm.org/doi/10.1145/3476415.3476428
Heist NPaulheim H(2021)Information Extraction From Co-Occurring Similar EntitiesProceedings of the Web Conference 202110.1145/3442381.3449836(3999-4009)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3449836
Asudeh AJagadish HWu YYu C(2020)On detecting cherry-picked trendlinesProceedings of the VLDB Endowment10.14778/3380750.338076213:6(939-952)Online publication date: 1-Feb-2020
https://dl.acm.org/doi/10.14778/3380750.3380762
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents