research-article

From Appearance to Essence: Comparing Truth Discovery Methods without Using Ground Truth

Authors:

Xiu Susie Fang,

Wei Emma Zhang,

Anne H. H. Ngu,

Jian YangAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 11, Issue 6

Article No.: 74, Pages 1 - 24

https://doi.org/10.1145/3411749

Published: 11 September 2020 Publication History

Abstract

Truth discovery has been widely studied in recent years as a fundamental means for resolving the conflicts in multi-source data. Although many truth discovery methods have been proposed based on different considerations and intuitions, investigations show that no single method consistently outperforms the others. To select the right truth discovery method for a specific application scenario, it becomes essential to evaluate and compare the performance of different methods. A drawback of current research efforts is that they commonly assume the availability of certain ground truth for the evaluation of methods. However, the ground truth may be very limited or even impossible to obtain, rendering the evaluation biased. In this article, we present CompTruthHyp, a generic approach for comparing the performance of truth discovery methods without using ground truth. In particular, our approach calculates the probability of observations in a dataset based on the output of different methods. The probability is then ranked to reflect the performance of these methods. We review and compare 12 representative truth discovery methods and consider both single-valued and multi-valued objects. The empirical studies on both real-world and synthetic datasets demonstrate the effectiveness of our approach for comparing truth discovery methods.

References

[1]

Djamal Benslimane, Quan Z. Sheng, Mahmoud Barhamgi, and Henri Prade. 2016. The uncertain web: Concepts, challenges, and current solutions. ACM Trans. Internet Technol. 16, 1 (2016), 1:1--1:6.

Digital Library

[2]

Laure Berti-Équille. 2019. Truth Discovery. Springer International Publishing, Cham, 1--8.

[3]

Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30, 1–7 (1998), 107--117.

Digital Library

[4]

Anish Das Sarma, Xin Dong, and Alon Halevy. 2011. Data integration with dependent sources. In Proceedings of the 14th International Conference on Extending Database Technology (EDBT'11). 401--412.

Digital Library

[5]

Xin Luna Dong, Laure Berti-Equille, Yifan Hu, and Divesh Srivastava. 2010. Global detection of complex copying relationships between sources. Proc. VLDB Endow. 3, 1--2 (2010), 1358--1369.

Digital Library

[6]

Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Integrating conflicting data: The role of source dependence. Proc. VLDB Endow. 2, 1 (2009), 550--561.

Digital Library

[7]

Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Truth discovery and copying detection in a dynamic world. Proc. VLDB Endow. 2, 1 (2009), 562--573.

Digital Library

[8]

Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 601--610.

Digital Library

[9]

Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Kevin Murphy, Shaohua Sun, and Wei Zhang. 2014. From data fusion to knowledge fusion. Proc. VLDB Endow. 7, 10 (2014), 881--892.

Digital Library

[10]

Xin Luna Dong, Barna Saha, and Divesh Srivastava. 2012. Less is more: Selecting sources wisely for integration. Proc. VLDB Endow. 6, 2 (2012), 37--48.

Digital Library

[11]

Wenfei Fan. 2012. Data quality: Theory and practice. In Proceedings of the International Conference on Web-Age Information Management. 1--16.

[12]

Wenfei Fan, Floris Geerts, Shuai Ma, Nan Tang, and Wenyuan Yu. 2013. Data quality problems beyond consistency and duduplication. In Search of Elegance in the Theory and Practice of Computation: Essays Dedicated to Peter Buneman. Springer Berlin Heidelberg, 237--249.

[13]

Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Mahmoud Barhamgi, Lina Yao, and Anne H. H. Ngu. 2017. SourceVote: Fusing multi-valued data via inter-source agreements. In Proceedings of the 36th International Conference on Conceptual Modeling (ER’17). 164--172.

[14]

Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Dianhui Chu, and Anne H. H. Ngu. 2019. SmartVote: A full-fledged graph-based model for multi-valued truth discovery. World Wide Web J. 22, 4 (2019), 1855–1885.

Digital Library

[15]

Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, and Anne H. H. Ngu. 2017. Value veracity estimation for multi-truth objects via a graph-based approach. In Proceedings of the International World Wide Web Conference (WWW’17). 777--778.

[16]

Alban Galland, Serge Abiteboul, Amélie Marian, and Pierre Senellart. 2010. Corroborating information from disagreeing views. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM’10). 131--140.

Digital Library

[17]

David Gleich, Paul Constantine, Abraham Flaxman, and Asela Gunawardana. 2010. Tracking the random surfer: Empirically measured teleportation parameters in PageRank. In Proceedings of the International World Wide Web Conference (WWW’10). 381--390.

Digital Library

[18]

Manish Gupta, Yizhou Sun, and Jiawei Han. 2011. Trust analysis with clustering. In Proceedings of the International World Wide Web Conference (WWW’11). 53--54.

Digital Library

[19]

Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5 (1999), 604--632.

Digital Library

[20]

Qi Li, Yaliang Li, Jing Gao, Lu Su, Bo Zhao, Murat Demirbas, Wei Fan, and Jiawei Han. 2014. A confidence-aware approach for truth discovery on long-tail data. Proc. VLDB Endow. 8, 4 (2014).

Digital Library

[21]

Qi Li, Yaliang Li, Jing Gao, Bo Zhao, Wei Fan, and Jiawei Han. 2014. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1187--1198.

Digital Library

[22]

Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, and Divesh Srivastava. 2012. Truth finding on the deep web: Is the problem solved? Proc. VLDB Endow. 6, 2 (2012), 97--108.

Digital Library

[23]

Xian Li, Xin Luna Dong, Kenneth B. Lyons, Weiyi Meng, and Divesh Srivastava. 2015. Scaling up copy detection. In Proceedings of the IEEE International Conference on Data Engineering (ICDE’15). 89--100.

[24]

Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Lu Su, Bo Zhao, Wei Fan, and Jiawei Han. 2015. A survey on truth discovery. ACM SIGKDD Explor. Newslett. 17, 2 (2015), 1--16.

Digital Library

[25]

Yaliang Li, Chenglin Miao, Lu Su, Jing Gao, Qi Li, Bolin Ding, Zhan Qin, and Kui Ren. 2018. An efficient two-layer mechanism for privacy-preserving truth discovery. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery 8 Data Mining (KDD’18). 1705--1714.

Digital Library

[26]

Xueling Lin and Lei Chen. 2018. Domain-aware multi-truth discovery from conflicting sources. Proc. VLDB Endow. 11, 5 (2018), 635--647.

Digital Library

[27]

Xuan Liu, Xin Luna Dong, Beng Chin Ooi, and Divesh Srivastava. 2011. Online data fusion. Proc. VLDB Endow. 4, 11 (2011), 932--943.

Digital Library

[28]

J. Marshall, A. Argueta, and D. Wang. 2017. A neural network approach for truth discovery in social sensing. In Proceedings of the IEEE 14th International Conference on Mobile Ad Hoc and Sensor Systems (MASS’17). 343--347.

[29]

Chenglin Miao, Wenjun Jiang, Lu Su, Yaliang Li, Suxin Guo, Zhan Qin, Houping Xiao, Jing Gao, and Kui Ren. 2019. Privacy-preserving truth discovery in crowd sensing systems. ACM Trans. Sens. Netw. 15, 1 (2019).

Digital Library

[30]

Jeff Pasternack and Dan Roth. 2010. Comprehensive trust metrics for information networks. In Proceedings of the Army Science Conference.

[31]

Jeff Pasternack and Dan Roth. 2010. Knowing what to believe (when you already know something). In Proceedings of the International Conference on Computational Linguistics (COLING’10). 877--885.

[32]

Jeff Pasternack and Dan Roth. 2011. Making better informed trust decisions with generalized fact-finding. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’11). 2324--2329.

[33]

Jeff Pasternack and Dan Roth. 2013. Latent credibility analysis. In Proceedings of the International World Wide Web Conference (WWW’13). 1009--1020.

Digital Library

[34]

Ravali Pochampally, Anish Das Sarma, Xin Luna Dong, Alexandra Meliou, and Divesh Srivastava. 2014. Fusing data with correlations. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 433--444.

Digital Library

[35]

Kashyap Popat, Subhabrata Mukherjee, Jannik Strötgen, and Gerhard Weikum. 2017. Where the truth lies: Explaining the credibility of emerging claims on the web and social media. In Proceedings of the International World Wide Web Conference (WWW’17). 1003--1012.

Digital Library

[36]

Theodoros Rekatsinas, Xin Luna Dong, and Divesh Srivastava. 2014. Characterizing and selecting fresh data sources. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 919--930.

Digital Library

[37]

Dalia Attia Waguih and Laure Berti-Equille. 2014. Truth discovery algorithms: An experimental evaluation. Arxiv Preprint Arxiv:1409.6428 (2014).

[38]

Mengting Wan, Xiangyu Chen, Lance Kaplan, Jiawei Han, Jing Gao, and Bo Zhao. 2016. From truth discovery to trustworthy opinion discovery: An uncertainty-aware quantitative modeling approach. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1885--1894.

Digital Library

[39]

D. Wang, M. T. Amin, S. Li, T. Abdelzaher, L. Kaplan, S. Gu, C. Pan, H. Liu, C. C. Aggarwal, R. Ganti, X. Wang, P. Mohapatra, B. Szymanski, and H. Le. 2014. Using humans as sensors: An estimation-theoretic perspective. In Proceedings of the of the International Conference on Information Processing in Sensor Networks (IPSN’14). 35--46.

[40]

Xianzhi Wang, Quan Z. Sheng, Xiu Susie Fang, Lina Yao, Xiaofei Xu, and Xue Li. 2015. An integrated Bayesian approach for effective multi-truth discovery. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM’15). 493--502.

Digital Library

[41]

Xianzhi Wang, Quan Z. Sheng, Lina Yao, Xiu Susie Fang, Xiaofei Xu, and Boualem Benatallah. 2016. Truth discovery via exploiting implications from multi-source data. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM’16). 861--870.

Digital Library

[42]

Xianzhi Wang, Quan Z. Sheng, Lina Yao, Xue Li, Xiu Susie Fang, and Xiaofei Xu. 2016. Empowering truth discovery with multi-truth prediction. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM’16). 881--890.

Digital Library

[43]

Houping Xiao, Jing Gao, Qi Li, Fenglong Ma, Lu Su, Yunlong Feng, and Aidong Zhang. 2016. Towards confidence in the truth: A bootstrapping based truth discovery approach. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1935--1944.

Digital Library

[44]

H. Xiao, J. Gao, Q. Li, F. Ma, L. Su, Y. Feng, and A. Zhang. 2019. Towards confidence interval estimation in truth discovery. IEEE Trans. Knowl. Data Eng. 31, 3 (Mar. 2019), 575--588.

Digital Library

[45]

Xiaoxin Yin, Jiawei Han, and Philip S. Yu. 2008. Truth discovery with multiple conflicting information providers on the web. IEEE Trans. Knowl. Data Eng. 20, 6 (2008), 796--808.

Digital Library

[46]

Xiaoxin Yin and Wenzhao Tan. 2011. Semi-supervised truth discovery. In Proceedings of the International World Wide Web Conference (WWW’11). 217--226.

Digital Library

[47]

Dian Yu, Hongzhao Huang, Taylor Cassidy, Heng Ji, Chi Wang, Shi Zhi, Jiawei Han, Clare Voss, and Malik Magdon-Ismail. 2014. The wisdom of minority: Unsupervised slot filling validation based on multi-dimensional truth-finding. In Proceedings of the International Conference on Computational Linguistics (COLING’14). 1567--1578.

[48]

Hengtong Zhang, Qi Li, Fenglong Ma, Houping Xiao, Yaliang Li, Jing Gao, and Lu Su. 2016. Influence-aware truth discovery. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM’16). 851--860.

Digital Library

[49]

Bo Zhao and Jiawei Han. 2012. A probabilistic model for estimating real-valued truth from conflicting sources. In Proceedings of the International Workshop on Quality in DataBases (QDB’12) coheld with VLDB.

[50]

Bo Zhao, Benjamin I. P. Rubinstein, Jim Gemmell, and Jiawei Han. 2012. A Bayesian approach to discovering truth from conflicting sources for data integration. Proc. VLDB Endow. 5, 6 (2012), 550--561.

Digital Library

[51]

Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, and Reynold Cheng. 2017. Truth inference in crowdsourcing: Is the problem solved? Proc. VLDB Endow. 10, 5 (2017).

Digital Library

[52]

Shi Zhi, Fan Yang, Zheyi Zhu, Qi Li, Zhaoran Wang, and Jiawei Han. 2018. Dynamic truth discovery on numerical data. In Proceedings of the IEEE International Conference on Data Mining (ICDM’18). 817--826.

[53]

Shi Zhi, Bo Zhao, Wenzhu Tong, Jing Gao, Dian Yu, Heng Ji, and Jiawei Han. 2015. Modeling truth existence in truth discovery. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1543--1552.

Digital Library

Cited By

Wang HLiu AXiong NZhang SWang T(2024)TVD-RA: A Truthful Data Value Discovery-Based Reverse Auction Incentive System for Mobile CrowdsensingIEEE Internet of Things Journal10.1109/JIOT.2023.330807211:4(5826-5839)Online publication date: 15-Feb-2024
https://doi.org/10.1109/JIOT.2023.3308072
van Dijk MGeurtsen S(2023)Mapping Irrigated Areas in China Using a Synergy ApproachWater10.3390/w1509166615:9(1666)Online publication date: 25-Apr-2023
https://doi.org/10.3390/w15091666
Tang JFan KYin PQu ZLiu AXiong NWang TDong MZhang S(2023)DLFTI: A deep learning based fast truth inference mechanism for distributed spatiotemporal data in mobile crowd sensingInformation Sciences10.1016/j.ins.2023.119245644(119245)Online publication date: Oct-2023
https://doi.org/10.1016/j.ins.2023.119245

Index Terms

From Appearance to Essence: Comparing Truth Discovery Methods without Using Ground Truth
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
  2. Information systems applications
    1. Data mining

Recommendations

Truth Discovery from Conflicting Multi-Valued Objects
WWW '17 Companion: Proceedings of the 26th International Conference on World Wide Web Companion

Truth discovery is a fundamental research topic, which aims at identifying the true value(s) of objects of interest given the conflicting multi-sourced data. Although considerable research efforts have been conducted on this topic, we can still point ...
SmartVote: a full-fledged graph-based model for multi-valued truth discovery

In the era of Big Data, truth discovery has emerged as a fundamental research topic, which estimates data veracity by determining the reliability of multiple, often conflicting data sources. Although considerable research efforts have been conducted on ...
Poster: Impact of Ground Truth Errors on Wi-Fi Localization Accuracy
MobiSys '17: Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services

This study investigates the impact of small ground truth (GT) errors on indoor positioning systems based on Wi-Fi fingerprinting. The results demonstrate that even centimeter-scale GT deviations cause severe degradation of measured localization ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 11, Issue 6

Survey Paper and Regular Paper

December 2020

237 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/3424135

Editor:
Yu Zheng
JD Digits, China

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2020

Accepted: 01 July 2020

Revised: 01 July 2020

Received: 01 January 2020

Published in TIST Volume 11, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Australian Research Council (ARC)
Discovery Project

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
215
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)2

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang HLiu AXiong NZhang SWang T(2024)TVD-RA: A Truthful Data Value Discovery-Based Reverse Auction Incentive System for Mobile CrowdsensingIEEE Internet of Things Journal10.1109/JIOT.2023.330807211:4(5826-5839)Online publication date: 15-Feb-2024
https://doi.org/10.1109/JIOT.2023.3308072
van Dijk MGeurtsen S(2023)Mapping Irrigated Areas in China Using a Synergy ApproachWater10.3390/w1509166615:9(1666)Online publication date: 25-Apr-2023
https://doi.org/10.3390/w15091666
Tang JFan KYin PQu ZLiu AXiong NWang TDong MZhang S(2023)DLFTI: A deep learning based fast truth inference mechanism for distributed spatiotemporal data in mobile crowd sensingInformation Sciences10.1016/j.ins.2023.119245644(119245)Online publication date: Oct-2023
https://doi.org/10.1016/j.ins.2023.119245

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents