Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Computational Fact Checking through Query Perturbations

Published: 09 January 2017 Publication History

Abstract

Our media is saturated with claims of “facts” made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, for example, is a claim “cherry-picking”? This article proposes a framework that models claims based on structured data as parameterized queries. Intuitively, with its choice of the parameter setting, a claim presents a particular (and potentially biased) view of the underlying data. A key insight is that we can learn a lot about a claim by “perturbing” its parameters and seeing how its conclusion changes. For example, a claim is not robust if small perturbations to its parameters can change its conclusions significantly. This framework allows us to formulate practical fact-checking tasks—reverse-engineering vague claims, and countering questionable claims—as computational problems. Along with the modeling framework, we develop an algorithmic framework that enables efficient instantiations of “meta” algorithms by supplying appropriate algorithmic building blocks. We present real-world examples and experiments that demonstrate the power of our model, efficiency of our algorithms, and usefulness of their results.

References

[1]
Charu C. Aggarwal (Ed.). 2009. Managing and Mining Uncertain Data. Springer.
[2]
Raju Balakrishnan and Subbarao Kambhampati. 2011. SourceRank: Relevance and trust assessment for deep web sources based on inter-source agreement. In Proceedings of the 2011 International Conference on World Wide Web. 227--236.
[3]
Philip A. Bernstein and Laura M. Haas. 2008. Information integration in the enterprise. Commun. ACM 51, 9 (2008), 72--79.
[4]
Stephan Börzsönyi, Donald Kossmann, and Konrad Stocker. 2001. The skyline operator. In Proceedings of the 2001 International Conference on Data Engineering. 421--430.
[5]
Christian Buchta. 1989. On the average number of maxima in a set of vectors. Inform. Process. Lett. 33, 2 (1989), 63--65.
[6]
Surajit Chaudhuri. 1990. Generalization and a framework for query modification. In Proceedings of the 6th International Conference on Data Engineering, 1990. IEEE, 138--145.
[7]
Bernard Chazelle. 1988. A functional approach to data structures and its use in multidimensional searching. SIAM J. Comput. 17, 3 (1988), 427--462.
[8]
Wesley W. Chu, Qiming Chen, and Rei-Chi Lee. 1991. Cooperative Query Answering via Type Abstraction Hierarchy. Springer.
[9]
Sarah Cohen, James T. Hamilton, and Fred Turner. 2011a. Computational journalism. Commun. ACM 54, 10 (2011), 66--71.
[10]
Sarah Cohen, Chengkai Li, Jun Yang, and Cong Yu. 2011b. Computational journalism: A call to arms to database researchers. In Proceedings of the 2011 Conference on Innovative Data Systems Research.
[11]
Harish D., Pooja N. Darera, and Jayant R. Haritsa. 2008. Identifying robust plans through plan diagram reduction. In Proceedings of the 2008 International Conference on Very Large Data Bases. 1124--1140.
[12]
Nilesh N. Dalvi, Christopher Ré, and Dan Suciu. 2009. Probabilistic databases: Diamonds in the dirt. Commun. ACM 52, 7 (2009), 86--94.
[13]
Anish Das Sarma, Aditya G. Parameswaran, Hector Garcia-Molina, and Jennifer Widom. 2010. Synthesizing view definitions from data. In Proceedings of the 2010 International Conference on Database Theory. 89--103.
[14]
Mark De Berg, Marc Van Kreveld, Mark Overmars, and Otfried Cheong Schwarzkopf. 2000. Computational Geometry. Springer.
[15]
AnHai Doan, Alon Halevy, and Zachary Ives. 2012. Principles of Data Integration (1st ed.). Morgan Kaufmann.
[16]
Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Integrating conflicting data: The role of source dependence. Proc. VLDB Endow. 2, 1 (2009), 550--561.
[17]
Ronald Fagin, Amnon Lotem, and Moni Naor. 2003. Optimal aggregation algorithms for middleware. J. Comput. System Sci. 66, 4 (2003), 614--656.
[18]
Sumit Ganguly. 1998. Design and analysis of parametric query optimization algorithms. In Proceedings of the 1998 International Conference on Very Large Data Bases. 228--238.
[19]
Jim Giles. 2012. Truth goggles. The New Scientist 2882 (Sept. 2012), 44--47.
[20]
Jim Gray, Adam Bosworth, Andrew Layman, and Hamid Pirahesh. 1996. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In Proceedings of the 1996 International Conference on Data Engineering. 152--159.
[21]
Dov Harel and Robert E. Tarjan. 1984. Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13, 2 (1984), 338--355.
[22]
Zhian He and Eric Lo. 2012. Answering why-not questions on top-k queries. In Proceedings of the 2012 International Conference on Data Engineering. 750--761.
[23]
Soon-Young Huh, Kae-Hyun Moon, and Hee-Seok Lee. 2000. A data abstraction approach for query relaxation. Inf. Softw. Technol. 42, 6 (2000), 407--418.
[24]
Arvind Hulgeri and S. Sudarshan. 2003. AniPQO: Almost non-intrusive parametric query optimization for nonlinear cost functions. In Proceedings of the 2003 International Conference on Very Large Data Bases. 766--777.
[25]
Yannis E. Ioannidis, Raymond T. Ng, Kyuseok Shim, and Timos K. Sellis. 1992. Parametric query optimization. In Proceedings of the 1992 International Conference on Very Large Data Bases. 103--114.
[26]
Ravi Jampani, Fei Xu, Mingxi Wu, Luis Leopoldo Perez, Chris Jermaine, and Peter J. Haas. 2011. The Monte Carlo database system: Stochastic analysis close to the data. ACM Trans. Database Syst. 36, 3 (2011), 18.
[27]
Christian S. Jensen and Richard Snodgrass. 1994. Temporal specialization and generalization. IEEE Trans. Knowl. Data Eng. 6, 6 (1994), 954--974.
[28]
Jia-Ling Koh, Kuang-Ting Chiang, and I.-Chih Chiu. 2013. The strategies for supporting query specialization and query generalization in social tagging systems. In Database Systems for Advanced Applications. Springer, 164--178.
[29]
Hsiang-Tsung Kung, Fabrizio Luccio, and Franco P. Preparata. 1975. On finding the maxima of a set of vectors. J. ACM 22, 4 (1975), 469--476.
[30]
Xian Li, Weiyi Meng, and Clement T. Yu. 2011. T.-verifier: Verifying truthfulness of fact statements. In Proceedings of the 2011 International Conference on Data Engineering. 63--74.
[31]
Yunyao Li, Ishan Chaudhuri, Huahai Yang, Satinder Singh, and H. V. Jagadish. 2007. DaNaLIX: A domain-adaptive natural language interface for querying XML. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. 1165--1168.
[32]
Yunyao Li, Huahai Yang, and H. V. Jagadish. 2006. Constructing a generic natural language interface for an XML database. In Proceedings of the 2006 International Conference on Extending Database Technology. 737--754.
[33]
Xika Lin, Abhishek Mukherji, Elke A. Rundensteiner, Carolina Ruiz, and Matthew O. Ward. 2013. PARAS: A parameter space framework for online association mining. Proc. VLDB Endow. 6, 3 (2013), 193--204.
[34]
Kurt Mehlhorn and Stefan Näher. 1990. Dynamic fractional cascading. Algorithmica 5, 1--4 (1990), 215--241.
[35]
Kyriakos Mouratidis and HweeHwa Pang. 2012. Computing immutable regions for subspace top-k queries. Proc.VLDB Endow. 6, 2 (2012), 73--84.
[36]
Ana-Maria Popescu, Oren Etzioni, and Henry A. Kautz. 2003. Towards a theory of natural language interfaces to databases. In Proceedings of the 2003 International Conference on Intelligent User Interfaces. 149--157.
[37]
Alexander J. Quinn and Benjamin B. Bederson. 2011. Human computation: A survey and taxonomy of a growing field. In Proceedings of the 2011 International Conference on Human Factors in Computing Systems. 1403--1412.
[38]
Sudeepa Roy and Dan Suciu. 2014. A formal approach to finding explanations for database queries. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 1579--1590.
[39]
Mohamed A. Soliman, Ihab F. Ilyas, Davide Martinenghi, and Marco Tagliasacchi. 2011. Ranking with uncertain scoring functions: Semantics and sensitivity measures. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. 805--816.
[40]
Robert Endre Tarjan. 1979. Applications of path compression on balanced trees. J. ACM 26, 4 (1979), 690--715.
[41]
Quoc Trung Tran and Chee-Yong Chan. 2010. How to ConQueR why-not questions. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 15--26.
[42]
Quoc Trung Tran, Chee-Yong Chan, and Srinivasan Parthasarathy. 2009. Query by output. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 535--548.
[43]
Brett Walenz and Jun Yang. 2016. Perturbation analysis of database queries. Proc. VLDB Endow 9, 14 (2016).
[44]
Eugene Wu and Samuel Madden. 2013. Scorpion: Explaining away outliers in aggregate queries. Proc. VLDB Endow. 6, 8 (June 2013), 553--564.
[45]
You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2012. On “one of the few” objects. In Proceedings of the 2012 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1487--1495.
[46]
You Wu, Brett Walenz, Peggy Li, Andrew Shim, Emre Sonmez, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2014. iCheck: Computationally combating lies, d--ned lies, and statistics. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1063--1066.
[47]
Yusuke Yamamoto and Katsumi Tanaka. 2009. Finding comparative facts and aspects for judging the credibility of uncertain facts. In Proceedings of the 2009 International Conference on Web Information Systems Engineering. 291--305.
[48]
Yusuke Yamamoto, Taro Tezuka, Adam Jatowt, and Katsumi Tanaka. 2008. Supporting judgment of fact trustworthiness considering temporal and sentimental aspects. In Proceedings of the 2008 International Conference on Web Information Systems Engineering. 206--220.
[49]
Albert Yu, Pankaj K. Agarwal, and Jun Yang. 2012. Processing a large number of continuous preference top-k queries. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 397--408.
[50]
Bo Zhao, Benjamin I. P. Rubinstein, Jim Gemmell, and Jiawei Han. 2012. A Bayesian approach to discovering truth from conflicting sources for data integration. Proc. VLDB Endow. 5, 6 (2012), 550--561.

Cited By

View all
  • (2024)"The Data Says Otherwise" — Towards Automated Fact-checking and Communication of Data ClaimsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676359(1-20)Online publication date: 13-Oct-2024
  • (2024)"Fact-checks are for the Top 0.1%": Examining Reach, Awareness, and Relevance of Fact-Checking in Rural IndiaProceedings of the ACM on Human-Computer Interaction10.1145/36373338:CSCW1(1-34)Online publication date: 26-Apr-2024
  • (2023)Generation of Training Examples for Tabular Natural Language InferenceProceedings of the ACM on Management of Data10.1145/36267301:4(1-27)Online publication date: 12-Dec-2023
  • Show More Cited By

Index Terms

  1. Computational Fact Checking through Query Perturbations

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Database Systems
    ACM Transactions on Database Systems  Volume 42, Issue 1
    Invited Paper from ICDT 2014, Invited Paper from EDBT 2015, Regular Papers and Technical Correspondence
    March 2017
    263 pages
    ISSN:0362-5915
    EISSN:1557-4644
    DOI:10.1145/3015779
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 January 2017
    Accepted: 01 September 2016
    Revised: 01 May 2016
    Received: 01 June 2015
    Published in TODS Volume 42, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Sensitivity analysis
    2. computational journalism
    3. fact checking

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)93
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 28 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)"The Data Says Otherwise" — Towards Automated Fact-checking and Communication of Data ClaimsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676359(1-20)Online publication date: 13-Oct-2024
    • (2024)"Fact-checks are for the Top 0.1%": Examining Reach, Awareness, and Relevance of Fact-Checking in Rural IndiaProceedings of the ACM on Human-Computer Interaction10.1145/36373338:CSCW1(1-34)Online publication date: 26-Apr-2024
    • (2023)Generation of Training Examples for Tabular Natural Language InferenceProceedings of the ACM on Management of Data10.1145/36267301:4(1-27)Online publication date: 12-Dec-2023
    • (2023)Maximizing Neutrality in News OrderingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599425(11-24)Online publication date: 6-Aug-2023
    • (2023)Data Ambiguity Profiling for the Generation of Training Examples2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00041(450-463)Online publication date: May-2023
    • (2022)Beyond facts – a survey and conceptualisation of claims in online discourse analysisSemantic Web10.3233/SW-21283813:5(793-827)Online publication date: 18-Aug-2022
    • (2022)OREOProceedings of the VLDB Endowment10.14778/3554821.355484615:12(3570-3573)Online publication date: 1-Aug-2022
    • (2022)On detecting cherry-picked generalizationsProceedings of the VLDB Endowment10.14778/3485450.348545715:1(59-71)Online publication date: 14-Jan-2022
    • (2021)Efficient Exploration of Interesting Aggregates in RDF GraphsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457307(392-404)Online publication date: 9-Jun-2021
    • (2021)Fact Checking: Detection of Check Worthy Statements Through Support Vector Machine and Feed Forward Neural NetworkAdvances in Information and Communication10.1007/978-3-030-73103-8_37(520-535)Online publication date: 16-Apr-2021
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media