Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Surrogate Scoring Rules

Published: 15 February 2023 Publication History

Abstract

Strictly proper scoring rules (SPSR) are incentive compatible for eliciting information about random variables from strategic agents when the principal can reward agents after the realization of the random variables. They also quantify the quality of elicited information, with more accurate predictions receiving higher scores in expectation. In this article, we extend such scoring rules to settings in which a principal elicits private probabilistic beliefs but only has access to agents’ reports. We name our solution Surrogate Scoring Rules (SSR). SSR is built on a bias correction step and an error rate estimation procedure for a reference answer defined using agents’ reports. We show that, with a little information about the prior distribution of the random variables, SSR in a multi-task setting recover SPSR in expectation, as if having access to the ground truth. Therefore, a salient feature of SSR is that they quantify the quality of information despite the lack of ground truth, just as SPSR do for the setting with ground truth. As a by-product, SSR induce dominant uniform strategy truthfulness in reporting. Our method is verified both theoretically and empirically using data collected from real human forecasters.

References

[1]
Adam Altmejd, Anna Dreber, Eskil Forsell, Juergen Huber, Taisuke Imai, Magnus Johannesson, Michael Kirchler, Gideon Nave, and Colin Camerer. 2019. Predicting the replicability of social science lab experiments. PLOS One 14, 12 (2019).
[2]
Dana Angluin and Philip Laird. 1988. Learning from noisy examples. Machine Learning 2, 4 (1988), 343–370.
[3]
Pavel Atanasov, Phillip Rescober, Eric Stone, Samuel A. Swift, Emile Servan-Schreiber, Philip Tetlock, Lyle Ungar, and Barbara Mellers. 2016. Distilling the wisdom of crowds: Prediction markets vs. prediction polls. Management Science 63, 3 (2016), 691–706.
[4]
Glenn W. Brier. 1950. Verification of forecasts expressed in terms of probability. Monthly Weather Review 78, 1 (1950), 1–3.
[5]
Tom Bylander. 1994. Learning linear threshold functions in the presence of classification noise. In Proceedings of the 7th Annual Conference on Computational Learning Theory. ACM, 340–347.
[6]
Anirban Dasgupta and Arpita Ghosh. 2013. Crowdsourced judgement elicitation with endogenous proficiency. In Proceedings of the 22nd International Conference on World Wide Web. 319–330.
[7]
Darrell Duffie and Jun Pan. 1997. An overview of value at risk. Journal of Derivatives 4, 3 (1997), 7–49.
[8]
Alexander Frankel and Emir Kamenica. 2019. Quantifying information and uncertainty. American Economic Review 109, 10 (2019), 3650–80.
[9]
Benoît Frénay and Michel Verleysen. 2014. Classification in the presence of label noise: A survey. IEEE Transactions on Neural Networks and Learning Systems 25, 5 (2014), 845–869.
[10]
Jeffrey A. Friedman, Joshua D. Baker, Barbara A. Mellers, Philip E. Tetlock, and Richard Zeckhauser. 2018. The value of precision in probability assessment: Evidence from a large-scale geopolitical forecasting tournament. International Studies Quarterly 62, 2 (2018), 410–422.
[11]
Rafael Frongillo and Jens Witkowski. 2016. A geometric method to construct minimal peer prediction mechanisms. In 13th AAAI Conference on Artificial Intelligence.
[12]
Tilmann Gneiting and Adrian E. Raftery. 2005. Weather forecasting with ensemble methods. Science 310, 5746 (2005), 248–249.
[13]
Tilmann Gneiting and Adrian E. Raftery. 2007. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association 102, 477 (2007), 359–378.
[14]
Naman Goel and Boi Faltings. 2019. Deep Bayesian trust: A dominant and fair incentive mechanism for crowd. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 1996–2003.
[15]
Suzanne Hoogeveen, Alexandra Sarafoglou, and Eric-Jan Wagenmakers. 2019. Laypeople can predict which social science studies replicate. (2019).
[16]
IARPA. 2019. Hybrid Forecasting Competition. Retrieved October 13, 2022 from https://www.iarpa.gov/index.php/research-programs/hfc?id=661.
[17]
Victor Richmond Jose, Robert F. Nau, and Robert L. Winkler. 2006. Scoring Rules, Generalized Entropy and Utility Maximization. (2006). Working Paper, Fuqua School of Business, Duke University.
[18]
Radu Jurca and Boi Faltings. 2007. Collusion-resistant, incentive-compatible feedback payments. In Proceedings of the 8th ACM Conference on Electronic Commerce. ACM, 200–209.
[19]
Radu Jurca and Boi Faltings. 2009. Mechanisms for making crowds truthful. Journal of Artificial Intelligence Research 34 (2009), 209–253.
[20]
Yuqing Kong. 2020. Dominantly truthful multi-task peer prediction with a constant number of tasks. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 2398–2411.
[21]
Yuqing Kong, Katrina Ligett, and Grant Schoenebeck. 2016. Putting peer prediction under the micro (economic) scope and making truth-telling focal. In International Conference on Web and Internet Economics. Springer, 251–264.
[22]
Yuqing Kong and Grant Schoenebeck. 2018. Water from two rocks: Maximizing the mutual information. In Proceedings of the 2018 ACM Conference on Economics and Computation. 177–194.
[23]
Yuqing Kong and Grant Schoenebeck. 2019. An information theoretic framework for designing information elicitation mechanisms that reward truth-telling. ACM Transactions on Economics and Computation 7, 1 (2019), 2.
[24]
Yuqing Kong, Grant Schoenebeck, Biaoshuai Tao, and Fang-Yi Yu. 2020. Information elicitation mechanisms for statistical estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 2095–2102.
[25]
Yang Liu and Mingyan Liu. 2015. An online learning approach to improving the quality of crowd-sourcing. In Proceedings of the 2015 ACM SIGMETRICS (Portland, Oregon). ACM, New York, NY, 217–230.
[26]
John McCarthy. 1956. Measures of the value of information. PNAS: Proceedings of the National Academy of Sciences of the United States of America 42, 9 (1956), 654–655.
[27]
Aditya Menon, Brendan Van Rooyen, Cheng Soon Ong, and Bob Williamson. 2015. Learning from corrupted binary labels via class-probability estimation. In International Conference on Machine Learning. 125–134.
[28]
Nolan Miller, Paul Resnick, and Richard Zeckhauser. 2005. Eliciting informative feedback: The peer-prediction method. Management Science 51, 9 (2005), 1359–1373.
[29]
Nagarajan Natarajan, Inderjit S. Dhillon, Pradeep K. Ravikumar, and Ambuj Tewari. 2013. Learning with noisy labels. In Advances in Neural Information Processing Systems. 1196–1204.
[30]
Matthew Parry et al. 2016. Linear scoring rules for probabilistic binary classification. Electronic Journal of Statistics 10, 1 (2016), 1596–1607.
[31]
Dražen Prelec. 2004. A Bayesian truth serum for subjective data. Science 306, 5695 (2004), 462–466.
[32]
Dražen Prelec, H. Sebastian Seung, and John McCoy. 2017. A solution to the single-question crowd wisdom problem. Nature 541, 7638 (2017), 532.
[33]
Goran Radanovic and Boi Faltings. 2013. A robust Bayesian truth serum for non-binary signals. In Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI’13).
[34]
Goran Radanovic and Boi Faltings. 2014. Incentives for truthful information elicitation of continuous signals. In 28th AAAI Conference on Artificial Intelligence.
[35]
Goran Radanovic, Boi Faltings, and Radu Jurca. 2016. Incentives for effort in crowdsourcing using the peer truth serum. ACM Transactions on Intelligent Systems and Technology 7, 4 (2016), 48.
[36]
Blake Riley. 2014. Minimum truth serums with optional predictions. In Proceedings of the 4th Workshop on Social Computing and User Generated Content (SC’14).
[37]
Leonard J. Savage. 1971. Elicitation of personal probabilities and expectations. Journal of the American Statistical Association 66, 336 (1971), 783–801.
[38]
Grant Schoenebeck and Fang-Yi Yu. 2020. Learning and strongly truthful multi-task peer prediction: A variational approach. arXiv preprint arXiv:2009.14730 (2020).
[39]
Grant Schoenebeck and Fang-Yi Yu. 2020. Two strongly truthful mechanisms for three heterogeneous agents answering one question. In International Conference on Web and Internet Economics. Springer, 119–132.
[40]
Clayton Scott. 2015. A rate of convergence for mixture proportion estimation, with application to learning from noisy labels. In AISTATS.
[41]
Clayton Scott, Gilles Blanchard, Gregory Handy, Sara Pozzi, and Marek Flaska. 2013. Classification with asymmetric label noise: Consistency and maximal denoising. In COLT. 489–511.
[42]
Victor Shnayder, Arpit Agarwal, Rafael Frongillo, and David C. Parkes. 2016. Informed truthfulness in multi-task peer prediction. In Proceedings of the 2016 ACM Conference on Economics and Computation. ACM, 179–196.
[43]
Philip E. Tetlock, Barbara A. Mellers, Nick Rohrbaugh, and Eva Chen. 2014. Forecasting tournaments: Tools for increasing transparency and improving the quality of debate. Current Directions in Psychological Science 23, 4 (2014), 290–295.
[44]
Brendan van Rooyen and Robert C. Williamson. 2015. Learning in the presence of corruption. arXiv preprint:1504.00091 (2015).
[45]
Juntao Wang, Yang Liu, and Yiling Chen. 2019. Forecast aggregation via peer prediction. arXiv preprint arXiv:1910.03779 (2019).
[46]
Robert L. Winkler. 1969. Scoring rules and the evaluation of probability assessors. Journal of the American Statistical Association 64, 327 (1969), 1073–1078.
[47]
Jens Witkowski, Pavel Atanasov, Lyle H. Ungar, and Andreas Krause. 2017. Proper proxy scoring rules. In 31st AAAI Conference on Artificial Intelligence.
[48]
J. Witkowski and D. C. Parkes. 2012. Peer prediction without a common prior. In Proceedings of the 13th ACM Conference on Electronic Commerce (EC’12). ACM, 964–981.
[49]
Jens Witkowski and David C. Parkes. 2012. A robust Bayesian truth serum for small populations. In Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI’12).
[50]
Jens Witkowski and David C. Parkes. 2013. Learning the prior in minimal peer prediction. In Proceedings of the 3rd Workshop on Social Computing and User Generated Content at the ACM Conference on Electronic Commerce, Vol. 14.

Cited By

View all
  • (2024)Eliciting Informative Text Evaluations with Large Language ModelsProceedings of the 25th ACM Conference on Economics and Computation10.1145/3670865.3673532(582-612)Online publication date: 8-Jul-2024
  • (2024)Dominantly Truthful Peer Prediction Mechanisms with a Finite Number of TasksJournal of the ACM10.1145/363823971:2(1-49)Online publication date: 10-Apr-2024
  • (2024)On Truthful Item-Acquiring Mechanisms for Reward MaximizationProceedings of the ACM Web Conference 202410.1145/3589334.3645345(25-35)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Economics and Computation
ACM Transactions on Economics and Computation  Volume 10, Issue 3
September 2022
115 pages
ISSN:2167-8375
EISSN:2167-8383
DOI:10.1145/3572855
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 February 2023
Online AM: 08 October 2022
Accepted: 30 September 2022
Revised: 10 September 2022
Received: 15 August 2020
Published in TEAC Volume 10, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Strictly proper scoring rules
  2. information elicitation without verification
  3. peer prediction
  4. dominant strategy incentive compatibility
  5. information calibration

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • National Science Foundation (NSF)
  • Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA)
  • Defense Advanced Research Projects Agency (DARPA) and Space and Naval Warfare Systems Center Pacific (SSC Pacific)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)126
  • Downloads (Last 6 weeks)13
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Eliciting Informative Text Evaluations with Large Language ModelsProceedings of the 25th ACM Conference on Economics and Computation10.1145/3670865.3673532(582-612)Online publication date: 8-Jul-2024
  • (2024)Dominantly Truthful Peer Prediction Mechanisms with a Finite Number of TasksJournal of the ACM10.1145/363823971:2(1-49)Online publication date: 10-Apr-2024
  • (2024)On Truthful Item-Acquiring Mechanisms for Reward MaximizationProceedings of the ACM Web Conference 202410.1145/3589334.3645345(25-35)Online publication date: 13-May-2024
  • (2024)DLC: Dynamic Loss Correction for Cross-Domain Remotely Sensed SegmentationIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.340212762(1-14)Online publication date: 2024
  • (2023)The importance of human-labeled data in the era of LLMsProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/802(7026-7032)Online publication date: 19-Aug-2023
  • (2023)Game-theoretic mechanisms for eliciting accurate informationProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/740(6601-6609)Online publication date: 19-Aug-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media