Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3328526.3329589acmconferencesArticle/Chapter ViewAbstractPublication PagesecConference Proceedingsconference-collections
research-article
Open access

A Marketplace for Data: An Algorithmic Solution

Published: 17 June 2019 Publication History

Abstract

In this work, we aim to design a data marketplace; a robust real-time matching mechanism to efficiently buy and sell training data for Machine Learning tasks. While the monetization of data and pre-trained models is an essential focus of industry today, there does not exist a market mechanism to price training data and match buyers to sellers while still addressing the associated (computational and other) complexity. The challenge in creating such a market stems from the very nature of data as an asset: (i) it is freely replicable; (ii) its value is inherently combinatorial due to correlation with signal in other data; (iii) prediction tasks and the value of accuracy vary widely; (iv) usefulness of training data is difficult to verify a priori without first applying it to a prediction task. As our main contributions we: (i) propose a mathematical model for a two-sided data market and formally define the key associated challenges; (ii) construct algorithms for such a market to function and analyze how they meet the challenges defined. We highlight two technical contributions: (i) a new notion of "fairness" required for cooperative games with freely replicable goods; (ii) a truthful, zero regret mechanism to auction a class of combinatorial goods based on utilizing Myerson's payment function and the Multiplicative Weights algorithm. These might be of independent interest.

Supplementary Material

MP4 File (p701-agarwal.mp4)

References

[1]
Bill Aiello, Yuval Ishai, and Omer Reingold. 2001. Priced Oblivious Transfer: How to Sell Digital Goods. In International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 119--135.
[2]
Sanjeev Arora, Elad Hazan, and Satyen Kale. 2012. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications. Theory of Computing, Vol. 8, 6 (2012), 121--164.
[3]
Peter Auer. 2003. Using Confidence Bounds for Exploitation-exploration Trade-offs. Journal of Machine Learning Research, Vol. 3 (March 2003), 397--422. http://dl.acm.org/citation.cfm?id=944919.944941
[4]
Moshe Babaioff, Robert Kleinberg, and Renato Paes Leme. 2012. Optimal Mechanisms for Selling Information. In Proceedings of the 13th ACM Conference on Electronic Commerce (EC '12). ACM, New York, NY, USA, 92--109.
[5]
Yoram Bachrach, Evangelos Markakis, Ezra Resnick, Ariel D. Procaccia, Jeffrey S. Rosenschein, and Amin Saberi. 2010. Approximating power indices: theoretical and empirical analysis. Autonomous Agents and Multi-Agent Systems, Vol. 20, 2 (01 Mar 2010), 105--122.
[6]
Eric Balkanski, Umar Syed, and Sergei Vassilvitskii. 2017. Statistical Cost Sharing. In Advances in Neural Information Processing Systems 30. Curran Associates, Inc., 6221--6230. http://papers.nips.cc/paper/7202-statistical-cost-sharing.pdf
[7]
Dirk Bergemann, Alessandro Bonatti, and Alex Smolin. 2018. The Design and Price of Information. American Economic Review, Vol. 108, 1 (January 2018), 1--48.
[8]
Bernard Caillaud and Bruno Jullien. 2003. Chicken & Egg: Competition Among Intermediation Service Providers. The RAND Journal of Economics, Vol. 34, 2 (2003), 309--328. http://www.jstor.org/stable/1593720
[9]
Javier Castro, Daniel Gómez, and Juan Tejada. 2009. Polynomial Calculation of the Shapley Value Based on Sampling. Computer and Operations Research, Vol. 36, 5 (May 2009), 1726--1730.
[10]
Rachel Cummings, Katrina Ligett, Aaron Roth, Zhiwei Steven Wu, and Juba Ziani. 2015. Accuracy for Sale: Aggregating Data with a Variance Constraint. In Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science (ITCS '15). ACM, New York, NY, USA, 317--324.
[11]
Constantinos Daskalakis. 2015. Multi-item Auctions Defying Intuition? SIGecom Exch., Vol. 14, 1 (Nov. 2015), 41--75.
[12]
C. Daskalakis and V. Syrgkanis. 2016. Learning in Auctions: Regret is Hard, Envy is Easy. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS). 219--228.
[13]
Arpita Ghosh and Aaron Roth. 2011. Selling Privacy at Auction. In Proceedings of the 12th ACM Conference on Electronic Commerce (EC '11). ACM, New York, NY, USA, 199--208.
[14]
Renato Gomes. 2014. Optimal Auction Design in Two-Sided Markets. The RAND Journal of Economics, Vol. 45, 2 (2014), 248--272.
[15]
A Ardeshir Goshtasby. 2012. Similarity and Dissimilarity Measures. In Image Registration . Springer, 7--66.
[16]
Robin Hanson. 2012. Logarithmic markets coring rules for modular combinatorial information aggregation. The Journal of Prediction Markets, Vol. 1, 1 (2012), 3--15.
[17]
Elad Hazan et almbox. 2016. Introduction to online convex optimization. Foundations and Trends® in Optimization, Vol. 2, 3--4 (2016), 157--325.
[18]
Daniel P Heyman and Matthew J Sobel. 2004. Stochastic Models in Operations Research, Volume II. Stochastic Optimization. Vol. 2. Courier Corporation.
[19]
Katrina Ligett and Aaron Roth. 2012. Take It or Leave It: Running a Survey When Privacy Comes at a Cost. In Internet and Network Economics, Paul W. Goldberg (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 378--391.
[20]
De Liu and Jianqing Chen. 2006. Designing online auctions with past performance information. Decision Support Systems, Vol. 42, 3 (2006), 1307--1320.
[21]
Yungao Ma, Nengmin Wang, Ada Che, Yufei Huang, and Jinpeng Xu. 2013. The Bullwhip Effect on Product Orders and Inventory: A Perspective of Demand Forecasting Techniques. International Journal of Production Research, Vol. 51, 1 (2013), 281--302.
[22]
Sasan Maleki, Long Tran-Thanh, Greg Hines, Talal Rahwan, and Alex Rogers. 2013. Bounding the Estimation Error of Sampling-based Shapley Value Approximation With/Without Stratifying. CoRR, Vol. abs/1306.4265 (2013).
[23]
I Mann and LS Shapley. 1952. Values for large games IV: Evaluating the electoral college exactly . Technical Report. RAND Corp Santa Monica CA.
[24]
Aranyak Mehta et almbox. 2013. Online matching and ad allocation. Foundations and Trends® in Theoretical Computer Science, Vol. 8, 4 (2013), 265--368.
[25]
Roger B Myerson. 1981. Optimal auction design. Mathematics of operations research, Vol. 6, 1 (1981), 58--73.
[26]
John G Riley and William F Samuelson. 1981. Optimal Auctions. The American Economic Review, Vol. 71, 3 (1981), 381--392.
[27]
Jean-Charles Rochet and Jean Tirole. 2003. Platform Competition in Two-Sided Markets. Journal of the European Economic Association, Vol. 1, 4 (2003), 990--1029.
[28]
Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. 2018. Adversarially Robust Generalization Requires More Data. In Advances in Neural Information Processing Systems 31. Curran Associates, Inc., 5019--5031. http://papers.nips.cc/paper/7749-adversarially-robust-generalization-requires-more-data.pdf
[29]
LS Shapley. 1952. A VALUE FOR N-PERSON GAMES . Technical Report. RAND Corp Santa Monica CA.
[30]
Hal R Varian. 2009. Online Ad Auctions. American Economic Review, Vol. 99, 2 (2009), 430--34.
[31]
Justin Wolfers and Eric Zitzewitz. 2004. Prediction markets. Journal of economic perspectives, Vol. 18, 2 (2004), 107--126.
[32]
Weinan Zhang, Shuai Yuan, and Jun Wang. 2014. Optimal Real-time Bidding for Display Advertising. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '14). ACM, New York, NY, USA, 1077--1086.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EC '19: Proceedings of the 2019 ACM Conference on Economics and Computation
June 2019
947 pages
ISBN:9781450367929
DOI:10.1145/3328526
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data marketplaces
  2. online combinatorial auctions
  3. shapley value
  4. value of data

Qualifiers

  • Research-article

Funding Sources

  • MIT Institute for Data Systems and Society (IDSS) Thomson Reuters Fellowship
  • MIT Institute for Data Systems and Society (IDSS) WorldQuant Fellowship

Conference

EC '19
Sponsor:
EC '19: ACM Conference on Economics and Computation
June 24 - 28, 2019
AZ, Phoenix, USA

Acceptance Rates

EC '19 Paper Acceptance Rate 106 of 382 submissions, 28%;
Overall Acceptance Rate 664 of 2,389 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,011
  • Downloads (Last 6 weeks)120
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Bubbly DataSSRN Electronic Journal10.2139/ssrn.4790765Online publication date: 2024
  • (2024)Counterfactual Explanation of Shapley Value in Data CoalitionsProceedings of the VLDB Endowment10.14778/3681954.368200417:11(3332-3345)Online publication date: 1-Jul-2024
  • (2024)Performance-Based Pricing for Federated Learning via AuctionProceedings of the VLDB Endowment10.14778/3648160.364816917:6(1269-1282)Online publication date: 1-Feb-2024
  • (2024)Integrated utility and optimizing pricing of data productsSCIENTIA SINICA Informationis10.1360/SSI-2023-0277Online publication date: 28-Oct-2024
  • (2024)A Profit-Maximizing Data Marketplace with Differentially Private Federated Learning under Price CompetitionProceedings of the ACM on Management of Data10.1145/36771272:4(1-27)Online publication date: 30-Sep-2024
  • (2024)Artificial Intelligence for Web 3.0: A Comprehensive SurveyACM Computing Surveys10.1145/365728456:10(1-39)Online publication date: 14-May-2024
  • (2024)Data Acquisition for Improving Model ConfidenceProceedings of the ACM on Management of Data10.1145/36549342:3(1-25)Online publication date: 30-May-2024
  • (2024)Social Welfare Maximization for Federated Learning with Network EffectsProceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing10.1145/3641512.3686394(131-140)Online publication date: 14-Oct-2024
  • (2024)Fast Shapley Value Computation in Data Assemblage Tasks as Cooperative Simple GamesProceedings of the ACM on Management of Data10.1145/36393112:1(1-28)Online publication date: 26-Mar-2024
  • (2024)EcoVal: An Efficient Data Valuation Framework for Machine LearningProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672068(2866-2875)Online publication date: 25-Aug-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media