research-article

Open access

A Marketplace for Data: An Algorithmic Solution

Authors:

Munther Dahleh,

Tuhin SarkarAuthors Info & Claims

EC '19: Proceedings of the 2019 ACM Conference on Economics and Computation

Pages 701 - 726

https://doi.org/10.1145/3328526.3329589

Published: 17 June 2019 Publication History

Abstract

In this work, we aim to design a data marketplace; a robust real-time matching mechanism to efficiently buy and sell training data for Machine Learning tasks. While the monetization of data and pre-trained models is an essential focus of industry today, there does not exist a market mechanism to price training data and match buyers to sellers while still addressing the associated (computational and other) complexity. The challenge in creating such a market stems from the very nature of data as an asset: (i) it is freely replicable; (ii) its value is inherently combinatorial due to correlation with signal in other data; (iii) prediction tasks and the value of accuracy vary widely; (iv) usefulness of training data is difficult to verify a priori without first applying it to a prediction task. As our main contributions we: (i) propose a mathematical model for a two-sided data market and formally define the key associated challenges; (ii) construct algorithms for such a market to function and analyze how they meet the challenges defined. We highlight two technical contributions: (i) a new notion of "fairness" required for cooperative games with freely replicable goods; (ii) a truthful, zero regret mechanism to auction a class of combinatorial goods based on utilizing Myerson's payment function and the Multiplicative Weights algorithm. These might be of independent interest.

Supplementary Material

MP4 File (p701-agarwal.mp4)

Download
1564.81 MB

References

[1]

Bill Aiello, Yuval Ishai, and Omer Reingold. 2001. Priced Oblivious Transfer: How to Sell Digital Goods. In International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 119--135.

Digital Library

[2]

Sanjeev Arora, Elad Hazan, and Satyen Kale. 2012. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications. Theory of Computing, Vol. 8, 6 (2012), 121--164.

[3]

Peter Auer. 2003. Using Confidence Bounds for Exploitation-exploration Trade-offs. Journal of Machine Learning Research, Vol. 3 (March 2003), 397--422. http://dl.acm.org/citation.cfm?id=944919.944941

Digital Library

[4]

Moshe Babaioff, Robert Kleinberg, and Renato Paes Leme. 2012. Optimal Mechanisms for Selling Information. In Proceedings of the 13th ACM Conference on Electronic Commerce (EC '12). ACM, New York, NY, USA, 92--109.

Digital Library

[5]

Yoram Bachrach, Evangelos Markakis, Ezra Resnick, Ariel D. Procaccia, Jeffrey S. Rosenschein, and Amin Saberi. 2010. Approximating power indices: theoretical and empirical analysis. Autonomous Agents and Multi-Agent Systems, Vol. 20, 2 (01 Mar 2010), 105--122.

[6]

Eric Balkanski, Umar Syed, and Sergei Vassilvitskii. 2017. Statistical Cost Sharing. In Advances in Neural Information Processing Systems 30. Curran Associates, Inc., 6221--6230. http://papers.nips.cc/paper/7202-statistical-cost-sharing.pdf

Digital Library

[7]

Dirk Bergemann, Alessandro Bonatti, and Alex Smolin. 2018. The Design and Price of Information. American Economic Review, Vol. 108, 1 (January 2018), 1--48.

[8]

Bernard Caillaud and Bruno Jullien. 2003. Chicken & Egg: Competition Among Intermediation Service Providers. The RAND Journal of Economics, Vol. 34, 2 (2003), 309--328. http://www.jstor.org/stable/1593720

[9]

Javier Castro, Daniel Gómez, and Juan Tejada. 2009. Polynomial Calculation of the Shapley Value Based on Sampling. Computer and Operations Research, Vol. 36, 5 (May 2009), 1726--1730.

Digital Library

[10]

Rachel Cummings, Katrina Ligett, Aaron Roth, Zhiwei Steven Wu, and Juba Ziani. 2015. Accuracy for Sale: Aggregating Data with a Variance Constraint. In Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science (ITCS '15). ACM, New York, NY, USA, 317--324.

Digital Library

[11]

Constantinos Daskalakis. 2015. Multi-item Auctions Defying Intuition? SIGecom Exch., Vol. 14, 1 (Nov. 2015), 41--75.

Digital Library

[12]

C. Daskalakis and V. Syrgkanis. 2016. Learning in Auctions: Regret is Hard, Envy is Easy. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS). 219--228.

[13]

Arpita Ghosh and Aaron Roth. 2011. Selling Privacy at Auction. In Proceedings of the 12th ACM Conference on Electronic Commerce (EC '11). ACM, New York, NY, USA, 199--208.

Digital Library

[14]

Renato Gomes. 2014. Optimal Auction Design in Two-Sided Markets. The RAND Journal of Economics, Vol. 45, 2 (2014), 248--272.

[15]

A Ardeshir Goshtasby. 2012. Similarity and Dissimilarity Measures. In Image Registration . Springer, 7--66.

[16]

Robin Hanson. 2012. Logarithmic markets coring rules for modular combinatorial information aggregation. The Journal of Prediction Markets, Vol. 1, 1 (2012), 3--15.

[17]

Elad Hazan et almbox. 2016. Introduction to online convex optimization. Foundations and Trends® in Optimization, Vol. 2, 3--4 (2016), 157--325.

Digital Library

[18]

Daniel P Heyman and Matthew J Sobel. 2004. Stochastic Models in Operations Research, Volume II. Stochastic Optimization. Vol. 2. Courier Corporation.

[19]

Katrina Ligett and Aaron Roth. 2012. Take It or Leave It: Running a Survey When Privacy Comes at a Cost. In Internet and Network Economics, Paul W. Goldberg (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 378--391.

Digital Library

[20]

De Liu and Jianqing Chen. 2006. Designing online auctions with past performance information. Decision Support Systems, Vol. 42, 3 (2006), 1307--1320.

Digital Library

[21]

Yungao Ma, Nengmin Wang, Ada Che, Yufei Huang, and Jinpeng Xu. 2013. The Bullwhip Effect on Product Orders and Inventory: A Perspective of Demand Forecasting Techniques. International Journal of Production Research, Vol. 51, 1 (2013), 281--302.

[22]

Sasan Maleki, Long Tran-Thanh, Greg Hines, Talal Rahwan, and Alex Rogers. 2013. Bounding the Estimation Error of Sampling-based Shapley Value Approximation With/Without Stratifying. CoRR, Vol. abs/1306.4265 (2013).

[23]

I Mann and LS Shapley. 1952. Values for large games IV: Evaluating the electoral college exactly . Technical Report. RAND Corp Santa Monica CA.

[24]

Aranyak Mehta et almbox. 2013. Online matching and ad allocation. Foundations and Trends® in Theoretical Computer Science, Vol. 8, 4 (2013), 265--368.

Digital Library

[25]

Roger B Myerson. 1981. Optimal auction design. Mathematics of operations research, Vol. 6, 1 (1981), 58--73.

Digital Library

[26]

John G Riley and William F Samuelson. 1981. Optimal Auctions. The American Economic Review, Vol. 71, 3 (1981), 381--392.

[27]

Jean-Charles Rochet and Jean Tirole. 2003. Platform Competition in Two-Sided Markets. Journal of the European Economic Association, Vol. 1, 4 (2003), 990--1029.

[28]

Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. 2018. Adversarially Robust Generalization Requires More Data. In Advances in Neural Information Processing Systems 31. Curran Associates, Inc., 5019--5031. http://papers.nips.cc/paper/7749-adversarially-robust-generalization-requires-more-data.pdf

Digital Library

[29]

LS Shapley. 1952. A VALUE FOR N-PERSON GAMES . Technical Report. RAND Corp Santa Monica CA.

[30]

Hal R Varian. 2009. Online Ad Auctions. American Economic Review, Vol. 99, 2 (2009), 430--34.

[31]

Justin Wolfers and Eric Zitzewitz. 2004. Prediction markets. Journal of economic perspectives, Vol. 18, 2 (2004), 107--126.

[32]

Weinan Zhang, Shuai Yuan, and Jun Wang. 2014. Optimal Real-time Bidding for Display Advertising. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '14). ACM, New York, NY, USA, 1077--1086.

Digital Library

Cited By

Dong FLi J(2024)Bubbly DataSSRN Electronic Journal10.2139/ssrn.4790765Online publication date: 2024
https://doi.org/10.2139/ssrn.4790765
Si MPei J(2024)Counterfactual Explanation of Shapley Value in Data CoalitionsProceedings of the VLDB Endowment10.14778/3681954.368200417:11(3332-3345)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682004
Li ZDing BYao LLi YXiao XZhou J(2024)Performance-Based Pricing for Federated Learning via AuctionProceedings of the VLDB Endowment10.14778/3648160.364816917:6(1269-1282)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.14778/3648160.3648169
Show More Cited By

Index Terms

A Marketplace for Data: An Algorithmic Solution
1. Theory of computation
  1. Theory and algorithms for application domains
    1. Algorithmic game theory and mechanism design

Recommendations

Name Your Own Price on Data Marketplaces

A novel approach to pricing on data marketplaces is proposed, which is based on the Name Your Own Price (NYOP) principle: customers suggest their own price for a (relational) data product and in return receive a custom-tailored one. The result is a fair ...
Fair profit allocation in the spectrum auction using the Shapley value
GLOBECOM'09: Proceedings of the 28th IEEE conference on Global telecommunications

Microeconomics-inspired spectrum auctions can effectively improve the spectrum utilization for wireless networks to satisfy the ever increasing service demands. Considering the spatial reuse, the bidding nodes without mutual interference are grouped as ...
Internet economics: the use of Shapley value for ISP settlement

Within the current Internet, autonomous ISPs implement bilateral agreements, with each ISP establishing agreements that suit its own local objective to maximize its profit. Peering agreements based on local views and bilateral settlements, while ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

EC '19: Proceedings of the 2019 ACM Conference on Economics and Computation

June 2019

947 pages

ISBN:9781450367929

DOI:10.1145/3328526

General Chair:
Anna Karlin
University of Washington, USA
,
Program Chairs:
Nicole Immorlica
Microsoft Research, USA
,
Ramesh Johari
Stanford University, USA

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGecom: Special Interest Group on Economics and Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

MIT Institute for Data Systems and Society (IDSS) Thomson Reuters Fellowship
MIT Institute for Data Systems and Society (IDSS) WorldQuant Fellowship

Conference

EC '19

Sponsor:

SIGecom

EC '19: ACM Conference on Economics and Computation

June 24 - 28, 2019

AZ, Phoenix, USA

Acceptance Rates

EC '19 Paper Acceptance Rate 106 of 382 submissions, 28%;

Overall Acceptance Rate 664 of 2,389 submissions, 28%

Upcoming Conference

EC '25

Sponsor:
sigecom

The 25th ACM Conference on Economics and Computation

July 7 - 11, 2025

Stanford , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

135
Total Citations
View Citations
3,623
Total Downloads

Downloads (Last 12 months)1,107
Downloads (Last 6 weeks)126

Reflects downloads up to 28 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dong FLi J(2024)Bubbly DataSSRN Electronic Journal10.2139/ssrn.4790765Online publication date: 2024
https://doi.org/10.2139/ssrn.4790765
Si MPei J(2024)Counterfactual Explanation of Shapley Value in Data CoalitionsProceedings of the VLDB Endowment10.14778/3681954.368200417:11(3332-3345)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682004
Li ZDing BYao LLi YXiao XZhou J(2024)Performance-Based Pricing for Federated Learning via AuctionProceedings of the VLDB Endowment10.14778/3648160.364816917:6(1269-1282)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.14778/3648160.3648169
WANG YLI JHAO J(2024)Integrated utility and optimizing pricing of data productsSCIENTIA SINICA Informationis10.1360/SSI-2023-027754:11(2533)Online publication date: 28-Oct-2024
https://doi.org/10.1360/SSI-2023-0277
Sun PWu LWang ZLiu JLuo JJin W(2024)A Profit-Maximizing Data Marketplace with Differentially Private Federated Learning under Price CompetitionProceedings of the ACM on Management of Data10.1145/36771272:4(1-27)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3677127
Shen MTan ZNiyato DLiu YKang JXiong ZZhu LWang WShen X(2024)Artificial Intelligence for Web 3.0: A Comprehensive SurveyACM Computing Surveys10.1145/365728456:10(1-39)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3657284
Li YYu XKoudas N(2024)Data Acquisition for Improving Model ConfidenceProceedings of the ACM on Management of Data10.1145/36549342:3(1-25)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654934
Li XLuo YLuo BHuang JPapavassiliou SSchmid S(2024)Social Welfare Maximization for Federated Learning with Network EffectsProceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing10.1145/3641512.3686394(131-140)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3641512.3686394
Luo XPei JXu CZhang WXu J(2024)Fast Shapley Value Computation in Data Assemblage Tasks as Cooperative Simple GamesProceedings of the ACM on Management of Data10.1145/36393112:1(1-28)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639311
Tarun AChundawat VMandal MTan HChen BKankanhalli MBaeza-Yates RBonchi F(2024)EcoVal: An Efficient Data Valuation Framework for Machine LearningProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672068(2866-2875)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3672068
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents