Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3565011.3569054acmconferencesArticle/Chapter ViewAbstractPublication PagesconextConference Proceedingsconference-collections
research-article

Try before you buy: a practical data purchasing algorithm for real-world data marketplaces

Published: 06 December 2022 Publication History

Abstract

Data trading is becoming increasingly popular, as evident by the appearance of scores of data marketplaces (DMs) in the last few years satisfying the demand for third-party data. For buyers, however, deciding whether paying the requested price makes sense can only be done after having tested the data on their ML model. In this paper, we propose a method for optimizing data purchasing decisions. We show that if a marketplace provides to potential buyers a measure of the performance of their models on individual datasets, then they can select which of them to buy with an efficacy that approximates that of knowing the performance of each possible combination of datasets offered by the DM. We call the resulting algorithm Try Before You Buy (TBYB) and demonstrate over synthetic and real-world datasets how TBYB can lead to near optimal data purchasing with only O(N) instead of O(2N) information and execution time.

References

[1]
Advaneo. 2022. Advaneo. Access to the world of data. https://www.advaneo-datamarketplace.de/. Last accessed: Sep'22.
[2]
A. Agarwal, M. Dahleh, and T. Sarkar. 2019. A Marketplace for Data: An Algorithmic Solution. In Proc. of ACM EC'19.
[3]
S. Andrés and N. Laoutaris. 2022. A Survey of Data Marketplaces and Their Business Models. ACM SIGMOD Record 51, 3 (2022).
[4]
S. Andrés Azcoitia, C. Iordanou, and N. Laoutaris. 2021. What Is the Price of Data? A Measurement Study of Commercial Data Marketplaces. (2021). arXiv:2111.04427
[5]
S. Andrés Azcoitia, M. Paraschiv, and N. Laoutaris. 2022. Computing the Relative Value of Spatio-Temporal Data in Data Marketplaces. SIGSPATIAL'22 (2022).
[6]
Battlefin. 2022. Better Your Investments Using Alternative Data. https://www.battlefin.com/. Last accessed: Sep'22.
[7]
Y. M. Brovman, M. Jacob, N. Srinivasan, S. Neola, D. Galron, R. Snyder, and P. Wang. 2016. Optimizing Similar Item Recommendations in a Semi-Structured Marketplace to Maximize Conversion. In Proc. of RecSys'16.
[8]
CARTO. 2022. Marketplace. https://marketplace.carto.com/me. Last accessed: Sep'22.
[9]
J. Castro, D. Gomez, and J. Tejada. 2009. Polynomial calculation of the Shapley value based on sampling. Computers and Operations Research 36 (05 2009).
[10]
R. Castro Fernandez. 2022. Protecting Data Markets from Strategic Buyers. In Proceedings of SIGMOD'22.
[11]
R. Castro Fernandez, P. Subramaniam, and M. J. Franklin. 2020. Data Market Platforms: Trading Data Assets to Solve Data Problems. Proc. VLDB Endow. 13, 12 (2020).
[12]
S. Chawla, S. Deep, P. Koutris, and Y. Teng. 2019. Revenue maximization for query pricing. Proc. VLDB Endow. 13 (09 2019).
[13]
L. Chen, P. Koutris, and A. Kumar. 2019. Towards Model-Based Pricing for Machine Learning in a Data Marketplace. In Proc. of SIGMOD'19. ACM.
[14]
M. Dahleh. 2018. Why the Data Marketplaces of the Future Will Sell Insights, Not Data.
[15]
EU. 2016. General Data Protection Regulation (GDPR).
[16]
GeoDB. 2022. A Decentralized Big Data Ecosystem That Rewards You For The Data You Generate. https://geodb.com/. Last accessed: Sep'22.
[17]
A. Ghorbani and J. Zou. 2019. Data Shapley: Equitable Valuation of Data for Machine Learning. (04 2019).
[18]
S&P Global. 2022. Marketplace. https://www.marketplace.spglobal.com/en/. Last accessed: Sep'22.
[19]
IOTA. 2022. IOTA data marketplace. https://data.iota.org/. Last accessed: Oct'22.
[20]
R. Jia, D. Dao, B. Wang, F. A. Hubis, N. Hynes, N. M. Gürel, B. Li, C. Zhang, D. Song, and C. J. Spanos. 2019. Towards Efficient Data Valuation Based on the Shapley Value (Proc. of ML Research, Vol. 89).
[21]
Kaggle. 2015 (accessed Sep'22). ECML/PKDD 15: Taxi Trajectory Prediction. https://www.kaggle.com/c/pkdd-15-predict-taxi-service-trajectory-i/data
[22]
N. Kourtellis, K. Katevas, and D. Perino. 2020. FLaaS: Federated Learning as a Service. In Proc. of Workshop on Distributed Machine Learning.
[23]
P. Koutris, P. Upadhyaya, M. Balazinska, B. Howe, and D. Suciu. 2015. Query-Based Data Pricing. J. ACM 62, 5 (2015).
[24]
J. Lanier. 2013. Who Owns the Future? SIMON and SCHUSTER.
[25]
D. Moor. 2019. Data Markets with Dynamic Arrival of Buyers and Sellers. In Proc. of NetEcon '19.
[26]
State of California. 2018. California Consumer Privacy Act (CCPA).
[27]
City of Chicago. 2019 (accessed Sep'22). Taxi Trips. https://data.cityofchicago.org/Transportation/Taxi-Trips/wrvz-psew
[28]
O. Ohrimenko, S. Tople, and S. Tschiatschek. 2019. Collaborative Machine Learning Markets with Data-Replication-Robust Payments. ArXiv (2019). arXiv:1911.09052
[29]
Otonomo. 2022. One-Stop Shop for Vehicle Data. https://otonomo.io/. Last accessed: Sep'22.
[30]
Hubert P. and Ricco G. 2018. Imperfect information in macroeconomics. Sciences Po publications (2018).
[31]
J. Pei. 2020. Data Pricing - From Economics to Data Science. In Proc. of SIGKDD'20. ACM.
[32]
E. Posner and G. Weyl. 2018. Radical Markets. Uprooting Capitalism and Democracy for a Just Society. Princeton Univ. Press.
[33]
Refinitiv. 2022. Data Catalog. https://www.refinitiv.com/en/financial-data. Last accessed: Sep'22.
[34]
B. Rozemberczki, L. Watson, P. Bayer, H. Yang, O. Kiss, S. Nilsson, and R. Sarkar. 2022. The Shapley Value in Machine Learning. arXiv:2202.05594
[35]
Amazon Web Services. 2022. AWS Marketplace. https://aws.amazon.com/marketplace. Last accessed: Sep'22.
[36]
Lloyd S. Shapley. 1952. A Value for n-Person Games. (1952). https://www.rand.org/pubs/papers/P0295.html
[37]
Y. Shen, B. Guo, Y. Shen, X. Duan, X. Dong, and H. Zhang. 2016. A pricing model for Big Personal Data. Tsinghua Science and Technology 21 (10 2016), 482--490.
[38]
Shutterstock. 2022. Shutterstock. https://www.shutterstock.com/. Last accessed: Oct'22.
[39]
Snowflake. 2022. Marketplace. https://www.snowflake.com/marketplace/. Last accessed: Sep'22.
[40]
Yan T. and Procaccia A. 2020. If You Like Shapley Then You'll Love the Core.
[41]
TAUS. 2022. Data Marketplace. https://datamarketplace.taus.net/. Last accessed: Oct'22.
[42]
J. Yang, C. Zhao, and C. Xing. 2019. Big Data Market Optimization Pricing Model Based on Data Quality. Complexity 2019 (04 2019).

Cited By

View all
  • (2025)Survey of Artificial Intelligence Model MarketplaceFuture Internet10.3390/fi1701003517:1(35)Online publication date: 14-Jan-2025
  • (2024)Value is in the Eye of the Beholder: A Framework for an Equitable Graph Data EvaluationProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658919(467-479)Online publication date: 3-Jun-2024
  • (2024)Recurrent neural network for solving several novel sales models: Try-Before-You-BuyInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02479-8Online publication date: 6-Dec-2024
  • Show More Cited By

Index Terms

  1. Try before you buy: a practical data purchasing algorithm for real-world data marketplaces

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      DE '22: Proceedings of the 1st International Workshop on Data Economy
      December 2022
      70 pages
      ISBN:9781450399234
      DOI:10.1145/3565011
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 December 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. data economy
      2. data marketplaces
      3. data purchasing
      4. value of data

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      CoNEXT '22
      Sponsor:

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)58
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 16 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Survey of Artificial Intelligence Model MarketplaceFuture Internet10.3390/fi1701003517:1(35)Online publication date: 14-Jan-2025
      • (2024)Value is in the Eye of the Beholder: A Framework for an Equitable Graph Data EvaluationProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658919(467-479)Online publication date: 3-Jun-2024
      • (2024)Recurrent neural network for solving several novel sales models: Try-Before-You-BuyInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02479-8Online publication date: 6-Dec-2024
      • (2023)Data Product-Oriented Services for Data Ecosystem2023 IEEE International Conference on Web Services (ICWS)10.1109/ICWS60048.2023.00102(755-762)Online publication date: Jul-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media