Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3557915.3561470acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

Computing the relative value of spatio-temporal data in data marketplaces

Published: 22 November 2022 Publication History

Abstract

Spatio-temporal information is used for driving a plethora of intelligent transportation, smart-city and crowd-sensing applications. Data is now a valuable production factor and data marketplaces have appeared to help individuals and enterprises bring it to market and the ever-growing demand. Such marketplaces are able to combine data from different sources to meet the requirements of different applications. In this paper we study the problem of estimating the relative value of spatio-temporal datasets combined in marketplaces for predicting transportation demand and travel time in metropolitan areas. Using large datasets of taxi rides from Chicago, Porto and New York we show that simplistic but popular approaches for estimating the relative value of data, such as splitting it equally among the data sources, more complex ones based on volume or the "leave-one-out" heuristic, are inaccurate. Instead, more complex notions of value from economics and game-theory, such as the Shapley value, need to be employed if one wishes to capture the complex effects of mixing different datasets on the accuracy of forecasting algorithms. This does not seem to be a coincidental observation related to a particular use case but rather a general trend across different use cases with different objective functions.

References

[1]
Advaneo. 2022. https://www.advaneo-datamarketplace.de/. Last accessed: Jun'22.
[2]
A. Agarwal, M. Dahleh, and T. Sarkar. 2019. A Marketplace for Data: An Algorithmic Solution. In Proc. of ACM EC'19.
[3]
H. Aly, J. Krumm, G. Ranade, and E. Horvitz. 2018. On the Value of Spatiotemporal Information: Principles and Scenarios. In Proc. of ACM SIGSPATIAL '18. ACM.
[4]
H. Aly, J. Krumm, G. Ranade, and E. Horvitz. 2019. To Buy or Not to Buy: Computing Value of Spatiotemporal Information. Proc. of ACM SIGSPATIAL '19.
[5]
L. Amichi, A. C. Viana, M. Crovella, and A.F. Loureiro. 2021. From Movement Purpose to Perceptive Spatial Mobility Prediction. In Proc. of SIGSPATIAL '21.
[6]
S. Andrés, C. Iordanou, and N. Laoutaris. 2021. What Is the Price of Data? A Measurement Study of Commercial Data Marketplaces. (2021).
[7]
S. Andrés and N. Laoutaris. 2020. Try Before You Buy: A practical data purchasing algorithm for real-world data marketplaces. arXiv:2012.08874
[8]
S. Andrés and N. Laoutaris. 2022. A Survey of Data Marketplaces and Their Business Models. SIGMOD Record (2022).
[9]
International Data Spaces Association. 2022. IDSA. The future of the data economy is here. https://internationaldataspaces.org/. Last accessed: Jun'22.
[10]
Y. Bachrach, E. Elkind, R. Meir, D. Pasechnik, M. Zuckerman, J. Rothe, and J. Rosenschein. 2009. The Cost of Stability in Coalitional Games. In Algorithmic Game Theory. Springer Berlin Heidelberg.
[11]
S. Cabello and T. M. Chan. 2022. Computing Shapley values in the plane. Discrete & Computational Geometry (2022).
[12]
Javier Castro, Daniel Gomez, and Juan Tejada. 2009. Polynomial calculation of the Shapley value based on sampling. Computers and Operations Research 36 (05 2009), 1726--1730.
[13]
L. Chen, P. Koutris, and A. Kumar. 2019. Towards Model-Based Pricing for Machine Learning in a Data Marketplace. In Proc. of SIGMOD'19. ACM.
[14]
C. F. Costa and M. A. Nascimento. 2021. Last Mile Delivery Considering Time-Dependent Locations. In Proc. of SIGSPATIAL'21. ACM.
[15]
S. S. Fatima, M. Wooldridge, and N. R. Jennings. 2008. A linear approximation method for the Shapley value. Artificial Intelligence.
[16]
R. C. Fernandez, P. Subramaniam, and M. J. Franklin. 2020. Data Market Platforms: Trading Data Assets to Solve Data Problems. Proc. VLDB Endow. 13, 12 (2020).
[17]
Gaia-X. 2022. https://gaia-x.eu/. Last accessed: Jun'22.
[18]
A. Ghorbani and J. Zou. 2019. Data Shapley: Equitable Valuation of Data for Machine Learning.
[19]
M. Gupta, J. Gao, C. C. Aggarwal, and J. Han. 2014. Outlier Detection for Temporal Data: A Survey. IEEE Transactions on Knowledge and Data Engineering 26, 9 (2014).
[20]
R. Jia, D. Dao, B. Wang, F. Hubis, N. Gurel, C. Zhang, C. Spanos, and D. Song. 2019. Efficient task-specific data valuation for nearest neighbor algorithms. Proc. of VLDB Endow. (2019).
[21]
J. Lanier. 2013. Who Owns the Future? SIMON and SCHUSTER.
[22]
N. Laoutaris. 2019. Why Online Services Should Pay You for Your Data? The Arguments for a Human-Centric Data Economy. IEEE Internet Computing (2019).
[23]
S. Mehta, M. Dawande, G. Janakiraman, and V. Mookerjee. 2019. How to Sell a Dataset?: Pricing Policies for Data Monetization. Information Systems Research.
[24]
K. Nguyen, J. Krumm, and C. Shahabi. 2020. Spatial Privacy Pricing: The Interplay between Privacy, Utility and Price in Geo-Marketplaces. In Proc. SIGSPATIAL '20.
[25]
K. Nguyen, J. Krumm, and C. Shahabi. 2021. Quantifying Intrinsic Value of Information of Trajectories. In Proc. of SIGSPATIAL'21 (Beijing, China). ACM.
[26]
O. Ohrimenko, S. Tople, and S. Tschiatschek. 2019. Collaborative Machine Learning Markets with Data-Replication-Robust Payments. (2019). arXiv:1911.09052
[27]
J. Pei. 2020. Data Pricing - From Economics to Data Science. In Proc. of ACM SIGKDD'20.
[28]
E. Posner and G. Weyl. 2018. Radical Markets. Uprooting Capitalism and Democracy for a Just Society. Princeton Univ. Press.
[29]
Ocean Protocol. 2022. https://oceanprotocol.com/. Last accessed: Jun'22.
[30]
B. Rozemberczki, L. Watson, P. Bayer, H. Yang, O. Kiss, S. Nilsson, and R. Sarkar. 2022. The Shapley Value in Machine Learning. arXiv:2202.05594
[31]
F. Schomm, F. Stahl, and G. Vossen. 2013. Marketplaces for Data: An Initial Survey. SIGMOD Record 42, 1 (May 2013), 12 pages.
[32]
L. S. Shapley. 1952. A Value for n-Person Games. (1952).
[33]
M. Spiekermann. 2019. Data Marketplaces: Trends and Monetisation of Data Goods. Intereconomics (2019).
[34]
F. Stahl, F. Schomm, L. Vomfell, and G. Vossen. 2017. Marketplaces for Digital Data: Quo Vadis? Computer and Information Science 10 (2017).
[35]
S. Touati, M. S. Radjef, and L. Sais. 2021. A Bayesian Monte Carlo method for computing the Shapley value: Application to weighted voting and bin packing games. Computers and Operations Research (2021).
[36]
T. van Campen, H. Hamers, B. Husslage, and R. Lindelauf. 2017. A new approximation method for the Shapley value applied to the WTC 9/11 terrorist attack. Social Network Analysis and Mining 8, 1 (02 Dec 2017).
[37]
X. Xu, A. Hannun, and L. Van Der Maaten. 2022. Data Appraisal Without Data Sharing (Proc. of Machine Learning Research).
[38]
L. Yan, H. Shen, Z. Li, A. Sarker, J. A. Stankovic, C. Qiu, J. Zhao, and C. Xu. 2018. Employing Opportunistic Charging for Electric Taxicabs to Reduce Idle Time. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. (2018).
[39]
T. Yan and A. D. Procaccia. 2021. If You Like Shapley Then You'll Love the Core. Proc. of the AAAI Conf, on Artificial Intelligence 35, 6 (May 2021), 5751--5759.
[40]
K. Zhao, S. H. Mahboobi, and S. Bagheri. 2018. Shapley Value Methods for Attribution Modeling in Online Advertising. (04 2018).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSPATIAL '22: Proceedings of the 30th International Conference on Advances in Geographic Information Systems
November 2022
806 pages
ISBN:9781450395298
DOI:10.1145/3557915
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 November 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data marketplaces
  2. shapley value
  3. spatio-temporal data
  4. value of information

Qualifiers

  • Research-article

Funding Sources

Conference

SIGSPATIAL '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 257 of 1,238 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 150
    Total Downloads
  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)2
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media