Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Incentive-Aware Decentralized Data Collaboration

Published: 20 June 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Data collaboration enables multiple parties to pool data for deriving meaningful data insights. However, data misuse and unlawful data collection have led to precautionary measures being imposed by individual organizations to guide against data leakage and abuse. As a response, decentralized federated learning (DFL) has emerged as an attractive paradigm to facilitate data collaboration while being amenable to privacy-preserving data and knowledge sharing, cost reduction, and prediction accuracy improvement. Unfortunately, the participating parties in DFL tend to be heterogeneous with skew datasets and uneven capabilities. Inevitably, training and transmission costs, and the presence of free-riders pose challenges to the adoption and participation of DFL. The absence of centralized parameter servers further exacerbates the problem of evaluating the contribution of each individual party. Therefore, an effective incentive mechanism is essential to promote data collaboration.
    In this paper, we propose a novel Incentive-aware Decentralized fEderated leArning (IDEA) framework for facilitating data collaboration. Specifically, we first design a customizable reward scheme for heterogeneous parties to optimize their respective objectives such as higher model accuracy, communication efficiency, and computational efficiency. To reward fairly to deserving parties while offering flexibility, we propose a novel multi-agent reinforcement learning (MARL) incentive mechanism, which enables heterogeneous parties to learn their own optimal collaboration policy. We then design an efficient decentralized data collaboration algorithm that supports the customizable reward scheme based on individual objective-specific collaboration policy. We theoretically prove that the algorithm achieves a Nash equilibrium, which ensures the fairness of the corresponding rewards for parties. We conduct extensive experiments to evaluate the performance of our proposed framework against four baselines on five real-world datasets. The results show that IDEA outperforms state-of-the-art methods in terms of effectiveness, efficiency, and accumulated reward.

    Supplemental Material

    MP4 File
    Presentation video for SIGMOD 2023

    References

    [1]
    Durmus Alp Emre Acar, Yue Zhao, Ramon Matas, Matthew Mattina, Paul Whatmough, and Venkatesh Saligrama. 2020. Federated Learning Based on Dynamic Regularization. In ICLR.
    [2]
    Ergute Bao, Yizheng Zhu, Xiaokui Xiao, Yin Yang, Beng Chin Ooi, Benjamin Tan, and Khin Mi Mi Aung. 2022. Skellam Mixture Mechanism: a Novel Approach to Federated Learning with Differential Privacy. Proc. VLDB Endow., Vol. 15, 11 (2022), 2348--2360.
    [3]
    Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. science, Vol. 286, 5439 (1999), 509--512.
    [4]
    Albert-László Barabási, Réka Albert, and Hawoong Jeong. 1999. Mean-field theory for scale-free random networks. Physica A: Statistical Mechanics and its Applications, Vol. 272, 1--2 (1999), 173--187.
    [5]
    Lawrence E Blume. 1993. The statistical mechanics of strategic interaction. Games and economic behavior, Vol. 5, 3 (1993), 387--424.
    [6]
    Christopher Briggs, Zhong Fan, and Peter Andras. 2020. Federated learning with hierarchical clustering of local updates to improve training on non-IID data. In 2020 International Joint Conference on Neural Networks (IJCNN). 1--9. https://doi.org/10.1109/IJCNN48605.2020.9207469
    [7]
    Zhenan Fan, Huang Fang, Zirui Zhou, Jian Pei, Michael P. Friedlander, Changxin Liu, and Yong Zhang. 2022. Improving Fairness for Data Valuation in Horizontal Federated Learning. In ICDE. 2440--2453.
    [8]
    Gabriele Farina, Andrea Celli, Alberto Marchesi, and Nicola Gatti. 2022. Simple uncoupled no-regret learning dynamics for extensive-form correlated equilibrium. J. ACM, Vol. 69, 6 (2022), 1--41.
    [9]
    Arlington M Fink. 1964. Equilibrium in a stochastic $ n $-person game. Journal of science of the hiroshima university, series ai (mathematics), Vol. 28, 1 (1964), 89--93.
    [10]
    Yann Fraboni, Richard Vidal, and Marco Lorenzi. 2021. Free-rider attacks on model aggregation in federated learning. In AISTATS. PMLR, 1846--1854.
    [11]
    Fangcheng Fu, Yingxia Shao, Lele Yu, Jiawei Jiang, Huanran Xue, Yangyu Tao, and Bin Cui. 2021. VF(^2 )Boost: Very Fast Vertical Federated Gradient Boosting for Cross-Enterprise Learning. In SIGMOD. 563--576.
    [12]
    Fangcheng Fu, Huanran Xue, Yong Cheng, Yangyu Tao, and Bin Cui. 2022. BlindFL: Vertical Federated Machine Learning without Peeking into Your Data. In SIGMOD. 1316--1330.
    [13]
    Jonas Geiping, Hartmut Bauermeister, Hannah Dröge, and Michael Moeller. 2020. Inverting gradients-how easy is it to break privacy in federated learning? NeurIPS, Vol. 33 (2020), 16937--16947.
    [14]
    Sergiu Hart. 1992. Games in extensive and strategic forms. Handbook of game theory with economic applications, Vol. 1 (1992), 19--40.
    [15]
    István HegedHu s, Gábor Danner, and Márk Jelasity. 2019. Gossip learning as a decentralized alternative to federated learning. In DAIS. Springer, 74--90.
    [16]
    Junling Hu and Michael P Wellman. 2003. Nash Q-learning for general-sum stochastic games. Journal of machine learning research, Vol. 4, Nov (2003), 1039--1069.
    [17]
    Tommi Jaakkola, Michael Jordan, and Satinder Singh. 1993. Convergence of stochastic iterative dynamic programming algorithms. Advances in neural information processing systems, Vol. 6 (1993).
    [18]
    Xianyan Jia, Shutao Song, Wei He, Yangzihao Wang, Haidong Rong, Feihu Zhou, Liqiang Xie, Zhenyu Guo, Yuanzhou Yang, Liwei Yu, et al. 2018. Highly scalable deep learning training system with mixed-precision: Training imagenet in four minutes. arXiv preprint arXiv:1807.11205 (2018).
    [19]
    Zhanhong Jiang, Aditya Balu, Chinmay Hegde, and Soumik Sarkar. 2017. Collaborative deep learning in fixed topology networks. Advances in Neural Information Processing Systems, Vol. 30 (2017).
    [20]
    Jiawen Kang, Zehui Xiong, Dusit Niyato, Yuze Zou, Yang Zhang, and Mohsen Guizani. 2020. Reliable federated learning for mobile networks. IEEE Wireless Communications, Vol. 27, 2 (2020), 72--80.
    [21]
    Daphne Koller and Avi Pfeffer. 1997. Representations and solutions for game-theoretic problems. Artificial intelligence, Vol. 94, 1--2 (1997), 167--215.
    [22]
    Anastasia Koloskova, Sebastian Stich, and Martin Jaggi. 2019. Decentralized stochastic optimization and gossip algorithms with compressed communication. In ICML. PMLR, 3478--3487.
    [23]
    Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. (2009), 32--33. https://www.cs.toronto.edu/ kriz/learning-features-2009-TR.pdf
    [24]
    Chengxi Li, Gang Li, and Pramod K Varshney. 2021. Decentralized federated learning via mutual knowledge transfer. IEEE Internet of Things Journal, Vol. 9, 2 (2021), 1136--1147.
    [25]
    Qinbin Li, Yiqun Diao, Quan Chen, and Bingsheng He. 2022. Federated Learning on Non-IID Data Silos: An Experimental Study. In ICDE. 965--978.
    [26]
    Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. 2020. Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, Vol. 37, 3 (2020), 50--60.
    [27]
    Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jui Hsieh, Wei Zhang, and Ji Liu. 2017a. Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent. NIPS, Vol. 30 (2017).
    [28]
    Xiangru Lian, Ce Zhang, Huan Zhang, Cho Jui Hsieh, Wei Zhang, and Ji Liu. 2017b. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent. (2017).
    [29]
    Wei Yang Bryan Lim, Zehui Xiong, Chunyan Miao, Dusit Niyato, Qiang Yang, Cyril Leung, and H Vincent Poor. 2020. Hierarchical incentive mechanism design for federated machine learning in mobile networks. IEEE Internet of Things Journal, Vol. 7, 10 (2020), 9575--9588.
    [30]
    Yejia Liu, Weiyuan Wu, Lampros Flokas, Jiannan Wang, and Eugene Wu. 2021. Enabling SQL-based Training Data Debugging for Federated Learning. Proc. VLDB Endow., Vol. 15, 3 (2021), 388--400.
    [31]
    Songtao Lu, Yawen Zhang, and Yunlong Wang. 2020. Decentralized Federated Learning for Electronic Health Records. In 2020 54th Annual Conference on Information Sciences and Systems (CISS). 1--5. https://doi.org/10.1109/CISS48834.2020.1570617414
    [32]
    Xinjian Luo, Yuncheng Wu, Xiaokui Xiao, and Beng Chin Ooi. 2021. Feature Inference Attack on Model Predictions in Vertical Federated Learning. In ICDE. 181--192.
    [33]
    Shuaicheng Ma, Yang Cao, and Li Xiong. 2021. Transparent contribution evaluation for secure federated learning on blockchain. In 2021 IEEE 37th International Conference on Data Engineering Workshops (ICDEW). IEEE, 88--91.
    [34]
    Viraaji Mothukuri, Prachi Khare, Reza M. Parizi, Seyedamin Pouriyeh, Ali Dehghantanha, and Gautam Srivastava. 2022. Federated-Learning-Based Anomaly Detection for IoT Security Attacks. IEEE Internet Things J., Vol. 9, 4 (2022), 2545--2554.
    [35]
    Dinh C Nguyen, Quoc-Viet Pham, Pubudu N Pathirana, Ming Ding, Aruna Seneviratne, Zihuai Lin, Octavia Dobre, and Won-Joo Hwang. 2022. Federated learning for smart healthcare: A survey. ACM Computing Surveys (CSUR), Vol. 55, 3 (2022), 1--37.
    [36]
    Noa Onoszko, Gustav Karlsson, Olof Mogren, and Edvin Listo Zec. 2021. Decentralized federated learning of deep neural networks on non-iid data. arXiv preprint arXiv:2107.08517 (2021).
    [37]
    Beng Chin Ooi, Gang Chen, Mike Zheng Shou, Kian-Lee Tan, Anthony Tung, Xiaokui Xiao, James Wei Luen Yip, Bingxue Zhang, and Meihui Zhang. 2023. The Metaverse Data Deluge: What Can We Do About It?. In ICDE.
    [38]
    Beng Chin Ooi, Kian-Lee Tan, Sheng Wang, Wei Wang, Qingchao Cai, Gang Chen, Jinyang Gao, Zhaojing Luo, Anthony K. H. Tung, Yuan Wang, Zhongle Xie, Meihui Zhang, and Kaiping Zheng. 2015. SINGA: A Distributed Deep Learning Platform. In Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15. 685--688.
    [39]
    Shashi Raj Pandey, Nguyen H Tran, Mehdi Bennis, Yan Kyaw Tun, Zhu Han, and Choong Seon Hong. 2019. Incentivize to build: A crowdsourcing framework for federated learning. In 2019 IEEE Global Communications Conference (GLOBECOM). IEEE, 1--6.
    [40]
    UCI Machine Learning Repository. 2017a. Adult dataset. In https://archive.ics.uci.edu/ml/datasets/Adult.
    [41]
    UCI Machine Learning Repository. 2017b. Connect-4 dataset. In http://archive.ics.uci.edu/ml/datasets/connect-4.
    [42]
    UCI Machine Learning Repository. 2017c. Sensorless drive diagnosis dataset. In https://archive.ics.uci.edu/ml/datasets/datasetforsensorlessdrivediagnosis.
    [43]
    Nicola Rieke, Jonny Hancox, Wenqi Li, Fausto Milletari, Holger Roth, Shadi Albarqouni, Spyridon Bakas, Mathieu N. Galtier, Bennett A. Landman, Klaus H. Maier-Hein, Sé bastien Ourselin, Micah J. Sheller, Ronald M. Summers, Andrew Trask, Daguang Xu, Maximilian Baust, and M. Jorge Cardoso. 2020. The Future of Digital Health with Federated Learning. CoRR, Vol. abs/2003.08119 (2020).
    [44]
    Abhijit Guha Roy, Shayan Siddiqui, Sebastian Pölsterl, Nassir Navab, and Christian Wachinger. 2019. Braintorrent: A peer-to-peer environment for decentralized federated learning. arXiv preprint arXiv:1905.06731 (2019).
    [45]
    Stefano Savazzi, Monica Nicoli, and Vittorio Rampa. 2020a. Federated learning with cooperating devices: A consensus approach for massive IoT networks. IEEE Internet of Things Journal, Vol. 7, 5 (2020), 4641--4654.
    [46]
    Stefano Savazzi, Monica Nicoli, Vittorio Rampa, and Sanaz Kianoush. 2020b. Federated Learning with Mutually Cooperating Devices: A Consensus Approach Towards Server-Less Model Optimization. In ICASSP. 3937--3941. https://doi.org/10.1109/ICASSP40776.2020.9054055
    [47]
    Rachael Hwee Ling Sim, Yehong Zhang, Mun Choon Chan, and Bryan Kian Hsiang Low. 2020. Collaborative machine learning with incentive-aware model rewards. In ICML. PMLR, 8927--8936.
    [48]
    Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
    [49]
    Gerald Tesauro et al. 1995. Temporal difference learning and TD-Gammon. Commun. ACM, Vol. 38, 3 (1995), 58--68.
    [50]
    Xuezhen Tu, Kun Zhu, Nguyen Cong Luong, Dusit Niyato, Yang Zhang, and Juan Li. 2022. Incentive mechanisms for federated learning: From economic and game theoretic perspective. IEEE Transactions on Cognitive Communications and Networking (2022).
    [51]
    Muhammad Habib ur Rehman, Khaled Salah, Ernesto Damiani, and Davor Svetinovic. 2020. Towards blockchain-based reputation-aware federated learning. In INFOCOM Workshops. IEEE, 183--188.
    [52]
    Paul Voigt and Axel von dem Bussche. 2017. The EU General Data Protection Regulation (GDPR): A Practical Guide 1st ed.). Springer Publishing Company, Incorporated.
    [53]
    Han Wang, Luis Mu n oz-Gonzá lez, David Eklund, and Shahid Raza. 2021. Non-IID data re-balancing at IoT edge with peer-to-peer federated learning for anomaly detection. In WiSec. 153--163.
    [54]
    Jianyu Wang and Gauri Joshi. 2021. Cooperative sgd: A unified framework for the design and analysis of local-update sgd algorithms. Journal of Machine Learning Research, Vol. 22, 213 (2021), 1--50.
    [55]
    Jiayi Wang, Shiqiang Wang, Rong-Rong Chen, and Mingyue Ji. 2020. Local averaging helps: Hierarchical federated learning and convergence analysis. arXiv preprint arXiv:2010.12998 (2020).
    [56]
    Jiasi Weng, Jian Weng, Hongwei Huang, Chengjun Cai, and Cong Wang. 2021. Fedserving: A federated prediction serving framework based on incentive mechanism. In IEEE INFOCOM 2021-IEEE Conference on Computer Communications. IEEE, 1--10.
    [57]
    Yuncheng Wu, Shaofeng Cai, Xiaokui Xiao, Gang Chen, and Beng Chin Ooi. 2020. Privacy Preserving Vertical Federated Learning for Tree-based Models. Proc. VLDB Endow., Vol. 13, 11 (2020), 2090--2103.
    [58]
    Jiancheng Yang, Rui Shi, and Bingbing Ni. 2021. MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis. In IEEE 18th International Symposium on Biomedical Imaging (ISBI). 191--195.
    [59]
    Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, and Jun Wang. 2018. Mean field multi-agent reinforcement learning. In ICML. PMLR, 5571--5580.
    [60]
    Zhaohui Yang, Mingzhe Chen, Walid Saad, Choong Seon Hong, Mohammad Shikh-Bahaei, H Vincent Poor, and Shuguang Cui. 2020. Delay minimization for federated learning over wireless communication networks. arXiv preprint arXiv:2007.03462 (2020).
    [61]
    Han Yu, Zelei Liu, Yang Liu, Tianjian Chen, Mingshu Cong, Xi Weng, Dusit Niyato, and Qiang Yang. 2020. A sustainable incentive scheme for federated learning. IEEE Intelligent Systems, Vol. 35, 4 (2020), 58--69.
    [62]
    Mikhail Yurochkin, Mayank Agarwal, Soumya Ghosh, Kristjan Greenewald, Nghia Hoang, and Yasaman Khazaeni. 2019. Bayesian nonparametric federated learning of neural networks. In ICML. PMLR, 7252--7261.
    [63]
    Rongfei Zeng, Shixun Zhang, Jiaqi Wang, and Xiaowen Chu. 2020. Fmore: An incentive scheme of multi-dimensional auction for federated learning in mec. In ICDCS. IEEE, 278--288.
    [64]
    Yufeng Zhan, Peng Li, Zhihao Qu, Deze Zeng, and Song Guo. 2020. A learning-based incentive mechanism for federated learning. IEEE Internet of Things Journal, Vol. 7, 7 (2020), 6360--6368.
    [65]
    Yufeng Zhan, Jiang Zhang, Peng Li, and Yuanqing Xia. 2019. Crowdtraining: Architecture and incentive mechanism for deep learning training in the internet of things. IEEE Network, Vol. 33, 5 (2019), 89--95.
    [66]
    Chenhan Zhang, Shuyu Zhang, JQ James, and Shui Yu. 2021c. FASTGNN: A topological information protected federated learning approach for traffic speed forecasting. IEEE Transactions on Industrial Informatics, Vol. 17, 12 (2021), 8464--8474.
    [67]
    Jiale Zhang, Junjun Chen, Di Wu, Bing Chen, and Shui Yu. 2019. Poisoning attack in federated learning using generative adversarial nets. In TrustCom/BigDataSE. IEEE, 374--380.
    [68]
    Kaiqing Zhang, Zhuoran Yang, and Tamer Bacs ar. 2021b. Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control (2021), 321--384.
    [69]
    Zhebin Zhang, Dajie Dong, Yuhang Ma, Yilong Ying, Dawei Jiang, Ke Chen, Lidan Shou, and Gang Chen. 2021a. Refiner: A Reliable Incentive-Driven Federated Learning System Powered by Blockchain. Proc. VLDB Endow., Vol. 14, 12 (2021), 2659--2662.
    [70]
    Kaiping Zheng, Shaofeng Cai, Horng Ruey Chua, Melanie Herschel, Meihui Zhang, and Beng Chin Ooi. 2022. DyHealth: Making Neural Networks Dynamic for Effective Healthcare Analytics. Proc. VLDB Endow., Vol. 15, 12 (2022), 3445--3458.
    [71]
    Wenbo Zheng, Lan Yan, Chao Gou, and Fei-Yue Wang. 2020. Federated Meta-Learning for Fraudulent Credit Card Detection. In IJCAI. 4654--4660.
    [72]
    Pan Zhou, Qian Lin, Dumitrel Loghin, Beng Chin Ooi, Yuncheng Wu, and Hongfang Yu. 2021. Communication-efficient decentralized machine learning over heterogeneous networks. In ICDE. IEEE, 384--395.

    Cited By

    View all
    • (2024)Secure and Verifiable Data Collaboration with Low-Cost Zero-Knowledge ProofsProceedings of the VLDB Endowment10.14778/3665844.366586017:9(2321-2334)Online publication date: 1-May-2024
    • (2024)DIDS: Double Indices and Double Summarizations for Fast Similarity SearchProceedings of the VLDB Endowment10.14778/3665844.366585117:9(2198-2211)Online publication date: 1-May-2024
    • (2024)CIVET: Exploring Compact Index for Variable-Length Subsequence Matching on Time SeriesProceedings of the VLDB Endowment10.14778/3665844.366584517:9(2123-2135)Online publication date: 1-May-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Management of Data
    Proceedings of the ACM on Management of Data  Volume 1, Issue 2
    PACMMOD
    June 2023
    2310 pages
    EISSN:2836-6573
    DOI:10.1145/3605748
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 June 2023
    Published in PACMMOD Volume 1, Issue 2

    Author Tags

    1. data collaboration
    2. decentralized learning
    3. incentive mechanism

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)524
    • Downloads (Last 6 weeks)35
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Secure and Verifiable Data Collaboration with Low-Cost Zero-Knowledge ProofsProceedings of the VLDB Endowment10.14778/3665844.366586017:9(2321-2334)Online publication date: 1-May-2024
    • (2024)DIDS: Double Indices and Double Summarizations for Fast Similarity SearchProceedings of the VLDB Endowment10.14778/3665844.366585117:9(2198-2211)Online publication date: 1-May-2024
    • (2024)CIVET: Exploring Compact Index for Variable-Length Subsequence Matching on Time SeriesProceedings of the VLDB Endowment10.14778/3665844.366584517:9(2123-2135)Online publication date: 1-May-2024
    • (2024)Visualization-Aware Time Series Min-Max Caching with Error Bound GuaranteesProceedings of the VLDB Endowment10.14778/3659437.365946017:8(2091-2103)Online publication date: 31-May-2024
    • (2024)Performance-Based Pricing for Federated Learning via AuctionProceedings of the VLDB Endowment10.14778/3648160.364816917:6(1269-1282)Online publication date: 3-May-2024
    • (2024)Hybrid Prompt Learning for Generating Justifications of Security Risks in Automation RulesACM Transactions on Intelligent Systems and Technology10.1145/3675401Online publication date: 29-Jun-2024
    • (2024)Databases in Edge and Fog Environments: A SurveyACM Computing Surveys10.1145/366600156:11(1-40)Online publication date: 8-Jul-2024
    • (2024)RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor SearchProceedings of the ACM on Management of Data10.1145/36549702:3(1-27)Online publication date: 30-May-2024
    • (2024)Convolution and Cross-Correlation of Count Sketches Enables Fast Cardinality Estimation of Multi-Join QueriesProceedings of the ACM on Management of Data10.1145/36549322:3(1-26)Online publication date: 30-May-2024
    • (2024)Time Series Representation for Visualization in Apache IoTDBProceedings of the ACM on Management of Data10.1145/36392902:1(1-26)Online publication date: 26-Mar-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media