Abstract
Vertical federated learning enables multiple participants to build a joint machine learning model upon distributed features of overlapping samples. The performance of VFL models heavily depends on the quality of participants’ local data. It’s essential to measure the contributions of the participants for various purposes, e.g., participant selection and reward allocation. The Shapley value is widely adopted by previous works for contribution assessment. However, computing the Shapley value in VFL requires repetitive model training from scratch, incurring expensive computation and communication overheads. Inspired by this challenge, in this paper, we ask: can we efficiently and securely perform data valuation for participants via the Shapley value in VFL?
We call this problem Vertical Federated Data Valuation, and introduce VFDV-IM, a method utilizing an Inheritance Mechanism to expedite Shapley value calculations by leveraging historical training records. We first propose a simple, yet effective, strategy that directly inherits the model trained over the entire consortium. To further optimize VFDV-IM, we propose a model ensemble approach that measures the similarity of evaluated consortiums, based on which we reweight the historical models. We conduct extensive experiments on various datasets and show that our VFDV-IM can efficiently calculate the Shapley value while maintaining accuracy.
X. Zhou and X. Yan—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bonawitz, K., et al.: Practical secure aggregation for privacy-preserving machine learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191 (2017)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)
Chen, T., Jin, X., Sun, Y., Yin, W.: VAFL: a method of vertical asynchronous federated learning. arXiv preprint arXiv:2007.06081 (2020)
Fan, Z., Fang, H., Zhou, Z., Pei, J., Friedlander, M.P., Liu, C., Zhang, Y.: Improving fairness for data valuation in horizontal federated learning. In: 2022 IEEE 38th International Conference on Data Engineering, pp. 2440–2453. IEEE (2022)
Fan, Z., Fang, H., Zhou, Z., Pei, J., Friedlander, M.P., Zhang, Y.: Fair and efficient contribution valuation for vertical federated learning. arXiv preprint arXiv:2201.02658 (2022)
Fu, F., Miao, X., Jiang, J., Xue, H., Cui, B.: Towards communication-efficient vertical federated learning training via cache-enabled local updates. arXiv preprint arXiv:2207.14628 (2022)
Fu, F., Shao, Y., Yu, L., Jiang, J., Xue, H., Tao, Y., Cui, B.: VF2Boost: very fast vertical federated gradient boosting for cross-enterprise learning. In: Proceedings of the 2021 International Conference on Management of Data, pp. 563–576 (2021)
Han, X., Wang, L., Wu, J.: Data valuation for vertical federated learning: an information-theoretic approach. arXiv preprint arXiv:2112.08364 (2021)
Hardy, S., et al.: Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677 (2017)
Jiang, J., et al.: VF-PS: how to select important participants in vertical federated learning, efficiently and securely? In: Advances in Neural Information Processing Systems (2022)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Li, A., et al.: Efficient federated-learning model debugging. In: 2021 IEEE 37th International Conference on Data Engineering, pp. 372–383. IEEE (2021)
Liu, Z., Chen, Y., Yu, H., Liu, Y., Cui, L.: GTG-Shapley: efficient and accurate participant contribution evaluation in federated learning. ACM Trans. Intell. Syst. Technol. 13(4), 1–21 (2022)
Mohassel, P., Zhang, Y.: SecureML: a system for scalable privacy-preserving machine learning. In: 2017 IEEE Symposium on Security and Privacy, pp. 19–38. IEEE (2017)
Roth, A.E.: The Shapley Value: Essays in Honor of Lloyd S. Cambridge University Press, Shapley (1988)
Shokri, R., Shmatikov, V.: Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1310–1321 (2015)
Song, T., Tong, Y., Wei, S.: Profit allocation for federated learning. In: 2019 IEEE International Conference on Big Data, pp. 2577–2586. IEEE (2019)
Vepakomma, P., Gupta, O., Swedish, T., Raskar, R.: Split learning for health: distributed deep learning without sharing raw patient data. arXiv preprint arXiv:1812.00564 (2018)
Wang, G., Dang, C.X., Zhou, Z.: Measure contribution of participants in federated learning. In: 2019 IEEE International Conference on Big Data, pp. 2597–2604. IEEE (2019)
Wang, J., Zhang, L., Li, A., You, X., Cheng, H.: Efficient participant contribution evaluation for horizontal and vertical federated learning. In: 2022 IEEE 38th International Conference on Data Engineering, pp. 911–923. IEEE (2022)
Wang, T., Rausch, J., Zhang, C., Jia, R., Song, D.: A principled approach to data valuation for federated learning. In: Federated Learning: Privacy and Incentive, pp. 153–167 (2020)
Wu, Y., Cai, S., Xiao, X., Chen, G., Ooi, B.C.: Privacy preserving vertical federated learning for tree-based models. arXiv preprint arXiv:2008.06170 (2020)
Wu, Z., Li, Q., He, B.: A coupled design of exploiting record similarity for practical vertical federated learning. Adv. Neural. Inf. Process. Syst. 35, 21087–21100 (2022)
Xue, Y., et al.: Toward understanding the influence of individual clients in federated learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 10560–10567 (2021)
Yang, Z., et al.: OceanBase: a 707 million tpmC distributed relational database system. Proc. VLDB Endow. 15(12), 3385–3397 (2022)
Yang, Z., et al.: OceanBase Paetica: a hybrid shared-nothing/shared-everything database for supporting single machine and distributed cluster. Proc. VLDB Endow. 16(12), 3728–3740 (2023)
Zhang, J., Wu, Y., Pan, R.: Incentive mechanism for horizontal federated learning based on reputation and reverse auction. In: WWW 2021, p. 947–956. Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3442381.3449888
Zhang, Q., et al.: ASYSQN: faster vertical federated learning algorithms with better computation resource utilization. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3917–3927 (2021)
Zhao, J., Zhu, X., Wang, J., Xiao, J.: Efficient client contribution evaluation for horizontal federated learning. In: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3060–3064. IEEE (2021)
Acknowledgment
This work was sponsored by Key R&D Program of Hubei Province (No. 2023BAB077, No. 2023BAB170), and the Fundamental Research Funds for the Central Universities (No. 2042023kf0219). This work was supported by Ant Group through CCF-Ant Research Fund (CCF-AFSG RF20220001).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhou, X. et al. (2024). VFDV-IM: An Efficient and Securely Vertical Federated Data Valuation. In: Onizuka, M., et al. Database Systems for Advanced Applications. DASFAA 2024. Lecture Notes in Computer Science, vol 14850. Springer, Singapore. https://doi.org/10.1007/978-981-97-5552-3_28
Download citation
DOI: https://doi.org/10.1007/978-981-97-5552-3_28
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5551-6
Online ISBN: 978-981-97-5552-3
eBook Packages: Computer ScienceComputer Science (R0)