research-article

Toward Few-Label Vertical Federated Learning

Authors:

Zibin Zheng, and

Chuan ChenAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data, Volume 18, Issue 7

Article No.: 176, Pages 1 - 21

https://doi.org/10.1145/3656344

Published: 19 June 2024 Publication History

Abstract

Federated Learning (FL) provides a novel paradigm for privacy-preserving machine learning, enabling multiple clients to collaborate on model training without sharing private data. To handle multi-source heterogeneous data, Vertical Federated Learning (VFL) has been extensively investigated. However, in the context of VFL, the label information tends to be kept in one authoritative client and is very limited. This poses two challenges for model training in the VFL scenario. On the one hand, a small number of labels cannot guarantee to train a well VFL model with informative network parameters, resulting in unclear boundaries for classification decisions. On the other hand, the large amount of unlabeled data is dominant and should not be discounted, and it is worthwhile to focus on how to leverage them to improve representation modeling capabilities. To address the preceding two challenges, we first introduce supervised contrastive loss to enhance the intra-class aggregation and inter-class estrangement, which is to deeply explore label information and improve the effectiveness of downstream classification tasks. Then, for unlabeled data, we introduce a pseudo-label-guided consistency mechanism to induce the classification results coherent across clients, which allows the representations learned by local networks to absorb the knowledge from other clients, and alleviates the disagreement between different clients for classification tasks. We conduct sufficient experiments on four commonly used datasets, and the experimental results demonstrate that our method is superior to the state-of-the-art methods, especially in the low-label rate scenario, and the improvement becomes more significant.

References

[1]

Manoj Ghuhan Arivazhagan, Vinay Aggarwal, Aaditya Kumar Singh, and Sunav Choudhary. 2019. Federated learning with personalization layers. arXiv preprint arXiv:1912.00818 (2019).

[2]

Arthur Asuncion and David Newman. 2007. UCI Machine Learning Repository. Retrieved April 15, 2024 from https://archive.ics.uci.edu

[3]

Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory. 92–100.

Digital Library

[4]

Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konečnỳ, Stefano Mazzocchi, Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, and Jason Roselander. 2019. Towards federated learning at scale: System design. Proceedings of Machine Learning and Systems 1 (2019), 374–388.

[5]

Akin Caliskan, Armin Mustafa, Evren Imre, and Adrian Hilton. 2020. Multi-view consistency loss for improved single-image 3D reconstruction of clothed people. In Proceedings of the Asian Conference on Computer Vision.

[6]

Tianyi Chen, Xiao Jin, Yuejiao Sun, and Wotao Yin. 2020. VAFL: A method of vertical asynchronous federated learning. arXiv preprint arXiv:2007.06081 (2020).

[7]

Liam Collins, Hamed Hassani, Aryan Mokhtari, and Sanjay Shakkottai. 2021. Exploiting shared representations for personalized federated learning. In Proceedings of the International Conference on Machine Learning. 2089–2099.

[8]

Cynthia Dwork. 2006. Differential Privacy. In Automata, Langages and Programming. Lecture Notes in Computer Science, Vol. 4052. Springer, 1–12.

[9]

Farzan Farnia, Amirhossein Reisizadeh, Ramtin Pedarsani, and Ali Jadbabaie. 2022. An optimal transport approach to personalized federated learning. IEEE Journal on Selected Areas in Information Theory 3, 2 (2022), 162–171.

[10]

Siwei Feng and Han Yu. 2020. Multi-participant multi-class vertical federated learning. arXiv preprint arXiv:2001.11154 (2020).

[11]

Mingfei Gao, Zizhao Zhang, Guo Yu, Sercan Ö Arık, Larry S. Davis, and Tomas Pfister. 2020. Consistency-based semi-supervised active learning: Towards minimizing labeling cost. In Proceedings of the 16th European Conference on Computer vision (ECCV’20). 510–526.

Digital Library

[12]

Avishek Ghosh, Jichan Chung, Dong Yin, and Kannan Ramchandran. 2020. An efficient framework for clustered federated learning. Advances in Neural Information Processing Systems 33 (2020), 19586–19597.

[13]

Chen Gong, Zhenzhe Zheng, Fan Wu, Yunfeng Shao, Bingshuai Li, and Guihai Chen. 2023. To store or not? Online data selection for federated learning with limited storage. In Proceedings of the ACM Web Conference 2023. 3044–3055.

Digital Library

[14]

Michael Gutmann and Aapo Hyvärinen. 2010. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. 297–304.

[15]

Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, Richard Nock, Giorgio Patrini, Guillaume Smith, and Brian Thorne. 2017. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677 (2017).

[16]

Nakamasa Inoue and Keita Goto. 2020. Semi-supervised contrastive learning with generalized contrastive loss and its application to speaker recognition. In Proceedings of the 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC’20). 1641–1646.

[17]

Antoine Jamin and Anne Humeau-Heurtier. 2019. (Multiscale) cross-entropy methods: A review. Entropy 22, 1 (2019), 45.

[18]

Xiao Jin, Pin-Yu Chen, Chia-Yi Hsu, Chia-Mu Yu, and Tianyi Chen. 2021. CAFE: Catastrophic data leakage in vertical federated learning. Advances in Neural Information Processing Systems 34 (2021), 994–1006.

[19]

Yan Kang, Yang Liu, and Tianjian Chen. 2020. FedMVT: Semi-supervised vertical federated learning with multiview training. arXiv preprint arXiv:2008.10838 (2020).

[20]

Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian Stich, and Ananda Theertha Suresh. 2020. SCAFFOLD: Stochastic controlled averaging for federated learning. In Proceedings of the 37th International Conference on Machine Learning, Hal Daumé III and Aarti Singh (Eds.).Proceedings of Machine Learning Research, Vol. 119, Hal Daumé III and Aarti Singh (Eds.). PMLR, 5132–5143.

[21]

Fei-Fei Li, Marco Andreeto, Marc’Aurelio Ranzato, and Pietro Perona. 2022. Caltech 101 [Data Set]. CaltechDATA. DOI:

[22]

Junnan Li, Caiming Xiong, and Steven C. H. Hoi. 2021. CoMatch: Semi-supervised learning with contrastive graph regularization. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9475–9484.

[23]

Qinbin Li, Bingsheng He, and Dawn Song. 2021. Model-contrastive federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 10713–10722.

[24]

Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. 2020. Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems 2 (2020), 429–450.

[25]

Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, and Zhihua Zhang. 2019. On the convergence of FedAvg on non-IID data. arXiv preprint arXiv:1907.02189 (2019).

[26]

Youwei Liang, Dong Huang, and Chang-Dong Wang. 2019. Consistency meets inconsistency: A unified graph learning framework for multi-view clustering. In Proceedings of the 2019 IEEE International Conference on Data Mining (ICDM’19). IEEE, 1204–1209.

[27]

Tao Lin, Lingjing Kong, Sebastian U. Stich, and Martin Jaggi. 2020. Ensemble distillation for robust model fusion in federated learning. Advances in Neural Information Processing Systems 33 (2020), 2351–2363.

[28]

Yang Liu, Yan Kang, Tianyuan Zou, Yanhong Pu, Yuanqin He, Xiaozhou Ye, Ye Ouyang, Ya-Qin Zhang, and Qiang Yang. 2022. Vertical federated learning. arXiv preprint arXiv:2211.12814 (2022).

[29]

Xinjian Luo, Yuncheng Wu, Xiaokui Xiao, and Beng Chin Ooi. 2021. Feature inference attack on model predictions in vertical federated learning. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering (ICDE’21). IEEE, 181–192.

[30]

Xiaosong Ma, Jie Zhang, Song Guo, and Wenchao Xu. 2022. Layer-wised model aggregation for personalized federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10092–10101.

[31]

Shie Mannor, Dori Peleg, and Reuven Rubinstein. 2005. The cross entropy method for classification. In Proceedings of the 22nd International Conference on Machine Learning. 561–568.

Digital Library

[32]

David Mickisch, Felix Assion, Florens Greßner, Wiebke Günther, and Mariele Motta. 2020. Understanding the decision boundary of deep neural networks: An empirical study. arXiv preprint arXiv:2002.01810 (2020).

[33]

Ion Muslea, Steven Minton, and Craig A. Knoblock. 2006. Active learning with multiple views. Journal of Artificial Intelligence Research 27 (2006), 203–233.

Digital Library

[34]

Jinlong Pang, Jieling Yu, Ruiting Zhou, and John C. S. Lui. 2022. An incentive auction for heterogeneous client selection in federated learning. IEEE Transactions on Mobile Computing. Published Online, June 14, 2022.

[35]

Krishna Pillutla, Kshitiz Malik, Abdel-Rahman Mohamed, Mike Rabbat, Maziar Sanjabi, and Lin Xiao. 2022. Federated learning with partial model personalization. In Proceedings of the International Conference on Machine Learning. 17716–17758.

[36]

Protection Regulation. 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council. Regulation (EU) 679 (2016), 2016.

[37]

Daniele Romanini, Adam James Hall, Pavlos Papadopoulos, Tom Titcombe, Abbas Ismail, Tudor Cebere, Robert Sandmann, Robin Roehm, and Michael A. Hoeh. 2021. PyVertical: A vertical federated learning framework for multi-headed SplitNN. arXiv preprint arXiv:2104.00489 (2021).

[38]

Theo Ryffel, Andrew Trask, Morten Dahl, Bobby Wagner, Jason Mancuso, Daniel Rueckert, and Jonathan Passerat-Palmbach. 2018. A generic framework for privacy preserving deep learning. arXiv preprint arXiv:1811.04017 (2018).

[39]

Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, and Phillip Isola. 2020. What makes for good views for contrastive learning? Advances in Neural Information Processing Systems 33 (2020), 6827–6839.

[40]

Shubham Tulsiani, Alexei A. Efros, and Jitendra Malik. 2018. Multi-view consistency as supervisory signal for learning shape and pose prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2897–2905.

[41]

Jesper E. Van Engelen and Holger H. Hoos. 2020. A survey on semi-supervised learning. Machine Learning 109, 2 (2020), 373–440.

[42]

Cédric Villani. 2009. Optimal Transport: Old and New. Grundlehren der mathematischen Wissenschaften, Vol. 338. Springer.

[43]

Hao Wang, Zakhary Kaplan, Di Niu, and Baochun Li. 2020. Optimizing federated learning on non-IID data with reinforcement learning. In Proceedings of the IEEE Conference on Computer Communications(INFOCOM’20). IEEE, 1698–1707.

Digital Library

[44]

Jianyu Wang, Qinghua Liu, Hao Liang, Gauri Joshi, and H. Vincent Poor. 2020. Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in Neural Information Processing Systems 33 (2020), 7611–7623.

[45]

Kang Wei, Jun Li, Ming Ding, Chuan Ma, Howard H. Yang, Farhad Farokhi, Shi Jin, Tony Q. S. Quek, and H. Vincent Poor. 2020. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Transactions on Information Forensics and Security 15 (2020), 3454–3469.

Digital Library

[46]

Kang Wei, Jun Li, Chuan Ma, Ming Ding, Sha Wei, Fan Wu, Guihai Chen, and Thilina Ranbaduge. 2022. Vertical federated learning: Challenges, methodologies and experiments. arXiv preprint arXiv:2202.04309 (2022).

[47]

Haiqin Weng, Juntao Zhang, Feng Xue, Tao Wei, Shouling Ji, and Zhiyuan Zong. 2020. Privacy leakage of real-world vertical federated learning. arXiv preprint arXiv:2011.09290 (2020).

[48]

Yuncheng Wu, Shaofeng Cai, Xiaokui Xiao, Gang Chen, and Beng Chin Ooi. 2020. Privacy preserving vertical federated learning for tree-based models. arXiv preprint arXiv:2008.06170 (2020).

[49]

Zhaomin Wu, Qinbin Li, and Bingsheng He. 2022. Practical vertical federated learning with unsupervised representation learning. IEEE Transactions on Big Data. Published Online, June 6, 2022.

[50]

Fan Yang, Kai Wu, Shuyi Zhang, Guannan Jiang, Yong Liu, Feng Zheng, Wei Zhang, Chengjie Wang, and Long Zeng. 2022. Class-aware contrastive semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14421–14430.

[51]

J. H. Yang, C. Chen, H. N. Dai, M. Ding, L. L. Fu, and Z. B. Zheng. 2022. Hierarchical representation for multi-view clustering: From intra-sample to intra-view to inter-view. In Proceedings of the Conference on Information and Knowledge Management.

Digital Library

[52]

J. H. Yang, C. Chen, H. N. Dai, M. Ding, Z. B. Wu, and Z. B. Zheng. 2022. Robust corrupted data recovery and clustering via generalized transformed tensor low-rank representation. IEEE Transactions on Neural Networks and Learning Systems. Early Access, November 3, 2022. DOI: 10.1109/TNNLS.2022.3215983.

[53]

J. H. Yang, C. Chen, H. N. Dai, L. L. Fu, and Z. B. Zheng. 2022. A structure noise-aware tensor dictionary learning method for high-dimensional data clustering. Information Sciences 612 (2022), 87–106.

Digital Library

[54]

J. H. Yang, L. L. Fu, C. Chen, H. N. Dai, and Z. B. Zheng. 2023. Cross-view graph matching for incomplete multi-view clustering. Neurocomputing 515 (2023), 79–88.

Digital Library

[55]

Lei Yang, Jiaming Huang, Wanyu Lin, and Jiannong Cao. 2023. Personalized federated learning on non-IID data via group-based meta-learning. ACM Transactions on Knowledge Discovery from Data 17, 4 (2023), Article 49, 20 pages.

Digital Library

[56]

Xihong Yang, Xiaochang Hu, Sihang Zhou, Xinwang Liu, and En Zhu. 2022. Interpolation-based contrastive learning for few-label semi-supervised learning. IEEE Transactions on Neural Networks and Learning Systems. Published Online, July 7, 2022.

[57]

Chenhao Ying, Haiming Jin, Xudong Wang, and Yuan Luo. 2020. Double insurance: Incentivized federated learning with differential privacy in mobile crowdsensing. In Proceedings of the 2020 International Symposium on Reliable Distributed Systems (SRDS’20). IEEE, 81–90.

[58]

Chunjie Zhang, Jian Cheng, and Qi Tian. 2019. Multi-view image classification with visual, semantic and view consistency. IEEE Transactions on Image Processing 29 (2019), 617–627.

Digital Library

[59]

Chen Zhang, Yu Xie, Hang Bai, Bin Yu, Weihong Li, and Yuan Gao. 2021. A survey on federated learning. Knowledge-Based Systems 216 (2021), 106775.

[60]

Qingsong Zhang, Bin Gu, Cheng Deng, Songxiang Gu, Liefeng Bo, Jian Pei, and Heng Huang. 2021. AsySQN: Faster vertical federated learning algorithms with better computation resource utilization. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3917–3927.

Digital Library

[61]

Yuhang Zhang, Xiaopeng Zhang, Jie Li, Robert Qiu, Haohang Xu, and Qi Tian. 2022. Semi-supervised contrastive learning with similarity co-calibration. IEEE Transactions on Multimedia 25 (2022), 1749–1759. DOI:

Digital Library

Cited By

Duong HPham HTruong TFournier-Viger P(2024)Efficient algorithms to mine concise representations of frequent high utility occupancy patternsApplied Intelligence10.1007/s10489-024-05296-254:5(4012-4042)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1007/s10489-024-05296-2

Index Terms

Toward Few-Label Vertical Federated Learning
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Classification and regression trees
      2. Neural networks
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Transductive Multilabel Learning via Label Set Propagation

The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Read More
Deep semi-supervised learning with contrastive learning and partial label propagation for image data
Abstract
Deep semi-supervised learning is becoming an active research topic because it jointly utilizes labeled and unlabeled samples in training deep neural networks. Recent advances are mainly focused on inductive semi-supervised learning ...
Read More
Semi-supervised partial label learning algorithm via reliable label propagation
Abstract
Partial label learning (PLL) is a weakly supervised learning method that is able to predict one label as the correct answer from a given candidate label set. In PLL, when all possible candidate labels are as signed to real-world training examples, ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 18, Issue 7

August 2024

505 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/3613689

Editor:
Jian Pei
Duke University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2024

Online AM: 09 April 2024

Accepted: 31 March 2024

Revised: 24 January 2024

Received: 01 April 2023

Published in TKDD Volume 18, Issue 7

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the National Key Research and Development Program of China
the National Natural Science Foundation of China
the Guangzhou Science and Technology Program
the Natural Science Foundation of Sichuan Province
Postdoctoral Fellowship Program of CPSF

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
187
Total Downloads

Downloads (Last 12 months)187
Downloads (Last 6 weeks)35

Other Metrics

View Author Metrics

Citations

Cited By

Duong HPham HTruong TFournier-Viger P(2024)Efficient algorithms to mine concise representations of frequent high utility occupancy patternsApplied Intelligence10.1007/s10489-024-05296-254:5(4012-4042)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1007/s10489-024-05296-2

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents