Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Toward Few-Label Vertical Federated Learning

Published: 19 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Federated Learning (FL) provides a novel paradigm for privacy-preserving machine learning, enabling multiple clients to collaborate on model training without sharing private data. To handle multi-source heterogeneous data, Vertical Federated Learning (VFL) has been extensively investigated. However, in the context of VFL, the label information tends to be kept in one authoritative client and is very limited. This poses two challenges for model training in the VFL scenario. On the one hand, a small number of labels cannot guarantee to train a well VFL model with informative network parameters, resulting in unclear boundaries for classification decisions. On the other hand, the large amount of unlabeled data is dominant and should not be discounted, and it is worthwhile to focus on how to leverage them to improve representation modeling capabilities. To address the preceding two challenges, we first introduce supervised contrastive loss to enhance the intra-class aggregation and inter-class estrangement, which is to deeply explore label information and improve the effectiveness of downstream classification tasks. Then, for unlabeled data, we introduce a pseudo-label-guided consistency mechanism to induce the classification results coherent across clients, which allows the representations learned by local networks to absorb the knowledge from other clients, and alleviates the disagreement between different clients for classification tasks. We conduct sufficient experiments on four commonly used datasets, and the experimental results demonstrate that our method is superior to the state-of-the-art methods, especially in the low-label rate scenario, and the improvement becomes more significant.

    References

    [1]
    Manoj Ghuhan Arivazhagan, Vinay Aggarwal, Aaditya Kumar Singh, and Sunav Choudhary. 2019. Federated learning with personalization layers. arXiv preprint arXiv:1912.00818 (2019).
    [2]
    Arthur Asuncion and David Newman. 2007. UCI Machine Learning Repository. Retrieved April 15, 2024 from https://archive.ics.uci.edu
    [3]
    Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory. 92–100.
    [4]
    Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konečnỳ, Stefano Mazzocchi, Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, and Jason Roselander. 2019. Towards federated learning at scale: System design. Proceedings of Machine Learning and Systems 1 (2019), 374–388.
    [5]
    Akin Caliskan, Armin Mustafa, Evren Imre, and Adrian Hilton. 2020. Multi-view consistency loss for improved single-image 3D reconstruction of clothed people. In Proceedings of the Asian Conference on Computer Vision.
    [6]
    Tianyi Chen, Xiao Jin, Yuejiao Sun, and Wotao Yin. 2020. VAFL: A method of vertical asynchronous federated learning. arXiv preprint arXiv:2007.06081 (2020).
    [7]
    Liam Collins, Hamed Hassani, Aryan Mokhtari, and Sanjay Shakkottai. 2021. Exploiting shared representations for personalized federated learning. In Proceedings of the International Conference on Machine Learning. 2089–2099.
    [8]
    Cynthia Dwork. 2006. Differential Privacy. In Automata, Langages and Programming. Lecture Notes in Computer Science, Vol. 4052. Springer, 1–12.
    [9]
    Farzan Farnia, Amirhossein Reisizadeh, Ramtin Pedarsani, and Ali Jadbabaie. 2022. An optimal transport approach to personalized federated learning. IEEE Journal on Selected Areas in Information Theory 3, 2 (2022), 162–171.
    [10]
    Siwei Feng and Han Yu. 2020. Multi-participant multi-class vertical federated learning. arXiv preprint arXiv:2001.11154 (2020).
    [11]
    Mingfei Gao, Zizhao Zhang, Guo Yu, Sercan Ö Arık, Larry S. Davis, and Tomas Pfister. 2020. Consistency-based semi-supervised active learning: Towards minimizing labeling cost. In Proceedings of the 16th European Conference on Computer vision (ECCV’20). 510–526.
    [12]
    Avishek Ghosh, Jichan Chung, Dong Yin, and Kannan Ramchandran. 2020. An efficient framework for clustered federated learning. Advances in Neural Information Processing Systems 33 (2020), 19586–19597.
    [13]
    Chen Gong, Zhenzhe Zheng, Fan Wu, Yunfeng Shao, Bingshuai Li, and Guihai Chen. 2023. To store or not? Online data selection for federated learning with limited storage. In Proceedings of the ACM Web Conference 2023. 3044–3055.
    [14]
    Michael Gutmann and Aapo Hyvärinen. 2010. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. 297–304.
    [15]
    Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, Richard Nock, Giorgio Patrini, Guillaume Smith, and Brian Thorne. 2017. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677 (2017).
    [16]
    Nakamasa Inoue and Keita Goto. 2020. Semi-supervised contrastive learning with generalized contrastive loss and its application to speaker recognition. In Proceedings of the 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC’20). 1641–1646.
    [17]
    Antoine Jamin and Anne Humeau-Heurtier. 2019. (Multiscale) cross-entropy methods: A review. Entropy 22, 1 (2019), 45.
    [18]
    Xiao Jin, Pin-Yu Chen, Chia-Yi Hsu, Chia-Mu Yu, and Tianyi Chen. 2021. CAFE: Catastrophic data leakage in vertical federated learning. Advances in Neural Information Processing Systems 34 (2021), 994–1006.
    [19]
    Yan Kang, Yang Liu, and Tianjian Chen. 2020. FedMVT: Semi-supervised vertical federated learning with multiview training. arXiv preprint arXiv:2008.10838 (2020).
    [20]
    Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian Stich, and Ananda Theertha Suresh. 2020. SCAFFOLD: Stochastic controlled averaging for federated learning. In Proceedings of the 37th International Conference on Machine Learning, Hal Daumé III and Aarti Singh (Eds.).Proceedings of Machine Learning Research, Vol. 119, Hal Daumé III and Aarti Singh (Eds.). PMLR, 5132–5143.
    [21]
    Fei-Fei Li, Marco Andreeto, Marc’Aurelio Ranzato, and Pietro Perona. 2022. Caltech 101 [Data Set]. CaltechDATA. DOI:
    [22]
    Junnan Li, Caiming Xiong, and Steven C. H. Hoi. 2021. CoMatch: Semi-supervised learning with contrastive graph regularization. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9475–9484.
    [23]
    Qinbin Li, Bingsheng He, and Dawn Song. 2021. Model-contrastive federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 10713–10722.
    [24]
    Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. 2020. Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems 2 (2020), 429–450.
    [25]
    Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, and Zhihua Zhang. 2019. On the convergence of FedAvg on non-IID data. arXiv preprint arXiv:1907.02189 (2019).
    [26]
    Youwei Liang, Dong Huang, and Chang-Dong Wang. 2019. Consistency meets inconsistency: A unified graph learning framework for multi-view clustering. In Proceedings of the 2019 IEEE International Conference on Data Mining (ICDM’19). IEEE, 1204–1209.
    [27]
    Tao Lin, Lingjing Kong, Sebastian U. Stich, and Martin Jaggi. 2020. Ensemble distillation for robust model fusion in federated learning. Advances in Neural Information Processing Systems 33 (2020), 2351–2363.
    [28]
    Yang Liu, Yan Kang, Tianyuan Zou, Yanhong Pu, Yuanqin He, Xiaozhou Ye, Ye Ouyang, Ya-Qin Zhang, and Qiang Yang. 2022. Vertical federated learning. arXiv preprint arXiv:2211.12814 (2022).
    [29]
    Xinjian Luo, Yuncheng Wu, Xiaokui Xiao, and Beng Chin Ooi. 2021. Feature inference attack on model predictions in vertical federated learning. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering (ICDE’21). IEEE, 181–192.
    [30]
    Xiaosong Ma, Jie Zhang, Song Guo, and Wenchao Xu. 2022. Layer-wised model aggregation for personalized federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10092–10101.
    [31]
    Shie Mannor, Dori Peleg, and Reuven Rubinstein. 2005. The cross entropy method for classification. In Proceedings of the 22nd International Conference on Machine Learning. 561–568.
    [32]
    David Mickisch, Felix Assion, Florens Greßner, Wiebke Günther, and Mariele Motta. 2020. Understanding the decision boundary of deep neural networks: An empirical study. arXiv preprint arXiv:2002.01810 (2020).
    [33]
    Ion Muslea, Steven Minton, and Craig A. Knoblock. 2006. Active learning with multiple views. Journal of Artificial Intelligence Research 27 (2006), 203–233.
    [34]
    Jinlong Pang, Jieling Yu, Ruiting Zhou, and John C. S. Lui. 2022. An incentive auction for heterogeneous client selection in federated learning. IEEE Transactions on Mobile Computing. Published Online, June 14, 2022.
    [35]
    Krishna Pillutla, Kshitiz Malik, Abdel-Rahman Mohamed, Mike Rabbat, Maziar Sanjabi, and Lin Xiao. 2022. Federated learning with partial model personalization. In Proceedings of the International Conference on Machine Learning. 17716–17758.
    [36]
    Protection Regulation. 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council. Regulation (EU) 679 (2016), 2016.
    [37]
    Daniele Romanini, Adam James Hall, Pavlos Papadopoulos, Tom Titcombe, Abbas Ismail, Tudor Cebere, Robert Sandmann, Robin Roehm, and Michael A. Hoeh. 2021. PyVertical: A vertical federated learning framework for multi-headed SplitNN. arXiv preprint arXiv:2104.00489 (2021).
    [38]
    Theo Ryffel, Andrew Trask, Morten Dahl, Bobby Wagner, Jason Mancuso, Daniel Rueckert, and Jonathan Passerat-Palmbach. 2018. A generic framework for privacy preserving deep learning. arXiv preprint arXiv:1811.04017 (2018).
    [39]
    Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, and Phillip Isola. 2020. What makes for good views for contrastive learning? Advances in Neural Information Processing Systems 33 (2020), 6827–6839.
    [40]
    Shubham Tulsiani, Alexei A. Efros, and Jitendra Malik. 2018. Multi-view consistency as supervisory signal for learning shape and pose prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2897–2905.
    [41]
    Jesper E. Van Engelen and Holger H. Hoos. 2020. A survey on semi-supervised learning. Machine Learning 109, 2 (2020), 373–440.
    [42]
    Cédric Villani. 2009. Optimal Transport: Old and New. Grundlehren der mathematischen Wissenschaften, Vol. 338. Springer.
    [43]
    Hao Wang, Zakhary Kaplan, Di Niu, and Baochun Li. 2020. Optimizing federated learning on non-IID data with reinforcement learning. In Proceedings of the IEEE Conference on Computer Communications(INFOCOM’20). IEEE, 1698–1707.
    [44]
    Jianyu Wang, Qinghua Liu, Hao Liang, Gauri Joshi, and H. Vincent Poor. 2020. Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in Neural Information Processing Systems 33 (2020), 7611–7623.
    [45]
    Kang Wei, Jun Li, Ming Ding, Chuan Ma, Howard H. Yang, Farhad Farokhi, Shi Jin, Tony Q. S. Quek, and H. Vincent Poor. 2020. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Transactions on Information Forensics and Security 15 (2020), 3454–3469.
    [46]
    Kang Wei, Jun Li, Chuan Ma, Ming Ding, Sha Wei, Fan Wu, Guihai Chen, and Thilina Ranbaduge. 2022. Vertical federated learning: Challenges, methodologies and experiments. arXiv preprint arXiv:2202.04309 (2022).
    [47]
    Haiqin Weng, Juntao Zhang, Feng Xue, Tao Wei, Shouling Ji, and Zhiyuan Zong. 2020. Privacy leakage of real-world vertical federated learning. arXiv preprint arXiv:2011.09290 (2020).
    [48]
    Yuncheng Wu, Shaofeng Cai, Xiaokui Xiao, Gang Chen, and Beng Chin Ooi. 2020. Privacy preserving vertical federated learning for tree-based models. arXiv preprint arXiv:2008.06170 (2020).
    [49]
    Zhaomin Wu, Qinbin Li, and Bingsheng He. 2022. Practical vertical federated learning with unsupervised representation learning. IEEE Transactions on Big Data. Published Online, June 6, 2022.
    [50]
    Fan Yang, Kai Wu, Shuyi Zhang, Guannan Jiang, Yong Liu, Feng Zheng, Wei Zhang, Chengjie Wang, and Long Zeng. 2022. Class-aware contrastive semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14421–14430.
    [51]
    J. H. Yang, C. Chen, H. N. Dai, M. Ding, L. L. Fu, and Z. B. Zheng. 2022. Hierarchical representation for multi-view clustering: From intra-sample to intra-view to inter-view. In Proceedings of the Conference on Information and Knowledge Management.
    [52]
    J. H. Yang, C. Chen, H. N. Dai, M. Ding, Z. B. Wu, and Z. B. Zheng. 2022. Robust corrupted data recovery and clustering via generalized transformed tensor low-rank representation. IEEE Transactions on Neural Networks and Learning Systems. Early Access, November 3, 2022. DOI: 10.1109/TNNLS.2022.3215983.
    [53]
    J. H. Yang, C. Chen, H. N. Dai, L. L. Fu, and Z. B. Zheng. 2022. A structure noise-aware tensor dictionary learning method for high-dimensional data clustering. Information Sciences 612 (2022), 87–106.
    [54]
    J. H. Yang, L. L. Fu, C. Chen, H. N. Dai, and Z. B. Zheng. 2023. Cross-view graph matching for incomplete multi-view clustering. Neurocomputing 515 (2023), 79–88.
    [55]
    Lei Yang, Jiaming Huang, Wanyu Lin, and Jiannong Cao. 2023. Personalized federated learning on non-IID data via group-based meta-learning. ACM Transactions on Knowledge Discovery from Data 17, 4 (2023), Article 49, 20 pages.
    [56]
    Xihong Yang, Xiaochang Hu, Sihang Zhou, Xinwang Liu, and En Zhu. 2022. Interpolation-based contrastive learning for few-label semi-supervised learning. IEEE Transactions on Neural Networks and Learning Systems. Published Online, July 7, 2022.
    [57]
    Chenhao Ying, Haiming Jin, Xudong Wang, and Yuan Luo. 2020. Double insurance: Incentivized federated learning with differential privacy in mobile crowdsensing. In Proceedings of the 2020 International Symposium on Reliable Distributed Systems (SRDS’20). IEEE, 81–90.
    [58]
    Chunjie Zhang, Jian Cheng, and Qi Tian. 2019. Multi-view image classification with visual, semantic and view consistency. IEEE Transactions on Image Processing 29 (2019), 617–627.
    [59]
    Chen Zhang, Yu Xie, Hang Bai, Bin Yu, Weihong Li, and Yuan Gao. 2021. A survey on federated learning. Knowledge-Based Systems 216 (2021), 106775.
    [60]
    Qingsong Zhang, Bin Gu, Cheng Deng, Songxiang Gu, Liefeng Bo, Jian Pei, and Heng Huang. 2021. AsySQN: Faster vertical federated learning algorithms with better computation resource utilization. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3917–3927.
    [61]
    Yuhang Zhang, Xiaopeng Zhang, Jie Li, Robert Qiu, Haohang Xu, and Qi Tian. 2022. Semi-supervised contrastive learning with similarity co-calibration. IEEE Transactions on Multimedia 25 (2022), 1749–1759. DOI:

    Cited By

    View all
    • (2024)Efficient algorithms to mine concise representations of frequent high utility occupancy patternsApplied Intelligence10.1007/s10489-024-05296-254:5(4012-4042)Online publication date: 18-Mar-2024

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 18, Issue 7
    August 2024
    505 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3613689
    • Editor:
    • Jian Pei
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 June 2024
    Online AM: 09 April 2024
    Accepted: 31 March 2024
    Revised: 24 January 2024
    Received: 01 April 2023
    Published in TKDD Volume 18, Issue 7

    Check for updates

    Author Tags

    1. Vertical federated learning
    2. semi-supervised learning
    3. contrastive learning

    Qualifiers

    • Research-article

    Funding Sources

    • the National Key Research and Development Program of China
    • the National Natural Science Foundation of China
    • the Guangzhou Science and Technology Program
    • the Natural Science Foundation of Sichuan Province
    • Postdoctoral Fellowship Program of CPSF

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)187
    • Downloads (Last 6 weeks)35

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Efficient algorithms to mine concise representations of frequent high utility occupancy patternsApplied Intelligence10.1007/s10489-024-05296-254:5(4012-4042)Online publication date: 18-Mar-2024

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media