Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3394486.3403079acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

TranSlider: Transfer Ensemble Learning from Exploitation to Exploration

Published: 20 August 2020 Publication History

Abstract

In transfer learning, what and where to transfer has been widely studied. Nevertheless, the learned transfer strategies are at high risk of over-fitting, especially when only a few annotated instances are available in the target domain. In this paper, we introduce the concept of transfer ensemble learning, a new direction to tackle the over-fitting of transfer strategies. Intuitively, models with different transfer strategies offer various perspectives on what and where to transfer. Therefore a core problem is to search these diversely transferred models for ensemble so as to achieve better generalization. Towards this end, we propose the Transferability Slider (TranSlider) for transfer ensemble learning. By decreasing the transferability, we obtain a spectrum of base models ranging from pure exploitation of the source model to unconstrained exploration for the target domain. Furthermore, the manner of decreasing transferability with parameter sharing guarantees fast optimization at no additional training cost. Finally, we conduct extensive experiments with various analyses, which demonstrate that TranSlider achieves the state-of-the-art on comprehensive benchmark datasets.

References

[1]
Firoj Alam, Shafiq Joty, and Muhammad Imran. 2018. Domain Adaptation with Adversarial Training and Graph Embeddings. In ACL. 1077--1087.
[2]
Antreas Antoniou, Harrison Edwards, and Amos Storkey. 2018. How to train your MAML. In ICLR.
[3]
Leo Breiman. 1996. Bagging predictors. Machine learning, Vol. 24, 2 (1996), 123--140.
[4]
Rich Caruana, Alexandru Niculescu-Mizil, Geoff Crew, and Alex Ksikes. 2004. Ensemble selection from libraries of models. In ICML. 18.
[5]
Thomas G Dietterich. 2000. Ensemble methods in machine learning. In International workshop on multiple classifier systems. Springer, 1--15.
[6]
Zi-Yi Dou, Junjie Hu, Antonios Anastasopoulos, and Graham Neubig. 2019 a. Unsupervised Domain Adaptation for Neural Machine Translation with Domain-Aware Feature Embeddings. In EMNLP. 1417--1422.
[7]
Zi-Yi Dou, Keyi Yu, and Antonios Anastasopoulos. 2019 b. Investigating Meta-Learning Algorithms for Low-Resource Natural Language Understanding Tasks. In EMNLP. 1192--1197.
[8]
Mathias Eitz, James Hays, and Marc Alexa. 2012. How Do Humans Sketch Objects? ACM Trans. Graph. (Proc. SIGGRAPH), Vol. 31, 4 (2012), 44:1--44:10.
[9]
Tommaso Furlanello, Zachary Lipton, Michael Tschannen, Laurent Itti, and Anima Anandkumar. 2018. Born Again Neural Networks. In ICML. 1607--1616.
[10]
AD. Perona P Griffin, G. Holub. 2007. The Caltech 256. (2007).
[11]
Yunhui Guo, Honghui Shi, Abhishek Kumar, Kristen Grauman, Tajana Rosing, and Rogerio Feris. 2019. SpotTune: transfer learning through adaptive fine-tuning. In CVPR. 4805--4814.
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.
[13]
Byeongho Heo, Minsik Lee, Sangdoo Yun, and Jin Young Choi. 2019. Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons. In AAAI, Vol. 33. 3779--3787.
[14]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[15]
Gao Huang, Yixuan Li, Geoff Pleiss, Zhuang Liu, John E Hopcroft, and Kilian Q Weinberger. 2017. Snapshot ensembles: Train 1, get m for free. In ICLR.
[16]
Yunhun Jang, Hankook Lee, Sung Ju Hwang, and Jinwoo Shin. 2019. Learning What and Where to Transfer. In ICML. 3030--3039.
[17]
Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Li Fei-Fei. 2011. Novel Dataset for Fine-Grained Image Categorization. In First Workshop on Fine-Grained Visual Categorization, CVPR.
[18]
Sasi Kiran Yelamarthi, Shiva Krishna Reddy, Ashish Mishra, and Anurag Mittal. 2018. A zero-shot framework for sketch based image retrieval. In ECCV. 300--317.
[19]
Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997 (2014).
[20]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).
[21]
Xuhong Li, Yves Grandvalet, and Franck Davoine. 2018. Explicit inductive bias for transfer learning with convolutional networks. In ICML. 2825--2834.
[22]
Xingjian Li, Haoyi Xiong, Hanchao Wang, Yuxuan Rao, Liping Liu, and Jun Huan. 2019. DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks. In ICLR.
[23]
Hong Liu, Mingsheng Long, Jianmin Wang, and Michael I Jordan. 2019. Towards Understanding the Transferability of Deep Representations. arXiv preprint arXiv:1909.12031 (2019).
[24]
Ilya Loshchilov and Frank Hutter. 2016. Sgdr: Stochastic gradient descent with warm restarts. In ICLR.
[25]
David JC MacKay. 1992. A practical Bayesian framework for backpropagation networks. Neural computation, Vol. 4, 3 (1992), 448--472.
[26]
Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. 2016. Cross-Stitch Networks for Multi-Task Learning. In CVPR. 3994--4003.
[27]
Pramod Kaushik Mudrakarta, Mark Sandler, Andrey Zhmoginov, and Andrew Howard. 2018. K for the Price of 1: Parameter-efficient Multi-task and Transfer Learning. In ICLR.
[28]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. 2011. Reading digits in natural images with unsupervised feature learning. (2011).
[29]
Sinno Jialin Pan and Qiang Yang. 2009. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, Vol. 22, 10 (2009), 1345--1359.
[30]
Hanyu Peng, Jiaxiang Wu, Shifeng Chen, and Junzhou Huang. 2019. Collaborative channel pruning for deep networks. In ICML. 5113--5122.
[31]
Ariadna Quattoni and Antonio Torralba. 2009. Recognizing indoor scenes. In CVPR. 413--420.
[32]
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2015. Fitnets: Hints for thin deep nets. In ICLR.
[33]
Sebastian Ruder12, Joachim Bingel, Isabelle Augenstein, and Anders Søgaard. 2017. Sluice networks: Learning what to share between loosely related tasks. STAT, Vol. 1050 (2017), 23.
[34]
Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: an astounding baseline for recognition. In CVPR. 806--813.
[35]
Suraj Srinivas and Francois Fleuret. 2018. Knowledge Transfer with Jacobian Matching. In ICML. 4723--4731.
[36]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech-UCSD Birds-200--2011 Dataset. Technical Report CNS-TR-2011-001. California Institute of Technology.
[37]
Chenglin Yang, Lingxi Xie, Chi Su, and Alan L Yuille. 2019. Snapshot distillation: Teacher-student optimization in one generation. In CVPR. 2859--2868.
[38]
Junho Yim, Donggyu Joo, Jihoon Bae, and Junmo Kim. 2017. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In CVPR. 4133--4141.
[39]
Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. In NeurIPS. 3320--3328.
[40]
Sergey Zagoruyko and Nikos Komodakis. 2016. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In ICLR.
[41]
Yinghua Zhang, Yu Zhang, and Qiang Yang. 2019. Parameter Transfer Unit for Deep Neural Networks. In PAKDD. 82--95.
[42]
Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, 6 (2017), 1452--1464.

Cited By

View all
  • (2023)Beyond Fine-Tuning: Efficient and Effective Fed-Tuning for Mobile/Web UsersProceedings of the ACM Web Conference 202310.1145/3543507.3583212(2863-2873)Online publication date: 30-Apr-2023
  • (2022)A Robust Computerized Adaptive Testing Approach in Educational Question RetrievalProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531928(416-426)Online publication date: 6-Jul-2022
  • (2022)What is Market Talking about Market-oriented Prospect Analysis for Entrepreneur FundraisingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3174336(1-1)Online publication date: 2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
August 2020
3664 pages
ISBN:9781450379984
DOI:10.1145/3394486
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ensemble learning
  2. exploitation
  3. exploration
  4. transfer learning

Qualifiers

  • Research-article

Funding Sources

  • Joint Research Center of Tencent and Tsinghua
  • SZSTI
  • NSFC

Conference

KDD '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)4
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Beyond Fine-Tuning: Efficient and Effective Fed-Tuning for Mobile/Web UsersProceedings of the ACM Web Conference 202310.1145/3543507.3583212(2863-2873)Online publication date: 30-Apr-2023
  • (2022)A Robust Computerized Adaptive Testing Approach in Educational Question RetrievalProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531928(416-426)Online publication date: 6-Jul-2022
  • (2022)What is Market Talking about Market-oriented Prospect Analysis for Entrepreneur FundraisingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3174336(1-1)Online publication date: 2022
  • (2022)A Transfer Ensemble Learning Method for Evaluating Power Transformer Health Conditions with Limited Measurement DataIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2022.3175268(1-1)Online publication date: 2022
  • (2022)Deep Domain Adaptation for Powe Transformer Fault Diagnosis Based on Transfer Convolutional Neural Network2022 4th International Conference on Electrical Engineering and Control Technologies (CEECT)10.1109/CEECT55960.2022.10030508(25-29)Online publication date: Dec-2022
  • (2021)Neural Prototype Trees for Interpretable Fine-grained Image Recognition2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR46437.2021.01469(14928-14938)Online publication date: Jun-2021
  • (2021)TransJury: Towards Explainable Transfer Learning through Selection of Layers from Deep Neural Networks2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671723(978-984)Online publication date: 15-Dec-2021

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media