Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2623330.2623618acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Supervised deep learning with auxiliary networks

Published: 24 August 2014 Publication History

Abstract

Deep learning well demonstrates its potential in learning latent feature representations. Recent years have witnessed an increasing enthusiasm for regularizing deep neural networks by incorporating various side information, such as user-provided labels or pairwise constraints. However, the effectiveness and parameter sensitivity of such algorithms have been major obstacles for putting them into practice. The major contribution of our work is the exposition of a novel supervised deep learning algorithm, which distinguishes from two unique traits. First, it regularizes the network construction by utilizing similarity or dissimilarity constraints between data pairs, rather than sample-specific annotations. Such kind of side information is more flexible and greatly mitigates the workload of annotators. Secondly, unlike prior works, our proposed algorithm decouples the supervision information and intrinsic data structure. We design two heterogeneous networks, each of which encodes either supervision or unsupervised data structure respectively. Specifically, we term the supervision-oriented network as "auxiliary network" since it is principally used for facilitating the parameter learning of the other one and will be removed when handling out-of-sample data. The two networks are complementary to each other and bridged by enforcing the correlation of their parameters. We name the proposed algorithm SUpervision-Guided AutoencodeR (SUGAR). Comparing prior works on unsupervised deep networks and supervised learning, SUGAR better balances numerical tractability and the flexible utilization of supervision information. The classification performance on MNIST digits and eight benchmark datasets demonstrates that SUGAR can effectively improve the performance by using the auxiliary networks, on both shallow and deep architectures. Particularly, when multiple SUGARs are stacked, the performance is significantly boosted. On the selected benchmarks, ours achieve up to 11.35% relative accuracy improvement compared to the state-of-the-art models.

Supplementary Material

MP4 File (p353-sidebyside.mp4)

References

[1]
A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 51(1):117--122, Jan. 2008.
[2]
Y. Bengio. Learning deep architectures for AI. Foundations and trends R in Machine Learning, 2(1):1--127, 2009.
[3]
Y. Bengio. Deep learning of representations: Looking forward. In A.-H. Dediu, C. Martín-Vidie, R. Mitkov, and B. Truthe, editors, Statistical Language and Speech Processing, volume 7978 of Lecture Notes in Computer Science, pages 1--37. Springer Berlin Heidelberg, 2013.
[4]
Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. Advances in neural information processing systems, 19:153, 2007.
[5]
J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010. Oral Presentation.
[6]
S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 539--546 vol. 1, June 2005.
[7]
A. Coates, A. Y. Ng, and H. Lee. An analysis of single-layer networks in unsupervised feature learning. In International Conference on Artificial Intelligence and Statistics, pages 215--223, 2011.
[8]
L. Deng, M. L. Seltzer, D. Yu, A. Acero, A.-r. Mohamed, and G. E. Hinton. Binary coding of speech spectrograms using a deep auto-encoder. In INTERSPEECH, pages 1692--1695. Citeseer, 2010.
[9]
D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, P. Vincent, and S. Bengio. Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res., 11:625--660, Mar. 2010.
[10]
X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In International Conference on Artificial Intelligence and Statistics, pages 249--256, 2010.
[11]
G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527--1554, 2006.
[12]
G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504--507, 2006.
[13]
A. Hyvärinen, P. O. Hoyer, and M. Inki. Topographic independent component analysis. Neural computation, 13(7):1527--1558, 2001.
[14]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097--1105. Curran Associates, Inc., 2012.
[15]
H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings ofthe 24th international conference on Machine learning, ICML '07, pages 473--480, New York, NY, USA, 2007. ACM.
[16]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.
[17]
Y. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller. Efficient backprop. In Neural networks: Tricks of the trade, pages 9--50. Springer, 1998.
[18]
R. Memisevic, C. Zach, M. Pollefeys, and G. E. Hinton. Gated softmax classification. In Advances in Neural Information Processing Systems, pages 1603--1611, 2010.
[19]
Y. Mu, J. Shen, and S. Yan. Weakly-supervised hashing in kernel space. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 3344--3351, June 2010.
[20]
M. A. Ranzato and M. Szummer. Semi-supervised learning of compact document representations with deep networks. In Proceedings of the 25th international conference on Machine learning, ICML '08, pages 792--799, New York, NY, USA, 2008. ACM.
[21]
S. Rifai, G. Mesnil, P. Vincent, X. Muller, Y. Bengio, Y. Dauphin, and X. Glorot. Higher order contractive auto-encoder. In Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II, ECML PKDD'11, pages 645--660, Berlin, Heidelberg, 2011. Springer-Verlag.
[22]
S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio. Contractive auto-encoders: Explicit invariance during feature extraction. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 833--840, 2011.
[23]
D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by back-propagating errors. Nature, 323(6088):533--536, Oct. 1986.
[24]
J. Snoek, R. P. Adams, and H. Larochelle. Nonparametric guidance of autoencoder representations using label information. J. Mach. Learn. Res., 13(1):2567--2588, Sept. 2012.
[25]
R. Socher, J. Pennington, E. H. Huang, A. Y. Ng, and C. D. Manning. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '11, pages 151--161, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics.
[26]
P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, ICML '08, pages 1096--1103, New York, NY, USA, 2008. ACM.
[27]
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res., 11:3371--3408, Dec. 2010.
[28]
F. Wang and C. Zhang. Feature extraction by maximizing the average neighborhood margin. In Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on, pages 1--8, June 2007.
[29]
J. Wang, S. Kumar, and S.-F. Chang. Semi-supervised hashing for scalable image retrieval. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 3424--3431, 2010.
[30]
Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In Advances in neural information processing systems, pages 1753--1760, 2008.
[31]
Z. Wen and W. Yin. A feasible method for optimization with orthogonality constraints. Mathematical Programming, 142(1-2):397--434, 2013.
[32]
Y. Yang, G. Shu, and M. Shah. Semi-supervised learning of feature hierarchies for object detection in a video. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 1650--1657. IEEE, 2013.

Cited By

View all
  • (2023)Micro-Supervised Disturbance Learning: A Perspective of Representation Probability DistributionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.322546145:6(7542-7558)Online publication date: 1-Jun-2023
  • (2021)N-HANS: A neural network-based toolkit for in-the-wild audio enhancementMultimedia Tools and Applications10.1007/s11042-021-11080-yOnline publication date: 3-Jun-2021
  • (2020)From Depth What Can You See? Depth Completion via Auxiliary Image Reconstruction2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR42600.2020.01132(11303-11312)Online publication date: Jun-2020
  • Show More Cited By

Index Terms

  1. Supervised deep learning with auxiliary networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2014
    2028 pages
    ISBN:9781450329569
    DOI:10.1145/2623330
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 August 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. autoencoder
    2. deep neural networks
    3. supervision

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    KDD '14
    Sponsor:

    Acceptance Rates

    KDD '14 Paper Acceptance Rate 151 of 1,036 submissions, 15%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)38
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Micro-Supervised Disturbance Learning: A Perspective of Representation Probability DistributionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.322546145:6(7542-7558)Online publication date: 1-Jun-2023
    • (2021)N-HANS: A neural network-based toolkit for in-the-wild audio enhancementMultimedia Tools and Applications10.1007/s11042-021-11080-yOnline publication date: 3-Jun-2021
    • (2020)From Depth What Can You See? Depth Completion via Auxiliary Image Reconstruction2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR42600.2020.01132(11303-11312)Online publication date: Jun-2020
    • (2019)Restricted Boltzmann Machines With Gaussian Visible Units Guided by Pairwise ConstraintsIEEE Transactions on Cybernetics10.1109/TCYB.2018.286360149:12(4321-4334)Online publication date: Dec-2019
    • (2019)Integrated Semi-Supervised Model for Learning and ClassificationProceedings of 3rd International Conference on Computer Vision and Image Processing10.1007/978-981-32-9088-4_16(183-195)Online publication date: 1-Nov-2019
    • (2018)Fast Convergence for Object Detection by Learning how to Combine Error Functions2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS.2018.8594179(7329-7335)Online publication date: Oct-2018
    • (2018)Semi-supervised Deep Representation Learning for Multi-View Problems2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622015(56-64)Online publication date: Dec-2018
    • (2016)CORP: Cooperative Opportunistic Resource Provisioning for Short-Lived Jobs in Cloud Systems2016 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2016.65(90-99)Online publication date: Sep-2016
    • (2016)Image matting in the perception granular deep learningKnowledge-Based Systems10.1016/j.knosys.2016.03.018102:C(51-63)Online publication date: 15-Jun-2016
    • (2015)Online learning of deep hybrid architectures for Semi-supervised categorizationProceedings of the 2015th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I10.1007/978-3-319-23528-8_32(516-532)Online publication date: 7-Sep-2015

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media