research-article

Bilinear deep learning for image classification

Authors:

Sheng-hua Zhong,

Yang LiuAuthors Info & Claims

MM '11: Proceedings of the 19th ACM international conference on Multimedia

Pages 343 - 352

https://doi.org/10.1145/2072298.2072344

Published: 28 November 2011 Publication History

Abstract

Image classification is a well-known classical problem in multimedia content analysis. This paper proposes a novel deep learning model called bilinear deep belief network (BDBN) for image classification. Unlike previous image classification models, BDBN aims to provide human-like judgment by referencing the architecture of the human visual system and the procedure of intelligent perception. Therefore, the multi-layer structure of the cortex and the propagation of information in the visual areas of the brain are realized faithfully. Unlike most existing deep models, BDBN utilizes a bilinear discriminant strategy to simulate the "initial guess" in human object recognition, and at the same time to avoid falling into a bad local optimum. To preserve the natural tensor structure of the image data, a novel deep architecture with greedy layer-wise reconstruction and global fine-tuning is proposed. To adapt real-world image classification tasks, we develop BDBN under a semi-supervised learning framework, which makes the deep model work well when labeled images are insufficient. Comparative experiments on three standard datasets show that the proposed algorithm outperforms both representative classification models and existing deep learning techniques. More interestingly, our demonstrations show that the proposed BDBN works consistently with the visual perception of humans.

References

[1]

F. Moosmann, E. Nowak and F. Jurie, "Randomized Clustering Forests for Image Classification", In PAMI, 2008.

Digital Library

[2]

A. Kumar, C. Sminchisescu, "Support kernel machines for object recognition", In ICCV, 2007.

[3]

A. Opelt, M. Fussenegger, A. Pinz, and P. Auer, "Weak hypotheses and boosting for generic object detection and recognition", In ECCV, 2004.

[4]

J. Yang, K.Yu, Y. Gong, T. Huang, "Linear spatial pyramid matching using sparse coding for image classification", In CVPR, 2009.

[5]

A. Bosch, A. Zisserman, X. Munoz, "Image classification using random forests and ferns", In ICCV, 2007.

[6]

D. Mahajan, and M. Slaney, "Image classification using the web graph", In ACMMM, 2010.

Digital Library

[7]

M.H. Tsai, S.F. Tsai, T.S. Huang, "Hierarchical image feature extraction and classification", In ACMMM, 2010.

Digital Library

[8]

O. Boiman, E. Shechtman, M. Irani, "In defense of nearest-neighbor based image classification", In CVPR, 2008.

[9]

X. Xian, C.S. Xu, J.Q. Wang, "Landmark image classification using 3D point clouds", In ACMMM, 2010.

Digital Library

[10]

L.F. Li, N Zhang, L.Y. Duan, Q.M. Huang, J. Du, L. Guan, "Automatic sports genre categorization and view-type classification over large-scale dataset", In ACMMM, 2009.

Digital Library

[11]

W.T. Chu, W.L. Liu, J. Y. Yu, "Age classification for pose variant and occluded faces", In ACMMM, 2010.

Digital Library

[12]

J. Machajdik and A. Hanbury, "Affective image classification using features inspired by psychology and art theory," In ACMMM, 2010.

Digital Library

[13]

R. Valenti, A. Jaimes, N. Sebe, "Sonify your face: facial expressions for sound generation", In ACMMM, 2010.

Digital Library

[14]

Z. Li, H.Z. Luo, J.P. Fan, "Incorporating camera metadata for attended region detection and consumer photo classification", In ACMMM, 2009.

Digital Library

[15]

G. Wallis, H. Bülthoff, "Learning to recognize objects", In Trends. Cogn. Sci, 1999.

[16]

T. Lee, D. Mumford, "Hierarchical Bayesian inference in the visual cortex", In JOSAA, 2003.

[17]

G. Leuba, R. Kraftsik, "Changes in volume, surface estimate, 3-dimensional shape and total number of neurons of the human primary visual-cortex from midgestation until old-age", In Inat. Embryol., 1994.

[18]

R. A. Barton, "Neocortex size and behavioural ecology in primates", In Royal Society of London, 1996.

[19]

G. E. Hinton, "Learning Multiple Layers of Representation", In Trends. Cogn. Sci, 2007.

[20]

D. J. Felleman, D. C. Van Essen, "Distributed hierarchical processing in the primate cerebral cortex", In Cereb. Cortex., 1991.

[21]

R. VanRullen, S. J. Thorpe, "The time course of visual processing: from early perception to decision-making," In JOCN, 2001.

Digital Library

[22]

X. Zhu, "Semi-supervised learning literature survey," Technical report 1530, Univ. of Wisconsin-Madison, 2006.

[23]

R. Gross, L. Sweeney, F. D. la Torre, S. Baker, "Semi-supervised learning of multi-factor models for face de-identification," In CVPR, 2008.

[24]

H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, "An empirical evaluation of deep architectures on problems with many factors of variation, In ICML, 2007.

Digital Library

[25]

G. E. Hinton, S. Osindero, Y. Teh, "A fast learning algorithm for deep belief nets", In Neural Comput., 2006.

Digital Library

[26]

P. Smolensky, "Information processing in dynamical systems: foundations of harmony theory", In Parallel Distributed Processing: Explorations in The Microstructure of Cognition, vol. 1: Foundations, MIT Press, pp. 194--281, 1986.

Digital Library

[27]

R.R. Salakhutdinov, G.E. Hinton, "Learning a nonlinear embedding by preserving class neighbourhood structure", In AISTATS, 2007.

[28]

J. Weston, F. Ratle, R. Collobert, "Deep learning via semi-supervised embedding", In ICML, 2008.

Digital Library

[29]

S.S. Zhou, Q.C. Chen, and X.L. Wang. "Discriminate Deep Belief Networks for Image Classification", In ICIP, 2010.

[30]

Z. Wang, D. Xia, E.Y. Chang, "A deep-learning model-based and data-driven hybrid architecture for image annotation", In VLS-MCMR, ACM, 2010.

Digital Library

[31]

E. Hörster, and R. Lienhart, "Deep networks for image retrieval on large-scale databases", In ACMMM, 2008.

Digital Library

[32]

H. Lee, R. Grosse, R. Ranganath, A.Y. Ng, "Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations", In ICML, 2009.

Digital Library

[33]

G. Taylor, R. Fergus, Y.L. Cun and C. Bregler, "Convolutional learning of spatio-temporal features," In ECCV, 2010.

Digital Library

[34]

Y.L. Cun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D. Jackel, "Backpropagation applied to handwritten zip code recognition," In Neural Comput., 1989.

Digital Library

[35]

Y. Bengio, and Y.L. Cun, "Scaling Learning Algorithms towards AI," In Large-Scale Kernel Machines, 2007.

[36]

R. Memisevic, G.E. Hinton, "Learning to represent spatial transformations with factored higher-order Boltzmann machines," In Neural Comput., 2010.

Digital Library

[37]

K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y.L. Cun, "What is the best multi-stage architecture for object recognition?", In ICCV, 2009.

[38]

S. Ji, W. Xu, M. Yang, K. Yu, "3D convolutional neural networks for human action recognition," In ICML, 2010.

[39]

S. Yan, D. Xu, B. Zhang, H.J. Zhang, Q. Yang and S. Lin, "Graph embedding and extension: a general framework for dimensionality reduction", In PAMI, 2007.

Digital Library

[40]

M. Sugiyama, "Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis", In JMLR, 2007.

Digital Library

[41]

G.E. Hinton, "Training products of experts by minimizing contrastive divergence", In Neural Comput., 2002.

Digital Library

[42]

F.F. Li, R. Fergus, P. Pernoa, "Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories", In CVPR, 2004.

[43]

A. Oliva and A. Torralba, "Modeling the shape of the scene: A holistic representation of the spatial envelope," In IJCV, 2001.

Digital Library

[44]

T. Sim, S. Baker, "The Carnegie Mellon University pose, illumination, and expression database", In PAMI, 2003.

Digital Library

[45]

Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, "Greedy layer-wise training of deep networks", In NIPS, 2006.

[46]

B. E. Boser, I. M. Guyon, and V. N. Vapnik, "A training algorithm for optimal margin classifiers", In COLT, 1992.

Digital Library

[47]

R. Collobert, F. Sinz, J. Weston, L. Bottou, "Large scale transductive SVMs, In JMLR", 2006.

Digital Library

[48]

T.M. Mitchell, "Machine Learning", 1997.

Digital Library

[49]

Y. Lecun, L. Bottou, Y. Bengio and P. Haffner. "Gradient-based learning applied to document recognition," In Proceedings of the IEEE, pp. 2278--2324, 1998.

[50]

X.F. He, D. Cai, and P. Niyogi, "Tensor subspace analysis", In NIPS, 2005.

Cited By

Zhao KCai ZSui QWei EZhu K(2022)Clustering Image Search Results by Entity DisambiguationMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44845-8_24(369-384)Online publication date: 10-Mar-2022
https://dl.acm.org/doi/10.1007/978-3-662-44845-8_24
Kontopoulos IMakris ATserpes K(2021)A Deep Learning Streaming Methodology for Trajectory ClassificationISPRS International Journal of Geo-Information10.3390/ijgi1004025010:4(250)Online publication date: 8-Apr-2021
https://doi.org/10.3390/ijgi10040250
Wu DZhou FWang BWong CShui CZhou YLao QWan F(2021)On the Benefits of Two Dimensional Metric LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3100353(1-1)Online publication date: 2021
https://doi.org/10.1109/TKDE.2021.3100353
Show More Cited By

Index Terms

Bilinear deep learning for image classification
1. Computing methodologies
  1. Artificial intelligence
    1. Philosophical/theoretical foundations of artificial intelligence
  2. Machine learning

Recommendations

Deep CNN for Classification of Image Contents
IPMV '21: Proceedings of the 2021 3rd International Conference on Image Processing and Machine Vision

In recent years the classification of images has made great progress and has been used in many fields. However, it may not be possible to classify images perfectly through the CNN because of overfitting and gradient vanishing. Most existing CNNs have ...
Deep Learning Approaches for Image Classification
EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering

Deep learning models can achieve a higher accuracy result compared with traditional machine learning algorithm. It is widely useful in different areas, especially in images classification area. In recent years, because of the improvement of hardware and ...
Deep adaptive networks for image classification
ICIMCS '10: Proceedings of the Second International Conference on Internet Multimedia Computing and Service

This paper proposes a novel classifier called Deep Adaptive Networks (DAN) with deep architecture for image classification. First, we construct a deep and directed belief nets using a set of Restricted Boltzmann Machines (RBM) via greedy and layer-wise ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '11: Proceedings of the 19th ACM international conference on Multimedia

November 2011

944 pages

ISBN:9781450306164

DOI:10.1145/2072298

General Chairs:
K. Selçuk Candan
Arizona State University, USA
,
Sethuraman Panchanathan
Arizona State University, USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
,
Program Chairs:
Hari Sundaram
Arizona State University, USA
,
Wu-Chi Feng
Portland State University, USA
,
Nicu Sebe
University of Trento, Italy

Copyright © 2011 Authors.

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 November 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '11

Sponsor:

SIGMM

MM '11: ACM Multimedia Conference

November 28 - December 1, 2011

Arizona, Scottsdale, USA

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

43
Total Citations
View Citations
1,564
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)7

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhao KCai ZSui QWei EZhu K(2022)Clustering Image Search Results by Entity DisambiguationMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44845-8_24(369-384)Online publication date: 10-Mar-2022
https://dl.acm.org/doi/10.1007/978-3-662-44845-8_24
Kontopoulos IMakris ATserpes K(2021)A Deep Learning Streaming Methodology for Trajectory ClassificationISPRS International Journal of Geo-Information10.3390/ijgi1004025010:4(250)Online publication date: 8-Apr-2021
https://doi.org/10.3390/ijgi10040250
Wu DZhou FWang BWong CShui CZhou YLao QWan F(2021)On the Benefits of Two Dimensional Metric LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3100353(1-1)Online publication date: 2021
https://doi.org/10.1109/TKDE.2021.3100353
Okewu EAdewole PMisra SMaskeliunas RDamasevicius R(2021)Artificial Neural Networks for Educational Data Mining in Higher Education: A Systematic Literature ReviewApplied Artificial Intelligence10.1080/08839514.2021.192284735:13(983-1021)Online publication date: 9-Oct-2021
https://doi.org/10.1080/08839514.2021.1922847
Gunjan VPathak RSingh O(2019)Understanding Image Classification Using TensorFlow Deep Learning - Convolution Neural NetworkInternational Journal of Hyperconnectivity and the Internet of Things10.4018/IJHIoT.20190701033:2(19-37)Online publication date: Jul-2019
https://doi.org/10.4018/IJHIoT.2019070103
Guo JRen THuang LBei J(2019)Saliency detection on sampled images for tag rankingMultimedia Systems10.1007/s00530-017-0546-925:1(35-47)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1007/s00530-017-0546-9
Xie XYang YFang ZWang GZhang FZhang FLiu YZhang D(2018)coSenseProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/32870742:4(1-25)Online publication date: 27-Dec-2018
https://dl.acm.org/doi/10.1145/3287074
Elleuch MAlimi AKherallah M(2018)Enhancement of Deep Architecture using Dropout/ DropConnect Techniques Applied for AHR System2018 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2018.8489245(1-6)Online publication date: Jul-2018
https://doi.org/10.1109/IJCNN.2018.8489245
Wu XZhang LLi FWang B(2018)A Novel Model for Multi-label Image Annotation2018 24th International Conference on Pattern Recognition (ICPR)10.1109/ICPR.2018.8546110(1953-1958)Online publication date: Aug-2018
https://doi.org/10.1109/ICPR.2018.8546110
Chang PZhang JHu JSong Z(2018)A Deep Neural Network Based on ELM for Semi-supervised Learning of Image ClassificationNeural Processing Letters10.1007/s11063-017-9709-048:1(375-388)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.1007/s11063-017-9709-0
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents