Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2072298.2072344acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Bilinear deep learning for image classification

Published: 28 November 2011 Publication History

Abstract

Image classification is a well-known classical problem in multimedia content analysis. This paper proposes a novel deep learning model called bilinear deep belief network (BDBN) for image classification. Unlike previous image classification models, BDBN aims to provide human-like judgment by referencing the architecture of the human visual system and the procedure of intelligent perception. Therefore, the multi-layer structure of the cortex and the propagation of information in the visual areas of the brain are realized faithfully. Unlike most existing deep models, BDBN utilizes a bilinear discriminant strategy to simulate the "initial guess" in human object recognition, and at the same time to avoid falling into a bad local optimum. To preserve the natural tensor structure of the image data, a novel deep architecture with greedy layer-wise reconstruction and global fine-tuning is proposed. To adapt real-world image classification tasks, we develop BDBN under a semi-supervised learning framework, which makes the deep model work well when labeled images are insufficient. Comparative experiments on three standard datasets show that the proposed algorithm outperforms both representative classification models and existing deep learning techniques. More interestingly, our demonstrations show that the proposed BDBN works consistently with the visual perception of humans.

References

[1]
F. Moosmann, E. Nowak and F. Jurie, "Randomized Clustering Forests for Image Classification", In PAMI, 2008.
[2]
A. Kumar, C. Sminchisescu, "Support kernel machines for object recognition", In ICCV, 2007.
[3]
A. Opelt, M. Fussenegger, A. Pinz, and P. Auer, "Weak hypotheses and boosting for generic object detection and recognition", In ECCV, 2004.
[4]
J. Yang, K.Yu, Y. Gong, T. Huang, "Linear spatial pyramid matching using sparse coding for image classification", In CVPR, 2009.
[5]
A. Bosch, A. Zisserman, X. Munoz, "Image classification using random forests and ferns", In ICCV, 2007.
[6]
D. Mahajan, and M. Slaney, "Image classification using the web graph", In ACMMM, 2010.
[7]
M.H. Tsai, S.F. Tsai, T.S. Huang, "Hierarchical image feature extraction and classification", In ACMMM, 2010.
[8]
O. Boiman, E. Shechtman, M. Irani, "In defense of nearest-neighbor based image classification", In CVPR, 2008.
[9]
X. Xian, C.S. Xu, J.Q. Wang, "Landmark image classification using 3D point clouds", In ACMMM, 2010.
[10]
L.F. Li, N Zhang, L.Y. Duan, Q.M. Huang, J. Du, L. Guan, "Automatic sports genre categorization and view-type classification over large-scale dataset", In ACMMM, 2009.
[11]
W.T. Chu, W.L. Liu, J. Y. Yu, "Age classification for pose variant and occluded faces", In ACMMM, 2010.
[12]
J. Machajdik and A. Hanbury, "Affective image classification using features inspired by psychology and art theory," In ACMMM, 2010.
[13]
R. Valenti, A. Jaimes, N. Sebe, "Sonify your face: facial expressions for sound generation", In ACMMM, 2010.
[14]
Z. Li, H.Z. Luo, J.P. Fan, "Incorporating camera metadata for attended region detection and consumer photo classification", In ACMMM, 2009.
[15]
G. Wallis, H. Bülthoff, "Learning to recognize objects", In Trends. Cogn. Sci, 1999.
[16]
T. Lee, D. Mumford, "Hierarchical Bayesian inference in the visual cortex", In JOSAA, 2003.
[17]
G. Leuba, R. Kraftsik, "Changes in volume, surface estimate, 3-dimensional shape and total number of neurons of the human primary visual-cortex from midgestation until old-age", In Inat. Embryol., 1994.
[18]
R. A. Barton, "Neocortex size and behavioural ecology in primates", In Royal Society of London, 1996.
[19]
G. E. Hinton, "Learning Multiple Layers of Representation", In Trends. Cogn. Sci, 2007.
[20]
D. J. Felleman, D. C. Van Essen, "Distributed hierarchical processing in the primate cerebral cortex", In Cereb. Cortex., 1991.
[21]
R. VanRullen, S. J. Thorpe, "The time course of visual processing: from early perception to decision-making," In JOCN, 2001.
[22]
X. Zhu, "Semi-supervised learning literature survey," Technical report 1530, Univ. of Wisconsin-Madison, 2006.
[23]
R. Gross, L. Sweeney, F. D. la Torre, S. Baker, "Semi-supervised learning of multi-factor models for face de-identification," In CVPR, 2008.
[24]
H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, "An empirical evaluation of deep architectures on problems with many factors of variation, In ICML, 2007.
[25]
G. E. Hinton, S. Osindero, Y. Teh, "A fast learning algorithm for deep belief nets", In Neural Comput., 2006.
[26]
P. Smolensky, "Information processing in dynamical systems: foundations of harmony theory", In Parallel Distributed Processing: Explorations in The Microstructure of Cognition, vol. 1: Foundations, MIT Press, pp. 194--281, 1986.
[27]
R.R. Salakhutdinov, G.E. Hinton, "Learning a nonlinear embedding by preserving class neighbourhood structure", In AISTATS, 2007.
[28]
J. Weston, F. Ratle, R. Collobert, "Deep learning via semi-supervised embedding", In ICML, 2008.
[29]
S.S. Zhou, Q.C. Chen, and X.L. Wang. "Discriminate Deep Belief Networks for Image Classification", In ICIP, 2010.
[30]
Z. Wang, D. Xia, E.Y. Chang, "A deep-learning model-based and data-driven hybrid architecture for image annotation", In VLS-MCMR, ACM, 2010.
[31]
E. Hörster, and R. Lienhart, "Deep networks for image retrieval on large-scale databases", In ACMMM, 2008.
[32]
H. Lee, R. Grosse, R. Ranganath, A.Y. Ng, "Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations", In ICML, 2009.
[33]
G. Taylor, R. Fergus, Y.L. Cun and C. Bregler, "Convolutional learning of spatio-temporal features," In ECCV, 2010.
[34]
Y.L. Cun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D. Jackel, "Backpropagation applied to handwritten zip code recognition," In Neural Comput., 1989.
[35]
Y. Bengio, and Y.L. Cun, "Scaling Learning Algorithms towards AI," In Large-Scale Kernel Machines, 2007.
[36]
R. Memisevic, G.E. Hinton, "Learning to represent spatial transformations with factored higher-order Boltzmann machines," In Neural Comput., 2010.
[37]
K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y.L. Cun, "What is the best multi-stage architecture for object recognition?", In ICCV, 2009.
[38]
S. Ji, W. Xu, M. Yang, K. Yu, "3D convolutional neural networks for human action recognition," In ICML, 2010.
[39]
S. Yan, D. Xu, B. Zhang, H.J. Zhang, Q. Yang and S. Lin, "Graph embedding and extension: a general framework for dimensionality reduction", In PAMI, 2007.
[40]
M. Sugiyama, "Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis", In JMLR, 2007.
[41]
G.E. Hinton, "Training products of experts by minimizing contrastive divergence", In Neural Comput., 2002.
[42]
F.F. Li, R. Fergus, P. Pernoa, "Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories", In CVPR, 2004.
[43]
A. Oliva and A. Torralba, "Modeling the shape of the scene: A holistic representation of the spatial envelope," In IJCV, 2001.
[44]
T. Sim, S. Baker, "The Carnegie Mellon University pose, illumination, and expression database", In PAMI, 2003.
[45]
Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, "Greedy layer-wise training of deep networks", In NIPS, 2006.
[46]
B. E. Boser, I. M. Guyon, and V. N. Vapnik, "A training algorithm for optimal margin classifiers", In COLT, 1992.
[47]
R. Collobert, F. Sinz, J. Weston, L. Bottou, "Large scale transductive SVMs, In JMLR", 2006.
[48]
T.M. Mitchell, "Machine Learning", 1997.
[49]
Y. Lecun, L. Bottou, Y. Bengio and P. Haffner. "Gradient-based learning applied to document recognition," In Proceedings of the IEEE, pp. 2278--2324, 1998.
[50]
X.F. He, D. Cai, and P. Niyogi, "Tensor subspace analysis", In NIPS, 2005.

Cited By

View all
  • (2022)Clustering Image Search Results by Entity DisambiguationMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44845-8_24(369-384)Online publication date: 10-Mar-2022
  • (2021)A Deep Learning Streaming Methodology for Trajectory ClassificationISPRS International Journal of Geo-Information10.3390/ijgi1004025010:4(250)Online publication date: 8-Apr-2021
  • (2021)On the Benefits of Two Dimensional Metric LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3100353(1-1)Online publication date: 2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '11: Proceedings of the 19th ACM international conference on Multimedia
November 2011
944 pages
ISBN:9781450306164
DOI:10.1145/2072298

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 November 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bilinear discriminant projection
  2. deep learning
  3. image classification

Qualifiers

  • Research-article

Conference

MM '11
Sponsor:
MM '11: ACM Multimedia Conference
November 28 - December 1, 2011
Arizona, Scottsdale, USA

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24
The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne , VIC , Australia

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)7
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Clustering Image Search Results by Entity DisambiguationMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44845-8_24(369-384)Online publication date: 10-Mar-2022
  • (2021)A Deep Learning Streaming Methodology for Trajectory ClassificationISPRS International Journal of Geo-Information10.3390/ijgi1004025010:4(250)Online publication date: 8-Apr-2021
  • (2021)On the Benefits of Two Dimensional Metric LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3100353(1-1)Online publication date: 2021
  • (2021)Artificial Neural Networks for Educational Data Mining in Higher Education: A Systematic Literature ReviewApplied Artificial Intelligence10.1080/08839514.2021.192284735:13(983-1021)Online publication date: 9-Oct-2021
  • (2019)Understanding Image Classification Using TensorFlow Deep Learning - Convolution Neural NetworkInternational Journal of Hyperconnectivity and the Internet of Things10.4018/IJHIoT.20190701033:2(19-37)Online publication date: Jul-2019
  • (2019)Saliency detection on sampled images for tag rankingMultimedia Systems10.1007/s00530-017-0546-925:1(35-47)Online publication date: 1-Feb-2019
  • (2018)coSenseProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/32870742:4(1-25)Online publication date: 27-Dec-2018
  • (2018)Enhancement of Deep Architecture using Dropout/ DropConnect Techniques Applied for AHR System2018 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2018.8489245(1-6)Online publication date: Jul-2018
  • (2018)A Novel Model for Multi-label Image Annotation2018 24th International Conference on Pattern Recognition (ICPR)10.1109/ICPR.2018.8546110(1953-1958)Online publication date: Aug-2018
  • (2018)A Deep Neural Network Based on ELM for Semi-supervised Learning of Image ClassificationNeural Processing Letters10.1007/s11063-017-9709-048:1(375-388)Online publication date: 1-Aug-2018
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media