research-article

Learning image-to-class distance metric for image classification

Authors:

Zhengxiang Wang,

Liang-Tien ChiaAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 4, Issue 2

Article No.: 34, Pages 1 - 22

https://doi.org/10.1145/2438653.2438669

Published: 03 April 2013 Publication History

Abstract

Image-To-Class (I2C) distance is a novel distance used for image classification and has successfully handled datasets with large intra-class variances. However, it uses Euclidean distance for measuring the distance between local features in different classes, which may not be the optimal distance metric in real image classification problems. In this article, we propose a distance metric learning method to improve the performance of I2C distance by learning per-class Mahalanobis metrics in a large margin framework. Our I2C distance is adaptive to different classes by combining with the learned metric for each class. These multiple per-class metrics are learned simultaneously by forming a convex optimization problem with the constraints that the I2C distance from each training image to its belonging class should be less than the distances to other classes by a large margin. A subgradient descent method is applied to efficiently solve this optimization problem. For efficiency and scalability to large-scale problems, we also show how to simplify the method to learn a diagonal matrix for each class. We show in experiments that our learned Mahalanobis I2C distance can significantly outperform the original Euclidean I2C distance as well as other distance metric learning methods in several prevalent image datasets, and our simplified diagonal matrices can preserve the performance but significantly speed up the metric learning procedure for large-scale datasets. We also show in experiment that our method is able to correct the class imbalance problem, which usually leads the NN-based methods toward classes containing more training images.

References

[1]

Bar-Hillel, A., Hertz, T., Shental, N., and Weinshall, D. 2005. Learning a mahalanobis metric from equivalence constraints. J. Mach. Learn. Res. 6, 937--965.

Digital Library

[2]

Bay, H., Ess, A., Tuytelaars, T., and Gool, L. V. 2008. SURF: Speeded up robust features. Comput. Vis. Image Understand. 110, 3, 346--359.

Digital Library

[3]

Behmo, R., Marcombes, P., Dalalyan, A., and Prinet, V. 2010. Towards optimal naive bayes nearest neighbor. In Proceedings of the European Conference on Computer Vision. 171--184.

Digital Library

[4]

Boiman, O., Shechtman, E., and Irani, M. 2008. In defense of nearest-neighbor based image classification. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]

Bosch, A., Zisserman, A., and Munoz, X. 2008. Scene classification using a hybrid generative/discriminative approach. IEEE Trans. Pattern Anal. Mach. Intell. 30, 4.

Digital Library

[6]

Cox, T. and Cox, M. 1994. Multidimensional Scaling. Chapman & Hall, London.

[7]

Davis, J. V., Kulis, B., Jain, P., Sra, S., and Dhillon, I. S. 2007. Information-Theoretic metric learning. In Proceedings of the 24th International Conference on Machine Learning. 209--216.

Digital Library

[8]

Fei-Fei, L., Fergus, R., and Perona, P. 2004. Learning generative visual models from few training examples: An Incremental bayesian approach tested on 101 object categories. In Proceedings of the CVPR Workshop on Generative-Model Based Vision.

Digital Library

[9]

Fei-Fei, L. and Perona, P. 2005. A bayesian hierarchical model for learning natural scene categories. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 524--531.

Digital Library

[10]

Frome, A., Singer, Y., and Malik, J. 2006. Image retrieval and classification using local distance functions. Adv. Neural Inf. Process. Syst. 19.

[11]

Frome, A., Singer, Y., Sha, F., and Malik, J. 2007. Learning globally-consistent local distance functions for shape-based image retrieval and classification. In Proceedings of the IEEE International Conference on Computer Vision.

[12]

Goldberger, G. H. J., Roweis, S., and Salakhutdinov, R. 2005. Neighbourhood components analysis. Adv. Neural Info. Process. Syst. 17.

[13]

Grauman, K. and Darrell, T. 2005. The pyramid match kernel: Discriminative classification with sets of image features. In Proceedings of the IEEE International Conference on Computer Vision.

Digital Library

[14]

Griffin, G., Holub, A., and Perona, P. 2007. Caltech-256 object category dataset. Tech. rep. 7694, California Institute of Technology.

[15]

Gu, C., Lim, J., Arbelaez, P., and Malik, J. 2009. Recognition using regions. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]

Hoi, S., Liu, W., and Chang, S.-F. 2008. Semi-Supervised distance metric learning for collaborative image retrieval. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]

Hoi, S., Liu, W., Lyu, M. R., and Ma, W.-Y. 2006. Learning distance metrics with contextual constraints for image retrieval. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Vol. 2. 2072--2078.

Digital Library

[18]

Huang, Y., Xu, D., and Cham, T.-J. 2010. Face and human gait recognition using image-to-class distance. IEEE Trans. Circuits Syst. Video Technol. 20, 3, 431--438.

Digital Library

[19]

Lazebnik, S., Schmid, C., and Ponce, J. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2169--2178.

Digital Library

[20]

Li, L.-J. and Fei-Fei, L. 2007. What, where and who&quest; Classifying events by scene and object recognition. In Proceedings of the IEEE International Conference on Computer Vision.

[21]

Liu, J. and Shah, M. 2007. Scene modeling using co-clustering. In Proceedings of the IEEE International Conference on Computer Vision.

[22]

Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2, 91--110.

Digital Library

[23]

Lu, Z. and Ip, H. H. 2009a. Image categorization by learning with context and consistency. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]

Lu, Z. and Ip, H. H. 2009b. Image categorization with spatial mismatch kernels. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]

Oliva, A. and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 3, 145--175.

Digital Library

[26]

Roweis, S. and Saul, L. K. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 5500, 2323--2326.

[27]

Si, L., Jin, R., Hoi, S. C. H., and Lyu, M. R. 2006. Collaborative image retrieval via regularized metric learning. Multimedia Syst. 12, 1, 34--44.

Digital Library

[28]

Tenenbaum, J., de Silva, V., and Langford, J. C. 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290, 5500, 2319--2323.

[29]

Vedaldi, A. and Fulkerson, B. 2008. VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/.

[30]

Wang, J., Yang, J., Yu, K., Lv, F., and Gong, Y. 2010a. Locality-Constrained linear coding for image classification. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]

Wang, Z., Hu, Y., and Chia, L.-T. 2009. Learning instance-to-class distance for human action recognition. In Proceedings of the International Conference on Image Processing. 3545--3548.

Digital Library

[32]

Wang, Z., Hu, Y., and Chia, L.-T. 2010b. Image-to-Class distance metric learning for image classification. In Proceedings of the European Conference on Computer Vision. 709--719.

Digital Library

[33]

Weinberger, K. Q. and Saul, L. K. 2009. Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207--244.

Digital Library

[34]

Wu, J. and Rehg, J. M. 2009. Beyond the euclidean distance: Creating effective visual codebooks using the histogram intersection kernel. In Proceedings of the IEEE International Conference on Computer Vision.

[35]

Yang, J., KaiYu, Gong, Y., and Huang, T. 2009. Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]

Yang, L., Jin, R., Sukthankar, R., and Liu, Y. 2006. An efficient algorithm for local distance metric learning. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI'06).

Digital Library

[37]

Zha, Z., Mei, T., Wang, M., Wang, Z., and Hua, X. 2009. Robust distance metric learning with auxiliary knowledge. In Proceedings of the 21st International Jont Conference on Artifical Intelligence.

Digital Library

[38]

Zhang, Y. and Yeung, D.-Y. 2010. Transfer metric learning by learning task relationships. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

Digital Library

Cited By

Liu XYang XWang MHong R(2020)Deep Neighborhood Component Analysis for Visual Similarity ModelingACM Transactions on Intelligent Systems and Technology10.1145/337578711:3(1-15)Online publication date: 18-Apr-2020
https://dl.acm.org/doi/10.1145/3375787
Jie XLin Y(2020)DSR: A Deep Learning Framework Towards Modulation Signal Retrieval2020 7th International Conference on Dependable Systems and Their Applications (DSA)10.1109/DSA51864.2020.00041(234-239)Online publication date: Nov-2020
https://doi.org/10.1109/DSA51864.2020.00041
Aman ERawat AGiri AGothwal H(2019)Content-Based Image Retrieval : A Comprehensive StudyInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT1952275(1073-1081)Online publication date: 20-Mar-2019
https://doi.org/10.32628/CSEIT1952275
Show More Cited By

Index Terms

Learning image-to-class distance metric for image classification
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
        Scene understanding
2. Information systems
  1. Information systems applications

Recommendations

Improved learning of I2C distance and accelerating the neighborhood search for image classification

Image-to-class (I2C) distance is a novel measure for image classification and has successfully handled datasets with large intra-class variances. However, due to the lack of a training phase, the performance of this distance is easily affected by ...
Class-specific mahalanobis distance metric learning for biological image classification
ICIAR'12: Proceedings of the 9th international conference on Image Analysis and Recognition - Volume Part II

Distance metric learning (DML) is an emerging field of machine learning. The basic idea behind DML is to adapt the underlying distance metric to improve the performance for the pattern analysis tasks. In this paper, we present the use of DML techniques ...
Saliency-aware image-to-class distances for image classification

Non-parametric Nearest-Neighbour (NN) image classification is desired in certain applications, because no intensive learning is required. Naive Bayes Nearest Neighbour (NBNN) and its improved version, Local Naive Bayes Nearest Neighbour (Local NBNN), ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 4, Issue 2

Special section on agent communication, trust in multiagent systems, intelligent tutoring and coaching systems

March 2013

339 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/2438653

Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 April 2013

Accepted: 01 March 2012

Revised: 01 March 2012

Received: 01 October 2010

Published in TIST Volume 4, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
281
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu XYang XWang MHong R(2020)Deep Neighborhood Component Analysis for Visual Similarity ModelingACM Transactions on Intelligent Systems and Technology10.1145/337578711:3(1-15)Online publication date: 18-Apr-2020
https://dl.acm.org/doi/10.1145/3375787
Jie XLin Y(2020)DSR: A Deep Learning Framework Towards Modulation Signal Retrieval2020 7th International Conference on Dependable Systems and Their Applications (DSA)10.1109/DSA51864.2020.00041(234-239)Online publication date: Nov-2020
https://doi.org/10.1109/DSA51864.2020.00041
Aman ERawat AGiri AGothwal H(2019)Content-Based Image Retrieval : A Comprehensive StudyInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT1952275(1073-1081)Online publication date: 20-Mar-2019
https://doi.org/10.32628/CSEIT1952275
Alonso STorre Díez IZapiraín B(2019)Predictive, Personalized, Preventive and Participatory (4P) Medicine Applied to Telemedicine and eHealth in the LiteratureJournal of Medical Systems10.1007/s10916-019-1279-443:5(1-10)Online publication date: 1-May-2019
https://dl.acm.org/doi/10.1007/s10916-019-1279-4
Hu ZWen YLiu LJiang JHong RWang MYan S(2017)Visual Classification of Furniture StylesACM Transactions on Intelligent Systems and Technology10.1145/30659518:5(1-20)Online publication date: 30-Jun-2017
https://dl.acm.org/doi/10.1145/3065951
Dong YTao DLi X(2015)Nonnegative Multiresolution Representation-Based Texture Image ClassificationACM Transactions on Intelligent Systems and Technology10.1145/27380507:1(1-21)Online publication date: 7-Oct-2015
https://dl.acm.org/doi/10.1145/2738050
Wan JWang DHoi SWu PZhu JZhang YLi JHua KRui YSteinmetz RHanjalic ANatsev AZhu W(2014)Deep Learning for Content-Based Image RetrievalProceedings of the 22nd ACM international conference on Multimedia10.1145/2647868.2654948(157-166)Online publication date: 3-Nov-2014
https://dl.acm.org/doi/10.1145/2647868.2654948

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents