Abstract
Many vision tasks require a multi-class classifier to discriminate multiple categories, on the order of hundreds or thousands. In this paper, we propose sparse output coding, a principled way for large-scale multi-class classification, by turning high-cardinality multi-class categorization into a bit-by-bit decoding problem. Specifically, sparse output coding is composed of two steps: efficient coding matrix learning with scalability to thousands of classes, and probabilistic decoding. Empirical results on object recognition and scene classification demonstrate the effectiveness of our proposed approach.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Allwein, E., Schapire, R., & Singer, Y. (2001). Reducing multiclass to binary: A unifying approach for margin classifiers. The Journal of Machine Learning Research, 1, 113–141.
Bakker, B., & Heskes, T. (2003). Task clustering and gating for bayesian multitask learning. The Journal of Machine Learning Research, 4, 83–99.
Bengio, S., Weston, J., & Grangier, D. (2010). Label embedding trees for large multi-class tasks. In Advances in Neural Information Processing Systems, pp. 163–171.
Bergamo, A., & Torresani, L. (2012). Meta-class features for large-scale object categorization on a budget. In Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR ’12).
Beygelzimer, A., Langford, J., Lifshits, Y., Sorkin, G., & Strehl, A. (2009). Conditional probability tree estimation analysis and algorithms. In Conference in Uncertainty in Artificial Intelligence (UAI).
Beygelzimer, A., Langford, J., & Ravikumar, P. (2009). Error-correcting tournaments. In International conference on algorithmic learning theory (ALT).
Binder, A., Mller, K. -R., & Kawanabe, M. (2011). On taxonomies for multi-class image categorization. International Journal of Computer Vision, 1–21.
Boiman, O., Shechtman, E., & Irani, M. (2008). In defense of nearest-neighbor based image classification. In IEEE conference on computer vision and pattern recognition (CVPR).
Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In COMPSTAT.
Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundation and Trends in Machine Learning, 3(1), 1–122.
Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 32, 13–47.
Cai, L., & Hofmann, T. (2004). Hierarchical document categorization with support vector machines. In CIKM.
Crammer, K., & Singer, Y. (2002). On the learnability and design of output codes for multiclass problems. Machine Learning, 2, 265–292.
Dekel, O., Keshet, J., & Singer, Y. (2004). Large margin hierarchical classification. In ICML.
Deng, J., Berg, A., & Fei-Fei, L. (2011). Hierarchical semantic indexing for large scale image retrieval. In CVPR.
Deng, J., Dong, W., Socher, R., Li, L. -J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In IEEE Computer Vision and Pattern Recognition (CVPR).
Deng, J., Satheesh, S., Berg, A., & Fei-Fei, L. (2011). Fast and balanced: Efficient label tree learning for large scale object recognition. In NIPS.
Dietterich, T., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.
Eckstein, J., & Bertsekas, D. (1992). On the douglas-rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55(1), 293–318.
Escalera, S., Pujol, O., & Radeva, P. (2010). On the decoding process in ternary error-correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), 120–134.
Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In CVPR Workshop on Generative-Model Based Vision.
Fergus, R., Bernal, H., Weiss, Y., & Torralba, A. (2010). Semantic label sharing for learning with many categories. In ECCV. Berlin: Springer.
Gabay, D., & Mercier, B. (1976). A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers and Mathematics with Applications, 2(1), 17–40.
Gao, T., & Koller, D. (2011). Discriminative learning of relaxed hierarchy for large-scale visual recognition. In International Conference on Computer Vision (ICCV).
Gao, T., & Koller, D. (2011). Multiclass boosting with hinge loss based on output coding. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) .
Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology.
Haussler, D. (1999). Convolution kernels on discrete structures. Technical report.
Hsu, D., Kakade, S., Langford, J., & Zhang, T. (2009). Multi-label prediction via compressed sensing. In Proceedings of NIPS.
Jacob, L., Bach, F., & Vert, J. -P. (2008). Clustered multi-task learning: A convex formulation. In Advances in Neural Information Processing Systems NIPS.
Koller, D., & Sahami, M. (1997). Hierarchically classifying docuemnts using very few words. In ICML.
Kosmopoulos, A., Gaussier, E., Paliouras, G., & Aseervatham, S. (2010). The ecir 2010 large scale hierarchical classification workshop. SIGIR Forum, 44(1), 23–32.
Kumar, N., Berg, A., Belhumeur, P., & Nayar, S. (2009). Attribute and simile classifiers for face verification. In 2009 IEEE 12th International Conference on Computer Vision (ICCV).
Lampert, C., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Le, Q., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G., Dean, J., & Ng, A. (2012). Building high-level features using large scale unsupervised learning. In ICML.
Li, L., Su, H., Xing, E., & Fei-Fei, L. (2010). Object bank: A highlevel image representation for scene classification and semantic feature sparsification. In Proceedings of NIPS.
Lin, Y., Lv, F., Zhu, S., Yang, M., Cour, T., Yu, K., Cao, L., & Huang. T. (2011). Large-scale image classification: fast feature extraction and svm training. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1689–1696.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.
Nilsson, N. (1965). Learning Machines. New York: McGraw-Hill.
Parsana, M., Bhattacharya, S., Bhattacharyya, C., & Ramakrishnan, K. (2007). Kernels on attributed pointsets with applications. In Advances in Neural Information Processing Systems (NIPS).
Passerini, A., Pontil, M., & Frasconi, P. (2004). New results on error correcting output codes of kernel machines. IEEE Transactions on Neural Networks, 15(1), 45–54.
Patterson, G., & Hays, J. (2012). Sun attribute database: Discovering, annotating, and recognizing scene attributes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Póczos, B., Xiong, L., & Schneider, J. (2011). Nonparametric divergence estimation with applications to machine learning on distributions. In UAI.
Pujol, O., Radeva, P., & Vitria, J. (2006). Discriminant ecoc: A heuristic method for application dependent design of error correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(6), 1001–1007.
Rastegari, M., Farhadi, A., & Forsyth, D. (2012). Attribute discovery via predictable discriminative binary codes. In Computer Vision (ECCV). Berlin: Springer.
Rifkin, R., & Klautau, A. (2004). In defense of one-vs-all classification. The Journal of Machine Learning Research, 5, 101–141.
Russell, B., Torralba, A., Murphy, K., & Freeman, W. (2008). Labelme: A database and web-based tool for image annotation. International Journal of Computer Vision, 77, 157–173.
Sanchez, Jorge, Perronnin, Florent, Mensink, Thomas, & Verbeek, Jakob. (2013). Image classification with the Fisher vector: Theory and practice. International Journal of Computer Vision, 105(3), 222–245.
Schapire, R. (1997). Using output codes to boost multiclass learing problems. In ICML .
Schapire, R., & Freund, Y. (2012). Boosting: Foundations and algorithms., Adaptive computation and machine learning series Cambridge: MIT Press.
Torralba, A., Fergus, R., & Freeman, W. (2008). 80 Million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1958–1970.
Torresani, L., Szummer, M., & Fitzgibbon, A. (2010). Efficient object category recognition using classemes. In Computer Vision (ECCV).
Wang, G., Hoiem, D., & Forsyth, D. (2009). Learning image similarity from flickr using stochastic intersection kernel machines. In IEEE 12th International Conference on Computer Vision (ICCV).
Weinberger, K., & Chapelle, O. (2008). Large margin taxonomy embedding for document categorization. In Advances in Neural Information Processing Systems (NIPS).
Wen, Z., & Yin, W. (2012). A feasible method for optimization with orthogonality constraints. Mathematical Programming, pp. 1–38.
Weston, J., Bengio, S., & Usunier, N. (2011). Wsabie: scaling up to large vocabulary image annotation. In IJCAI.
Xiao, J., Hays, J., Ehinger, K., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Zhang, X., Liang, L., & Shum, H. (2009). Spectral error correcting output codes for efficient multiclass recognition. In 12th International Conference on Computer Vision (ICCV).
Zhang, Y., & Schneider, J. (2012). Maximum margin output coding. In ICML.
Zhao, B., & Xing, E. (2013). Sparse output coding for large-scale visual recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Zhou, D., Xiao, L., & Wu, M. (2011). Hierarchical classification via orthogonal transfer. In Proceedings of the 28th International Conference on Machine Learning (ICML).
Zhu, X., Ghahramani, Z., & Lafferty, J. (2003). Semi-supervised learning using gaussian fields and harmonic functions. In ICML.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Antonio Torralba and Alexei Efros.
Rights and permissions
About this article
Cite this article
Zhao, B., Xing, E.P. Sparse Output Coding for Scalable Visual Recognition. Int J Comput Vis 119, 60–75 (2016). https://doi.org/10.1007/s11263-015-0839-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-015-0839-4