Abstract
On the network, a large amount of multi-modal data has emerged. Efficiently utilizing such data to conduct cross modal retrieval has become a hot topic of research. Some solutions have been proposed for this problem. However, many of these methods only considered the local structural information, thus losing sight of the global structural information of data. To overcome this problem and enhance retrieval accuracy, we propose a multi-modal graph regularization based class center discriminant analysis for cross modal retrieval. The core of our method is to maximize the intra-modality distance and minimize the inter-modality distance of class center samples to strengthen the discriminant ability of the model. Meanwhile, a multi-modal graph, which consists of the inter-modality similarity graph, the class center intra-modality graph and the inter-modality graph, is fused into the method to further reinforce the semantic similarity between different modalities. The method considers the local structural information of data together with the global structural information of data. Experimental results on three benchmark datasets demonstrate the superiority of this proposed scheme over several state-of-the-art methods.
Similar content being viewed by others
References
Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664
Huang X, Peng Y (2018) Deep Cross-media Knowledge Transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8837–8846
Ji Z, Pang Y, Li X (2015) Relevance preserving projection and ranking for web image search reranking. IEEE Trans Image Process Publ IEEE Signal Process Soc 24(11):4137–47
Ji Z, Yu Y, Pang Y (2017) Manifold regularized cross-modal embedding for zero-shot learning. Inf Sci 378:48–58
Ji Z, Li S, Pang Y (2018) Fusion-attention network for person search with free-form natural language. Pattern Recogn Lett 116:205–211
Ji Z, Sun Y, Yu Y (2019) Attribute-Guided Network for Cross-Modal Zero-Shot Hashing. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2019.2904991
Jian M, Lam KM, Dong J (2014) Facial-feature detection and localization based on a hierarchical scheme. Inf Sci 262:1–14
Jian M, Lam KM, Dong J (2015) Visual-patch-attention-aware saliency detection. IEEE Trans Cybern 45(8):1575–1586
Jian M, Qi Q, Dong J (2017) Saliency detection using quaternionic distance based weber local descriptor and level priors. Multimed Tools Appl 77(11):14343–14360
Jian M, Qi Q, Dong J (2018) Integrating qdwd with pattern distinctness and local contrast for underwater saliency detection. J Vis Commun Image Represent 53:31–41
Jian M, Yin Y, Dong J (2018) Content-based image retrieval via a hierarchical-local-feature extraction scheme. Multimedia Tools and Applications 77 (8):1–19
Kang WC, Li WJ, Zhou ZH (2016) Column Sampling Based Discrete Supervised Hashing in AAAI, pp 1230-1236
Lan X, Ma AJ, Yuen PC (2014) Multi-cue Visual Tracking Using Robust Feature-Level Fusion Based on Joint Sparse Representation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1194–1201
Lan X, Ma AJ, Yuen PC (2015) Joint sparse representation and robust Feature-Level fusion for Multi-Cue visual tracking. IEEE Trans Image Process 24 (12):5826–5841
Lan X, Zhang S, Yuen PC (2016) Robust joint discriminative feature learning for visual tracking. International Joint Conference on Artificial Intelligence, pp 3403–3410
Lan X, Ye M, Zhang S (2018) Modality-Correlation-Aware Sparse Representation for RGB-Infrared Object Tracking. Pattern Recognition Letters. https://doi.org/10.1016/j.patrec.2018.10.002
Lan X, Ye M, Zhang S (2018) Robust collaborative discriminative learning for rgb-infrared tracking. Thirty-Second AAAI Conference on Artificial Intelligence, pp 7008–7015
Lan X, Zhang S, Yuen PC (2018) Learning common and Feature-Specific patterns: a novel Multiple-Sparse-Representation-Based tracker. IEEE Trans Image Process 27(4):2022–2037
Lan X, Ye M, Shao R (2019) Learning Modality-Consistency Feature Templates: A Robust RGB-Infrared Tracking System. IEEE Transactions on Industrial Electronics. https://doi.org/10.1109/TIE.2019.2898618
Li D, Dimitrova N, Li M, Sethi IK (2003) Multimedia content processing through cross-modal association. In: Proceedings of the eleventh ACM international conference on Multimedia, pp 604–611
Liu W, Wang J, Ji R, Jiang YG, Chang SF (2012) Supervised hashing with kernels. In: Computer Vision and Pattern Recognition (CVPR), pp 2074–2081
Peng Y, Qi J, Yuan Y (2017) CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning. arXiv:1710.05106
Peng Y, Qi J, Huang X, Yuan Y (2018) Ccl: Cross-modal correlation learning with multigrained fusion by hierarchical network. IEEE Trans Multimed 20 (2):405–420
Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on Multimedia, pp 251–260
Sharma A, Jacobs DW (2011) Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch. In: Computer Vision and Pattern Recognition (CVPR), pp 593–600
Sharma A, Kumar A, Daume H, Jacobs DW (2012) Generalized multiview analysis: a discriminative latent space. In: Computer Vision and Pattern Recognition (CVPR), pp 2160–2167
Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp 785–796
Wang J, Kumar S, Chang SF (2012) Semi-supervised hashing for large-scale search. IEEE Trans Pattern Anal Mach Intell 34(12):2393–2406
Wang K, He R, Wang W, Wang L, Tan T (2013) Learning coupled feature spaces for cross-modal matching. In: Computer Vision (ICCV). IEEE International Conference, pp 2088–2095
Wang K, He R, Wang L, Wang W, Tan T (2016) Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 38(10):2010–2023
Wang K, Yin Q, Wang W, Wu S, Wang L (2016) A comprehensive survey on cross-modal retrieval. arXiv:1607.06215
Wei Y, Zhao Y, Zhu Z, Wei S, Xiao Y, Feng J, Yan S (2016) Modality-dependent cross-media retrieval. ACM Trans Intell Syst Technol 7(4):57
Wei Y, Zhao Y, Lu C, Wei S, Liu L, Zhu Z, Yan S (2017) Cross-modal retrieval with cnn visual features: a new baseline. IEEE Trans Cybern 47 (2):449–460
Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Advances in neural information processing systems, pp 1753–1760
Wu J, Lin Z, Zha H (2017) Joint latent subspace learning and regression for Cross-Modal retrieval. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 917–920
Wu Y, Wang S, Zhang W, Huang Q (2017) Online low-rank similarity function learning with adaptive relative margin for cross-modal retrieval. IEEE International Conference on Multimedia and Expo, pp 823–828
Yan J, Zhang H, Sun J, Wang Q, Guo P, Meng L, Dong X (2018) Joint graph regularization based modality-dependent cross-media retrieval. Multimed Tools Appl 77(3):3009–3027
Zhai X, Peng Y, Xiao J (2013) Heterogeneous metric learning with joint graph regularization for cross-media retrieval. Twenty-Seventh AAAI Conference on Artificial Intelligence, pp 1198–1204
Zhai X, Peng Y, Xiao J (2014) Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans Circ Syst Video Technol 24(6):965–978
Zhang H, Lu J (2009) Semi-supervised fuzzy clustering: a kernel-based approach. Knowl-Based Syst 22(6):477–481
Zhang H, Lu J (2010) Creating ensembles of classifiers via fuzzy clustering and deflection. Fuzzy Sets Syst 161(13):1790–1802
Zhang H, Cao L (2014) A spectral clustering based ensemble pruning approach. Neurocomputing 139:289–297
Zhang H, Cao L, Gao S (2014) A locality correlation preserving support vector machine. Pattern Recogn 47(9):3168–3178
Zhang L, Ma B, Li G, Huang Q, Tian Q (2016) Pl-ranking: a novel ranking method for cross-modal retrieval. In: Proceedings of the 2016 ACM on Multimedia Conference, pp 1355–1364
Zhang L, Ma B, Li G, Huang Q, Tian Q (2017) Cross-modal retrieval using multiordered discriminative structured subspace learning. IEEE Trans Multimed 19(6):1220–1233
Zhang L, Ma B, Li G, Huang Q, Tian Q (2018) Generalized semi-supervised and structured subspace learning for cross-modal retrieval. IEEE Trans Multimed 20(1):128–141
Zhang M, Zhang H, Li J (2018) Supervised graph regularization based cross media retrieval with intra and inter-class correlation. Journal of Visual Communication and Image Representation. https://doi.org/10.1016/j.jvcir.2018.11.025
Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on Research and development in information retrieval, pp 415–424
Zhu L, Shen J, Liu X, Xie L, Nie L (2016) Learning compact visual representation with canonical views for robust mobile landmark search. International Joint Conferences on Artificial Intelligence, pp 3959–3965
Zhu L, Huang Z, Liu X, He X, Sun J, Zhou X (2017) Discrete multimodal hashing with canonical views for robust mobile landmark search. IEEE Trans Multimed 19(9):2066–2079
Acknowledgments
The work is partially supported by the National Natural Science Foundation of China (Nos.61572298, 61772322, U1836216), the Key Research and Development Foundation of Shandong Province (Nos.2017GGX10117, 2017CXGC0703), and the Natural Science Foundation of Shandong China (No.ZR2015PF006).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, M., Zhang, H., Li, J. et al. Multi-modal graph regularization based class center discriminant analysis for cross modal retrieval. Multimed Tools Appl 78, 28285–28307 (2019). https://doi.org/10.1007/s11042-019-07909-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-07909-2