Abstract
The ever-growing popularity of mobile networks and electronics has prompted intensive research on multimedia data (e.g. text, image, video, audio, etc.) management. This leads to the researches of semi-supervised learning that can incorporate a small number of labeled and a large number of unlabeled data by exploiting the local structure of data distribution. Manifold regularization and pairwise constraints are representative semi-supervised learning methods. In this paper, we introduce a novel local structure preserving approach by considering both manifold regularization and pairwise constraints. Specifically, we construct a new graph Laplacian that takes advantage of pairwise constraints compared with the traditional Laplacian. The proposed graph Laplacian can better preserve the local geometry of data distribution and achieve the effective recognition. Upon this, we build the graph regularized classifiers including support vector machines and kernel least squares as special cases for action recognition. Experimental results on a multimodal human action database (CAS-YNU-MHAD) show that our proposed algorithms outperform the general algorithms.
Similar content being viewed by others
References
Ballan L, Bertini M, Del Bimbo A, Seidenari L, Serra G (2012) Effective codebooks for human action representation and classification in unconstrained videos. IEEE Trans Multimedia 14(4):1234–1245
Bar-Hillel A, Hertz T, Shental N, Weinshall D (2005) Learning a mahalanobis metric from equivalence constraints. J Mach Learn Res 6(6):937–965
Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. Int Conf Neural Inf Proces Syst: Nat and Synth MIT Press 14(6):585–591
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7(1):2399–2434
Bernstein M, De Silva V, Langford JC, Tenenbaum JB (2001) Graph approximations to geodesics on embedded manifolds. Tech Rep, Standard University 24(9):153–158
Cevikalp H, Verbeek J, Jurie F, Klaser A (2008) Semi-supervised dimensionality reduction using pairwise equivalence constraints. Int Conf Comput Vis Theory Appl 1:489–496
Chapelle O, Schölkopf B, Zien A (2006) Semi-supervised learning. MIT Press, Cambridge
Chen C, Jafari R, Kehtarnavaz N (2015) Improving human action recognition using fusion of depth camera and inertial sensors. IEEE Trans Hum-Mach Syst 45(1):51–61
Coyte JL, Stirling D, Haiping D, Ros M (2016) Seated whole-body vibration analysis, technologies, and modeling: a survey. IEEE Trans Syst Man Cybern Syst 46(6):725–739
Ding S, Jia H, Zhang L, Jin F (2014) Research of semi-supervised spectral clustering algorithm based on pairwise constraints. Neural Comput & Applic 24(1):211–219
Donoho DL, Grimes C (2003) Hessian eigenmaps: new locally linear embedding techniques for high-dimensional data. Natl Acad Sci USA 100(10):5591–5596
Gong C, Liu T, Tao D, Keren F, Enmei T, Yang J (2015) Deformed graph Laplacian for semisupervised learning. IEEE Trans Neural Netw Learn Syst 26(10):2261–2274
Guo Y, Tao D, Liu W, Cheng J (2017) Multiview Cauchy estimator feature embedding for depth and inertial sensor-based human action recognition. IEEE Trans Syst Man Cybern Syst 47(4):617–627
Hong C, Yu J, Tao D, Wang M (2015) Image-based three-dimensional human pose recovery by Multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron 62(6):3742–3751
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Hong C, Yu J, You J, Chen X, Tao D (2015) Multi-view ensemble manifold regularization for 3D object recognition. Inf Sci 320:395–405
Huang K, Wang C, Tao D (2015) High-order topology modeling of visual words for image classification. IEEE Trans Image Process 24(11):3598–3608
Jalal A, Uddin MZ, Kim T-S (2012) Depth video-based human activity recognition system using translation and scaling invariant features for life logging at smart home. IEEE Trans Consum Electron 58(3):863–871
Ji X, Zhaojie J, Wang C, Wang C (2015) Multi-view transition HMMs based view-invariant human action recognition method. Multimed Tools Appl 75(19):1–18
Jiang J, Hu R, Wang Z, Cai Z (2016) CDMMA: coupled discriminant multi-manifold analysis for matching low-resolution face images. Signal Process 124:162–172
Khan AM, Lee Y-K, Lee SY, Kim T-S (2010) A triaxial accelerometer-based physical-activity recognition via augmented-signal features and a hierarchical recognizer. IEEE Trans Inf Technol Biomed 14(5):1166–1172
Li L, Dai S (2017) Action recognition with spatio-temporal augmented descriptor and fusion method. Multimed Tools Appl 76(12):13953–13969
Liu T, Tao D (2016) Classification with noisy labels by importance reweighting. IEEE Trans Pattern Anal Mach Intell 38(3):447–461
Liu M, Zhang D (2016) Pairwise constraint-guided sparse learning for feature selection. IEEE Trans Cybern 46(1):298–310
Liu W, Liu H, Tao D, Wang Y, Lu K (2014) Multiview hessian regularized logistic regression for action recognition. Signal Process 110:101–107
Liu A, Yuting S, Jia P, Gao Z, Hao T, Yang Z (2015) Multiple/single-view human action recognition via part-induced multitask structural learning. IEEE Trans Cybern 45(6):1194–1208
Luo Y, Tao D, Ramamohanarao K, Xu C, Wen Y (2015) Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Trans Knowl Data Eng 27(11):3111–3124
Luo Y, Wen Y, Tao D, Gui J, Xu C (2016) Large margin multi-modal multi-task feature extraction for image classification. IEEE Trans Image Process 25(1):414–427
Luo Y, Wen Y, Tao D (2016) On Combining Side Information and Unlabeled Data for Heterogeneous Multi-Task Metric Learning, International Joint Conference on Artificial Intelligence , pp. 1809–1815
Mignon A, Jurie F (2012) PCCA: A new approach for distance learning from sparse pairwise constraints, IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, pp. 2666–2672
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Sang J, Deng Z, Lu D, Xu C (2015) Cross-OSN user modeling by homogeneous behavior quantification and local social regularization. IEEE Trans Multimed 17(12):2259–2270
Schiller H, Chaudhuri BB (1990) Efficient coding of side information in a low bit rate hybrid image coder. Signal Process 19(1):61–73
Seeger M (2000) Learning with labeled and unlabeled data. Technical report. University of Edinburgh, Edinburgh
Tentori M, Favela J (2008) Activity-aware computing for healthcare. IEEE Pervasive Comput 7(2):51–57
Tosato D, Spera M, Cristani M, Murino V (2013) Characterizing humans on riemannian manifolds. IEEE Trans Pattern Anal Mach Intell 35(8):1972–1984
Wagstaff K, Cardie C (2000) Clustering with instance-level constraints, International Conference on Machine Learning DBLP, pp. 1103–1110
Wang M, Ni B, Hua X-S, Chua T-S, (2012) Assistive tagging: a survey of multimedia tagging with human-computer joint exploration. ACM Comput Surv (CSUR) 44(4):25
Xia L, Aggarwa JK (2013) Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera, IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, pp. 2834–2841
Yan M, Sang J, Xu C, Shamim Hossain M (2015) YouTube video promotion by cross-network association: @Britney to advertise Gangnam style. IEEE Trans Multimed 17(8):1248–1261
Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients, Proceedings of the ACM international conference on Multimedia, pp. 1057–1060
Yu J, Rui Y, Tang Y, Tao D (2014) High-order distance based Multiview stochastic learning in image classification. IEEE Trans Cybern 44(12):2431–2442
Zhang D, Zhou Z-H, Chen S, (2007) Semi-supervised dimensionality reduction, Siam International Conference on Data Mining DBLP, 22, pp. 11–393
Zhang D, Chen S, Zhou Z-H (2008) Constraint score: a new filter method for feature selection with pairwise constraints. Pattern Recogn 41(5):1440–1451
Zhang T, Liu S, Xu C, Lu H (2013) Mining semantic context information for intelligent video surveillance of traffic scenes. IEEE Trans Ind Inf 9(1):149–160
Zhang J, Han Y, Tang J, Hu Q, Jiang J (2017) Semi-supervised image-to-video adaptation for video action recognition. IEEE Trans Cybern 47(4):960–973
Zheng J, Jiang Z, Chellappa R (2016) Cross-view action recognition via transferable dictionary learning. IEEE Trans Image Process 25(6):2542–2556
Zhenyong F, Lu Z, Ip HHS, Lu H, Wang Y (2015) Local similarity learning for pairwise constraint propagation. Multimed Tools Appl 74(11):3739–3758
Zhu X (2008) Semi-supervised learning literature survey. Comput Sci 37(1):63–77
Acknowledgements
This paper is partly supported by the National Natural Science Foundation of China (Grant No. 61671480), the Fundamental Research Funds for the Central Universities, China University of Petroleum (East China) (Grant No. 14CX02203A, YCX2017059).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ma, X., Tao, D. & Liu, W. Effective human action recognition by combining manifold regularization and pairwise constraints. Multimed Tools Appl 78, 13313–13329 (2019). https://doi.org/10.1007/s11042-017-5172-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5172-1