Abstract
Human action recognition is an essential area of research in the field of computer vision. However, existing methods ignore the essence of infrared image spectral imaging. Compared with the visible modality with all three channels, the infrared modality with approximate single-channel pays more attention to the lightness contrast and loses the channel information. Therefore, we explore channel duplication and tend to investigate more appropriate feature presentations. We propose a subspace enhancement and colorization network (S\(^2\)ECNet) to recognize infrared video action recognition. Specifically, we apply the subspace enhancement (S\(^2\)E) module to promote edge contour extraction with subspace. Meanwhile, a subspace colorization (S\(^2\)C) module is utilized for better completing missing semantic information. What is more, the optical flow provides effective supplements for temporal information. Experiments conducted on the infrared action recognition dataset InfAR demonstrates the competitiveness of the proposed method compared with the state-of-the-arts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), pp. 568–576 (2014)
Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4489–4497 (2015)
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Fan, Z., Zhao, X., Lin, T., Su, H.: Attention-based multiview re-observation fusion network for skeletal action recognition. IEEE Trans. Multimed. 21(2), 363–374 (2019)
Lee, E.J., Ko, B.C., Nam, J.Y.: Recognizing pedestrian’s unsafe behaviors in far-infrared imagery at night. Infrared Phys. Technol. 76, 261–270 (2016)
Akula, A., Shah, A.K., Ghosh, R.: Deep learning approach for human action recognition in infrared images. Cogn. Syst. Res. 50, 146–154 (2018)
Huang, Z., Wang, Z., Tsai, C.C., Satoh, S., Lin, C.W.: DotSCN: group re-identification via domain-transferred single and couple representation learning. IEEE Trans. Circ. Syst. Video Technol. 31(7), 2739–2750 (2021)
Kansal, K., Subramanyam, A.V., Wang, Z., Satoh, S.: SDL: spectrum-disentangled representation learning for visible-infrared person re-identification. IEEE Trans. Circ. Syst. Video Technol. 30(10), 3422–3432 (2020)
Zhong, X., Lu, T., Huang, W., Ye, M., Jia, X., Lin, C.: Grayscale enhancement colorization network for visible-infrared person re-identification. IEEE Trans. Circ. Syst. Video Technol. (2021). https://doi.org/10.1109/TCSVT.2021.3072171
Gowda, S.N., Yuan, C.: ColorNet: investigating the importance of color spaces for image classification. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11364, pp. 581–596. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20870-7_36
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012)
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6450–6459 (2018)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4733 (2017)
Lin, J., Gan, C., Han, S.: TSM: temporal shift module for efficient video understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7082–7092 (2019)
Wu, H., Ma, X., Li, Y.: Convolutional networks with channel and STIPs attention model for action recognition in videos. IEEE Trans. Multimed. 22(9), 2293–2306 (2020)
Han, J., Bhanu, B.: Fusion of color and infrared video for moving human detection. Pattern Recogn. 40(6), 1771–1784 (2007)
Zhu, Y., Guo, G.: A study on visible to infrared action recognition. IEEE Sig. Process. Lett. 20(9), 897–900 (2013)
Gao, C., et al.: Infar dataset: infrared action recognition at different times. Neurocomputing 212, 36–47 (2016)
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005). https://doi.org/10.1007/s11263-005-1838-7
Kläser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of the BMVA British Machine Vision Conference (BMVC), pp. 1–10 (2008)
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the ACM International Conference on Multimedia (ACM MM), pp. 357–360 (2007)
Shi, Y., Tian, Y., Wang, Y., Zeng, W., Huang, T.: Learning long-term dependencies for action recognition with a biologically-inspired deep network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 716–725 (2017)
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3304–3311 (2010)
Tsai, D.-M., Chiu, W.-Y., Lee, M.-H.: Optical flow-motion history image (OF-MHI) for action recognition. SIViP 9(8), 1897–1906 (2014). https://doi.org/10.1007/s11760-014-0677-9
Hilsenbeck, B., Münch, D., Grosselfinger, A., Hübner, W., Arens, M.: Action recognition in the longwave infrared and the visible spectrum using hough forests. In: Proceedings of the IEEE International Symposium on Multimedia (ISM), pp. 329–332 (2016)
Jiang, Z., Rozgic, V., Adali, S.: Learning spatiotemporal features for infrared action recognition with 3D convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR), pp. 309–317 (2017)
Wang, L., Gao, C., Yang, L., Zhao, Y., Zuo, W., Meng, D.: PM-GANs: discriminative representation learning for action recognition using partial-modalities. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 389–406. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_24
Wang, L., Gao, C., Zhao, Y., Song, T., Feng, Q.: Infrared and visible image registration using transformer adversarial network. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 1248–1252 (2018)
Ali, S., Bouguila, N.: Variational learning of beta-liouville hidden Markov models for infrared action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR), pp. 898–906 (2019)
de la Riva, M., Mettes, P.: Bayesian 3D convnets for action recognition from few examples. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCV), pp. 1337–1343 (2019)
Liu, Y., Lu, Z., Li, J., Yang, T., Yao, C.: Global temporal representation based CNNs for infrared action recognition. IEEE Sig. Process. Lett. 25(6), 848–852 (2018)
Imran, J., Raman, B.: Deep residual infrared action recognition by integrating local and global spatio-temporal cues. Infrared Phys. Technol. 102, 103014 (2019)
Chen, X., Gao, C., Li, C., Yang, Y., Meng, D.: Infrared action detection in the dark via cross-stream attention mechanism. IEEE Trans. Multimed. (2021). https://doi.org/10.1109/TMM.2021.3050069
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_40
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 386–397 (2020)
Su, J., Chu, H., Huang, J.: Instance-aware image colorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7965–7974 (2020)
Wang, H., Kläser, A., Schmid, C., Liu, C.: Action recognition by dense trajectories. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176 (2011)
Khebli, A., Meglouli, H., Bentabet, L., Airouche, M.: A new technique based on 3D convolutional neural networks and filtering optical flow maps for action classification in infrared video. Control Eng. Appl. Inform. 21(4), 43–50 (2019)
Acknowledgements
This work was supported in part by the Fundamental Research Funds for the Central Universities of China under Grant 191010001, in part by the Hubei Key Laboratory of Transportation Internet of Things under Grant 2020III026GX.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, L., Zhong, X., Liu, W., Zhao, S., Yang, Z., Zhong, L. (2021). Subspace Enhancement and Colorization Network for Infrared Video Action Recognition. In: Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F. (eds) PRICAI 2021: Trends in Artificial Intelligence. PRICAI 2021. Lecture Notes in Computer Science(), vol 13033. Springer, Cham. https://doi.org/10.1007/978-3-030-89370-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-89370-5_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89369-9
Online ISBN: 978-3-030-89370-5
eBook Packages: Computer ScienceComputer Science (R0)