Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Joint Transferable Dictionary Learning and View Adaptation for Multi-view Human Action Recognition

Published: 05 March 2021 Publication History

Abstract

Multi-view human action recognition remains a challenging problem due to large view changes. In this article, we propose a transfer learning-based framework called transferable dictionary learning and view adaptation (TDVA) model for multi-view human action recognition. In the transferable dictionary learning phase, TDVA learns a set of view-specific transferable dictionaries enabling the same actions from different views to share the same sparse representations, which can transfer features of actions from different views to an intermediate domain. In the view adaptation phase, TDVA comprehensively analyzes global, local, and individual characteristics of samples, and jointly learns balanced distribution adaptation, locality preservation, and discrimination preservation, aiming at transferring sparse features of actions of different views from the intermediate domain to a common domain. In other words, TDVA progressively bridges the distribution gap among actions from various views by these two phases. Experimental results on IXMAS, ACT42, and NUCLA action datasets demonstrate that TDVA outperforms state-of-the-art methods.

References

[1]
Michal Aharon, Michael Elad, Alfred Bruckstein, et al. 2006. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing 54, 11 (2006), 4311--4322.
[2]
Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. 2007. Analysis of representations for domain adaptation. In Proceedings of the Advances in Neural Information Processing Systems. 137--144.
[3]
Peter E. Bryant and Thomas Trabasso. 1971. Transitive inferences and memory in young children. Nature 232, 5311 (1971), 456–458.
[4]
Wei Chen, Tianyi Lao, Jing Xia, Xinxin Huang, Biao Zhu, Wanqi Hu, and Huihua Guan. 2016. GameFlow: Narrative visualization of NBA basketball games. IEEE Transactions on Multimedia 18, 11 (2016), 2247--2256.
[5]
Zhongwei Cheng, Lei Qin, Yituo Ye, Qingming Huang, and Qi Tian. 2012. Human daily action analysis with multi-view and color-depth data. In Proceedings of the European Conference on Computer Vision. Springer, 52--61.
[6]
Chhavi Dhiman and Dinesh Kumar Vishwakarma. 2020. View-invariant deep architecture for human action recognition using two-stream motion and shape temporal dynamics. IEEE Transactions on Image Processing 29 (2020), 3835--3844.
[7]
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2625--2634.
[8]
Muhammad Ghifary, David Balduzzi, W. Bastiaan Kleijn, and Mengjie Zhang. 2017. Scatter component analysis: A unified framework for domain adaptation and domain generalization. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 7 (2017), 1414--1430.
[9]
Georgia Gkioxari and Jitendra Malik. 2015. Finding action tubes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 759--768.
[10]
Rui Gong, Wen Li, Yuhua Chen, and Luc Van Gool. 2019. Dlow: Domain flow for adaptation and generalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2477--2486.
[11]
Ankur Gupta, Julieta Martinez, James J. Little, and Robert J. Woodham. 2014. 3D pose from motion for cross-view action recognition via non-linear circulant temporal encoding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2601--2608.
[12]
Tong Hao, Dan Wu, Qian Wang, and Jin-Sheng Sun. 2017. Multi-view representation learning for multi-view action recognition. Journal of Visual Communication and Image Representation 48, C (2017), 453--460.
[13]
Junyuan Hong, Yang Li, and Huanhuan Chen. 2019. Variant grassmann manifolds: A representation augmentation method for action recognition. ACM Transactions on Knowledge Discovery from Data 13, 2 (2019), 23.
[14]
Weiming Hu, Dan Xie, Zhouyu Fu, Wenrong Zeng, and Steve Maybank. 2007. Semantic-based surveillance video retrieval. IEEE Transactions on Image Processing 16, 4 (2007), 1168--1181.
[15]
Chun-Hao Huang, Yi-Ren Yeh, and Yu-Chiang Frank Wang. 2012. Recognizing actions across cameras by exploring the correlated subspace. In Proceedings of the European Conference on Computer Vision. 342--351.
[16]
Imran Junejo, Emilie Dexter, Ivan Laptev, and Patrick Perez. 2011. View-independent action recognition from temporal self-similarities. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (2011), 172--185.
[17]
Imran N. Junejo, Emilie Dexter, Ivan Laptev, and Patrick Pérez. 2008. Cross-view action recognition from temporal self-similarities. In Proceedings of the European Conference on Computer Vision. 293--306.
[18]
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1725--1732.
[19]
Yu Kong, Zhengming Ding, Jun Li, and Yun Fu. 2017. Deeply learned view-invariant features for cross-view action recognition. IEEE Transactions on Image Processing 26, 6 (2017), 3028--3037.
[20]
Adriana Kovashka and Kristen Grauman. 2010. Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2046--2053.
[21]
Binlong Li, Octavia I. Camps, and Mario Sznaier. 2012. Cross-view activity recognition using hankelets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1362--1369.
[22]
Jingjing Li, Ke Lu, Zi Huang, Lei Zhu, and Heng Tao Shen. 2018. Transfer independently together: A generalized framework for domain adaptation. IEEE Transactions on Cybernetics 49, 6 (2018), 2144--2155.
[23]
Jun Li, Tong Zhang, Wei Luo, Jian Yang, Xiao-Tong Yuan, and Jian Zhang. 2017. Sparseness analysis in the pretraining of deep neural networks. IEEE Transactions on Neural Networks and Learning Systems 28, 6 (2017), 1425--1438.
[24]
Ruonan Li and Todd Zickler. 2012. Discriminative virtual views for cross-view action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2855--2862.
[25]
Jingen Liu and Mubarak Shah. 2008. Learning human actions via information maximization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--8.
[26]
Jingen Liu, Mubarak Shah, Benjamin Kuipers, and Silvio Savarese. 2011. Cross-view action recognition via view knowledge transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3209--3216.
[27]
Mengyuan Liu, Hong Liu, and Chen Chen. 2017. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition 68, C (2017), 346--362.
[28]
Yang Liu, Zhaoyang Lu, Jing Li, and Tao Yang. 2019. Hierarchically learned view-invariant representations for cross-view action recognition. IEEE Transactions on Circuits and Systems for Video Technology 29, 8 (2019), 2416--2430.
[29]
Mingsheng Long, Jianmin Wang, Guiguang Ding, Jiaguang Sun, and Philip S. Yu. 2013. Transfer feature learning with joint distribution adaptation. In Proceedings of the IEEE International Conference on Computer Vision. 2200--2207.
[30]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 86 (2008), 2579--2605.
[31]
Vasu Parameswaran and Rama Chellappa. 2005. Human action-recognition using mutual invariants. Computer Vision and Image Understanding 98, 2 (2005), 294--324.
[32]
Vasu Parameswaran and Rama Chellappa. 2006. View invariance for human action recognition. International Journal of Computer Vision 66, 1 (2006), 83--101.
[33]
Hossein Rahmani and Ajmal Mian. 2016. 3D action recognition from novel viewpoints. In Proceedings of the IEEE Conference on IEEE Conference on Computer Vision and Pattern Recognition. 1506--1515.
[34]
Hossein Rahmani, Ajmal Mian, and Mubarak Shah. 2018. Learning a deep model for human action recognition from novel viewpoints. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 3 (2018), 667--681.
[35]
Cen Rao, Alper Yilmaz, and Mubarak Shah. 2002. View-invariant representation and recognition of actions. International Journal of Computer Vision 50, 2 (2002), 203--226.
[36]
Xiangbo Shu, Jinhui Tang, Zechao Li, Hanjiang Lai, Liyan Zhang, and Shuicheng Yan. 2017. Personalized age progression with bi-level aging dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (2017), 905--917.
[37]
Xiangbo Shu, Jinhui Tang, Guojun Qi, Wei Liu, and Jian Yang. 2019. Hierarchical long short-term concurrent memory for human interaction recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 3 (2019), 1110--1118.
[38]
Xiangbo Shu, Jinhui Tang, Guo-Jun Qi, Zechao Li, Yu-Gang Jiang, and Shuicheng Yan. 2016. Image classification with tailored fine-grained dictionaries. IEEE Transactions on Circuits and Systems for Video Technology 28, 2 (2016), 454--467.
[39]
Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Proceedings of the Advances in Neural Information Processing Systems. 568--576.
[40]
Wanchen Sui, Xinxiao Wu, Yang Feng, and Yunde Jia. 2016. Heterogeneous discriminant analysis for cross-view action recognition. Neurocomputing 191, C (2016), 286--295.
[41]
Jeremias Sulam, Vardan Papyan, Yaniv Romano, and Michael Elad. 2018. Multilayer convolutional sparse modeling: Pursuit and dictionary learning. IEEE Transactions on Signal Processing 66, 15 (2018), 4090--4104.
[42]
Bin Sun, Dehui Kong, Shaofan Wang, Lichun Wang, Yuping Wang, and Baocai Yin. 2019. Effective human action recognition using global and local offsets of skeleton joints. Multimedia Tools and Applications 78, 5 (2019), 6329--6353.
[43]
Ben Tan, Yu Zhang, Sinno Jialin Pan, and Qiang Yang. 2017. Distant domain transfer learning. In Proceedings of the AAAI Conference on Artificial Intelligence. 2604--2610.
[44]
Jinhui Tang, Xiangbo Shu, Rui Yan, and Liyan Zhang. 2019. Coherence constrained graph lstm for group activity recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019). Early access.
[45]
Joel A. Tropp and Anna C. Gilbert. 2007. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory 53, 12 (2007), 4655--4666.
[46]
Heng Wang and Cordelia Schmid. 2013. Action recognition with improved trajectories. In Proceedings of the IEEE International Conference on Computer Vision. 3551--3558.
[47]
Jindong Wang, Wenjie Feng, Yiqiang Chen, Han Yu, Meiyu Huang, and Philip S. Yu. 2018. Visual domain adaptation with manifold embedded distribution alignment. In Proceedings of the ACM International Conference on Multimedia. 402--410.
[48]
Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, and Song-Chun Zhu. 2014. Cross-view action modeling, learning and recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2649--2656.
[49]
Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas Huang, and Yihong Gong. 2010. Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3360--3367.
[50]
Jing Wang, Huicheng Zheng, Jinyu Gao, and Jiepeng Cen. 2016b. Cross-view action recognition based on a statistical translation framework. IEEE Transactions on Circuits and Systems for Video Technology 26, 8 (2016), 1461--1475.
[51]
Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016a. Temporal segment networks: Towards good practices for deep action recognition. In Proceedings of the European Conference on Computer Vision. 20--36.
[52]
Xiaogang Wang. 2013. Intelligent multi-camera video surveillance: A review. Pattern Recognition Letters 34, 1 (2013), 3--19.
[53]
Daniel Weinland, Edmond Boyer, and Remi Ronfard. 2007. Action recognition from arbitrary views using 3D exemplars. In Proceedings of the IEEE International Conference on Computer Vision. 1--7.
[54]
Daniel Weinland, Mustafa Özuysal, and Pascal Fua. 2010. Making action recognition robust to occlusions and viewpoint changes. In Proceedings of the European Conference on Computer Vision. 635--648.
[55]
Daniel Weinland, Remi Ronfard, and Edmond Boyer. 2006. Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding 104, 2–3 (2006), 249--257.
[56]
Pingkun Yan, Saad M. Khan, and Mubarak Shah. 2008. Learning 4D action feature models for arbitrary view action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--7.
[57]
Rui Yan, Jinhui Tang, Xiangbo Shu, Zechao Li, and Qi Tian. 2018a. Participation-contributed temporal dynamic model for group activity recognition. In Proceedings of the 26th ACM International Conference on Multimedia. 1292--1300.
[58]
Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018b. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence.
[59]
Shuicheng Yan, Dong Xu, Benyu Zhang, Hong-Jiang Zhang, Qiang Yang, and Stephen Lin. 2006. Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 1 (2006), 40--51.
[60]
Yan Yan, Elisa Ricci, Ramanathan Subramanian, Gaowen Liu, and Nicu Sebe. 2014. Multitask linear discriminant analysis for view invariant action recognition. IEEE Transactions on Image Processing 23, 12 (2014), 5599--5611.
[61]
Jun Ye, Hao Hu, Guo-Jun Qi, and Kien A. Hua. 2017. A temporal order modeling approach to human action recognition from multimodal sensor data. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 2 (2017), 1--22.
[62]
Jun Ye, Kai Li, Guo-Jun Qi, and Kien A, Hua. 2015. Temporal order-preserving dynamic quantization for human action recognition from multimodal sensor streams. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. 99--106.
[63]
Jun Ye, Guo-Jun Qi, Naifan Zhuang, Hao Hu, and Kien A. Hua. 2018. Learning compact features for human activity recognition via probabilistic first-take-all. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 1 (2018), 126--139.
[64]
Jing Zhang, Wanqing Li, and Philip Ogunbona. 2017. Joint geometrical and statistical alignment for visual domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1859--1867.
[65]
Jingtian Zhang, Hubert P.H. Shum, Jungong Han, and Ling Shao. 2018. Action recognition from arbitrary views using transferable dictionary learning. IEEE Transactions on Image Processing 27, 10 (2018), 4709--4723.
[66]
Jingtian Zhang, Lining Zhang, Hubert P. H. Shum, and Ling Shao. 2016b. Arbitrary view action recognition via transfer dictionary learning on synthetic training data. In Proceedings of the IEEE International Conference on Robotics and Automation. 1678--1684.
[67]
Wenhao Zhang, Melvyn L. Smith, Lyndon N. Smith, and Abdul Farooq. 2016a. Gender and gaze gesture recognition for human-computer interaction. Computer Vision and Image Understanding 149, C (2016), 32--50.
[68]
Zhong Zhang, Chunheng Wang, Baihua Xiao, Wen Zhou, Shuang Liu, and Cunzhao Shi. 2013. Cross-view action recognition via a continuous virtual path. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2690--2697.
[69]
Jingjing Zheng and Zhuolin Jiang. 2013. Learning view-invariant sparse representations for cross-view action recognition. In Proceedings of the IEEE International Conference on Computer Vision. 3176--3183.
[70]
Jingjing Zheng, Zhuolin Jiang, and Rama Chellappa. 2016. Cross-view action recognition via transferable dictionary learning. IEEE Transactions on Image Processing 25, 6 (2016), 2542--2556.
[71]
Jingjing Zheng, Zhuolin Jiang, P. Jonathon Phillips, and Rama Chellappa. 2012. Cross-view action recognition via a transferable dictionary pair. In Proceedings of the British Machine Vision Conference. 1--11.
[72]
Fan Zhu and Ling Shao. 2014. Weakly-supervised cross-domain dictionary learning for visual recognition. International Journal of Computer Vision 109, 1–2 (2014), 42--59.

Cited By

View all
  • (2024)Application of 3D recognition algorithm based on spatio-temporal graph convolutional network in basketball pose estimationInternational Journal for Simulation and Multidisciplinary Design Optimization10.1051/smdo/202400415(9)Online publication date: 12-Apr-2024
  • (2024)A Study on Vision-Based Human Activity Recognition ApproachesModeling, Simulation and Optimization10.1007/978-981-99-6866-4_17(235-248)Online publication date: 20-Feb-2024
  • (2023)Dictionary-Based Multi-View Learning With Privileged InformationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.324760034:5(3523-3537)Online publication date: 22-Feb-2023
  • Show More Cited By

Index Terms

  1. Joint Transferable Dictionary Learning and View Adaptation for Multi-view Human Action Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 15, Issue 2
    Survey Paper and Regular Papers
    April 2021
    524 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3446665
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 March 2021
    Accepted: 01 November 2020
    Revised: 01 September 2020
    Received: 01 September 2019
    Published in TKDD Volume 15, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Action recognition
    2. multi-view
    3. sparse representation
    4. transfer learning

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Beijing Outstanding Young Scientists Projects
    • National Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Application of 3D recognition algorithm based on spatio-temporal graph convolutional network in basketball pose estimationInternational Journal for Simulation and Multidisciplinary Design Optimization10.1051/smdo/202400415(9)Online publication date: 12-Apr-2024
    • (2024)A Study on Vision-Based Human Activity Recognition ApproachesModeling, Simulation and Optimization10.1007/978-981-99-6866-4_17(235-248)Online publication date: 20-Feb-2024
    • (2023)Dictionary-Based Multi-View Learning With Privileged InformationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.324760034:5(3523-3537)Online publication date: 22-Feb-2023
    • (2023)Unsupervised video segmentation for multi-view daily action recognitionImage and Vision Computing10.1016/j.imavis.2023.104687134(104687)Online publication date: Jun-2023
    • (2022)In-Home Older Adults’ Activity Pattern Monitoring Using Depth Sensors: A ReviewSensors10.3390/s2223906722:23(9067)Online publication date: 23-Nov-2022
    • (2022)GAN for vision, KG for relation: A two-stage network for zero-shot action recognitionPattern Recognition10.1016/j.patcog.2022.108563126(108563)Online publication date: Jun-2022
    • (2022)Task-driven joint dictionary learning model for multi-view human action recognitionDigital Signal Processing10.1016/j.dsp.2022.103487126(103487)Online publication date: Jun-2022
    • (2022)Multi-sensor human activity recognition using CNN and GRUInternational Journal of Multimedia Information Retrieval10.1007/s13735-022-00234-911:2(135-147)Online publication date: 19-Apr-2022

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media