Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2733373.2806304acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Spatio-Temporal Triangular-Chain CRF for Activity Recognition

Published: 13 October 2015 Publication History

Abstract

Understanding human activities in video is a fundamental problem in computer vision. In real life, human activities are composed of temporal and spatial arrangement of actions. Understanding such complex activities requires recognizing not only each individual action, but more importantly, capturing their spatio-temporal relationships. This paper addresses the problem of complex activity recognition with a unified hierarchical model. We expand triangular-chain CRFs (TriCRFs) to the spatial dimension. The proposed architecture can be perceived as a spatio-temporal version of the TriCRFs, in which the labels of actions and activity are modeled jointly and their complex dependencies are exploited. Experiments show that our model generates promising results, outperforming competing methods significantly. The framework also can be applied to model other structured sequential data.

References

[1]
J. Besag. Statistical analysis of non-lattice data. Journal of the Royal Statistical Society. Series D (The Statistician), 24(3):pp. 179--195, 1975.
[2]
C. Chen, Y. Zhuang, F. Nie, Y. Yang, F. Wu, and J. Xiao. Learning a 3d human pose distance metric from geometric pose descriptor. IEEE Trans. Vis. Comput. Graph., 17(11):1676--1689, 2011.
[3]
H. Chu, W. Lin, J. Wu, X. Zhou, Y. Chen, and H. Li. A new heat-map-based algorithm for human group activity recognition. In Proceedings of the 20th ACM Multimedia Conference, October 29 - November 02, 2012, pages 1069--1072. ACM, 2012.
[4]
A. Fathi and J. M. Rehg. Modeling actions through state changes. In CVPR 2013, June 23-28, 2013, pages 2579--2586. IEEE, 2013.
[5]
M. Jeong and G. G. Lee. Triangular-chain conditional random fields. IEEE Transactions on Audio, Speech & Language Processing, 16(7):1287--1302, 2008.
[6]
M. Jeong and G. G. Lee. Multi-domain spoken language understanding with transfer learning. Speech Communication, 51(5):412--424, 2009.
[7]
H. Kuehne, A. B. Arslan, and T. Serre. The language of actions: Recovering the syntax and semantics of goal-directed human activities. In CVPR 2014, June 23-28, 2014, pages 780--787. IEEE, 2014.
[8]
J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML 2001, June 28 - July 1, 2001, pages 282--289. Morgan Kaufmann, 2001.
[9]
I. Lillo, A. Soto, and J. C. Niebles. Discriminative hierarchical modeling of spatio-temporally composable human activities. In CVPR 2014, June 23-28, 2014, pages 812--819, 2014.
[10]
J. Shotton, A. W. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. Real-time human pose recognition in parts from single depth images. In CVPR 2011, 20-25 June 2011, pages 1297--1304. IEEE Computer Society, 2011.
[11]
C. Sminchisescu, A. Kanaujia, and D. N. Metaxas. Conditional models for contextual human motion recognition. Computer Vision and Image Understanding, 104(2--3):210--220, 2006.
[12]
D. Tran, L. D. Bourdev, R. Fergus, L. Torresani, and M. Paluri. C3D: generic features for video analysis. CoRR, abs/1412.0767, 2014.
[13]
N. N. Vo and A. F. Bobick. From stochastic grammar to bayes network: Probabilistic parsing of complex activity. In CVPR 2014, June 23-28, 2014, pages 2641--2648. IEEE, 2014.
[14]
P. Xu and R. Sarikaya. Convolutional neural network based triangular CRF for joint intent detection and slot filling. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, December 8-12, 2013, pages 78--83. IEEE, 2013.
[15]
Z. Xu, Y. Yang, and A. G. Hauptmann. A discriminative CNN video representation for event detection. CoRR, abs/1411.4006, 2014.
[16]
Y. Zhang, Y. Zhang, E. Swears, N. Larios, Z. Wang, and Q. Ji. Modeling temporal interactions with interval temporal bayesian networks for complex activity recognition. IEEE Trans. Pattern Anal. Mach. Intell., 35(10):2468--2483, 2013.

Cited By

View all
  • (2021)Learnable Higher-Order Representation for Action Recognition2020 25th International Conference on Pattern Recognition (ICPR)10.1109/ICPR48806.2021.9412963(9038-9045)Online publication date: 10-Jan-2021
  • (2018)Mining Semantics-Preserving Attention for Group Activity RecognitionProceedings of the 26th ACM international conference on Multimedia10.1145/3240508.3240576(1283-1291)Online publication date: 15-Oct-2018
  • (2016)Action recognition with joints-pooled 3D deep convolutional descriptorsProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3061053.3061086(3324-3330)Online publication date: 9-Jul-2016
  • Show More Cited By

Index Terms

  1. Spatio-Temporal Triangular-Chain CRF for Activity Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '15: Proceedings of the 23rd ACM international conference on Multimedia
    October 2015
    1402 pages
    ISBN:9781450334594
    DOI:10.1145/2733373
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 October 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. activity recognition
    2. hierarchical model
    3. joint learning
    4. spatio-temporal dependencies
    5. triangular-chain crfs

    Qualifiers

    • Short-paper

    Funding Sources

    Conference

    MM '15
    Sponsor:
    MM '15: ACM Multimedia Conference
    October 26 - 30, 2015
    Brisbane, Australia

    Acceptance Rates

    MM '15 Paper Acceptance Rate 56 of 252 submissions, 22%;
    Overall Acceptance Rate 995 of 4,171 submissions, 24%

    Upcoming Conference

    MM '24
    The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Learnable Higher-Order Representation for Action Recognition2020 25th International Conference on Pattern Recognition (ICPR)10.1109/ICPR48806.2021.9412963(9038-9045)Online publication date: 10-Jan-2021
    • (2018)Mining Semantics-Preserving Attention for Group Activity RecognitionProceedings of the 26th ACM international conference on Multimedia10.1145/3240508.3240576(1283-1291)Online publication date: 15-Oct-2018
    • (2016)Action recognition with joints-pooled 3D deep convolutional descriptorsProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3061053.3061086(3324-3330)Online publication date: 9-Jul-2016
    • (2016)A Hierarchical Pose-Based Approach to Complex Action Understanding Using Dictionaries of Actionlets and Motion Poselets2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR.2016.218(1981-1990)Online publication date: Jun-2016

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media