short-paper

Spatio-Temporal Triangular-Chain CRF for Activity Recognition

Authors:

Hanqing LuAuthors Info & Claims

MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Pages 1151 - 1154

https://doi.org/10.1145/2733373.2806304

Published: 13 October 2015 Publication History

Abstract

Understanding human activities in video is a fundamental problem in computer vision. In real life, human activities are composed of temporal and spatial arrangement of actions. Understanding such complex activities requires recognizing not only each individual action, but more importantly, capturing their spatio-temporal relationships. This paper addresses the problem of complex activity recognition with a unified hierarchical model. We expand triangular-chain CRFs (TriCRFs) to the spatial dimension. The proposed architecture can be perceived as a spatio-temporal version of the TriCRFs, in which the labels of actions and activity are modeled jointly and their complex dependencies are exploited. Experiments show that our model generates promising results, outperforming competing methods significantly. The framework also can be applied to model other structured sequential data.

References

[1]

J. Besag. Statistical analysis of non-lattice data. Journal of the Royal Statistical Society. Series D (The Statistician), 24(3):pp. 179--195, 1975.

[2]

C. Chen, Y. Zhuang, F. Nie, Y. Yang, F. Wu, and J. Xiao. Learning a 3d human pose distance metric from geometric pose descriptor. IEEE Trans. Vis. Comput. Graph., 17(11):1676--1689, 2011.

Digital Library

[3]

H. Chu, W. Lin, J. Wu, X. Zhou, Y. Chen, and H. Li. A new heat-map-based algorithm for human group activity recognition. In Proceedings of the 20th ACM Multimedia Conference, October 29 - November 02, 2012, pages 1069--1072. ACM, 2012.

Digital Library

[4]

A. Fathi and J. M. Rehg. Modeling actions through state changes. In CVPR 2013, June 23-28, 2013, pages 2579--2586. IEEE, 2013.

Digital Library

[5]

M. Jeong and G. G. Lee. Triangular-chain conditional random fields. IEEE Transactions on Audio, Speech & Language Processing, 16(7):1287--1302, 2008.

Digital Library

[6]

M. Jeong and G. G. Lee. Multi-domain spoken language understanding with transfer learning. Speech Communication, 51(5):412--424, 2009.

Digital Library

[7]

H. Kuehne, A. B. Arslan, and T. Serre. The language of actions: Recovering the syntax and semantics of goal-directed human activities. In CVPR 2014, June 23-28, 2014, pages 780--787. IEEE, 2014.

Digital Library

[8]

J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML 2001, June 28 - July 1, 2001, pages 282--289. Morgan Kaufmann, 2001.

Digital Library

[9]

I. Lillo, A. Soto, and J. C. Niebles. Discriminative hierarchical modeling of spatio-temporally composable human activities. In CVPR 2014, June 23-28, 2014, pages 812--819, 2014.

Digital Library

[10]

J. Shotton, A. W. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. Real-time human pose recognition in parts from single depth images. In CVPR 2011, 20-25 June 2011, pages 1297--1304. IEEE Computer Society, 2011.

Digital Library

[11]

C. Sminchisescu, A. Kanaujia, and D. N. Metaxas. Conditional models for contextual human motion recognition. Computer Vision and Image Understanding, 104(2--3):210--220, 2006.

Digital Library

[12]

D. Tran, L. D. Bourdev, R. Fergus, L. Torresani, and M. Paluri. C3D: generic features for video analysis. CoRR, abs/1412.0767, 2014.

[13]

N. N. Vo and A. F. Bobick. From stochastic grammar to bayes network: Probabilistic parsing of complex activity. In CVPR 2014, June 23-28, 2014, pages 2641--2648. IEEE, 2014.

Digital Library

[14]

P. Xu and R. Sarikaya. Convolutional neural network based triangular CRF for joint intent detection and slot filling. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, December 8-12, 2013, pages 78--83. IEEE, 2013.

[15]

Z. Xu, Y. Yang, and A. G. Hauptmann. A discriminative CNN video representation for event detection. CoRR, abs/1411.4006, 2014.

[16]

Y. Zhang, Y. Zhang, E. Swears, N. Larios, Z. Wang, and Q. Ji. Modeling temporal interactions with interval temporal bayesian networks for complex activity recognition. IEEE Trans. Pattern Anal. Mach. Intell., 35(10):2468--2483, 2013.

Digital Library

Cited By

Shao JXue X(2021)Learnable Higher-Order Representation for Action Recognition2020 25th International Conference on Pattern Recognition (ICPR)10.1109/ICPR48806.2021.9412963(9038-9045)Online publication date: 10-Jan-2021
https://doi.org/10.1109/ICPR48806.2021.9412963
Tang YWang ZLi PLu JYang MZhou JBoll SMu Lee KLuo JZhu WByun HWen Chen CLienhart RMei T(2018)Mining Semantics-Preserving Attention for Group Activity RecognitionProceedings of the 26th ACM international conference on Multimedia10.1145/3240508.3240576(1283-1291)Online publication date: 15-Oct-2018
https://dl.acm.org/doi/10.1145/3240508.3240576
Cao CZhang YZhang CLu H(2016)Action recognition with joints-pooled 3D deep convolutional descriptorsProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3061053.3061086(3324-3330)Online publication date: 9-Jul-2016
https://dl.acm.org/doi/10.5555/3061053.3061086
Show More Cited By

Index Terms

Spatio-Temporal Triangular-Chain CRF for Activity Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding

Recommendations

Mid-level features and spatio-temporal context for activity recognition

Local spatio-temporal features have been shown to be effective and robust in order to represent simple actions. However, for high level human activities with long-range motion or multiple interactive body parts and persons, the limitation of low-level ...
Semi-Markov conditional random fields for accelerometer-based activity recognition

Activity recognition is becoming an important research area, and finding its way to many application domains ranging from daily life services to industrial zones. Sensing hardware and learning algorithms are two important components in activity ...
Wellness determination of the elderly using spatio-temporal correlation analysis of daily activities

With the advent of smart computing, Internet of Things (IoT) and sensor technology, it is now possible to determine the wellness of the elderly living alone in a home equipped with miniaturized sensors. The technology has the potential to seamlessly ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '15: Proceedings of the 23rd ACM international conference on Multimedia

October 2015

1402 pages

ISBN:9781450334594

DOI:10.1145/2733373

General Chairs:
Xiaofang Zhou
The University of Queensland, Australia
,
Alan F. Smeaton
Dublin City University, Ireland
,
Qi Tian
The University of Texas at San Antonio, USA
,
Program Chairs:
Dick C.A. Bulterman
FXPAL, USA
,
Heng Tao Shen
The University of Queensland, Australia
,
Ketan Mayer-Patel
The University of North Carolina, USA
,
Shuicheng Yan
National University of Singapore, Singapore

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

National Natural Science Foundation of China
863 Program

Conference

MM '15

Sponsor:

SIGMM

MM '15: ACM Multimedia Conference

October 26 - 30, 2015

Brisbane, Australia

Acceptance Rates

MM '15 Paper Acceptance Rate 56 of 252 submissions, 22%;

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
209
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Shao JXue X(2021)Learnable Higher-Order Representation for Action Recognition2020 25th International Conference on Pattern Recognition (ICPR)10.1109/ICPR48806.2021.9412963(9038-9045)Online publication date: 10-Jan-2021
https://doi.org/10.1109/ICPR48806.2021.9412963
Tang YWang ZLi PLu JYang MZhou JBoll SMu Lee KLuo JZhu WByun HWen Chen CLienhart RMei T(2018)Mining Semantics-Preserving Attention for Group Activity RecognitionProceedings of the 26th ACM international conference on Multimedia10.1145/3240508.3240576(1283-1291)Online publication date: 15-Oct-2018
https://dl.acm.org/doi/10.1145/3240508.3240576
Cao CZhang YZhang CLu H(2016)Action recognition with joints-pooled 3D deep convolutional descriptorsProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3061053.3061086(3324-3330)Online publication date: 9-Jul-2016
https://dl.acm.org/doi/10.5555/3061053.3061086
Lillo INiebles JSoto A(2016)A Hierarchical Pose-Based Approach to Complex Action Understanding Using Dictionaries of Actionlets and Motion Poselets2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR.2016.218(1981-1990)Online publication date: Jun-2016
https://doi.org/10.1109/CVPR.2016.218

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents