research-article

Pattern4Ego: Learning Egocentric Video Representation Using Cross-video Activity Patterns

Authors:

Andy Guanhong Chen,

Hao DongAuthors Info & Claims

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

Pages 785 - 794

https://doi.org/10.1145/3652583.3658010

Published: 07 June 2024 Publication History

Abstract

With the development of Embodied AI, Robotics and Augmented Reality, videos captured from the 'first-person' point of view, also known as egocentric videos, are arousing interests in Computer Vision and Robotics communities. Further, learning a proper representation of egocentric videos can benefit diverse downstream tasks like action forecasting and human object interactions, further beneficial for robotic planning. However, current works mostly focus on learning the temporal or topological information for egocentric video representations, while the activity patterns, which reveal the behavior regularities or the intentions of people or robots in a more explicit way, are not carefully considered. In this paper, we propose a novel framework, Pattern4Ego, that learns the representations of egocentric videos using cross-video activity patterns. This framework achieves state-of-the-art performance on two representative egocentric video tasks: long-term action anticipation and context-based environment affordance.

References

[1]

Mehmet Ali Arabaci, Elif Surer, and Alptekin Temizel. 2023. Egocentric Activity Recognition Using Two-Stage Decision Fusion. (2023).

[2]

Fabien Baradel, Natalia Neverova, Christian Wolf, Julien Mille, and Greg Mori. 2018. Object level visual reasoning in videos. In Proceedings of the European Conference on Computer Vision (ECCV). 105--121.

Digital Library

[3]

Sagie Benaim, Ariel Ephrat, Oran Lang, Inbar Mosseri, William T Freeman, Michael Rubinstein, Michal Irani, and Tali Dekel. 2020. Speednet: Learning the speediness in videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9922--9931.

[4]

Marius Bock, Michael Moeller, Kristof Van Laerhoven, and Hilde Kuehne. 2023. WEAR: A Multimodal Dataset for Wearable and Egocentric Video Activity Recognition. arXiv preprint arXiv:2304.05088 (2023).

[5]

Minjie Cai, Kris M Kitani, and Yoichi Sato. 2016. Understanding Hand-Object Manipulation with Grasp Types and Object Attributes. In Robotics: Science and Systems, Vol. 3. Ann Arbor, Michigan;.

[6]

Minjie Cai, Feng Lu, and Yoichi Sato. 2020. Generalizing Hand Segmentation in Egocentric Videos With Uncertainty-Guided Model Adaptation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]

Joao Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).

[9]

Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, et al. 2018. Scaling egocentric vision: The epic-kitchens dataset. In Proceedings of the European Conference on Computer Vision (ECCV). 720--736.

[10]

Dima Damen, Teesid Leelasawassuk, and Walterio Mayol-Cuevas. 2016. You-Do, I-Learn: Egocentric unsupervised discovery of objects and their modes of interaction towards video-based guidance. Computer Vision and Image Understanding, Vol. 149 (2016), 98--112.

Digital Library

[11]

Junting Dong, Qing Shuai, Yuanqing Zhang, Xian Liu, Xiaowei Zhou, and Hujun Bao. 2020. Motion capture from internet videos. In European Conference on Computer Vision. Springer, 210--227.

Digital Library

[12]

Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, and Andrew Zisserman. 2019. Temporal cycle-consistency learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1801--1810.

[13]

Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF international conference on computer vision. 6202--6211.

[14]

Antonino Furnari, Sebastiano Battiato, Kristen Grauman, and Giovanni Maria Farinella. 2017. Next-active-object prediction from egocentric videos. Journal of Visual Communication and Image Representation, Vol. 49 (2017), 401--411.

Digital Library

[15]

Antonino Furnari and Giovanni Maria Farinella. 2019. What would you expect? anticipating egocentric actions with rolling-unrolling lstms and modality attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6252--6261.

[16]

Jiyang Gao, Zhenheng Yang, and Ram Nevatia. 2017. Red: Reinforced encoder-decoder networks for action anticipation. arXiv preprint arXiv:1707.04818 (2017).

[17]

Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic, and Bryan Russell. 2017. Actionvlad: Learning spatio-temporal aggregation for action classification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 971--980.

[18]

Xinyu Gong, Sreyas Mohan, Naina Dhingra, Jean-Charles Bazin, Yilei Li, Zhangyang Wang, and Rakesh Ranjan. 2023. MMG-Ego4D: Multimodal Generalization in Egocentric Action Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6481--6491.

[19]

Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Christian Fuegen, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei Huang, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, and Jitendra Malik. 2021. Ego4D: Around the World in 3,000 Hours of Egocentric Video. CoRR, Vol. abs/2110.07058 (2021). showeprint[arXiv]2110.07058 https://arxiv.org/abs/2110.07058

[20]

Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, et al. 2023. Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives. arXiv preprint arXiv:2311.18259 (2023).

[21]

Jiaqi Guan, Ye Yuan, Kris M Kitani, and Nicholas Rhinehart. 2020. Generative hybrid representations for activity forecasting with no-regret learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 173--182.

[22]

Tengda Han, Weidi Xie, and Andrew Zisserman. 2019. Video representation learning by dense predictive coding. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 0--0.

[23]

Tengda Han, Weidi Xie, and Andrew Zisserman. 2020. Memory-augmented dense predictive coding for video representation learning. In European conference on computer vision. Springer, 312--329.

Digital Library

[24]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.

[25]

Yedid Hoshen and Shmuel Peleg. 2016. An egocentric look at video photographer identity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4284--4292.

[26]

Tao Hu, Kripasindhu Sarkar, Lingjie Liu, Matthias Zwicker, and Christian Theobalt. 2021. EgoRenderer: Rendering Human Avatars from Egocentric Camera Images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14528--14538.

[27]

Yifei Huang, Minjie Cai, Zhenqiang Li, and Yoichi Sato. 2018. Predicting gaze in egocentric video by learning task-dependent attention transition. In Proceedings of the European Conference on Computer Vision (ECCV). 754--769.

Digital Library

[28]

Noureldien Hussein, Efstratios Gavves, and Arnold WM Smeulders. 2019a. Timeception for complex action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 254--263.

[29]

Noureldien Hussein, Efstratios Gavves, and Arnold WM Smeulders. 2019b. Videograph: Recognizing minutes-long human activities in videos. arXiv preprint arXiv:1905.05143 (2019).

[30]

Hao Jiang and Kristen Grauman. 2017. Seeing invisible poses: Estimating 3d body pose from egocentric video. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 3501--3509.

[31]

Hao Jiang and Vamsi Krishna Ithapu. 2021. Egocentric Pose Estimation from Human Vision Span. arXiv preprint arXiv:2104.05167 (2021).

[32]

Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, and Dima Damen. 2019. EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

[33]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).

[34]

Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Daniel Gordon, Yuke Zhu, Abhinav Gupta, and Ali Farhadi. 2017. AI2-THOR: An Interactive 3D Environment for Visual AI. arXiv (2017).

[35]

Bolin Lai, Miao Liu, Fiona Ryan, and James Rehg. 2022. In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation. arXiv preprint arXiv:2208.04464 (2022).

[36]

Yong Jae Lee and Kristen Grauman. 2015. Predicting important objects for egocentric video summarization. International Journal of Computer Vision, Vol. 114, 1 (2015), 38--55.

Digital Library

[37]

Rosario Leonardi, Francesco Ragusa, Antonino Furnari, and Giovanni Maria Farinella. 2023. Exploiting Multimodal Synthetic Data for Egocentric Human-Object Interaction Detection in an Industrial Scenario. arXiv preprint arXiv:2306.12152 (2023).

[38]

Pandeng Li, Chen-Wei Xie, Liming Zhao, Hongtao Xie, Jiannan Ge, Yun Zheng, Deli Zhao, and Yongdong Zhang. 2023. Progressive spatio-temporal prototype matching for text-video retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4100--4110.

[39]

Xinyu Li, Chunhui Liu, Bing Shuai, Yi Zhu, Hao Chen, and Joseph Tighe. 2022. Nuta: Non-uniform temporal aggregation for action recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 3683--3692.

[40]

Yin Li, Alireza Fathi, and James M Rehg. 2013. Learning to predict gaze in egocentric video. In Proceedings of the IEEE International Conference on Computer Vision. 3216--3223.

Digital Library

[41]

Yan Li, Bin Ji, Xintian Shi, Jianguo Zhang, Bin Kang, and Limin Wang. 2020. Tea: Temporal excitation and aggregation for action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 909--918.

[42]

Yin Li, Miao Liu, and James M Rehg. 2018. In the eye of beholder: Joint learning of gaze and actions in first person video. In Proceedings of the European conference on computer vision (ECCV). 619--635.

Digital Library

[43]

William Lotter, Gabriel Kreiman, and David Cox. 2016. Deep predictive coding networks for video prediction and unsupervised learning. arXiv preprint arXiv:1605.08104 (2016).

[44]

Cewu Lu, Renjie Liao, and Jiaya Jia. 2015. Personal object discovery in first-person videos. IEEE Transactions on Image Processing, Vol. 24, 12 (2015), 5789--5799.

Digital Library

[45]

Zheng Lu and Kristen Grauman. 2013. Story-driven summarization for egocentric video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2714--2721.

Digital Library

[46]

Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, and Dacheng Tao. 2022. Learning affordance grounding from exocentric images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2252--2261.

[47]

Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, and Hans Peter Graf. 2018. Attend and interact: Higher-order object interactions for video understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6790--6800.

[48]

Osama Makansi, Ozgun Cicek, Kevin Buchicchio, and Thomas Brox. 2020. Multimodal Future Localization and Emergence Prediction for Objects in Egocentric View With a Reachability Prior. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]

Tushar Nagarajan, Christoph Feichtenhofer, and Kristen Grauman. 2019. Grounded human-object interaction hotspots from video. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8688--8697.

[50]

Tushar Nagarajan and Kristen Grauman. 2020. Learning Affordance Landscapes for Interaction Exploration in 3D Environments. In NeurIPS.

[51]

Tushar Nagarajan, Yanghao Li, Christoph Feichtenhofer, and Kristen Grauman. 2020. EGO-TOPO: Environment Affordances from Egocentric Video. In CVPR.

[52]

Evonne Ng, Donglai Xiang, Hanbyul Joo, and Kristen Grauman. 2020. You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]

Chuanruo Ning, Ruihai Wu, Haoran Lu, Kaichun Mo, and Hao Dong. 2023. Where2Explore: Few-shot Affordance Learning for Unseen Novel Categories of Articulated Objects. In Advances in Neural Information Processing Systems (NeurIPS).

[54]

Takehiko Ohkawa, Kun He, Fadime Sener, Tomas Hodan, Luan Tran, and Cem Keskin. 2023. AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12999--13008.

[55]

Hyun Soo Park, Jyh-Jing Hwang, Yedong Niu, and Jianbo Shi. 2016. Egocentric future localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4697--4705.

[56]

Fiora Pirri, Lorenzo Mauro, Edoardo Alati, Valsamis Ntouskos, Mahdieh Izadpanahkakhk, and Elham Omrani. 2019. Anticipation and next action forecasting in video: an end-to-end model with memory. arXiv preprint arXiv:1901.03728 (2019).

[57]

Hamed Pirsiavash and Deva Ramanan. 2012. Detecting activities of daily living in first-person camera views. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2847--2854.

[58]

Gorjan Radevski, Dusan Grujicic, Matthew Blaschko, Marie-Francine Moens, and Tinne Tuytelaars. 2023. Multimodal distillation for egocentric action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5213--5224.

[59]

Fitsum A Reda, Deqing Sun, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin J Shih, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2019. Unsupervised video interpolation using cycle consistency. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 892--900.

[60]

Nicholas Rhinehart and Kris M Kitani. 2017. First-person activity forecasting with online inverse reinforcement learning. In Proceedings of the IEEE International Conference on Computer Vision. 3696--3705.

[61]

Poonam Sharma and Gudla Balakrishna. 2011. PrefixSpan: Mining sequential patterns by prefix-projected pattern. International Journal of Computer Science and Engineering Survey, Vol. 2, 4 (2011), 111.

[62]

Yuge Shi, Basura Fernando, and Richard Hartley. 2018. Action anticipation with rbf kernelized feature mapping rnn. In Proceedings of the European Conference on Computer Vision (ECCV). 301--317.

Digital Library

[63]

Gunnar A Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, and Karteek Alahari. 2018. Charades-ego: A large-scale dataset of paired third and first person videos. arXiv preprint arXiv:1804.09626 (2018).

[64]

Jiajun Tang, Jin Xia, Xinzhi Mu, Bo Pang, and Cewu Lu. 2020. Asynchronous interaction aggregation for action detection. In European Conference on Computer Vision. Springer, 71--87.

Digital Library

[65]

Daksh Thapar, Aditya Nigam, and Chetan Arora. 2021. Anonymizing Egocentric Videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2320--2329.

[66]

Denis Tome, Patrick Peluse, Lourdes Agapito, and Hernan Badino. 2019. xr-egopose: Egocentric 3d human pose from an hmd camera. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7728--7738.

[67]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, 11 (2008).

[68]

Jiangliu Wang, Jianbo Jiao, Linchao Bao, Shengfeng He, Yunhui Liu, and Wei Liu. 2019. Self-supervised spatio-temporal representation learning for videos by predicting motion and appearance statistics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4006--4015.

[69]

Jiangliu Wang, Jianbo Jiao, and Yun-Hui Liu. 2020. Self-supervised video representation learning by pace prediction. In European conference on computer vision. Springer, 504--521.

Digital Library

[70]

Jian Wang, Lingjie Liu, Weipeng Xu, Kripasindhu Sarkar, and Christian Theobalt. 2021a. Estimating Egocentric 3D Human Pose in Global Space. arXiv preprint arXiv:2104.13454 (2021).

[71]

Jian Wang, Diogo Luvizon, Weipeng Xu, Lingjie Liu, Kripasindhu Sarkar, and Christian Theobalt. 2023 b. Scene-aware Egocentric 3D Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13031--13040.

[72]

Xiaolong Wang and Abhinav Gupta. 2018. Videos as space-time region graphs. In Proceedings of the European conference on computer vision (ECCV). 399--417.

Digital Library

[73]

Xin Wang, Taein Kwon, Mahdi Rad, Bowen Pan, Ishani Chakraborty, Sean Andrist, Dan Bohus, Ashley Feniello, Bugra Tekin, Felipe Vieira Frujeri, et al. 2023 a. Holoassist: an egocentric human interaction dataset for interactive ai assistants in the real world. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 20270--20281.

[74]

Xiaohan Wang, Linchao Zhu, Heng Wang, and Yi Yang. 2021b. Interactive Prototype Learning for Egocentric Action Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8168--8177.

[75]

Yian Wang, Ruihai Wu, Kaichun Mo, Jiaqi Ke, Qingnan Fan, Leonidas Guibas, and Hao Dong. 2022. AdaAfford: Learning to Adapt Manipulation Affordance for 3D Articulated Objects via Few-shot Interactions. European conference on computer vision (ECCV 2022) (2022).

Digital Library

[76]

Haiping Wu and Xiaolong Wang. 2021. Contrastive Learning of Image Representations with Cross-Video Cycle-Consistency. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10149--10159.

[77]

Ruihai Wu, Kai Cheng, Yan Shen, Chuanruo Ning, Guanqi Zhan, and Hao Dong. 2023 a. Learning Environment-Aware Affordance for 3D Articulated Object Manipulation under Occlusions. In Advances in Neural Information Processing Systems (NeurIPS).

[78]

Ruihai Wu, Chuanruo Ning, and Hao Dong. 2023 b. Learning Foresightful Dense Visual Affordance for Deformable Object Manipulation. In IEEE International Conference on Computer Vision (ICCV).

[79]

Ruihai Wu, Yan Zhao, Kaichun Mo, Zizheng Guo, Yian Wang, Tianhao Wu, Qingnan Fan, Xuelin Chen, Leonidas Guibas, and Hao Dong. 2022. VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects. In International Conference on Learning Representations. https://openreview.net/forum?id=iEx3PiooLy

[80]

Yuhang Yang, Wei Zhai, Hongchen Luo, Yang Cao, Jiebo Luo, and Zheng-Jun Zha. 2023. Grounding 3d object affordance from 2d interactions in images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10905--10915.

[81]

Ryo Yonetani, Kris M Kitani, and Yoichi Sato. 2016. Visual motif discovery via first-person vision. In European Conference on Computer Vision. Springer, 187--203.

[82]

Lingzhi Zhang, Shenghao Zhou, Simon Stent, and Jianbo Shi. 2022. Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications. In European Conference on Computer Vision. Springer, 127--145.

Digital Library

[83]

Yubo Zhang, Pavel Tokmakov, Martial Hebert, and Cordelia Schmid. 2019. A structured model for action detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9975--9984.

[84]

Yan Zhao, Ruihai Wu, Zhehuan Chen, Yourong Zhang, Qingnan Fan, Kaichun Mo, and Hao Dong. 2023. DualAfford: Learning Collaborative Visual Affordance for Dual-gripper Object Manipulation. International Conference on Learning Representations (ICLR) (2023).

[85]

Yipin Zhou and Tamara L Berg. 2015. Temporal perception and prediction in ego-centric video. In Proceedings of the IEEE International Conference on Computer Vision. 4498--4506

Digital Library

Index Terms

Pattern4Ego: Learning Egocentric Video Representation Using Cross-video Activity Patterns
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding
        Vision for robotics

Recommendations

Recognizing Camera Wearer from Hand Gestures in Egocentric Videos: https://egocentricbiometric.github.io/
MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Wearable egocentric cameras are typically harnessed to a wearer's head, giving them the unique advantage of capturing their points of view. Hoshen and Peleg have shown that egocentric cameras indirectly capture the wearer's gait, which can be used to ...
Learning to Predict Gaze in Egocentric Video
ICCV '13: Proceedings of the 2013 IEEE International Conference on Computer Vision

We present a model for gaze prediction in egocentric video by leveraging the implicit cues that exist in camera wearer's behaviors. Specifically, we compute the camera wearer's head motion and hand location from the video and combine them to estimate ...
Egocentric Hand Detection Via Dynamic Region Growing

Egocentric videos, which mainly record the activities carried out by the users of wearable cameras, have drawn much research attention in recent years. Due to its lengthy content, a large number of ego-related applications have been developed to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

May 2024

1379 pages

ISBN:9798400706196

DOI:10.1145/3652583

General Chairs:
Cathal Gurrin
Dublin City University, Ireland
,
Rachada Kongkachandra
Thammasat University, Thailand
,
Klaus Schoeffmann
Klagenfurt University, Austria
,
Program Chairs:
Duc-Tien Dang-Nguyen
University of Bergen, Norway
,
Luca Rossetto
University of Zurich, Switzerland
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Liting Zhou
Dublin City University, Ireland

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMR '24

Sponsor:

ICMR '24: International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket, Thailand

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
149
Total Downloads

Downloads (Last 12 months)149
Downloads (Last 6 weeks)29

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten