Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3394171.3413659acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Complementary-View Co-Interest Person Detection

Published: 12 October 2020 Publication History

Abstract

Fast and accurate identification of the co-interest persons, who draw joint interest of the surrounding people, plays an important role in social scene understanding and surveillance. Previous study mainly focuses on detecting co-interest persons from a single-view video. In this paper, we study a much more realistic and challenging problem, namely co-interest person~(CIP) detection from multiple temporally-synchronized videos taken by the complementary and time-varying views. Specifically, we use a top-view camera, mounted on a flying drone at a high altitude to obtain a global view of the whole scene and all subjects on the ground, and multiple horizontal-view cameras, worn by selected subjects, to obtain a local view of their nearby persons and environment details. We present an efficient top- and horizontal-view data fusion strategy to map multiple horizontal views into the global top view. We then propose a spatial-temporal CIP potential energy function that jointly considers both intra-frame confidence and inter-frame consistency, thus leading to an effective Conditional Random Field~(CRF) formulation. We also construct a complementary-view video dataset, which provides a benchmark for the study of multi-view co-interest person detection. Extensive experiments validate the effectiveness and superiority of the proposed method.

Supplementary Material

MP4 File (3394171.3413659.mp4)
Presentation Video of Complementary-View Co-Interest Person Detection.

References

[1]
Shervin Ardeshir and Ali Borji. 2016. Ego2Top: Matching Viewers in Egocentric and Top-View Videos. In European Conference on Computer Vision.
[2]
Shervin Ardeshir and Ali Borji. 2018a. Egocentric Meets Top-View. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 6 (2018), 1353--1366.
[3]
Shervin Ardeshir and Ali Borji. 2018b. Integrating Egocentric Videos in Top-View Surveillance Videos: Joint Identification and Temporal Alignment. In European Conference on Computer Vision.
[4]
Chenglizhao Chen, Shuai Li, Yongguang Wang, Hong Qin, and Aimin Hao. 2017. Video Saliency Detection via Spatial-Temporal Fusion and Low-Rank Coherency Diffusion. IEEE Transactions on Image Processing, Vol. 26, 7 (2017), 3156--3170.
[5]
Ding-Jie Chen, Hwann-Tzong Chen, and Long-Wen Chang. 2012. Video object cosegmentation. In ACM International Conference on Multimedia.
[6]
Weichen Chiu and Mario Fritz. 2013. Multi-class Video Co-segmentation with a Generative Multi-video Model. In IEEE Conference on Computer Vision and Pattern Recognition.
[7]
Eunji Chong, Nataniel Ruiz, Yongxin Wang, Yun Zhang, Agata Rozga, and James M Rehg. 2018. Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency. In European Conference on Computer Vision.
[8]
Runmin Cong, Jianjun Lei, Huazhu Fu, Fatih Porikli, Qingming Huang, and Chunping Hou. 2019. Video Saliency Detection via Sparsity-Based Reconstruction and Propagation. IEEE Transactions on Image Processing, Vol. 28, 10 (2019), 4819--4831.
[9]
Thomas Deselaers, Bogdan Alexe, and Vittorio Ferrari. 2012. Weakly Supervised Localization and Learning with Generic Knowledge. International Journal of Computer Vision, Vol. 100, 3 (2012), 275--293.
[10]
Junting Dong, Wen Jiang, Qixing Huang, Hujun Bao, and Xiaowei Zhou. 2019. Fast and Robust Multi-Person 3D Pose Estimation from Multiple Views. In IEEE Conference on Computer Vision and Pattern Recognition.
[11]
Lifeng Fan, Yixin Chen, Ping Wei, Wenguan Wang, and Song-Chun Zhu. 2018. Inferring Shared Attention in Social Scene Videos. In IEEE Conference on Computer Vision and Pattern Recognition.
[12]
Lifeng Fan, Wenguan Wang, Siyuan Huang, Xinyu Tang, and Song-Chun Zhu. 2019. Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning. In International Conference on Computer Vision.
[13]
G David Forney. 1973. The viterbi algorithm. Proceedings of the IEEE, Vol. 61, 3 (1973), 268--278.
[14]
Huazhu Fu, Xu Dong, Bao Zhang, and Stephen Lin. 2014. Object-Based Multiple Foreground Video Co-segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.
[15]
Jiaming Guo, Zhuwen Li, Loongfah Cheong, and Steven Zhiying Zhou. 2013. Video Co-segmentation for Meaningful Action Extraction. In International Conference on Computer Vision.
[16]
Ruize Han, Wei Feng, Jiewen Zhao, Zicheng Niu, Yujun Zhang, Liang Wan, and Song Wang. 2020. Complementary-View Multiple Human Tracking. In AAAI Conference on Artificial Intelligence.
[17]
Ruize Han, Yujun Zhang, Wei Feng, Chenxing Gong, Xiaoyu Zhang, Jiewen Zhao, Liang Wan, and Song Wang. 2019. Multiple Human Association between Top and Horizontal Views by Matching Subjects' Spatial Distributions. In arXiv.
[18]
Lai Jiang, Mai Xu, Tie Liu, Minglang Qiao, and Zulin Wang. 2018. DeepVS: A Deep Learning Based Video Saliency Prediction Approach. In European Conference on Computer Vision.
[19]
Armand Joulin, Kevin Tang, and Li Fei-Fei. 2014. Efficient image and video co-localization with frank-wolfe algorithm. In European Conference on Computer Vision.
[20]
Petr Kellnhofer, Adria Recasens, Simon Stent, Wojciech Matusik, and Antonio Torralba. 2019. Gaze360: Physically Unconstrained Gaze Estimation in the Wild. In International Conference on Computer Vision.
[21]
Aditya Khosla, Carl Vondrick, and Antonio Torralba. 2015. Where are they looking?. In Advances in Neural Information Processing Systems.
[22]
Kyle Krafka, Aditya Khosla, Petr Kellnhofer, Harini Kannan, Suchendra M Bhandarkar, Wojciech Matusik, and Antonio Torralba. 2016. Eye Tracking for Everyone. In IEEE Conference on Computer Vision and Pattern Recognition.
[23]
Yuewei Lin, Kareem Ezzeldeen, Youjie Zhou, Xiaochuan Fan, Hongkai Yu, Hui Qian, and Song Wang. 2015. Co-Interest Person Detection from Multiple Wearable Camera Videos. In International Conference on Computer Vision.
[24]
Hyun Soo Park, Eakta Jain, and Yaser Sheikh. 2012. 3D Gaze Concurrences From Head-mounted Cameras. In Advances in Neural Information Processing Systems.
[25]
Adria Recasens, Carl Vondrick, Aditya Khosla, and Antonio Torralba. 2017. Following Gaze in Video. In International Conference on Computer Vision.
[26]
Joseph Redmon, Santosh Kumar Divvala, Ross B Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In IEEE Conference on Computer Vision and Pattern Recognition.
[27]
Neil Robertson and Ian Reid. 2006. Estimating gaze direction from low-resolution faces in video. In European Conference on Computer Vision.
[28]
Kevin Smith, Sileye O Ba, Jeanmarc Odobez, and Daniel Gaticaperez. 2008. Tracking the Visual Focus of Attention for a Varying Number of Wandering People. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30, 7 (2008), 1212--1229.
[29]
Yusuke Sugano, Yasuyuki Matsushita, and Yoichi Sato. 2014. Learning-by-Synthesis for Appearance-Based 3D Gaze Estimation. In IEEE Conference on Computer Vision and Pattern Recognition.
[30]
Le Wang, Gang Hua, Rahul Sukthankar, Jianru Xue, Zhenxing Niu, and Nanning Zheng. 2014. Video Object Discovery and Co-Segmentation with Extremely Weak Supervision. In European Conference on Computer Vision.
[31]
Le Wang, Gang Hua, Rahul Sukthankar, Jianru Xue, Zhenxing Niu, and Nanning Zheng. 2016. Video object discovery and co-segmentation with extremely weak supervision., Vol. 39, 10 (2016), 2074--2088.
[32]
Wenguan Wang, Qiuxia Lai, Huazhu Fu, Jianbing Shen, and Haibin Ling. 2019. Salient Object Detection in the Deep Learning Era: An In-Depth Survey. In arXiv.
[33]
Wenguan Wang and Jianbing Shen. 2018. Deep Visual Attention Prediction. IEEE Transactions on Image Processing, Vol. 27, 5 (2018), 2368--2378.
[34]
Wenguan Wang, Jianbing Shen, Fang Guo, Ming-Ming Cheng, and Ali Borji. 2018b. Revisiting video saliency: A large-scale benchmark and a new model. In IEEE Conference on Computer Vision and Pattern Recognition.
[35]
Wenguan Wang, Jianbing Shen, and Ling Shao. 2018a. Video Salient Object Detection via Fully Convolutional Networks. IEEE Transactions on Image Processing, Vol. 27, 1 (2018), 38--49.
[36]
Yufeng Xie, Linwei Ye, Zhi Liu, and Xuemei Zou. 2016. Video co-saliency detection. In International Conference on Digital Image Processing.
[37]
Mingze Xu, Chenyou Fan, Yuchen Wang, Michael S Ryoo, and David J Crandall. 2018. Joint Person Segmentation and Identification in Synchronized First- and Third-Person Videos. In European Conference on Computer Vision.
[38]
Yuanlu Xu, Xiaobai Liu, Lei Qin, and Songchun Zhu. 2017. Cross-View People Tracking by Scene-Centered Spatio-Temporal Parsing. In AAAI Conference on Artificial Intelligence.
[39]
Tsunyi Yang, Yiting Chen, Yenyu Lin, and Yungyu Chuang. 2019. FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation From a Single Image. In IEEE Conference on Computer Vision and Pattern Recognition.
[40]
Dong Zhang, Omar Javed, and Mubarak Shah. 2014. Video Object Co-segmentation by Regulated Maximum Weight Cliques. In European Conference on Computer Vision.
[41]
Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2015. Appearance-based gaze estimation in the wild. In IEEE Conference on Computer Vision and Pattern Recognition.
[42]
Jiewen Zhao, Ruize Han, Yiyang Gan, Liang Wan, Wei Feng, and Song Wang. 2020. Human Identification and Interaction Detection in Cross-View Multi-Person Videos with Wearable Cameras. In ACM Multimedia.

Cited By

View all
  • (2024)From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00088(863-873)Online publication date: 16-Jun-2024
  • (2023)Contactless interaction recognition and interactor detection in multi-person scenesFrontiers of Computer Science10.1007/s11704-023-2418-018:5Online publication date: 23-Dec-2023
  • (2023)Benchmarking the Complementary-View Multi-human Association and TrackingInternational Journal of Computer Vision10.1007/s11263-023-01857-z132:1(118-136)Online publication date: 23-Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. co-interest person
  2. horizontal view
  3. multi-camera
  4. top view
  5. video surveillance

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • The research fund for the Tianjin Key Lab for Advanced Signal Processing, Civil Aviation University of China

Conference

MM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)6
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00088(863-873)Online publication date: 16-Jun-2024
  • (2023)Contactless interaction recognition and interactor detection in multi-person scenesFrontiers of Computer Science10.1007/s11704-023-2418-018:5Online publication date: 23-Dec-2023
  • (2023)Benchmarking the Complementary-View Multi-human Association and TrackingInternational Journal of Computer Vision10.1007/s11263-023-01857-z132:1(118-136)Online publication date: 23-Aug-2023
  • (2023)Relating View Directions of Complementary-View Mobile Cameras via the Human ShadowInternational Journal of Computer Vision10.1007/s11263-022-01744-z131:5(1106-1121)Online publication date: 11-Jan-2023
  • (2022)Multi-View Multi-Human Association With Deep Assignment NetworkIEEE Transactions on Image Processing10.1109/TIP.2021.313917831(1830-1840)Online publication date: 2022
  • (2022)Panoramic Human Activity RecognitionComputer Vision – ECCV 202210.1007/978-3-031-19772-7_15(244-261)Online publication date: 28-Oct-2022
  • (2021)Self-supervised Multi-view Multi-Human Association and TrackingProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475177(282-290)Online publication date: 17-Oct-2021
  • (2021)Multiple Human Association and Tracking from Egocentric and Complementary Top ViewsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2021.3070562(1-1)Online publication date: 2021

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media