research-article

Complementary-View Co-Interest Person Detection

Authors:

Song WangAuthors Info & Claims

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Pages 2746 - 2754

https://doi.org/10.1145/3394171.3413659

Published: 12 October 2020 Publication History

Abstract

Fast and accurate identification of the co-interest persons, who draw joint interest of the surrounding people, plays an important role in social scene understanding and surveillance. Previous study mainly focuses on detecting co-interest persons from a single-view video. In this paper, we study a much more realistic and challenging problem, namely co-interest person~(CIP) detection from multiple temporally-synchronized videos taken by the complementary and time-varying views. Specifically, we use a top-view camera, mounted on a flying drone at a high altitude to obtain a global view of the whole scene and all subjects on the ground, and multiple horizontal-view cameras, worn by selected subjects, to obtain a local view of their nearby persons and environment details. We present an efficient top- and horizontal-view data fusion strategy to map multiple horizontal views into the global top view. We then propose a spatial-temporal CIP potential energy function that jointly considers both intra-frame confidence and inter-frame consistency, thus leading to an effective Conditional Random Field~(CRF) formulation. We also construct a complementary-view video dataset, which provides a benchmark for the study of multi-view co-interest person detection. Extensive experiments validate the effectiveness and superiority of the proposed method.

Supplementary Material

MP4 File (3394171.3413659.mp4)

Presentation Video of Complementary-View Co-Interest Person Detection.

Download
71.59 MB

References

[1]

Shervin Ardeshir and Ali Borji. 2016. Ego2Top: Matching Viewers in Egocentric and Top-View Videos. In European Conference on Computer Vision.

[2]

Shervin Ardeshir and Ali Borji. 2018a. Egocentric Meets Top-View. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 6 (2018), 1353--1366.

Digital Library

[3]

Shervin Ardeshir and Ali Borji. 2018b. Integrating Egocentric Videos in Top-View Surveillance Videos: Joint Identification and Temporal Alignment. In European Conference on Computer Vision.

[4]

Chenglizhao Chen, Shuai Li, Yongguang Wang, Hong Qin, and Aimin Hao. 2017. Video Saliency Detection via Spatial-Temporal Fusion and Low-Rank Coherency Diffusion. IEEE Transactions on Image Processing, Vol. 26, 7 (2017), 3156--3170.

Digital Library

[5]

Ding-Jie Chen, Hwann-Tzong Chen, and Long-Wen Chang. 2012. Video object cosegmentation. In ACM International Conference on Multimedia.

Digital Library

[6]

Weichen Chiu and Mario Fritz. 2013. Multi-class Video Co-segmentation with a Generative Multi-video Model. In IEEE Conference on Computer Vision and Pattern Recognition.

Digital Library

[7]

Eunji Chong, Nataniel Ruiz, Yongxin Wang, Yun Zhang, Agata Rozga, and James M Rehg. 2018. Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency. In European Conference on Computer Vision.

[8]

Runmin Cong, Jianjun Lei, Huazhu Fu, Fatih Porikli, Qingming Huang, and Chunping Hou. 2019. Video Saliency Detection via Sparsity-Based Reconstruction and Propagation. IEEE Transactions on Image Processing, Vol. 28, 10 (2019), 4819--4831.

[9]

Thomas Deselaers, Bogdan Alexe, and Vittorio Ferrari. 2012. Weakly Supervised Localization and Learning with Generic Knowledge. International Journal of Computer Vision, Vol. 100, 3 (2012), 275--293.

Digital Library

[10]

Junting Dong, Wen Jiang, Qixing Huang, Hujun Bao, and Xiaowei Zhou. 2019. Fast and Robust Multi-Person 3D Pose Estimation from Multiple Views. In IEEE Conference on Computer Vision and Pattern Recognition.

[11]

Lifeng Fan, Yixin Chen, Ping Wei, Wenguan Wang, and Song-Chun Zhu. 2018. Inferring Shared Attention in Social Scene Videos. In IEEE Conference on Computer Vision and Pattern Recognition.

[12]

Lifeng Fan, Wenguan Wang, Siyuan Huang, Xinyu Tang, and Song-Chun Zhu. 2019. Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning. In International Conference on Computer Vision.

[13]

G David Forney. 1973. The viterbi algorithm. Proceedings of the IEEE, Vol. 61, 3 (1973), 268--278.

[14]

Huazhu Fu, Xu Dong, Bao Zhang, and Stephen Lin. 2014. Object-Based Multiple Foreground Video Co-segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.

[15]

Jiaming Guo, Zhuwen Li, Loongfah Cheong, and Steven Zhiying Zhou. 2013. Video Co-segmentation for Meaningful Action Extraction. In International Conference on Computer Vision.

[16]

Ruize Han, Wei Feng, Jiewen Zhao, Zicheng Niu, Yujun Zhang, Liang Wan, and Song Wang. 2020. Complementary-View Multiple Human Tracking. In AAAI Conference on Artificial Intelligence.

[17]

Ruize Han, Yujun Zhang, Wei Feng, Chenxing Gong, Xiaoyu Zhang, Jiewen Zhao, Liang Wan, and Song Wang. 2019. Multiple Human Association between Top and Horizontal Views by Matching Subjects' Spatial Distributions. In arXiv.

[18]

Lai Jiang, Mai Xu, Tie Liu, Minglang Qiao, and Zulin Wang. 2018. DeepVS: A Deep Learning Based Video Saliency Prediction Approach. In European Conference on Computer Vision.

[19]

Armand Joulin, Kevin Tang, and Li Fei-Fei. 2014. Efficient image and video co-localization with frank-wolfe algorithm. In European Conference on Computer Vision.

[20]

Petr Kellnhofer, Adria Recasens, Simon Stent, Wojciech Matusik, and Antonio Torralba. 2019. Gaze360: Physically Unconstrained Gaze Estimation in the Wild. In International Conference on Computer Vision.

[21]

Aditya Khosla, Carl Vondrick, and Antonio Torralba. 2015. Where are they looking?. In Advances in Neural Information Processing Systems.

[22]

Kyle Krafka, Aditya Khosla, Petr Kellnhofer, Harini Kannan, Suchendra M Bhandarkar, Wojciech Matusik, and Antonio Torralba. 2016. Eye Tracking for Everyone. In IEEE Conference on Computer Vision and Pattern Recognition.

[23]

Yuewei Lin, Kareem Ezzeldeen, Youjie Zhou, Xiaochuan Fan, Hongkai Yu, Hui Qian, and Song Wang. 2015. Co-Interest Person Detection from Multiple Wearable Camera Videos. In International Conference on Computer Vision.

Digital Library

[24]

Hyun Soo Park, Eakta Jain, and Yaser Sheikh. 2012. 3D Gaze Concurrences From Head-mounted Cameras. In Advances in Neural Information Processing Systems.

[25]

Adria Recasens, Carl Vondrick, Aditya Khosla, and Antonio Torralba. 2017. Following Gaze in Video. In International Conference on Computer Vision.

[26]

Joseph Redmon, Santosh Kumar Divvala, Ross B Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In IEEE Conference on Computer Vision and Pattern Recognition.

[27]

Neil Robertson and Ian Reid. 2006. Estimating gaze direction from low-resolution faces in video. In European Conference on Computer Vision.

Digital Library

[28]

Kevin Smith, Sileye O Ba, Jeanmarc Odobez, and Daniel Gaticaperez. 2008. Tracking the Visual Focus of Attention for a Varying Number of Wandering People. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30, 7 (2008), 1212--1229.

Digital Library

[29]

Yusuke Sugano, Yasuyuki Matsushita, and Yoichi Sato. 2014. Learning-by-Synthesis for Appearance-Based 3D Gaze Estimation. In IEEE Conference on Computer Vision and Pattern Recognition.

Digital Library

[30]

Le Wang, Gang Hua, Rahul Sukthankar, Jianru Xue, Zhenxing Niu, and Nanning Zheng. 2014. Video Object Discovery and Co-Segmentation with Extremely Weak Supervision. In European Conference on Computer Vision.

[31]

Le Wang, Gang Hua, Rahul Sukthankar, Jianru Xue, Zhenxing Niu, and Nanning Zheng. 2016. Video object discovery and co-segmentation with extremely weak supervision., Vol. 39, 10 (2016), 2074--2088.

[32]

Wenguan Wang, Qiuxia Lai, Huazhu Fu, Jianbing Shen, and Haibin Ling. 2019. Salient Object Detection in the Deep Learning Era: An In-Depth Survey. In arXiv.

[33]

Wenguan Wang and Jianbing Shen. 2018. Deep Visual Attention Prediction. IEEE Transactions on Image Processing, Vol. 27, 5 (2018), 2368--2378.

Digital Library

[34]

Wenguan Wang, Jianbing Shen, Fang Guo, Ming-Ming Cheng, and Ali Borji. 2018b. Revisiting video saliency: A large-scale benchmark and a new model. In IEEE Conference on Computer Vision and Pattern Recognition.

[35]

Wenguan Wang, Jianbing Shen, and Ling Shao. 2018a. Video Salient Object Detection via Fully Convolutional Networks. IEEE Transactions on Image Processing, Vol. 27, 1 (2018), 38--49.

[36]

Yufeng Xie, Linwei Ye, Zhi Liu, and Xuemei Zou. 2016. Video co-saliency detection. In International Conference on Digital Image Processing.

[37]

Mingze Xu, Chenyou Fan, Yuchen Wang, Michael S Ryoo, and David J Crandall. 2018. Joint Person Segmentation and Identification in Synchronized First- and Third-Person Videos. In European Conference on Computer Vision.

[38]

Yuanlu Xu, Xiaobai Liu, Lei Qin, and Songchun Zhu. 2017. Cross-View People Tracking by Scene-Centered Spatio-Temporal Parsing. In AAAI Conference on Artificial Intelligence.

[39]

Tsunyi Yang, Yiting Chen, Yenyu Lin, and Yungyu Chuang. 2019. FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation From a Single Image. In IEEE Conference on Computer Vision and Pattern Recognition.

[40]

Dong Zhang, Omar Javed, and Mubarak Shah. 2014. Video Object Co-segmentation by Regulated Maximum Weight Cliques. In European Conference on Computer Vision.

[41]

Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2015. Appearance-based gaze estimation in the wild. In IEEE Conference on Computer Vision and Pattern Recognition.

[42]

Jiewen Zhao, Ruize Han, Yiyang Gan, Liang Wan, Wei Feng, and Song Wang. 2020. Human Identification and Interaction Detection in Cross-View Multi-Person Videos with Wearable Cameras. In ACM Multimedia.

Cited By

Qian ZHan RFeng WWang S(2024)From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00088(863-873)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00088
Li JHan RFeng WYan HWang S(2023)Contactless interaction recognition and interactor detection in multi-person scenesFrontiers of Computer Science10.1007/s11704-023-2418-018:5Online publication date: 23-Dec-2023
https://doi.org/10.1007/s11704-023-2418-0
Han RFeng WWang FQian ZYan HWang S(2023)Benchmarking the Complementary-View Multi-human Association and TrackingInternational Journal of Computer Vision10.1007/s11263-023-01857-z132:1(118-136)Online publication date: 23-Aug-2023
https://doi.org/10.1007/s11263-023-01857-z
Show More Cited By

Index Terms

Complementary-View Co-Interest Person Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding

Recommendations

Self-calibration of omnidirectional multi-cameras including synchronization and rolling shutter

Deal with consumer 360 cameras and spherical cameras without a privileged direction.Initialize the time offsets and intrinsic parameters using monocular structure-from-motion.Start multi-camera structure-from-motion with central and global shutter ...
Person tracking and reidentification: Introducing Panoramic Appearance Map (PAM) for feature representation
Abstract
This paper develops a concept of Panoramic Appearance Map (PAM) for performing person reidentification in a multi-camera setup. Each person is tracked in multiple cameras and the position on the floor plan is determined using triangulation. Using ...
OpenPTrack

OpenPTrack is an open source software for multi-camera calibration and people tracking in RGB-D camera networks. It allows to track people in big volumes at sensor frame rate and currently supports a heterogeneous set of 3D sensors.In this work, we ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

October 2020

4889 pages

ISBN:9781450379885

DOI:10.1145/3394171

General Chairs:
Chang Wen Chen
Chinese University of Hong Kong, Shenzhen, China
,
Rita Cucchiara
UNIMORE, Italy
,
Xian-Sheng Hua
Alibaba Group, China
,
Program Chairs:
Guo-Jun Qi
Futurewei Technologies, USA
,
Elisa Ricci
UNITN & Fondazione Bruno Kessler, Italy
,
Zhengyou Zhang
Tencent, China
,
Roger Zimmermann
National University of Singapore, Singapore

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
The research fund for the Tianjin Key Lab for Advanced Signal Processing, Civil Aviation University of China

Conference

MM '20

Sponsor:

SIGMM

MM '20: The 28th ACM International Conference on Multimedia

October 12 - 16, 2020

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
174
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)6

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Qian ZHan RFeng WWang S(2024)From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00088(863-873)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00088
Li JHan RFeng WYan HWang S(2023)Contactless interaction recognition and interactor detection in multi-person scenesFrontiers of Computer Science10.1007/s11704-023-2418-018:5Online publication date: 23-Dec-2023
https://doi.org/10.1007/s11704-023-2418-0
Han RFeng WWang FQian ZYan HWang S(2023)Benchmarking the Complementary-View Multi-human Association and TrackingInternational Journal of Computer Vision10.1007/s11263-023-01857-z132:1(118-136)Online publication date: 23-Aug-2023
https://doi.org/10.1007/s11263-023-01857-z
Han RGan YWang LLi NFeng WWang S(2023)Relating View Directions of Complementary-View Mobile Cameras via the Human ShadowInternational Journal of Computer Vision10.1007/s11263-022-01744-z131:5(1106-1121)Online publication date: 11-Jan-2023
https://doi.org/10.1007/s11263-022-01744-z
Han RWang YYan HFeng WWang S(2022)Multi-View Multi-Human Association With Deep Assignment NetworkIEEE Transactions on Image Processing10.1109/TIP.2021.313917831(1830-1840)Online publication date: 2022
https://doi.org/10.1109/TIP.2021.3139178
Han RYan HLi JWang SFeng WWang S(2022)Panoramic Human Activity RecognitionComputer Vision – ECCV 202210.1007/978-3-031-19772-7_15(244-261)Online publication date: 28-Oct-2022
https://doi.org/10.1007/978-3-031-19772-7_15
Gan YHan RYin LFeng WWang SShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Self-supervised Multi-view Multi-Human Association and TrackingProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475177(282-290)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475177
Han RFeng WZhang YZhao JWang S(2021)Multiple Human Association and Tracking from Egocentric and Complementary Top ViewsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2021.3070562(1-1)Online publication date: 2021
https://doi.org/10.1109/TPAMI.2021.3070562

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents