Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3343031.3356081acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

iQIYI Celebrity Video Identification Challenge

Published: 15 October 2019 Publication History

Abstract

We held the iQIYI Celebrity Video Identification Challenge in ACMMULTIMEDIA 2019. The purpose was to encourage the research on video-based person identification. We released the iQIYI-VID-2019 dataset, which contains 200K videos of 10K celebrities. In this paper, we introduce the organization of the challenge, the dataset, the evaluation process, and the results.

References

[1]
Martin Bäuml, Makarand Tapaswi, and Rainer Stiefelhagen. 2013. Semi-supervised Learning with Constraints for Person Identification in Multimedia Data. In CVPR. 3602--3609.
[2]
Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman. 2018. VGGFace2: A dataset for recognising faces across pose and age. In International Conference on Automatic Face and Gesture Recognition.
[3]
Joon Son Chung, Arsha Nagrani, and Andrew Zisserman. 2018. VoxCeleb2: Deep Speaker Recognition. In Interspeech. 1086--1090.
[4]
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In CVPR.
[5]
Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, and Jianfeng Gao. 2016. MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition. In European Conference on Computer Vision. 87--102.
[6]
Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. 2007. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Technical Report 07--49. University of Massachusetts, Amherst.
[7]
Qingqiu Huang, Wentao Liu, and Dahua Lin. 2018. Person Search in Videos with One Portrait Through Visual and Temporal Links. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8--14, 2018, Proceedings, Part XIII. 437--454.
[8]
Minyoung Kim, Sanjiv Kumar, Vladimir Pavlovic, and Henry A. Rowley. 2008. Face tracking and recognition with visual constraints in real-world videos. In IEEE Conference on Computer Vision and Pattern Recognition.
[9]
Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014. DeepReID: Deep Filter Pairing Neural Network for Person Re-identification. In CVPR. 152--159.
[10]
Yuanliu Liu, Bo Peng, Peipei Shi, He Yan, Yong Zhou, Bing Han, Yi Zheng, Chao Lin, Jianbin Jiang, Yin Fan, Tingwei Gao, Ganwen Wang, Jian Liu, Xiangju Lu, and Danming Xie. 2018. iQIYI-VID: A Large Dataset for Multi-modal Person Identification. CoRR, Vol. abs/1811.07548 (2018). http://arxiv.org/abs/1811.07548
[11]
Daniel Miller, Ira Kemelmacher-Shlizerman, and Steven M. Seitz. 2015. MegaFace: A Million Faces for Recognition at Scale. CoRR, Vol. abs/1505.02108 (2015).
[12]
Arsha Nagrani and Andrew Zisserman. 2017. From Benedict Cumberbatch to Sherlock Holmes: Character Identification in TV series without a Script. In BMVC.
[13]
Mahyar Najibi, Pouya Samangouei, Rama Chellappa, and Larry S. Davis. 2017. SSH: Single Stage Headless Face Detector. In IEEE International Conference on Computer Vision. 4885--4894.
[14]
Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: An ASR corpus based on public domain audio books. In International Conference on Acoustics, Speech and Signal Processing. 5206--5210.
[15]
Mirco Ravanelli and Yoshua Bengio. 2018. Speaker Recognition from raw waveform with SincNet. CoRR, Vol. abs/1808.00158 (2018).
[16]
Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, Faster, Stronger. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21--26, 2017. 6517--6525.
[17]
J S. Garofolo, Lori Lamel, W M. Fisher, Jonathan Fiscus, and D S. Pallett. 1993. DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1--1.1., Vol. 93 (01 1993), 27403.
[18]
J. Sivic, M. Everingham, and A. Zisserman. 2009. "Who are you?" -- Learning Person Specific Classifiers from Video. In CVPR.
[19]
Guanshuo Wang, Yufeng Yuan, Xiong Chen, Jiwei Li, and Xi Zhou. 2018. Learning Discriminative Features with Multiple Granularity for Person Re-Identification. In ACM Multimedia.
[20]
Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2014. Person Re-identification by Video Ranking. In European Conference on Computer Vision. 688--703.
[21]
Lior Wolf, Tal Hassner, and Itay Maoz. 2011. Face recognition in unconstrained videos with matched background similarity. In IEEE Conference on Computer Vision and Pattern Recognition. 529--534.
[22]
Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. MARS: A Video Benchmark for Large-Scale Person Re-Identification. In European Conference on Computer Vision.
[23]
Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable Person Re-identification: A Benchmark. In International Conference on Computer Vision. 1116--1124.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '19: Proceedings of the 27th ACM International Conference on Multimedia
October 2019
2794 pages
ISBN:9781450368896
DOI:10.1145/3343031
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. iqiyi vid
  2. multi-modal
  3. person identification
  4. video

Qualifiers

  • Research-article

Conference

MM '19
Sponsor:

Acceptance Rates

MM '19 Paper Acceptance Rate 252 of 936 submissions, 27%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Improving Person Re-Identification With Multi-Cue Similarity Embedding and PropagationIEEE Transactions on Multimedia10.1109/TMM.2022.320794925(6384-6396)Online publication date: 2023
  • (2023)Cross-modal dynamic sentiment annotation for speech sentiment analysisComputers and Electrical Engineering10.1016/j.compeleceng.2023.108598106(108598)Online publication date: Mar-2023
  • (2023)Social Relation Graph Generation on Untrimmed VideoMultiMedia Modeling10.1007/978-3-031-27818-1_61(739-744)Online publication date: 31-Mar-2023
  • (2023)MMM-GCN: Multi-Level Multi-Modal Graph Convolution Network for Video-Based Person IdentificationMultiMedia Modeling10.1007/978-3-031-27077-2_1(3-15)Online publication date: 29-Mar-2023
  • (2022)A semi-automated system for person re-identification adaptation to cross-outfit and cross-posture scenariosApplied Intelligence10.1007/s10489-021-02896-052:8(9501-9520)Online publication date: 6-Jan-2022
  • (2022)A comparison of deep learning models for end-to-end face-based video retrieval in unconstrained videosNeural Computing and Applications10.1007/s00521-021-06875-x34:10(7489-7506)Online publication date: 5-Jan-2022
  • (2021)Segment-Level Cross-Modal Knowledge Transfer for Speech Sentiment Analysis2021 IEEE 4th International Conference on Computer and Communication Engineering Technology (CCET)10.1109/CCET52649.2021.9544303(243-247)Online publication date: 13-Aug-2021
  • (2021)Combining cross-modal knowledge transfer and semi-supervised learning for speech emotion recognitionKnowledge-Based Systems10.1016/j.knosys.2021.107340(107340)Online publication date: Jul-2021
  • (2021)Frame Aggregation and Multi-modal Fusion Framework for Video-Based Person RecognitionMultiMedia Modeling10.1007/978-3-030-67832-6_7(75-86)Online publication date: 21-Jan-2021
  • (2020)Multi-Cue and Temporal Attention for Person Recognition in VideosPattern Recognition and Computer Vision10.1007/978-3-030-60639-8_31(369-380)Online publication date: 15-Oct-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media