research-article

A Hierarchical Framwork with Improved Loss for Large-scale Multi-modal Video Identification

Authors:

Shichuan Zhang,

Zengming Tang,

Hao Pan,

Xinyu Wei, and

Jun HuangAuthors Info & Claims

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

October 2019

Pages 2539 - 2542

https://doi.org/10.1145/3343031.3356074

Published: 15 October 2019 Publication History

Get Access

Abstract

This paper introduces our solution for iQIYI Celebrity Video Identification Challenge. After analyzing the iQIYI-VID-2019 dataset, we find the distribution of the dataset is very unbalanced and there are many unlabeled samples in the validation set and the test set. For these challenge, we propose a hierarchical system which combines different models and fuses base classifiers. For the false detections and low-quality features in the dataset, we use a simple and reasonable strategy to fuse features. In order to detect videos more accurately, we choose an improved loss function for the learning of base classifiers. Experiment results show that our framework performs well and evaluation conducted by the organizers shows that our final result gets the ninth place online and mAP 88.08%.

References

[1]

2019.iQIYI AI Competition Platform. http://challenge.ai.iqiyi.com/data-cluster

Google Scholar

[2]

Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici,Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. YouTube-8M: A Large-Scale Video Classification Benchmark. (2016).

Google Scholar

[3]

Fabian Caba, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activity Net: A Large-Scale Video Benchmark for Human Activity Understanding. In Computer Vision Pattern Recognition.

Google Scholar

[4]

Jiankang Deng, Guo Jia, and Stefanos Zafeiriou. 2018. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. (2018).

Google Scholar

[5]

Jianfeng Dong, Xirong Li, Chaoxi Xu, Gang Yang, and Xun Wang. 2018. Feature Re-Learning with Data Augmentation for Content-based Video Recommendation. In ACM Multimedia. https://doi.org/10.1145/3240508.3266441

Crossref

Google Scholar

[6]

Jinyu Guo, Wang Xin, and Li Yuan. 2018. kNN based on probability density for fault detection in multimodal processes. Journal of Chemometrics1 (2018), e3021.

Google Scholar

[7]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Sun Jian. 2015. Delving Deepinto Rectifiers: Surpassing Human-Level Performance on Image Net Classification.(2015).

Google Scholar

[8]

Chiori Hori, Takaaki Hori, Teng Yok Lee, Ziming Zhang, Bret Harsham, John R.Hershey, Tim K. Marks, and Kazuhiko Sumi. 2017. Attention-Based Multimodal Fusion for Video Description. In IEEE International Conference on Computer Vision.

Google Scholar

[9]

M. I. Jordan and R. A. Jacobs. 1994. Hierarchical Mixtures of Experts and the EMAlgorithm. Neural Computation 6, 2 (1994), 181--214.

Digital Library

Google Scholar

[10]

Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, and Andrew Zisserman. 2017. The Kinetics Human Action Video Dataset. (2017).

Google Scholar

[11]

Yuanliu Liu, Peipei Shi, Bo Peng, He Yan, Yong Zhou, Bing Han, Yi Zheng, Chao Lin, Jianbin Jiang, Yin Fan, et al.2018. iQIYI-VID: A Large Dataset for Multi-modal Person Identification. arXiv preprint arXiv:1811.07548(2018).

Google Scholar

Cited By

View all

Zhang SZheng SShui ZLi HYang L(2023)Multi-modal Learning with Missing Modality in Predicting Axillary Lymph Node Metastasis2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM58861.2023.10385578(2395-2400)Online publication date: 5-Dec-2023
https://doi.org/10.1109/BIBM58861.2023.10385578
Kumar DKumar CSeah CXia SShao MWen Chen CCucchiara RHua XQi GRicci EZhang ZZimmermann R(2020)Finding Achilles' HeelProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3413531(3829-3837)Online publication date: 12-Oct-2020
https://dl.acm.org/doi/10.1145/3394171.3413531

Index Terms

A Hierarchical Framwork with Improved Loss for Large-scale Multi-modal Video Identification
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

A Novel Deep Multi-Modal Feature Fusion Method for Celebrity Video Identification
MM '19: Proceedings of the 27th ACM International Conference on Multimedia

In this paper, we develop a novel multi-modal feature fusion method for the 2019 iQIYI Celebrity Video Identification Challenge, which is held in conjunction with ACM MM 2019. The purpose of this challenge is to retrieve all the video clips of a given ...
Read More
3D Face Recognition Using Multi-level Multi-feature Fusion
PSIVT '10: Proceedings of the 2010 Fourth Pacific-Rim Symposium on Image and Video Technology

This paper proposed a novel 3D face recognition algorithm using multi-level multi-feature fusions. A new face representation method named average edge image is proposed in addition to traditional ones such as maximal principal curvature image and range ...
Read More
Hierarchical fusion of multi-spectral face images for improved recognition performance

This paper presents a two level hierarchical fusion of face images captured under visible and infrared light spectrum to improve the performance of face recognition. At image level fusion, two face images from different spectrums are fused using DWT ...
Read More

Comments

Information & Contributors

Information

Published In

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

October 2019

2794 pages

ISBN:9781450368896

DOI:10.1145/3343031

General Chairs:
Laurent Amsaleg
CNRS-IRISA, France
,
Benoit Huet
EURECOM, France
,
Martha Larson
Radboud University and TU Delft (Netherlands)
,
Program Chairs:
Guillaume Gravier
CNRS-IRISA, France
,
Hayley Hung
Delft University of Technology Netherlands
,
Chong-Wah Ngo
City University of Hong Kong Hong Kong
,
Wei Tsang Ooi
National University of Singapore Singapore

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '19

Sponsor:

SIGMM

MM '19: The 27th ACM International Conference on Multimedia

October 21 - 25, 2019

Nice, France

Acceptance Rates

MM '19 Paper Acceptance Rate 252 of 936 submissions, 27%;

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
161
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Other Metrics

View Author Metrics

Citations

Cited By

View all

Zhang SZheng SShui ZLi HYang L(2023)Multi-modal Learning with Missing Modality in Predicting Axillary Lymph Node Metastasis2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM58861.2023.10385578(2395-2400)Online publication date: 5-Dec-2023
https://doi.org/10.1109/BIBM58861.2023.10385578
Kumar DKumar CSeah CXia SShao MWen Chen CCucchiara RHua XQi GRicci EZhang ZZimmermann R(2020)Finding Achilles' HeelProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3413531(3829-3837)Online publication date: 12-Oct-2020
https://dl.acm.org/doi/10.1145/3394171.3413531

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

A Novel Deep Multi-Modal Feature Fusion Method for Celebrity Video Identification

3D Face Recognition Using Multi-level Multi-feature Fusion

Hierarchical fusion of multi-spectral face images for improved recognition performance

Comments

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

A Novel Deep Multi-Modal Feature Fusion Method for Celebrity Video Identification

3D Face Recognition Using Multi-level Multi-feature Fusion

Hierarchical fusion of multi-spectral face images for improved recognition performance

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations