Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3343031.3356074acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

A Hierarchical Framwork with Improved Loss for Large-scale Multi-modal Video Identification

Published: 15 October 2019 Publication History
  • Get Citation Alerts
  • Abstract

    This paper introduces our solution for iQIYI Celebrity Video Identification Challenge. After analyzing the iQIYI-VID-2019 dataset, we find the distribution of the dataset is very unbalanced and there are many unlabeled samples in the validation set and the test set. For these challenge, we propose a hierarchical system which combines different models and fuses base classifiers. For the false detections and low-quality features in the dataset, we use a simple and reasonable strategy to fuse features. In order to detect videos more accurately, we choose an improved loss function for the learning of base classifiers. Experiment results show that our framework performs well and evaluation conducted by the organizers shows that our final result gets the ninth place online and mAP 88.08%.

    References

    [1]
    2019.iQIYI AI Competition Platform. http://challenge.ai.iqiyi.com/data-cluster
    [2]
    Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici,Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. YouTube-8M: A Large-Scale Video Classification Benchmark. (2016).
    [3]
    Fabian Caba, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activity Net: A Large-Scale Video Benchmark for Human Activity Understanding. In Computer Vision Pattern Recognition.
    [4]
    Jiankang Deng, Guo Jia, and Stefanos Zafeiriou. 2018. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. (2018).
    [5]
    Jianfeng Dong, Xirong Li, Chaoxi Xu, Gang Yang, and Xun Wang. 2018. Feature Re-Learning with Data Augmentation for Content-based Video Recommendation. In ACM Multimedia. https://doi.org/10.1145/3240508.3266441
    [6]
    Jinyu Guo, Wang Xin, and Li Yuan. 2018. kNN based on probability density for fault detection in multimodal processes. Journal of Chemometrics1 (2018), e3021.
    [7]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Sun Jian. 2015. Delving Deepinto Rectifiers: Surpassing Human-Level Performance on Image Net Classification.(2015).
    [8]
    Chiori Hori, Takaaki Hori, Teng Yok Lee, Ziming Zhang, Bret Harsham, John R.Hershey, Tim K. Marks, and Kazuhiko Sumi. 2017. Attention-Based Multimodal Fusion for Video Description. In IEEE International Conference on Computer Vision.
    [9]
    M. I. Jordan and R. A. Jacobs. 1994. Hierarchical Mixtures of Experts and the EMAlgorithm. Neural Computation 6, 2 (1994), 181--214.
    [10]
    Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, and Andrew Zisserman. 2017. The Kinetics Human Action Video Dataset. (2017).
    [11]
    Yuanliu Liu, Peipei Shi, Bo Peng, He Yan, Yong Zhou, Bing Han, Yi Zheng, Chao Lin, Jianbin Jiang, Yin Fan, et al.2018. iQIYI-VID: A Large Dataset for Multi-modal Person Identification. arXiv preprint arXiv:1811.07548(2018).

    Cited By

    View all
    • (2023)Multi-modal Learning with Missing Modality in Predicting Axillary Lymph Node Metastasis2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM58861.2023.10385578(2395-2400)Online publication date: 5-Dec-2023
    • (2020)Finding Achilles' HeelProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3413531(3829-3837)Online publication date: 12-Oct-2020

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '19: Proceedings of the 27th ACM International Conference on Multimedia
    October 2019
    2794 pages
    ISBN:9781450368896
    DOI:10.1145/3343031
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 October 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. feature fusion
    2. improved loss function
    3. models combination
    4. video identification

    Qualifiers

    • Research-article

    Conference

    MM '19
    Sponsor:

    Acceptance Rates

    MM '19 Paper Acceptance Rate 252 of 936 submissions, 27%;
    Overall Acceptance Rate 995 of 4,171 submissions, 24%

    Upcoming Conference

    MM '24
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Multi-modal Learning with Missing Modality in Predicting Axillary Lymph Node Metastasis2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM58861.2023.10385578(2395-2400)Online publication date: 5-Dec-2023
    • (2020)Finding Achilles' HeelProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3413531(3829-3837)Online publication date: 12-Oct-2020

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media