Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
introduction
Free access

Introduction to Big Multimodal Multimedia Data with Deep Analytics

Published: 31 March 2021 Publication History
  • Get Citation Alerts
  • Big multimodal multimedia data can benefit our understanding of humans and society in a subversive way, and the massive data from different media sources requires a feasible manner to process potential information. The recent advances in deep learning can help researchers handle big multimedia data analytics better in order to study user behavior patterns and understand practical implications for various applications. This summary reports some recent advancements in this area. The special issue presents nine articles after a careful peer review process. The addressed topics cover cross-modal retrieval to deep model design, representation learning, clustering, and image processing, as well as a comprehensive survey of big multimodal multimedia data analytics.
    The purpose of cross-modal retrieval is to find the relationship between different modal samples, and to retrieve other modal samples with similar semantics by using a certain modal sample. However, the existing methods often ignore the semantic correlation between the same modalities among different multimodal samples. To overcome this challenge, Zhang et al. propose HCMSL, a novel hybrid cross-model similarity learning model, which aims to capture sufficient semantic information from both labeled and unlabeled cross-model pairs and intramodel pairs with the same classification label. In addition, two Siamese convolutional neural network models are employed to learn intramodal similarity from samples of the same modality. These intramodal similarities are fused with cross-modal similarity to construct hybrid cross-modal similarity loss, so as to transform intramodal semantic correlation into cross-modal similarity to train a common subspace learning model.
    Xu et al. address the problem of existing Zero-Shot Cross-Modal Retrieval (ZS-CMR) models (i.e., poor generalization ability in a zero-shot setting and relatively inferior performance). To this end, a novel method named AAEGAN (Assembling AutoEncoder and Generative Adversarial Network) for the more realistic ZS-CMR scenario is proposed, which combines the strength of AutoEncoder and Generative Adversarial Network, to jointly incorporate common latent space learning, knowledge transfer, and feature synthesis for ZS-CMR. In addition, the novel constraint of distribution alignment can preserve the semantic compatibility between modalities while enhancing the learning of common latent space; this will be beneficial to learn more robust common space.
    Fu et al. show that most of the existing graph convolutional network-based methods depend on static data structural relationships, resulting in extracted data features lacking representativeness during the convolution process. To resolve that, they develop the Dynamic Graph Learning Convolutional Network (DGLCN) with semisupervised learning to yield a dynamic graph structure learning model. With the single-layer propagation rule of the DGLCN obtained by optimizing the spectral dynamic graph convolution, due to the fusion of the optimized structural information, the multilayer DGLCN can extract richer sample features to improve classification performance.
    Despite the good performance of low-rank coding-based representation learning in discovering and recovering the subspace structures in data, its single-layer structure determines that deep hidden information cannot be obtained. The studies of Zhang et al. propose DLRF-Net, a new and progressive deep latent low-rank fusion network to uncover deep features and the clustering structures embedded in latent subspaces while also obtaining deep hidden information to ensure the representation learning of deeper layers to discover the underlying clean subspaces. In addition, as indicated, DLRF-Net is general and is applicable to most existing latent low-rank representation models.
    With the aim of effectively addressing the missing features among data of Gaussian Mixture Model (GMM) clustering, Zhang et al. propose to integrate the imputation and GMM clustering into a unified learning procedure. Specifically, the missing data is filled by the result of GMM clustering, and the imputed data is then taken for GMM clustering. These two steps alternatively negotiate with each other to achieve an optimum result. Further, a two-step alternative algorithm with proved convergence is designed to solve the resultant optimization problem.
    The aim of the study by Zhang et al. focus on the user credit grading problem to achieve anomaly detection, risk early warning, personalized information, and service recommendation for privileged users. First, the three defined naturally ordered categories based on user registration and behavior information can formulate user credit grading as the ordinal regression problem. To avoid the fragility of KDLOR (Kernel Discriminant Learning for Ordinal Regression), they adopt a robust sampling model with the triplet metric constraint to balance distribution while addressing overfit or underfit learning. Last, the improved sampling method can obtain hard negative samples to enhance robustness and effectiveness for ordinal regression.
    Low light images captured by a nonuniform illumination environment are usually degraded with the scene depth and the corresponding environment lights, which results in severe object information loss in the degraded image modality. Different from the existing Salient Object Detection (SOD) methods that conduct SOD directly on original degraded images, Xu et al. eliminate the effect of low illumination by explicitly modeling the physical lighting of the environment for image enhancement. Specifically, an image enhancement approach is proposed to facilitate SOD in low light images and the Non-Local-Block Layer to capture the difference of local content of an object against its local neighborhood-favoring regions. Moreover, a low light image dataset is created to evaluate the performance of SOD.
    Li et al. design the lightweight Dense Connection Distillation Network for Single Image Super-Resolution by combining the feature fusion units and dense connection distillation blocks that include selective cascading and dense distillation components. In every dense connection distillation block, the distillation mechanism helps to reduce training parameters and improve training efficiency, and the layer contrast-aware channel attention further improves performance of the model. The network takes advantage of more useful features (edges, angle, textures, etc.) for image restoration. The experimental results on several benchmark datasets show that the proposed method performs better tradeoff in terms of accuracy and efficiency.
    Finally, Wang offers a comprehensive overview of the existing state of the art in the field multi-modal multimedia data analytics from shallow to deep spaces. Wang claims that the critical issue over existing state of the art is how to perform multimodal collaborations, including adversarial deep multimodal collaboration, so as to better fuse the complementary multimodal information. Throughout this survey, Wang further indicates that the critical components for this field go to collaboration, adversarial competition, and fusion over multimodal spaces. The experimental results of the state-of-the-art deep multimodal/cross-modal architectures over benchmark multimodal datasets are summarized.

    Cited By

    View all
    • (2022)Collusion-Tolerant Data Aggregation Method for Smart GridWireless Algorithms, Systems, and Applications10.1007/978-3-031-19208-1_26(314-325)Online publication date: 24-Nov-2022

    Index Terms

    1. Introduction to Big Multimodal Multimedia Data with Deep Analytics
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image ACM Transactions on Multimedia Computing, Communications, and Applications
            ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 1s
            January 2021
            353 pages
            ISSN:1551-6857
            EISSN:1551-6865
            DOI:10.1145/3453990
            Issue’s Table of Contents

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 31 March 2021
            Published in TOMM Volume 17, Issue 1s

            Permissions

            Request permissions for this article.

            Check for updates

            Qualifiers

            • Introduction
            • Opinion
            • Refereed

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)73
            • Downloads (Last 6 weeks)20
            Reflects downloads up to 26 Jul 2024

            Other Metrics

            Citations

            Cited By

            View all
            • (2022)Collusion-Tolerant Data Aggregation Method for Smart GridWireless Algorithms, Systems, and Applications10.1007/978-3-031-19208-1_26(314-325)Online publication date: 24-Nov-2022

            View Options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format.

            HTML Format

            Get Access

            Login options

            Full Access

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media