introduction

Free access

Introduction to Big Multimodal Multimedia Data with Deep Analytics

Authors:

Dacheng TaoAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 17, Issue 1s

Article No.: 1, Pages 1 - 3

https://doi.org/10.1145/3447530

Published: 31 March 2021 Publication History

All formats PDF

Big multimodal multimedia data can benefit our understanding of humans and society in a subversive way, and the massive data from different media sources requires a feasible manner to process potential information. The recent advances in deep learning can help researchers handle big multimedia data analytics better in order to study user behavior patterns and understand practical implications for various applications. This summary reports some recent advancements in this area. The special issue presents nine articles after a careful peer review process. The addressed topics cover cross-modal retrieval to deep model design, representation learning, clustering, and image processing, as well as a comprehensive survey of big multimodal multimedia data analytics.

The purpose of cross-modal retrieval is to find the relationship between different modal samples, and to retrieve other modal samples with similar semantics by using a certain modal sample. However, the existing methods often ignore the semantic correlation between the same modalities among different multimodal samples. To overcome this challenge, Zhang et al. propose HCMSL, a novel hybrid cross-model similarity learning model, which aims to capture sufficient semantic information from both labeled and unlabeled cross-model pairs and intramodel pairs with the same classification label. In addition, two Siamese convolutional neural network models are employed to learn intramodal similarity from samples of the same modality. These intramodal similarities are fused with cross-modal similarity to construct hybrid cross-modal similarity loss, so as to transform intramodal semantic correlation into cross-modal similarity to train a common subspace learning model.

Xu et al. address the problem of existing Zero-Shot Cross-Modal Retrieval (ZS-CMR) models (i.e., poor generalization ability in a zero-shot setting and relatively inferior performance). To this end, a novel method named AAEGAN (Assembling AutoEncoder and Generative Adversarial Network) for the more realistic ZS-CMR scenario is proposed, which combines the strength of AutoEncoder and Generative Adversarial Network, to jointly incorporate common latent space learning, knowledge transfer, and feature synthesis for ZS-CMR. In addition, the novel constraint of distribution alignment can preserve the semantic compatibility between modalities while enhancing the learning of common latent space; this will be beneficial to learn more robust common space.

Fu et al. show that most of the existing graph convolutional network-based methods depend on static data structural relationships, resulting in extracted data features lacking representativeness during the convolution process. To resolve that, they develop the Dynamic Graph Learning Convolutional Network (DGLCN) with semisupervised learning to yield a dynamic graph structure learning model. With the single-layer propagation rule of the DGLCN obtained by optimizing the spectral dynamic graph convolution, due to the fusion of the optimized structural information, the multilayer DGLCN can extract richer sample features to improve classification performance.

Despite the good performance of low-rank coding-based representation learning in discovering and recovering the subspace structures in data, its single-layer structure determines that deep hidden information cannot be obtained. The studies of Zhang et al. propose DLRF-Net, a new and progressive deep latent low-rank fusion network to uncover deep features and the clustering structures embedded in latent subspaces while also obtaining deep hidden information to ensure the representation learning of deeper layers to discover the underlying clean subspaces. In addition, as indicated, DLRF-Net is general and is applicable to most existing latent low-rank representation models.

With the aim of effectively addressing the missing features among data of Gaussian Mixture Model (GMM) clustering, Zhang et al. propose to integrate the imputation and GMM clustering into a unified learning procedure. Specifically, the missing data is filled by the result of GMM clustering, and the imputed data is then taken for GMM clustering. These two steps alternatively negotiate with each other to achieve an optimum result. Further, a two-step alternative algorithm with proved convergence is designed to solve the resultant optimization problem.

The aim of the study by Zhang et al. focus on the user credit grading problem to achieve anomaly detection, risk early warning, personalized information, and service recommendation for privileged users. First, the three defined naturally ordered categories based on user registration and behavior information can formulate user credit grading as the ordinal regression problem. To avoid the fragility of KDLOR (Kernel Discriminant Learning for Ordinal Regression), they adopt a robust sampling model with the triplet metric constraint to balance distribution while addressing overfit or underfit learning. Last, the improved sampling method can obtain hard negative samples to enhance robustness and effectiveness for ordinal regression.

Low light images captured by a nonuniform illumination environment are usually degraded with the scene depth and the corresponding environment lights, which results in severe object information loss in the degraded image modality. Different from the existing Salient Object Detection (SOD) methods that conduct SOD directly on original degraded images, Xu et al. eliminate the effect of low illumination by explicitly modeling the physical lighting of the environment for image enhancement. Specifically, an image enhancement approach is proposed to facilitate SOD in low light images and the Non-Local-Block Layer to capture the difference of local content of an object against its local neighborhood-favoring regions. Moreover, a low light image dataset is created to evaluate the performance of SOD.

Li et al. design the lightweight Dense Connection Distillation Network for Single Image Super-Resolution by combining the feature fusion units and dense connection distillation blocks that include selective cascading and dense distillation components. In every dense connection distillation block, the distillation mechanism helps to reduce training parameters and improve training efficiency, and the layer contrast-aware channel attention further improves performance of the model. The network takes advantage of more useful features (edges, angle, textures, etc.) for image restoration. The experimental results on several benchmark datasets show that the proposed method performs better tradeoff in terms of accuracy and efficiency.

Finally, Wang offers a comprehensive overview of the existing state of the art in the field multi-modal multimedia data analytics from shallow to deep spaces. Wang claims that the critical issue over existing state of the art is how to perform multimodal collaborations, including adversarial deep multimodal collaboration, so as to better fuse the complementary multimodal information. Throughout this survey, Wang further indicates that the critical components for this field go to collaboration, adversarial competition, and fusion over multimodal spaces. The experimental results of the state-of-the-art deep multimodal/cross-modal architectures over benchmark multimodal datasets are summarized.

Cited By

View all

Cao LChen YCai KWang DLuo YXue G(2022)Collusion-Tolerant Data Aggregation Method for Smart GridWireless Algorithms, Systems, and Applications10.1007/978-3-031-19208-1_26(314-325)Online publication date: 24-Nov-2022
https://dl.acm.org/doi/10.1007/978-3-031-19208-1_26

Index Terms

Introduction to Big Multimodal Multimedia Data with Deep Analytics

Index terms have been assigned to the content through auto-classification.

Recommendations

Big Data Analytics
Multimedia Big Data Analytics: A Survey

With the proliferation of online services and mobile technologies, the world has stepped into a multimedia big data era. A vast amount of research work has been done in the multimedia area, targeting different aspects of big data analytics, such as the ...
Big Data Analytics with R and Hadoop

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 1s

January 2021

353 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3453990

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 March 2021

Published in TOMM Volume 17, Issue 1s

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Introduction
Opinion
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
306
Total Downloads

Downloads (Last 12 months)73
Downloads (Last 6 weeks)20

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Cao LChen YCai KWang DLuo YXue G(2022)Collusion-Tolerant Data Aggregation Method for Smart GridWireless Algorithms, Systems, and Applications10.1007/978-3-031-19208-1_26(314-325)Online publication date: 24-Nov-2022
https://dl.acm.org/doi/10.1007/978-3-031-19208-1_26

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Big Data Analytics

Multimedia Big Data Analytics: A Survey

Big Data Analytics with R and Hadoop

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations