TOMM: Vol 19, No 2s

Volume 19, Issue 2sApril 2023

Volume 19, Issue 2s

April 2023

Editor:

Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:1551-6857

EISSN:1551-6865

Tags:

Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Bibliometrics

Issue Downloads

PDFfront matter (TOC, masthead, submission information)

Select All

Export Citations Save to Binder

survey

A Review on Methods and Applications in Multimodal Deep Learning

Article No.: 76, Pages 1–41https://doi.org/10.1145/3545572

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning (MMDL) is to create models that can process and link information using various modalities. Despite the ...

survey

Improved Random Grid-based Cheating Prevention Visual Cryptography Using Latin Square

Article No.: 77, Pages 1–21https://doi.org/10.1145/3550275

Visual cryptography scheme is a method of encrypting secret image into n noiselike shares. The secret image can be reconstructed by stacking adequate shares. In the past two decades, many schemes have been proposed to realize the cheating prevention ...

survey

Video Frame Interpolation: A Comprehensive Survey

Article No.: 78, Pages 1–31https://doi.org/10.1145/3556544

Video Frame Interpolation (VFI) is a fascinating and challenging problem in the computer vision (CV) field, aiming to generate non-existing frames between two consecutive video frames. In recent years, many algorithms based on optical flow, kernel, or ...

research-article

A Decoupled Kernel Prediction Network Guided by Soft Mask for Single Image HDR Reconstruction

Article No.: 79, Pages 1–23https://doi.org/10.1145/3550277

Recent works on single image high dynamic range (HDR) reconstruction fail to hallucinate plausible textures, resulting in information missing and artifacts in large-scale under/over-exposed regions. In this article, a decoupled kernel prediction network ...

research-article

Point Cloud Quality Assessment: Dataset Construction and Learning-based No-reference Metric

Article No.: 80, Pages 1–26https://doi.org/10.1145/3550274

Full-reference (FR) point cloud quality assessment (PCQA) has achieved impressive progress in recent years. However, in many cases, obtaining the reference point clouds is difficult, so no-reference (NR) metrics have become a research hotspot. Few ...

research-article

Pose- and Attribute-consistent Person Image Synthesis

Article No.: 81, Pages 1–21https://doi.org/10.1145/3554739

Person Image Synthesis aims at transferring the appearance of the source person image into a target pose. Existing methods cannot handle large pose variations and therefore suffer from two critical problems: (1) synthesis distortion due to the ...

research-article

Scalable Color Quantization for Task-centric Image Compression

Article No.: 82, Pages 1–18https://doi.org/10.1145/3551389

Conventional image compression techniques targeted for the perceptual quality are not generally optimized for classification tasks using deep neural networks (DNNs). To compress images for DNN inference tasks, recent studies have proposed task-centric ...

research-article

From False-Free to Privacy-Oriented Communitarian Microblogging Social Networks

Article No.: 83, Pages 1–23https://doi.org/10.1145/3555354

Online Social Networks (OSNs) have gained enormous popularity in recent years. They provide a dynamic platform for sharing content (text messages or multimedia) and for facilitating communication between friends and acquaintances. Microblogging services ...

research-article

Query-Guided Prototype Learning with Decoder Alignment and Dynamic Fusion in Few-Shot Segmentation

Article No.: 84, Pages 1–20https://doi.org/10.1145/3555314

Few-shot segmentation aims to segment objects belonging to a specific class under the guidance of a few annotated examples. Most existing approaches follow the prototype learning paradigm and generate category prototypes by squeezing masked feature maps ...

research-article

ML-CookGAN: Multi-Label Generative Adversarial Network for Food Image Generation

Article No.: 85, Pages 1–21https://doi.org/10.1145/3554738

Generating food images from recipe and ingredient information can be applied to many tasks such as food recommendation, recipe development, and health management. For the characteristics of food images, this paper proposes ML-CookGAN, a novel CGAN. This ...

research-article

GHOSM: Graph-based Hybrid Outline and Skeleton Modelling for Shape Recognition

Article No.: 86, Pages 1–23https://doi.org/10.1145/3554922

An efficient and accurate shape detection model plays a major role in many research areas. With the emergence of more complex shapes in real-life applications, shape recognition models need to capture the structure with more effective features to achieve ...

research-article

Distill-DBDGAN: Knowledge Distillation and Adversarial Learning Framework for Defocus Blur Detection

Article No.: 87, Pages 1–26https://doi.org/10.1145/3557897

Defocus blur detection (DBD) aims to segment the blurred regions from a given image affected by defocus blur. It is a crucial pre-processing step for various computer vision tasks. With the increasing popularity of small mobile devices, there is a need ...

research-article

Boosting Relationship Detection in Images with Multi-Granular Self-Supervised Learning

Article No.: 88, Pages 1–18https://doi.org/10.1145/3556978

Visual and spatial relationship detection in images has been a fast-developing research topic in the multimedia field, which learns to recognize the semantic/spatial interactions between objects in an image, aiming to compose a structured semantic ...

research-article

Robust Long-Term Tracking via Localizing Occluders

Article No.: 89, Pages 1–15https://doi.org/10.1145/3557896

Occlusion is known as one of the most challenging factors in long-term tracking because of its unpredictable shape. Existing works devoted into the design of loss functions, training strategies or model architectures, which are considered to have not ...

research-article

Context Prior Guided Semantic Modeling for Biomedical Image Segmentation

Article No.: 90, Pages 1–19https://doi.org/10.1145/3558520

Most state-of-the-art deep networks proposed for biomedical image segmentation are developed based on U-Net. While remarkable success has been achieved, its inherent limitations hinder it from yielding more precise segmentation. First, its receptive field ...

research-article

Open Access

A Optimized BERT for Multimodal Sentiment Analysis

Article No.: 91, Pages 1–12https://doi.org/10.1145/3566126

Sentiment analysis of one modality (e.g., text or image) has been broadly studied. However, not much attention has been paid to the sentiment analysis of multi-modal data. As the research on and applications of multi-modal data analysis are becoming more ...

research-article

Progressive Transformer Machine for Natural Character Reenactment

Article No.: 92, Pages 1–22https://doi.org/10.1145/3559107

Character reenactment aims to control a target person’s full-head movement by a driving monocular sequence that is made up of the driving character video. Current algorithms utilize convolution neural networks in generative adversarial networks, which ...

research-article

Is it Violin or Viola? Classifying the Instruments’ Music Pieces using Descriptive Statistics

Article No.: 93, Pages 1–22https://doi.org/10.1145/3563218

Classifying music pieces based on their instrument sounds is pivotal for analysis and application purposes. Given its importance, techniques using machine learning have been proposed to classify violin and viola music pieces. The violin and viola are two ...

research-article

EiMOL: A Secure Medical Image Encryption Algorithm based on Optimization and the Lorenz System

Article No.: 94, Pages 1–19https://doi.org/10.1145/3561513

Nowadays, the demand for digital images from different intelligent devices and sensors has dramatically increased in smart healthcare. Due to advanced low-cost and easily available tools and software, manipulation of these images is an easy task. Thus, ...

research-article

UEFPN: Unified and Enhanced Feature Pyramid Networks for Small Object Detection

Article No.: 95, Pages 1–21https://doi.org/10.1145/3561824

Object detection models based on feature pyramid networks have made significant progress in general object detection. However, small object detection is still a challenge for the existing models. In this paper, we think that two factors in the existing ...

research-article

Open Access

Deep Learning-Based Intra Mode Derivation for Versatile Video Coding

Article No.: 96, Pages 1–20https://doi.org/10.1145/3563699

In intra coding, Rate Distortion Optimization (RDO) is performed to achieve the optimal intra mode from a pre-defined candidate list. The optimal intra mode is also required to be encoded and transmitted to the decoder side besides the residual signal, ...

research-article

Learning Explicit and Implicit Dual Common Subspaces for Audio-visual Cross-modal Retrieval

Article No.: 97, Pages 1–23https://doi.org/10.1145/3564608

Audio-visual tracks in video contain rich semantic information with potential in many applications and research. Since the audio-visual data have inconsistent distributions and because of the heterogeneous nature of representations, the heterogeneous gap ...

research-article

Real-time Image Enhancement with Attention Aggregation

Article No.: 98, Pages 1–19https://doi.org/10.1145/3564607

Image enhancement has stimulated significant research works over the past years for its great application potential in video conferencing scenarios. Nevertheless, most existing image enhancement approaches are still struggling to find a good tradeoff that ...

research-article

Toward Visual Behavior and Attention Understanding for Augmented 360 Degree Videos

Article No.: 99, Pages 1–24https://doi.org/10.1145/3565024

Augmented reality (AR) overlays digital content onto reality. In an AR system, correct and precise estimations of user visual fixations and head movements can enhance the quality of experience by allocating more computational resources for analyzing, ...

research-article

Mirror Segmentation via Semantic-aware Contextual Contrasted Feature Learning

Article No.: 100, Pages 1–22https://doi.org/10.1145/3566127

Mirrors are everywhere in our daily lives. Existing computer vision systems do not consider mirrors, and hence may get confused by the reflected content inside a mirror, resulting in a severe performance degradation. However, separating the real content ...

ACM Transactions on Multimedia Computing, Communications, and Applications

Sections

Issue Downloads

A Review on Methods and Applications in Multimodal Deep Learning

Improved Random Grid-based Cheating Prevention Visual Cryptography Using Latin Square

Video Frame Interpolation: A Comprehensive Survey

A Decoupled Kernel Prediction Network Guided by Soft Mask for Single Image HDR Reconstruction

Point Cloud Quality Assessment: Dataset Construction and Learning-based No-reference Metric

Pose- and Attribute-consistent Person Image Synthesis

Scalable Color Quantization for Task-centric Image Compression

From False-Free to Privacy-Oriented Communitarian Microblogging Social Networks

Query-Guided Prototype Learning with Decoder Alignment and Dynamic Fusion in Few-Shot Segmentation

ML-CookGAN: Multi-Label Generative Adversarial Network for Food Image Generation

GHOSM: Graph-based Hybrid Outline and Skeleton Modelling for Shape Recognition

Distill-DBDGAN: Knowledge Distillation and Adversarial Learning Framework for Defocus Blur Detection

Boosting Relationship Detection in Images with Multi-Granular Self-Supervised Learning

Robust Long-Term Tracking via Localizing Occluders

Context Prior Guided Semantic Modeling for Biomedical Image Segmentation

A Optimized BERT for Multimodal Sentiment Analysis

Progressive Transformer Machine for Natural Character Reenactment

Is it Violin or Viola? Classifying the Instruments’ Music Pieces using Descriptive Statistics

EiMOL: A Secure Medical Image Encryption Algorithm based on Optimization and the Lorenz System

UEFPN: Unified and Enhanced Feature Pyramid Networks for Small Object Detection

Deep Learning-Based Intra Mode Derivation for Versatile Video Coding

Learning Explicit and Implicit Dual Common Subspaces for Audio-visual Cross-modal Retrieval

Real-time Image Enhancement with Attention Aggregation

Toward Visual Behavior and Attention Understanding for Augmented 360 Degree Videos

Mirror Segmentation via Semantic-aware Contextual Contrasted Feature Learning

Sections

Issue Downloads

Save to Binder

Subjects

Comments