TOMM: Vol 17, No 4

Volume 17, Issue 4November 2021

Volume 17, Issue 4

November 2021

Editor:

Alberto Del Bimbo
University of Firenze, Italy

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:1551-6857

EISSN:1551-6865

Tags:

Subscribe to Journal Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Issue Downloads

PDFfront matter (TOC, masthead, submission information)

Select All

Export Citations Save to Binder

introduction

Free

Table of Contents: Online Supplement Volume 17, Number 2s-3s

Article No.: 117e, Pages 1–5https://doi.org/10.1145/3507468

research-article

Dual-Stream Guided-Learning via a Priori Optimization for Person Re-identification

Article No.: 117, Pages 1–22https://doi.org/10.1145/3447715

The task of person re-identification (re-ID) is to find the same pedestrian across non-overlapping camera views. Generally, the performance of person re-ID can be affected by background clutter. However, existing segmentation algorithms cannot obtain ...

research-article

Adaptive Compression for Online Computer Vision: An Edge Reinforcement Learning Approach

Article No.: 118, Pages 1–23https://doi.org/10.1145/3447878

With the growth of computer vision-based applications, an explosive amount of images have been uploaded to cloud servers that host such online computer vision algorithms, usually in the form of deep learning models. JPEG has been used as the de facto ...

research-article

Smart Director: An Event-Driven Directing System for Live Broadcasting

Article No.: 119, Pages 1–18https://doi.org/10.1145/3448981

Live video broadcasting normally requires a multitude of skills and expertise with domain knowledge to enable multi-camera productions. As the number of cameras keeps increasing, directing a live sports broadcast has now become more complicated and ...

research-article

Dual-Stream Structured Graph Convolution Network for Skeleton-Based Action Recognition

Article No.: 120, Pages 1–22https://doi.org/10.1145/3450410

In this work, we propose a dual-stream structured graph convolution network (DS-SGCN) to solve the skeleton-based action recognition problem. The spatio-temporal coordinates and appearance contexts of the skeletal joints are jointly integrated into the ...

research-article

Unsupervised Domain Expansion for Visual Categorization

Article No.: 121, Pages 1–24https://doi.org/10.1145/3448108

Expanding visual categorization into a novel domain without the need of extra annotation has been a long-term interest for multimedia intelligence. Previously, this challenge has been approached by unsupervised domain adaptation (UDA). Given labeled data ...

research-article

Task-independent Recognition of Communication Skills in Group Interaction Using Time-series Modeling

Article No.: 122, Pages 1–27https://doi.org/10.1145/3450283

Case studies of group discussions are considered an effective way to assess communication skills (CS). This method can help researchers evaluate participants’ engagement with each other in a specific realistic context. In this article, multimodal analysis ...

research-article

Where Are They Going? Predicting Human Behaviors in Crowded Scenes

Article No.: 123, Pages 1–19https://doi.org/10.1145/3449359

In this article, we propose a framework for crowd behavior prediction in complicated scenarios. The fundamental framework is designed using the standard encoder-decoder scheme, which is built upon the long short-term memory module to capture the temporal ...

research-article

Using Multisensory Content to Impact the Quality of Experience of Reading Digital Books

Article No.: 124, Pages 1–18https://doi.org/10.1145/3458676

Multisensorial books enrich a story with either traditional multimedia content or sensorial effects. The main idea is to increase children’s interest in reading by enhancing their QoE while reading. Studies on enriched and/or augmented e-books also ...

research-article

Bi-Directional Co-Attention Network for Image Captioning

Article No.: 125, Pages 1–20https://doi.org/10.1145/3460474

Image Captioning, which automatically describes an image with natural language, is regarded as a fundamental challenge in computer vision. In recent years, significant advance has been made in image captioning through improving attention mechanism. ...

research-article

Cross-Domain Object Representation via Robust Low-Rank Correlation Analysis

Article No.: 126, Pages 1–20https://doi.org/10.1145/3458825

Cross-domain data has become very popular recently since various viewpoints and different sensors tend to facilitate better data representation. In this article, we propose a novel cross-domain object representation algorithm (RLRCA) which not only ...

research-article

Cross-Modal Hybrid Feature Fusion for Image-Sentence Matching

Article No.: 127, Pages 1–23https://doi.org/10.1145/3458281

Image-sentence matching is a challenging task in the field of language and vision, which aims at measuring the similarities between images and sentence descriptions. Most existing methods independently map the global features of images and sentences into ...

research-article

Fine-Grained Visual Textual Alignment for Cross-Modal Retrieval Using Transformer Encoders

Article No.: 128, Pages 1–23https://doi.org/10.1145/3451390

Despite the evolution of deep-learning-based visual-textual processing systems, precise multi-modal matching remains a challenging task. In this work, we tackle the task of cross-modal retrieval through image-sentence matching based on word-region ...

research-article

Open Access

Health Status Prediction with Local-Global Heterogeneous Behavior Graph

Article No.: 129, Pages 1–21https://doi.org/10.1145/3457893

Health management is getting increasing attention all over the world. However, existing health management mainly relies on hospital examination and treatment, which are complicated and untimely. The emergence of mobile devices provides the possibility to ...

research-article

Perceptual Quality Assessment of Low-light Image Enhancement

Article No.: 130, Pages 1–24https://doi.org/10.1145/3457905

Low-light image enhancement algorithms (LIEA) can light up images captured in dark or back-lighting conditions. However, LIEA may introduce various distortions such as structure damage, color shift, and noise into the enhanced images. Despite various ...

research-article

Dissimilarity-Based Regularized Learning of Charts

Article No.: 131, Pages 1–23https://doi.org/10.1145/3458884

Chart images exhibit significant variabilities that make each image different from others even though they belong to the same class or categories. Classification of charts is a major challenge because each chart class has variations in features, structure,...

research-article

A New Foreground-Background based Method for Behavior-Oriented Social Media Image Classification

Article No.: 132, Pages 1–25https://doi.org/10.1145/3458051

Due to various applications, research on personal traits using information on social media has become an important area. In this paper, a new method for the classification of behavior-oriented social images uploaded on various social media platforms is ...

research-article

Open Access

An Adaptive Bitrate Switching Algorithm for Speech Applications in Context of WebRTC

Article No.: 133, Pages 1–21https://doi.org/10.1145/3458751

Web Real-Time Communication (WebRTC) combines a set of standards and technologies to enable high-quality audio, video, and auxiliary data exchange in web browsers and mobile applications. It enables peer-to-peer multimedia sessions over IP networks ...

research-article

A Fast View Synthesis Implementation Method for Light Field Applications

Article No.: 134, Pages 1–20https://doi.org/10.1145/3459098

View synthesis (VS) for light field images is a very time-consuming task due to the great quantity of involved pixels and intensive computations, which may prevent it from the practical three-dimensional real-time systems. In this article, we propose an ...

research-article

Bayesian Covariance Representation with Global Informative Prior for 3D Action Recognition

Article No.: 135, Pages 1–22https://doi.org/10.1145/3460235

For the merits of high-order statistics and Riemannian geometry, covariance matrix has become a generic feature representation for action recognition. An independent action can be represented by an empirical statistics over all of its pose samples. Two ...

research-article

Pedestrian-Aware Panoramic Video Stitching Based on a Structured Camera Array

Article No.: 136, Pages 1–24https://doi.org/10.1145/3460511

The panorama stitching system is an indispensable module in surveillance or space exploration. Such a system enables the viewer to understand the surroundings instantly by aligning the surrounding images on a plane and fusing them naturally. The ...

research-article

Y-Net: Dual-branch Joint Network for Semantic Segmentation

Article No.: 137, Pages 1–22https://doi.org/10.1145/3460940

Most existing segmentation networks are built upon a “U-shaped” encoder–decoder structure, where the multi-level features extracted by the encoder are gradually aggregated by the decoder. Although this structure has been proven to be effective in ...

research-article

Detecting Non-Aligned Double JPEG Compression Based on Amplitude-Angle Feature

Article No.: 138, Pages 1–18https://doi.org/10.1145/3464388

Due to the popularity of JPEG format images in recent years, JPEG images will inevitably involve image editing operation. Thus, some tramped images will leave tracks of Non-aligned double JPEG (NA-DJPEG) compression. By detecting the presence of NA-DJPEG ...

research-article

Residual-guided In-loop Filter Using Convolution Neural Network

Article No.: 139, Pages 1–19https://doi.org/10.1145/3460820

The block-based coding structure in the hybrid video coding framework inevitably introduces compression artifacts such as blocking, ringing, and so on. To compensate for those artifacts, extensive filtering techniques were proposed in the loop of video ...

research-article

Trust Mechanism of Feedback Trust Weight in Multimedia Network

Article No.: 140, Pages 1–26https://doi.org/10.1145/3391296

It is necessary to solve the inaccurate data arising from data reliability ignored by most data fusion algorithms drawing upon collaborative filtering and fuzzy network theory. Therefore, a model is constructed based on the collaborative filtering ...

ACM Transactions on Multimedia Computing, Communications, and Applications

Sections

Issue Downloads

Table of Contents: Online Supplement Volume 17, Number 2s-3s

Dual-Stream Guided-Learning via a Priori Optimization for Person Re-identification

Adaptive Compression for Online Computer Vision: An Edge Reinforcement Learning Approach

Smart Director: An Event-Driven Directing System for Live Broadcasting

Dual-Stream Structured Graph Convolution Network for Skeleton-Based Action Recognition

Unsupervised Domain Expansion for Visual Categorization

Task-independent Recognition of Communication Skills in Group Interaction Using Time-series Modeling

Where Are They Going? Predicting Human Behaviors in Crowded Scenes

Using Multisensory Content to Impact the Quality of Experience of Reading Digital Books

Bi-Directional Co-Attention Network for Image Captioning

Cross-Domain Object Representation via Robust Low-Rank Correlation Analysis

Cross-Modal Hybrid Feature Fusion for Image-Sentence Matching

Fine-Grained Visual Textual Alignment for Cross-Modal Retrieval Using Transformer Encoders

Health Status Prediction with Local-Global Heterogeneous Behavior Graph

Perceptual Quality Assessment of Low-light Image Enhancement

Dissimilarity-Based Regularized Learning of Charts

A New Foreground-Background based Method for Behavior-Oriented Social Media Image Classification

An Adaptive Bitrate Switching Algorithm for Speech Applications in Context of WebRTC

A Fast View Synthesis Implementation Method for Light Field Applications

Bayesian Covariance Representation with Global Informative Prior for 3D Action Recognition

Pedestrian-Aware Panoramic Video Stitching Based on a Structured Camera Array

Y-Net: Dual-branch Joint Network for Semantic Segmentation

Detecting Non-Aligned Double JPEG Compression Based on Amplitude-Angle Feature

Residual-guided In-loop Filter Using Convolution Neural Network

Trust Mechanism of Feedback Trust Weight in Multimedia Network

Sections

Issue Downloads

Save to Binder

Subjects

Comments