Export Citations
Issue Downloads
Dual-Stream Guided-Learning via a Priori Optimization for Person Re-identification
The task of person re-identification (re-ID) is to find the same pedestrian across non-overlapping camera views. Generally, the performance of person re-ID can be affected by background clutter. However, existing segmentation algorithms cannot obtain ...
Adaptive Compression for Online Computer Vision: An Edge Reinforcement Learning Approach
With the growth of computer vision-based applications, an explosive amount of images have been uploaded to cloud servers that host such online computer vision algorithms, usually in the form of deep learning models. JPEG has been used as the de facto ...
Dual-Stream Structured Graph Convolution Network for Skeleton-Based Action Recognition
In this work, we propose a dual-stream structured graph convolution network (DS-SGCN) to solve the skeleton-based action recognition problem. The spatio-temporal coordinates and appearance contexts of the skeletal joints are jointly integrated into the ...
Unsupervised Domain Expansion for Visual Categorization
Expanding visual categorization into a novel domain without the need of extra annotation has been a long-term interest for multimedia intelligence. Previously, this challenge has been approached by unsupervised domain adaptation (UDA). Given labeled data ...
Task-independent Recognition of Communication Skills in Group Interaction Using Time-series Modeling
Case studies of group discussions are considered an effective way to assess communication skills (CS). This method can help researchers evaluate participants’ engagement with each other in a specific realistic context. In this article, multimodal analysis ...
Where Are They Going? Predicting Human Behaviors in Crowded Scenes
In this article, we propose a framework for crowd behavior prediction in complicated scenarios. The fundamental framework is designed using the standard encoder-decoder scheme, which is built upon the long short-term memory module to capture the temporal ...
Using Multisensory Content to Impact the Quality of Experience of Reading Digital Books
- Ellen P. Silva,
- Natália Vieira,
- Glauco Amorim,
- Renata Mousinho,
- Gustavo Guedes,
- Gheorghita Ghinea,
- Joel A. F. Dos Santos
Multisensorial books enrich a story with either traditional multimedia content or sensorial effects. The main idea is to increase children’s interest in reading by enhancing their QoE while reading. Studies on enriched and/or augmented e-books also ...
Bi-Directional Co-Attention Network for Image Captioning
Image Captioning, which automatically describes an image with natural language, is regarded as a fundamental challenge in computer vision. In recent years, significant advance has been made in image captioning through improving attention mechanism. ...
Cross-Domain Object Representation via Robust Low-Rank Correlation Analysis
Cross-domain data has become very popular recently since various viewpoints and different sensors tend to facilitate better data representation. In this article, we propose a novel cross-domain object representation algorithm (RLRCA) which not only ...
Cross-Modal Hybrid Feature Fusion for Image-Sentence Matching
Image-sentence matching is a challenging task in the field of language and vision, which aims at measuring the similarities between images and sentence descriptions. Most existing methods independently map the global features of images and sentences into ...
Fine-Grained Visual Textual Alignment for Cross-Modal Retrieval Using Transformer Encoders
- Nicola Messina,
- Giuseppe Amato,
- Andrea Esuli,
- Fabrizio Falchi,
- Claudio Gennaro,
- Stéphane Marchand-Maillet
Despite the evolution of deep-learning-based visual-textual processing systems, precise multi-modal matching remains a challenging task. In this work, we tackle the task of cross-modal retrieval through image-sentence matching based on word-region ...
Health Status Prediction with Local-Global Heterogeneous Behavior Graph
Health management is getting increasing attention all over the world. However, existing health management mainly relies on hospital examination and treatment, which are complicated and untimely. The emergence of mobile devices provides the possibility to ...
Perceptual Quality Assessment of Low-light Image Enhancement
Low-light image enhancement algorithms (LIEA) can light up images captured in dark or back-lighting conditions. However, LIEA may introduce various distortions such as structure damage, color shift, and noise into the enhanced images. Despite various ...
Dissimilarity-Based Regularized Learning of Charts
Chart images exhibit significant variabilities that make each image different from others even though they belong to the same class or categories. Classification of charts is a major challenge because each chart class has variations in features, structure,...
A New Foreground-Background based Method for Behavior-Oriented Social Media Image Classification
- Lokesh Nandanwar,
- Palaiahnakote Shivakumara,
- Divya Krishnani,
- Raghavendra Ramachandra,
- Tong Lu,
- Umapada Pal,
- Mohan Kankanhalli
Due to various applications, research on personal traits using information on social media has become an important area. In this paper, a new method for the classification of behavior-oriented social images uploaded on various social media platforms is ...
An Adaptive Bitrate Switching Algorithm for Speech Applications in Context of WebRTC
Web Real-Time Communication (WebRTC) combines a set of standards and technologies to enable high-quality audio, video, and auxiliary data exchange in web browsers and mobile applications. It enables peer-to-peer multimedia sessions over IP networks ...
A Fast View Synthesis Implementation Method for Light Field Applications
View synthesis (VS) for light field images is a very time-consuming task due to the great quantity of involved pixels and intensive computations, which may prevent it from the practical three-dimensional real-time systems. In this article, we propose an ...
Bayesian Covariance Representation with Global Informative Prior for 3D Action Recognition
For the merits of high-order statistics and Riemannian geometry, covariance matrix has become a generic feature representation for action recognition. An independent action can be represented by an empirical statistics over all of its pose samples. Two ...
Pedestrian-Aware Panoramic Video Stitching Based on a Structured Camera Array
The panorama stitching system is an indispensable module in surveillance or space exploration. Such a system enables the viewer to understand the surroundings instantly by aligning the surrounding images on a plane and fusing them naturally. The ...
Y-Net: Dual-branch Joint Network for Semantic Segmentation
Most existing segmentation networks are built upon a “U-shaped” encoder–decoder structure, where the multi-level features extracted by the encoder are gradually aggregated by the decoder. Although this structure has been proven to be effective in ...
Detecting Non-Aligned Double JPEG Compression Based on Amplitude-Angle Feature
Due to the popularity of JPEG format images in recent years, JPEG images will inevitably involve image editing operation. Thus, some tramped images will leave tracks of Non-aligned double JPEG (NA-DJPEG) compression. By detecting the presence of NA-DJPEG ...
Residual-guided In-loop Filter Using Convolution Neural Network
The block-based coding structure in the hybrid video coding framework inevitably introduces compression artifacts such as blocking, ringing, and so on. To compensate for those artifacts, extensive filtering techniques were proposed in the loop of video ...
Trust Mechanism of Feedback Trust Weight in Multimedia Network
It is necessary to solve the inaccurate data arising from data reliability ignored by most data fusion algorithms drawing upon collaborative filtering and fuzzy network theory. Therefore, a model is constructed based on the collaborative filtering ...