Issue Downloads
Efficient Light Field Image Compression with Enhanced Random Access
In light field image compression, facilitating random access to individual views plays a significant role in decoding views quickly, reducing memory footprint, and decreasing the bandwidth requirement for transmission. Highly efficient light field image ...
Evaluation of an Intervention Program Based on Mobile Apps to Learn Sexism Prevention in Teenagers
The fight against sexism is nowadays one of the flagship social movements in western countries. Adolescence is a crucial period, and some empirical studies have focused on the socialization of teenagers, proving that the socialization with the surrounding ...
Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition
Rapid progress and superior performance have been achieved for skeleton-based action recognition recently. In this article, we investigate this problem under a cross-dataset setting, which is a new, pragmatic, and challenging task in real-world scenarios. ...
An Effective Forest Fire Detection Framework Using Heterogeneous Wireless Multimedia Sensor Networks
With improvements in the area of Internet of Things (IoT), surveillance systems have recently become more accessible. At the same time, optimizing the energy requirements of smart sensors, especially for data transmission, has always been very ...
Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training
Vision-language pre-training has been an emerging and fast-developing research topic, which transfers multi-modal knowledge from rich-resource pre-training task to limited-resource downstream tasks. Unlike existing works that predominantly learn a single ...
Cascaded Structure-Learning Network with Using Adversarial Training for Robust Facial Landmark Detection
Recently, great progress has been achieved on facial landmark detection based on convolutional neural network, while it is still challenging due to partial occlusion and extreme head pose. In this paper, we propose a Cascaded Structure-Learning Network (...
Machine Learning Based Content-Agnostic Viewport Prediction for 360-Degree Video
Accurate and fast estimations or predictions of the (near) future location of the users of head-mounted devices within the virtual omnidirectional environment open a plethora of opportunities in application domains such as interactive immersive gaming and ...
Generating Virtual Wire Sculptural Art from 3D Models
Wire sculptures are objects sculpted by the use of wires. In this article, we propose practical methods to create 3D virtual wire sculptural art from a given 3D model. In contrast, most of the previous 3D wire art results are reconstructed from input 2D ...
Response Generation by Jointly Modeling Personalized Linguistic Styles and Emotions
Natural language generation (NLG) has been an essential technique for various applications, like XiaoIce and Siri, and engaged increasing attention recently. To improve the user experience, several emotion-aware NLG methods have been developed to generate ...
An l½ and Graph Regularized Subspace Clustering Method for Robust Image Segmentation
Segmenting meaningful visual structures from an image is a fundamental and most-addressed problem in image analysis algorithms. However, among factors such as diverse visual patterns, noise, complex backgrounds, and similar textures present in foreground ...
Will You Ever Become Popular? Learning to Predict Virality of Dance Clips
Dance challenges are going viral in video communities like TikTok nowadays. Once a challenge becomes popular, thousands of short-form videos will be uploaded within a couple of days. Therefore, virality prediction from dance challenges is of great ...
Deep Semantic and Attentive Network for Unsupervised Video Summarization
With the rapid growth of video data, video summarization is a promising approach to shorten a lengthy video into a compact version. Although supervised summarization approaches have achieved state-of-the-art performance, they require frame-level annotated ...
Moment is Important: Language-Based Video Moment Retrieval via Adversarial Learning
The newly emerging language-based video moment retrieval task aims at retrieving a target video moment from an untrimmed video given a natural language as the query. It is more applicable in reality since it is able to accurately localize a specific video ...
Learning Transferable Perturbations for Image Captioning
Present studies have discovered that state-of-the-art deep learning models can be attacked by small but well-designed perturbations. Existing attack algorithms for the image captioning task is time-consuming, and their generated adversarial examples ...
SADnet: Semi-supervised Single Image Dehazing Method Based on an Attention Mechanism
Many real-life tasks such as military reconnaissance and traffic monitoring require high-quality images. However, images acquired in foggy or hazy weather pose obstacles to the implementation of these real-life tasks; consequently, image dehazing is an ...
Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval
Composing Text and Image to Image Retrieval (CTI-IR) is an emerging task in computer vision, which allows retrieving images relevant to a query image with text describing desired modifications to the query image. Most conventional cross-modal retrieval ...
Structure-aware Meta-fusion for Image Super-resolution
There are two main categories of image super-resolution algorithms: distortion oriented and perception oriented. Recent evidence shows that reconstruction accuracy and perceptual quality are typically in disagreement with each other. In this article, we ...
Non-Acted Text and Keystrokes Database and Learning Methods to Recognize Emotions
The modern computing applications are presently adapting to the convenient availability of huge and diverse data for making their pattern recognition methods smarter. Identification of dominant emotion solely based on the text data generated by humans is ...
Transform, Warp, and Dress: A New Transformation-guided Model for Virtual Try-on
Virtual try-on has recently emerged in computer vision and multimedia communities with the development of architectures that can generate realistic images of a target person wearing a custom garment. This research interest is motivated by the large role ...
Adversarial Multi-Grained Embedding Network for Cross-Modal Text-Video Retrieval
Cross-modal retrieval between texts and videos has received consistent research interest in the multimedia community. Existing studies follow a trend of learning a joint embedding space to measure the distance between text and video representations. In ...
Fully Unsupervised Person Re-Identification via Selective Contrastive Learning
Person re-identification (ReID) aims at searching the same identity person among images captured by various cameras. Existing fully supervised person ReID methods usually suffer from poor generalization capability caused by domain gaps. Unsupervised ...
Music2Dance: DanceNet for Music-Driven Dance Generation
Synthesize human motions from music (i.e., music to dance) is appealing and has attracted lots of research interests in recent years. It is challenging because of the requirement for realistic and complex human motions for dance, but more importantly, the ...
Understanding and Creating Art with AI: Review and Outlook
Technologies related to artificial intelligence (AI) have a strong impact on the changes of research and creative practices in visual arts. The growing number of research initiatives and creative applications that emerge in the intersection of AI and art ...