RefinerHash: a new hashing-based re-ranking technique for image retrieval
Re-ranking is a task of refining an initially ranked list of images obtained from an image retrieval technique for a given query image, with the goal of enhancing retrieval performance in an efficient manner. However, existing re-ranking methods ...
Exploring contactless techniques in multimodal emotion recognition: insights into diverse applications, challenges, solutions, and prospects
In recent years, emotion recognition has received significant attention, presenting a plethora of opportunities for application in diverse fields such as human–computer interaction, psychology, and neuroscience, to name a few. Although unimodal ...
An efficient heuristic-aided adaptive autoencoder-based dilated DNN with attention mechanism for enhancing the performance of the MIMO system in 5G communication
On considering modern society, the wireless communication system plays a most significant role. This system has kept evolving and deployed into a wireless system of Fifth Generation (5G). One of the significant factors of the 5G system has ...
Synchronous composition and semantic line detection based on cross-attention
Composition detection and semantic line detection are important research topics in computer vision and play an important auxiliary role in the analysis of image esthetics. However, at present, few researchers have considered the internal ...
Unbinding tensor product representations for image captioning with semantic alignment and complementation
Image captioning, which describes an image with natural language, is an important but challenging multi-modal task. Many state-of-the-art methods generally adopt the encoder–decoder framework to implement information conversion from image modality ...
Semantic-wise guidance for efficient multimodal emotion recognition with missing modalities
Emotions play an important role in human–computer interaction. Multimodal emotion recognition combines feature information from different modalities to recognize emotional states. However, in real application scenarios, data from all modalities ...
Students and teachers learning together: a robust training strategy for neural network pruning
Convolutional neural networks (CNNs) serve as the backbone for extracting image features in the majority of computer vision tasks. In an attempt to make them deployable on small devices, many academics have released small neural networks that they ...
CLDE-Net: crowd localization and density estimation based on CNN and transformer network
- Yaocong Hu,
- Yuanyuan Lin,
- Huicheng Yang,
- Bingyou Liu,
- Guoyang Wan,
- Jinwen Hong,
- Chao Xie,
- Wei Wang,
- Xiaobo Lu
Given a crowd image, there are two ways for human to approximate the counting number: exactly locating head points in each local region or directly estimating the total number of person based on the whole image. By imitating human visual ...
Efficient brain tumour detection system by Cascaded Fully Convolutional Improved DenseNet with Attention-based Adaptive Swin Unet-derived segmentation strategy
The reason behind brain tumors is the rapid and uncontrolled growth of human cells. From this, the developed model motivated to design of the framework for brain tumor detection. Deep learning-assisted developments help to enhance the detection of ...
Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection
The widespread adoption of video surveillance systems in public security and network security domains has underscored the importance of video anomaly detection as a pivotal research area. To enhance the precision and robustness of anomaly ...
CA-CLIP: category-aware adaptation of CLIP model for few-shot class-incremental learning
Few-shot class-incremental learning (FSCIL) learns from continuously arriving new categories, each with only a small number of training samples. As a challenging problem, FSCIL aims to mitigate the catastrophic forgetting of old knowledge while ...
Image retrieval based on deep Tamura feature descriptor
Various levels of visual features have different effects in image retrieval, and deep features can express higher-level features or semantic information. Tamura texture feature belongs to the handcrafted feature, and it can represent texture ...
Link prediction in social networks using hyper-motif representation on hypergraph
Link prediction, a critical pursuit in complex networks research, revolves around the predictive understanding of connections between nodes. Our novel approach introduces a hypergraph to model the network, diverging from the conventional “node–...
A review of deep learning algorithms for modeling drug interactions
The interactions between therapeutics and their targets are an important part of the drug development process. To counter the cost, time and accuracy related issues, novel and efficient DL algorithms are required. These approaches have proven ...
Modeling methods of cylindrical and axisymmetric waterbomb origami based on multi-objective optimization
The basic principle of origami is to use two-dimensional flat materials to obtain various three-dimensional target shapes by folding crease patterns. Among them, the study of waterbomb tessellations inspires the design and functionality ...
Linking unknown characters via oracle bone inscriptions retrieval
Retrieving useful information from existing collections of oracle bone rubbing images plays a pivotal role in the study of oracle bone inscription decipherment. However, current systems for processing oracle bone information rely on expert-curated ...
A novel multiagent system for cervical motor control evaluation and individualized therapy: integrating gamification and portable solutions
- André Filipe Sales Mendes,
- Héctor Sánchez San Blas,
- Fátima Pérez Robledo,
- Juan F. De Paz Santana,
- Gabriel Villarrubia González
The study focused on designing a portable, objective device for assessing and addressing Cervical Motor Control (CMC) impairments. This device is based on a proposed architecture that employs advanced technology to evaluate and enhance patients’ ...
Joint frame-level and CTU-level rate control based on constant perceptual quality
For a given network bandwidth, optimizing the rate control to achieve the best compression performance is essential in video communication and storage. Besides the enhancement of the overall coding efficiency, the rate control algorithm for ...
PathNet: a novel multi-pathway convolutional neural network for few-shot image classification from scratch
In recent years, advanced computer vision models have trended toward deeper and larger network architectures, and model depth is often considered an important feature for achieving superior performance. While deeper networks can help solve complex ...
A visual analysis approach for data transformation via domain knowledge and intelligent models
- Haiyang Zhu,
- Jun Yin,
- Chengcan Chu,
- Minfeng Zhu,
- Yating Wei,
- Jiacheng Pan,
- Dongming Han,
- Xuwei Tan,
- Wei Chen
Industry benchmarking involves comparing and analyzing a company’s performance with other top-performing enterprises. PDF documents contain valuable corporate information, but their non-editable nature makes data extraction complex. This study ...
IS-DGM: an improved steganography method based on a deep generative model and hyper logistic map encryption via social media networks
The exchange of information through social networking sites has become a major risk due to the possibility of obtaining millions of subscribers’ data at any time without the right. Multimedia security is a multifaceted field that involves various ...
MF-DAT: a stock trend prediction of the double-graph attention network based on multisource information fusion
Stock forecasting research, which aims to predict the future price movement of stocks, has been the focus of investors and scholars. This is important for practical applications related to human-centric computing and information sciences. Previous ...
A novel exponent–sine–cosine chaos map-based multiple-image encryption technique
This paper proposes a multiple-image encryption (MIE) approach that uses a novel exponent–sine–cosine (ESC) chaotic map along with the dynamic permutation and DNA-based diffusion. In the first phase of the proposed approach, the three components ...
PointCMC: cross-modal multi-scale correspondences learning for point cloud understanding
Existing cross-modal frameworks have achieved impressive performance in point cloud object representations learning, where a 2D image encoder is employed to transfer knowledge to a 3D point cloud encoder. However, the local structures between ...
Deep Learning-based forgery detection and localization for compressed images using a hybrid optimization model
Manipulation of digital images has become quite common in recent years because of the rise of various image editing tools. It has become a challenging task to identify authentic and tampered images, since tampered images are non-distinguishable by ...
Pimo: memory-efficient privacy protection in video streaming and analytics
Video streaming from cameras to backend cloud or edge servers for neural-based analytics has gained significant popularity. However, the transmission of data from cameras to a backend raises substantial privacy concerns, particularly regarding ...
Improving collaborative filtering with SNE–GCN: a second-order neighbor enhanced graph convolutional network
Graph collaborative filtering uses user-item interactions to capture user preferences for items. While this approach proves highly effective, its performance may suffer from the sparse user-item interactions. On one hand, existing methods lack ...
Visual transductive learning via iterative label correction
Unsupervised domain adaptation (UDA) aims to transfer knowledge across domains when there is no labeled data available in the target domain. In this way, UDA methods attempt to utilize pseudo-labeled target samples to align distribution across the ...
Dual-path temporal map optimization for make-up temporal video grounding
Make-up temporal video grounding (MTVG) aims to localize the target video segment, which is semantically related to a sentence describing a make-up activity in a make-up video. Compared with the general video grounding, MTVG focuses on meticulous ...