Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Volume 247, Issue COct 2024
Reflects downloads up to 20 Feb 2025Bibliometrics
Skip Table Of Content Section
Regular papers
research-article
Multi-label image classification using adaptive graph convolutional networks: From a single domain to multiple domains
Abstract

This paper proposes an adaptive graph-based approach for multi-label image classification. Graph-based methods have been largely exploited in the field of multi-label classification, given their ability to model label correlations. Specifically, ...

Highlights

  • A novel graph-based approach for multi-label image classification is proposed.
  • It learns adaptively a graph describing label dependencies.
  • It is extended to cross-domain settings with an adversarial domain adaptation schema.
  • ...

research-article
EnsCLR: Unsupervised skeleton-based action recognition via ensemble contrastive learning of representation
Abstract

Skeleton-based action recognition is a key research area in video understanding, beneficial from its compact and efficient motion information. To relieve from the burden of expensive and laborious data annotation, unsupervised approaches, ...

Highlights

  • EnsCLR takes the motivations of ensemble learning to the contrastive learning.
  • QE method utilizes the ensemble information from multiple pipelines.
  • ENNM method excavates potential positive samples from unlabeled data.

research-article
Low-light image enhancement based on cell vibration energy model and lightness difference
Abstract

Low-light image enhancement algorithms play a crucial role in revealing details obscured by darkness in images and substantially improving overall image quality. However, existing methods often suffer from issues like color or lightness ...

Highlights

  • Fast and Effective Low Light Image Enhancement Algorithm.
  • Dark image enhancement using cell vibration energy model and lightness difference.
  • Adjustment strategy that combines the Weibull distribution with linear mapping.
  • Output ...

research-article
Pseudo initialization based Few-Shot Class Incremental Learning
Abstract

Few-Shot Class Incremental Learning (FSCIL) aims to recognize sequentially arriving new classes without catastrophic forgetting old classes. The incremental new classes only contain very few labeled examples for updating the model, which causes ...

Highlights

  • We propose a novel preserving feature space and pseudo initialization based FSCIL method.
  • We adopt the cosine classifier to avoid catastrophic forgetting and overfitting of the classifier.
  • Regularization are utilized to limit the ...

research-article
Implicit and explicit commonsense for multi-sentence video captioning
Abstract

Existing dense or paragraph video captioning approaches rely on holistic representations of videos, possibly coupled with learned object/action representations, to condition hierarchical language decoders. However, they fundamentally lack the ...

Highlights

  • A new task of video-based instruction generation that requires commonsense knowledge.
  • A new model using implicit and explicit commonsense to enhance sentence prediction.
  • We analyze the contributions of knowledge made toward ...

research-article
Enhanced dual contrast representation learning with cell separation and merging for breast cancer diagnosis
Abstract

Breast cancer remains a prevalent malignancy impacting a substantial number of individuals globally. In recent times, there has been a growing trend of combining deep learning methods with breast cancer diagnosis. Nevertheless, this integration ...

Highlights

  • The LLM SAM is introduced for effective fine-grained masks generation.
  • The cell separation and merging strategy enriches and balances the data distribution.
  • The dual contrast representation learning enhances the feature ...

research-article
Advancing Image Generation with Denoising Diffusion Probabilistic Model and ConvNeXt-V2: A novel approach for enhanced diversity and quality
Abstract

In the rapidly evolving domain of image generation, the availability of sufficient data is crucial for effective model training. However, obtaining a large dataset is often challenging. Medical imaging, industrial monitoring, and self-driving ...

Highlights

  • Novel Approach with DDPM and ConvNeXt-V2: Combines DDPM and ConvNeXt-V2 to enhance image diversity and quality from a single input.
  • High Performance and Robustness: The model excels with a Pixel Diversity score of 0.87, LPIPS ...

research-article
Object discriminability re-extraction for distractor-aware visual object tracking
Abstract

The similar distractor problem is one of the most difficult challenges for Siamese-based trackers. Since they formulate the visual tracking task as a similar matching problem, these trackers involve an essential problem that they are sensitive to ...

Highlights

  • ODR-Net re-extracts features to distinguish targets from distractors.
  • Integrates seamlessly into existing Siamese trackers without retraining.
  • Simple yet effective, ODR-Net activates when similar distractors appear.

research-article
Subtle signals: Video-based detection of infant non-nutritive sucking as a neurodevelopmental cue
Abstract

Non-nutritive sucking (NNS), which refers to the act of sucking on a pacifier, finger, or similar object without nutrient intake, plays a crucial role in assessing healthy early development. In the case of preterm infants, NNS behavior is a key ...

Highlights

  • First manually annotated infant NNS video datasets.
  • NNS system using convolutional LSTM network.
  • Comparative analysis with spatiotemporal models.
  • NNS segmentation with sliding windows and deep learning.

research-article
Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval
Abstract

Vision-Language Pretraining (VLP) and Foundation models have been the go-to recipe for achieving SoTA performance on general benchmarks. However, leveraging these powerful techniques for more complex vision-language tasks, such as cooking ...

Highlights

  • Existing general foundation models underperform on computational cooking tasks.
  • Domain specific applications need more adapted pretraining approaches.
  • Adapting existing general datasets of image-text pairs to be closer to food ...

research-article
Artifact feature purification for cross-domain detection of AI-generated images
Abstract

In the era of AIGC, the fast development of visual content generation technologies, such as diffusion models, brings potential security risks to our society. Existing generated image detection methods suffer from performance drops when faced with ...

Graphical abstract

Display Omitted

Highlights

  • Performance drops across generators and scenes in detecting AI-generated images.
  • Artifact Purification Network with explicit purification and implicit purification.
  • At least 1.7% improvement across both domains on two open ...

research-article
Image-to-image translation based face photo de-meshing using GANs
Abstract

Most of the existing face photo de-meshing methods have accomplished promising results; there are certain quality problems with these methods like the inpainted regions would appear blurry and unpleasant boundaries becoming visible. Such ...

Special issue on Advances in Deep Learning for Human-Centric Visual Understanding
research-article
Multi-domain awareness for compressed deepfake videos detection over social networks guided by common mechanisms between artifacts
Abstract

The viral spread of massive deepfake videos over social networks has caused serious security problems. Despite the remarkable advancements achieved by existing deepfake detection algorithms, deepfake videos over social networks are inevitably ...

Highlights

  • We analyzed the common mechanisms of compression artifacts and deepfake artifacts.
  • Based on common mechanisms between artifacts, we designed an anti-compression model.
  • We designed adaptive notch filter to remove the interference of ...

research-article
Modality adaptation via feature difference learning for depth human parsing
Abstract

In the field of human parsing, depth data offers unique advantages over RGB data due to its illumination invariance and geometric detail, which motivates us to explore human parsing with only depth input. However, depth data is challenging to ...

Highlights

  • An MAFDL pipeline leveraging RGB semantic knowledge to enhance depth human parsing.
  • DGDA to bridge the RGB-depth modality gap by learning inter-modal feature difference.
  • FAC as explicit supervision at pixel and batch levels for ...

research-article
SHOWMe: Robust object-agnostic hand-object 3D reconstruction from RGB video
Abstract

In this paper, we tackle the problem of detailed hand-object 3D reconstruction from monocular video with unknown objects, for applications where the required accuracy and level of detail is important, e.g. object hand-over in human–robot ...

Highlights

  • Object-agnostic hand-object 3D reconstruction from monocular hand-object motion video
  • Robust rigid-transformation estimation network that leverages large pre-trained model
  • Two-stage pipeline for 3D hand-object reconsruction
  • New ...

research-article
Classroom teacher action recognition based on spatio-temporal dual-branch feature fusion
Abstract

The classroom teaching action recognition task refers to recognizing and understanding teacher action through video temporal and spatial information. Due to complex backgrounds and significant occlusions, recognizing teacher action in the ...

Highlights

  • We propose the teacher action recognition method based on two-branch architecture.
  • We constructed a classroom teacher action dataset in a real-world setting.
  • Through experimental validation, our proposed method outperforms other ...

research-article
Enhanced local distribution learning for real image super-resolution
Abstract

Previous work has shown that CNN-based local distribution learning can efficiently reconstruct high-resolution images, but with limited performance improvement against complex degraded images. In this paper, we propose an enhanced local ...

Highlights

  • CNN-based enhanced local distribution learning method is proposed.
  • Parallel attention module is proposed to extract effective feature.
  • Dilated neighborhood sampling strategy is proposed.

research-article
UAHOI: Uncertainty-aware robust interaction learning for HOI detection
Abstract

This paper focuses on Human–Object Interaction (HOI) detection, addressing the challenge of identifying and understanding the interactions between humans and objects within a given image or video frame. Spearheaded by Detection Transformer (DETR),...

Highlights

  • We introduce an uncertainty-aware framework in HOI Detection.
  • We refine both detection and interaction predictions through prediction variance.
  • The proposed method outperforms existing approaches, enhancing both accuracy and ...

Special issue on Eyes on People: Recent Trends on Human Analysis, Perception and Generation
research-article
Lightning fast video anomaly detection via multi-scale adversarial distillation
Abstract

We propose a very fast frame-level model for anomaly detection in video, which learns to detect anomalies by distilling knowledge from multiple highly accurate object-level teacher models. To improve the fidelity of our student, we distill the ...

Graphical abstract

Display Omitted

Highlights

  • We introduce a novel teacher-student framework for anomaly detection in video.
  • We learn to detect anomalies by distilling from multiple highly accurate object-level teachers.
  • We propose adversarial knowledge distillation in the ...

Comments