Multi-label image classification using adaptive graph convolutional networks: From a single domain to multiple domains
This paper proposes an adaptive graph-based approach for multi-label image classification. Graph-based methods have been largely exploited in the field of multi-label classification, given their ability to model label correlations. Specifically, ...
Highlights
- A novel graph-based approach for multi-label image classification is proposed.
- It learns adaptively a graph describing label dependencies.
- It is extended to cross-domain settings with an adversarial domain adaptation schema.
- ...
EnsCLR: Unsupervised skeleton-based action recognition via ensemble contrastive learning of representation
Skeleton-based action recognition is a key research area in video understanding, beneficial from its compact and efficient motion information. To relieve from the burden of expensive and laborious data annotation, unsupervised approaches, ...
Highlights
- EnsCLR takes the motivations of ensemble learning to the contrastive learning.
- QE method utilizes the ensemble information from multiple pipelines.
- ENNM method excavates potential positive samples from unlabeled data.
Low-light image enhancement based on cell vibration energy model and lightness difference
Low-light image enhancement algorithms play a crucial role in revealing details obscured by darkness in images and substantially improving overall image quality. However, existing methods often suffer from issues like color or lightness ...
Highlights
- Fast and Effective Low Light Image Enhancement Algorithm.
- Dark image enhancement using cell vibration energy model and lightness difference.
- Adjustment strategy that combines the Weibull distribution with linear mapping.
- Output ...
Pseudo initialization based Few-Shot Class Incremental Learning
Few-Shot Class Incremental Learning (FSCIL) aims to recognize sequentially arriving new classes without catastrophic forgetting old classes. The incremental new classes only contain very few labeled examples for updating the model, which causes ...
Highlights
- We propose a novel preserving feature space and pseudo initialization based FSCIL method.
- We adopt the cosine classifier to avoid catastrophic forgetting and overfitting of the classifier.
- Regularization are utilized to limit the ...
Implicit and explicit commonsense for multi-sentence video captioning
Existing dense or paragraph video captioning approaches rely on holistic representations of videos, possibly coupled with learned object/action representations, to condition hierarchical language decoders. However, they fundamentally lack the ...
Highlights
- A new task of video-based instruction generation that requires commonsense knowledge.
- A new model using implicit and explicit commonsense to enhance sentence prediction.
- We analyze the contributions of knowledge made toward ...
Enhanced dual contrast representation learning with cell separation and merging for breast cancer diagnosis
Breast cancer remains a prevalent malignancy impacting a substantial number of individuals globally. In recent times, there has been a growing trend of combining deep learning methods with breast cancer diagnosis. Nevertheless, this integration ...
Highlights
- The LLM SAM is introduced for effective fine-grained masks generation.
- The cell separation and merging strategy enriches and balances the data distribution.
- The dual contrast representation learning enhances the feature ...
Advancing Image Generation with Denoising Diffusion Probabilistic Model and ConvNeXt-V2: A novel approach for enhanced diversity and quality
In the rapidly evolving domain of image generation, the availability of sufficient data is crucial for effective model training. However, obtaining a large dataset is often challenging. Medical imaging, industrial monitoring, and self-driving ...
Highlights
- Novel Approach with DDPM and ConvNeXt-V2: Combines DDPM and ConvNeXt-V2 to enhance image diversity and quality from a single input.
- High Performance and Robustness: The model excels with a Pixel Diversity score of 0.87, LPIPS ...
Object discriminability re-extraction for distractor-aware visual object tracking
The similar distractor problem is one of the most difficult challenges for Siamese-based trackers. Since they formulate the visual tracking task as a similar matching problem, these trackers involve an essential problem that they are sensitive to ...
Highlights
- ODR-Net re-extracts features to distinguish targets from distractors.
- Integrates seamlessly into existing Siamese trackers without retraining.
- Simple yet effective, ODR-Net activates when similar distractors appear.
Subtle signals: Video-based detection of infant non-nutritive sucking as a neurodevelopmental cue
- Shaotong Zhu,
- Michael Wan,
- Sai Kumar Reddy Manne,
- Elaheh Hatamimajoumerd,
- Marie J. Hayes,
- Emily Zimmerman,
- Sarah Ostadabbas
Non-nutritive sucking (NNS), which refers to the act of sucking on a pacifier, finger, or similar object without nutrient intake, plays a crucial role in assessing healthy early development. In the case of preterm infants, NNS behavior is a key ...
Highlights
- First manually annotated infant NNS video datasets.
- NNS system using convolutional LSTM network.
- Comparative analysis with spatiotemporal models.
- NNS segmentation with sliding windows and deep learning.
Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval
Vision-Language Pretraining (VLP) and Foundation models have been the go-to recipe for achieving SoTA performance on general benchmarks. However, leveraging these powerful techniques for more complex vision-language tasks, such as cooking ...
Highlights
- Existing general foundation models underperform on computational cooking tasks.
- Domain specific applications need more adapted pretraining approaches.
- Adapting existing general datasets of image-text pairs to be closer to food ...
Artifact feature purification for cross-domain detection of AI-generated images
In the era of AIGC, the fast development of visual content generation technologies, such as diffusion models, brings potential security risks to our society. Existing generated image detection methods suffer from performance drops when faced with ...
Graphical abstractDisplay Omitted
Highlights
- Performance drops across generators and scenes in detecting AI-generated images.
- Artifact Purification Network with explicit purification and implicit purification.
- At least 1.7% improvement across both domains on two open ...
Image-to-image translation based face photo de-meshing using GANs
- Abdul Jabbar,
- Muhammad Assam,
- Muhammad Arslan,
- Madiha Bukhsh,
- Muhammad Shoib Amin,
- Yazeed Yasin Ghadi,
- Nisreen Innab,
- Masoud Alajmi,
- Mamyrbayev Orken,
- Salgozha Indira,
- Hend Khalid Alkahtan
Most of the existing face photo de-meshing methods have accomplished promising results; there are certain quality problems with these methods like the inpainted regions would appear blurry and unpleasant boundaries becoming visible. Such ...
Multi-domain awareness for compressed deepfake videos detection over social networks guided by common mechanisms between artifacts
The viral spread of massive deepfake videos over social networks has caused serious security problems. Despite the remarkable advancements achieved by existing deepfake detection algorithms, deepfake videos over social networks are inevitably ...
Highlights
- We analyzed the common mechanisms of compression artifacts and deepfake artifacts.
- Based on common mechanisms between artifacts, we designed an anti-compression model.
- We designed adaptive notch filter to remove the interference of ...
Modality adaptation via feature difference learning for depth human parsing
- Shaofei Huang,
- Tianrui Hui,
- Yue Gong,
- Fengguang Peng,
- Yuqiang Fang,
- Jingwei Wang,
- Bin Ma,
- Xiaoming Wei,
- Jizhong Han
In the field of human parsing, depth data offers unique advantages over RGB data due to its illumination invariance and geometric detail, which motivates us to explore human parsing with only depth input. However, depth data is challenging to ...
Highlights
- An MAFDL pipeline leveraging RGB semantic knowledge to enhance depth human parsing.
- DGDA to bridge the RGB-depth modality gap by learning inter-modal feature difference.
- FAC as explicit supervision at pixel and batch levels for ...
SHOWMe: Robust object-agnostic hand-object 3D reconstruction from RGB video
- Anilkumar Swamy,
- Vincent Leroy,
- Philippe Weinzaepfel,
- Fabien Baradel,
- Salma Galaaoui,
- Romain Brégier,
- Matthieu Armando,
- Jean-Sebastien Franco,
- Grégory Rogez
In this paper, we tackle the problem of detailed hand-object 3D reconstruction from monocular video with unknown objects, for applications where the required accuracy and level of detail is important, e.g. object hand-over in human–robot ...
Highlights
- Object-agnostic hand-object 3D reconstruction from monocular hand-object motion video
- Robust rigid-transformation estimation network that leverages large pre-trained model
- Two-stage pipeline for 3D hand-object reconsruction
- New ...
Classroom teacher action recognition based on spatio-temporal dual-branch feature fusion
The classroom teaching action recognition task refers to recognizing and understanding teacher action through video temporal and spatial information. Due to complex backgrounds and significant occlusions, recognizing teacher action in the ...
Highlights
- We propose the teacher action recognition method based on two-branch architecture.
- We constructed a classroom teacher action dataset in a real-world setting.
- Through experimental validation, our proposed method outperforms other ...
Enhanced local distribution learning for real image super-resolution
Previous work has shown that CNN-based local distribution learning can efficiently reconstruct high-resolution images, but with limited performance improvement against complex degraded images. In this paper, we propose an enhanced local ...
Highlights
- CNN-based enhanced local distribution learning method is proposed.
- Parallel attention module is proposed to extract effective feature.
- Dilated neighborhood sampling strategy is proposed.
UAHOI: Uncertainty-aware robust interaction learning for HOI detection
This paper focuses on Human–Object Interaction (HOI) detection, addressing the challenge of identifying and understanding the interactions between humans and objects within a given image or video frame. Spearheaded by Detection Transformer (DETR),...
Highlights
- We introduce an uncertainty-aware framework in HOI Detection.
- We refine both detection and interaction predictions through prediction variance.
- The proposed method outperforms existing approaches, enhancing both accuracy and ...
Lightning fast video anomaly detection via multi-scale adversarial distillation
- Florinel-Alin Croitoru,
- Nicolae-Cătălin Ristea,
- Dana Dăscălescu,
- Radu Tudor Ionescu,
- Fahad Shahbaz Khan,
- Mubarak Shah
We propose a very fast frame-level model for anomaly detection in video, which learns to detect anomalies by distilling knowledge from multiple highly accurate object-level teacher models. To improve the fidelity of our student, we distill the ...
Graphical abstractDisplay Omitted
Highlights
- We introduce a novel teacher-student framework for anomaly detection in video.
- We learn to detect anomalies by distilling from multiple highly accurate object-level teachers.
- We propose adversarial knowledge distillation in the ...