Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleFebruary 2025
GraphMLP: A graph MLP-like architecture for 3D human pose estimation
AbstractModern multi-layer perceptron (MLP) models have shown competitive results in learning visual representations without self-attention. However, existing MLP models are not good at capturing local details and lack prior knowledge of human body ...
Highlights- We present, to the best of our knowledge, the first MLP-Like architecture called GraphMLP for 3D human pose estimation. It combines the advantages of modern MLPs and GCNs, including globality, locality, and connectivity.
- The novel SG-...
- research-articleJanuary 2025JUST ACCEPTED
Wakeup-Darkness: When Multimodal Meets Unsupervised Low-light Image Enhancement
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3711929Low-light image enhancement is a crucial visual task, and many unsupervised methods overlook the degradation of visible information in low-light scenes, adversely affecting the fusion of complementary information and hindering the generation of ...
- research-articleJanuary 2025
A pure MLP-Mixer-based GAN framework for guided image translation
AbstractTraditional guided image translation methods, based on encoder–decoder or U-Net structures, often struggle with complex or contrasting images. To address this, we introduce a novel dual-stage strategy. First, we use a cascaded cross-gating MLP-...
Highlights- A novel two-phase MLP-Mixer-based framework for guided image translation.
- A unique module for uncovering latent mapping cues between source images and target semantic guidance.
- A refined pixel-level loss function to address ...
- ArticleDecember 2024
MS-UMLP: Medical Image Segmentation via Multi-Scale U-shape MLP-Mixer
AbstractWith the emergence and rapid development of Transformers, medical image segmentation has also been revolutionized by Transformers due to their ability to encode long-range dependencies. Despite their advantages, Transformers also come with some ...
- ArticleDecember 2024
Audio-Visual Navigation with Anti-Backtracking
AbstractEmbodied navigation, which involves robotic agents exploring an unknown environment to reach target locations with egocentric observation, is a complex problem in the field of embodied AI. Audio-visual navigation extends this concept by equipping ...
-
- research-articleDecember 2024
Physical Adversarial Attack Meets Computer Vision: A Decade Survey
- Hui Wei,
- Hao Tang,
- Xuemei Jia,
- Zhixiang Wang,
- Hanxun Yu,
- Zhubo Li,
- Shin’ichi Satoh,
- Luc Van Gool,
- Zheng Wang
IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 46, Issue 12Pages 9797–9817https://doi.org/10.1109/TPAMI.2024.3430860Despite the impressive achievements of Deep Neural Networks (DNNs) in computer vision, their vulnerability to adversarial attacks remains a critical concern. Extensive research has demonstrated that incorporating sophisticated perturbations into input ...
- research-articleDecember 2024
Optimizing graph neural network architectures for schizophrenia spectrum disorder prediction using evolutionary algorithms
- Shurun Wang,
- Hao Tang,
- Ryutaro Himeno,
- Jordi Solé-Casals,
- Cesar F. Caiafa,
- Shuning Han,
- Shigeki Aoki,
- Zhe Sun
Computer Methods and Programs in Biomedicine (CBIO), Volume 257, Issue Chttps://doi.org/10.1016/j.cmpb.2024.108419Abstract Background and Objective:The accurate diagnosis of schizophrenia spectrum disorder plays an important role in improving patient outcomes, enabling timely interventions, and optimizing treatment plans. Functional connectivity analysis, utilizing ...
Highlights- We propose a GNAS framework to build GNN model for disorder prediction.
- We compare our model with other popular ML and DL models on multi-site datasets.
- We use the GNNExplainer method to provide the explainability of the model.
- research-articleNovember 2024
Global Meets Local: Dual Activation Hashing Network for Large-Scale Fine-Grained Image Retrieval
IEEE Transactions on Knowledge and Data Engineering (IEEECS_TKDE), Volume 36, Issue 11Pages 6266–6279https://doi.org/10.1109/TKDE.2024.3393512In the Internet era, the exponential growth of fine-grained image databases poses a considerable challenge for efficient information retrieval. Hashing-based approaches gained traction for their computational and storage efficiency, yet fine-grained ...
- research-articleOctober 2024
R4D-planes: Remapping Planes For Novel View Synthesis and Self-Supervised Decoupling of Monocular Videos
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 6569–6577https://doi.org/10.1145/3664647.3681281The tasks of view synthesis and decoupling dynamic objects from the static environment for monocular scenes are both long-standing challenges in CV and CG. Most of the previous NeRF-based methods rely on implicit representation, which require additional ...
- research-articleOctober 2024
CoIn: A Lightweight and Effective Framework for Story Visualization and Continuation
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 10659–10668https://doi.org/10.1145/3664647.3680873Story visualization aims to generate realistic and coherent images based on multi-sentence stories. However, current methods face challenges in achieving high-quality image generation while maintaining lightweight models and a fast generation speed. The ...
- research-articleOctober 2024
DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 2379–2388https://doi.org/10.1145/3664647.3680763Fine-grained image retrieval (FGIR) is to learn visual representations that distinguish visually similar objects while maintaining generalization. Existing methods propose to generate discriminative features, but rarely consider the particularity of the ...
- research-articleOctober 2024
ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 3964–3973https://doi.org/10.1145/3664647.3680619Diffusion models have shown impressive potential on talking head generation. While plausible appearance and talking effect are achieved, these methods still suffer from temporal, 3D or expression inconsistency due to the error accumulation and inherent ...
- posterOctober 2024
Enhancing Virtual Mobility for Individuals Who Are Blind or Have Low Vision: A Stationary Exploration Method
ISS Companion '24: Companion Proceedings of the 2024 Conference on Interactive Surfaces and SpacesPages 83–87https://doi.org/10.1145/3696762.3698058Designing accessible locomotion methods for individuals who are blind or have low vision (BLV) is a complex challenge, particularly in mobile VR environments with limited interface options. In this paper, we propose a novel locomotion technique on ...
- ArticleDecember 2024
3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance
AbstractIn this paper, we propose 3DSS-VLG, a weakly supervised approach for 3DSemantic Segmentation with 2D Vision-Language Guidance, an alternative approach that a 3D model predicts dense-embedding for each point which is co-embedded with both the ...
- ArticleNovember 2024
SCP-Diff: Spatial-Categorical Joint Prior for Diffusion Based Semantic Image Synthesis
AbstractSemantic image synthesis (SIS) shows good promises for sensor simulation. However, current best practices in this field, based on GANs, have not yet reached the desired level of quality. As latent diffusion models make significant strides in image ...
- ArticleNovember 2024
InstructGIE: Towards Generalizable Image Editing
AbstractRecent advances in image editing have been driven by the development of denoising diffusion models, marking a significant leap forward in this field. Despite these advances, the generalization capabilities of recent image editing approaches remain ...
- ArticleNovember 2024
GiT: Towards Generalist Vision Transformer Through Universal Language Interface
- Haiyang Wang,
- Hao Tang,
- Li Jiang,
- Shaoshuai Shi,
- Muhammad Ferjad Naeem,
- Hongsheng Li,
- Bernt Schiele,
- Liwei Wang
AbstractThis paper proposes a simple, yet effective framework, called GiT, simultaneously applicable for various vision tasks only with a vanilla ViT. Motivated by the universality of the Multi-layer Transformer architecture (e.g., GPT) widely used in ...
- ArticleOctober 2024
ADen: Adaptive Density Representations for Sparse-View Camera Pose Estimation
AbstractRecovering camera poses from a set of images is a foundational task in 3D computer vision, which powers key applications such as 3D scene/object reconstructions. Classic methods often depend on feature correspondence, such as keypoints, which ...
- ArticleOctober 2024
StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion
AbstractStory visualization aims to generate a series of realistic and coherent images based on a storyline. Current models adopt a frame-by-frame architecture by transforming the pre-trained text-to-image model into an auto-regressive manner. Although ...
- ArticleOctober 2024
Dataset Growth
- Ziheng Qin,
- Zhaopan Xu,
- Yukun Zhou,
- Zangwei Zheng,
- Zebang Cheng,
- Hao Tang,
- Lei Shang,
- Baigui Sun,
- Xiaojiang Peng,
- Radu Timofte,
- Hongxun Yao,
- Kai Wang,
- Yang You
AbstractDeep learning benefits from the growing abundance of available data. Meanwhile, efficiently dealing with the growing data scale has become a challenge. Data publicly available are from different sources with various qualities, and it is ...