Author: Zhai, Guangtao : Search

research-article

Content adaptive JND profile by leveraging HVS inspired channel modeling and perception oriented energy allocation optimization

Signal Processing (SIGN), Volume 227, Issue Chttps://doi.org/10.1016/j.sigpro.2024.109734

Abstract

The existing just noticeable difference (JND) models consider the effects of various covariates, however, they rarely account for the fusion relationship between the covariates, i.e., they lack a holistic understanding of the mechanisms of visual ...

Article

$E^{2} D A S$ : An Efficient Equivariant Dynamic Aggregation Saliency Model for Omnidirectional Images

Pattern RecognitionPages 407–423https://doi.org/10.1007/978-3-031-78122-3_26

Abstract

Recent years have witnessed rapid progress of convolutional neural networks (CNNs) and their successful application in the task of saliency prediction for omnidirectional images (ODIs). Albeit achieving tremendous performance improvements, these ... $^{}$

Article

TDiffSal: Text-Guided Diffusion Saliency Prediction Model for Images

Pattern RecognitionPages 15–31https://doi.org/10.1007/978-3-031-78186-5_2

Abstract

Existing visual saliency prediction methods mainly focus on single-modal visual saliency prediction, while ignoring the significant impact of text on visual saliency. To more comprehensively explore the influence of text on human attention in ...

research-article

Duration-aware and mode-aware micro-expression spotting for long video sequences

Image Communication (IMAG), Volume 129, Issue Chttps://doi.org/10.1016/j.image.2024.117192

Abstract

Micro-expressions (MEs) are unconscious, instant and slight facial movements, revealing people’s true emotions. Locating MEs is a prerequisite of classifying them, while only a few researches focus on this task. Among them, sliding window based ...

Highlights

We utilize multiple sliding windows of different scales and modes to generate multiple weak detectors, each accommodating MEs (Micro-Expressions) of certain durations and transition modes.
We design a majority voting based aggregation ...

research-article

Large Multi-modality Model Assisted AI-Generated Image Quality Assessment

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 7803–7812https://doi.org/10.1145/3664647.3681471

Traditional deep neural network (DNN)-based image quality assessment (IQA) models leverage convolutional neural networks (CNN) or Transformer to learn the quality-aware feature representation, achieving commendable performance on natural scene images. ...

research-article

MMHead: Towards Fine-grained Multi-modal 3D Facial Animation

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 7966–7975https://doi.org/10.1145/3664647.3681366

3D facial animation has attracted considerable attention due to its extensive applications in the multimedia field. Audio-driven 3D facial animation has been widely explored with promising results. However, multi-modal 3D facial animation, especially ...

research-article

G-Refine: A General Quality Refiner for Text-to-Image Generation

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 7375–7384https://doi.org/10.1145/3664647.3681152

With the evolution of Text-to-Image (T2I) models, the quality defects of AI-Generated Images (AIGIs) pose a significant barrier to their widespread adoption. In terms of both perception and alignment, existing models cannot always guarantee high-quality ...

research-article

Subjective and Objective Quality-of-Experience Assessment for 3D Talking Heads

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 6033–6042https://doi.org/10.1145/3664647.3680964

In recent years, immersive communication has emerged as a compelling alternative to traditional video communication methods. One prospective avenue for immersive communication involves augmenting the user's immersive experience through the transmission ...

research-article

LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 7783–7792https://doi.org/10.1145/3664647.3680946

Although large multi-modality models (LMMs) have seen extensive exploration and application in various quality assessment studies, their integration into Point Cloud Quality Assessment (PCQA) remains unexplored. Given LMMs' exceptional performance and ...

research-article

Open Access

T2I-Scorer: Quantitative Evaluation on Text-to-Image Generation via Fine-Tuned Large Multi-Modal Models

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 3676–3685https://doi.org/10.1145/3664647.3680939

Text-to-image (T2I) generation is a pivotal and core interest within the realm of AI content generation. Amid the swift advancements of both open-source (such as Stable Diffusion) and proprietary (for example, DALLE, MidJourney) T2I models, there is a ...

research-article

Subjective-Aligned Dataset and Metric for Text-to-Video Quality Assessment

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 7793–7802https://doi.org/10.1145/3664647.3680868

With the rapid development of generative models, AI-Generated Content (AIGC) has exponentially increased in daily lives. Among them, Text-to-Video (T2V) generation has received widespread attention. Though many T2V models have been released for ...

research-article

Hidden Barcode in Sub-Images with Invisible Locating Marker

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 10Article No.: 302, Pages 1–24https://doi.org/10.1145/3674976

The prevalence of the Internet of Things has led to the widespread adoption of 2D barcodes as a means of offline-to-online communication. Whereas, 2D barcodes are not ideal for publicity materials due to their space-consuming nature. Recent works have ...

research-article

Free

JUST ACCEPTED

Audio-visual Saliency Prediction Model with Implicit Neural Representation

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3698881

With the remarkable advancement of deep learning techniques and the wide availability of large-scale datasets, the performance of audio-visual saliency prediction has been drastically improved. Actually, audio-visual saliency prediction is still at an ...

Article

GLARE: Low Light Image Enhancement via Generative Latent Feature Based Codebook Retrieval

Computer Vision – ECCV 2024Pages 36–54https://doi.org/10.1007/978-3-031-73195-2_3

Abstract

Most existing Low-light Image Enhancement (LLIE) methods either directly map Low-Light (LL) to Normal-Light (NL) images or use semantic or illumination maps as guides. However, the ill-posed nature of LLIE and the difficulty of semantic retrieval ...

Article

UniProcessor: A Text-Induced Unified Low-Level Image Processor

Computer Vision – ECCV 2024Pages 180–199https://doi.org/10.1007/978-3-031-72855-6_11

Abstract

Image processing, including image restoration, image enhancement, etc., involves generating a high-quality clean image from a degraded input. Deep learning-based methods have shown superior performance for various image processing tasks in terms ...

Article

Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression

Computer Vision – ECCV 2024Pages 163–183https://doi.org/10.1007/978-3-031-72967-6_10

Abstract

Unsupervised video semantic compression (UVSC), i.e., compressing videos to better support various analysis tasks, has recently garnered attention. However, the semantic richness of previous methods remains limited, due to the single semantic ...

Article

Towards Open-Ended Visual Quality Comparison

Computer Vision – ECCV 2024Pages 360–377https://doi.org/10.1007/978-3-031-72646-0_21

Abstract

Comparative settings (e.g. pairwise choice, listwise ranking) have been adopted by a wide range of subjective studies for image quality assessment (IQA), as it inherently standardizes the evaluation criteria across different observers and offer ...

research-article

A cross-temporal multimodal fusion system based on deep learning for orthodontic monitoring

Computers in Biology and Medicine (CBIM), Volume 180, Issue Chttps://doi.org/10.1016/j.compbiomed.2024.109025

Abstract Introduction

In the treatment of malocclusion, continuous monitoring of the three-dimensional relationship between dental roots and the surrounding alveolar bone is essential for preventing complications from orthodontic procedures. Cone-beam ...

Graphical abstract

Display Omitted

Highlights

The first deep learning based orthodontic system to continuous risk monitoring.
Cross-temporal fusion framework for multimodal medical imaging registration.
Novel registration method based on segmentation for internal structure ...

research-article

Q-Bench<inline-formula><tex-math notation="LaTeX">$^+$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mo>+</mml:mo></mml:msup></mml:math><inline-graphic xlink:href="zhang-ieq1-3445770.gif"/></alternatives></inline-formula>: A Benchmark for Multi-Modal Foundation Models on Low-Level Vision From Single Images to Pairs

IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 46, Issue 12Pages 10404–10418https://doi.org/10.1109/TPAMI.2024.3445770

The rapid development of Multi-modality Large Language Models (MLLMs) has navigated a paradigm shift in computer vision, moving towards versatile foundational models. However, evaluating MLLMs in <italic>low-level visual perception and understanding</...

research-article

DiffStega: towards universal training-free coverless image steganography with diffusion models

IJCAI '24: Proceedings of the Thirty-Third International Joint Conference on Artificial IntelligenceArticle No.: 175, Pages 1579–1587https://doi.org/10.24963/ijcai.2024/175

Traditional image steganography focuses on concealing one image within another, aiming to avoid steganalysis by unauthorized entities. Coverless image steganography (CIS) enhances imperceptibility by not using any cover image. Recent works have utilized ...

Applied Filters

People

Names

Institutions

Authors

Editors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

E2DAS: An Efficient Equivariant Dynamic Aggregation Saliency Model for Omnidirectional Images

$E^{2} D A S$ : An Efficient Equivariant Dynamic Aggregation Saliency Model for Omnidirectional Images