Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJanuary 2025
Content adaptive JND profile by leveraging HVS inspired channel modeling and perception oriented energy allocation optimization
AbstractThe existing just noticeable difference (JND) models consider the effects of various covariates, however, they rarely account for the fusion relationship between the covariates, i.e., they lack a holistic understanding of the mechanisms of visual ...
- ArticleDecember 2024
TDiffSal: Text-Guided Diffusion Saliency Prediction Model for Images
AbstractExisting visual saliency prediction methods mainly focus on single-modal visual saliency prediction, while ignoring the significant impact of text on visual saliency. To more comprehensively explore the influence of text on human attention in ...
- research-articleJanuary 2025
Duration-aware and mode-aware micro-expression spotting for long video sequences
AbstractMicro-expressions (MEs) are unconscious, instant and slight facial movements, revealing people’s true emotions. Locating MEs is a prerequisite of classifying them, while only a few researches focus on this task. Among them, sliding window based ...
Highlights- We utilize multiple sliding windows of different scales and modes to generate multiple weak detectors, each accommodating MEs (Micro-Expressions) of certain durations and transition modes.
- We design a majority voting based aggregation ...
- research-articleOctober 2024
Large Multi-modality Model Assisted AI-Generated Image Quality Assessment
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 7803–7812https://doi.org/10.1145/3664647.3681471Traditional deep neural network (DNN)-based image quality assessment (IQA) models leverage convolutional neural networks (CNN) or Transformer to learn the quality-aware feature representation, achieving commendable performance on natural scene images. ...
-
- research-articleOctober 2024
MMHead: Towards Fine-grained Multi-modal 3D Facial Animation
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 7966–7975https://doi.org/10.1145/3664647.36813663D facial animation has attracted considerable attention due to its extensive applications in the multimedia field. Audio-driven 3D facial animation has been widely explored with promising results. However, multi-modal 3D facial animation, especially ...
- research-articleOctober 2024
G-Refine: A General Quality Refiner for Text-to-Image Generation
- Chunyi Li,
- Haoning Wu,
- Hongkun Hao,
- Zicheng Zhang,
- Tengchuan Kou,
- Chaofeng Chen,
- Lei Bai,
- Xiaohong Liu,
- Weisi Lin,
- Guangtao Zhai
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 7375–7384https://doi.org/10.1145/3664647.3681152With the evolution of Text-to-Image (T2I) models, the quality defects of AI-Generated Images (AIGIs) pose a significant barrier to their widespread adoption. In terms of both perception and alignment, existing models cannot always guarantee high-quality ...
- research-articleOctober 2024
Subjective and Objective Quality-of-Experience Assessment for 3D Talking Heads
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 6033–6042https://doi.org/10.1145/3664647.3680964In recent years, immersive communication has emerged as a compelling alternative to traditional video communication methods. One prospective avenue for immersive communication involves augmenting the user's immersive experience through the transmission ...
- research-articleOctober 2024
LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM
- Zicheng Zhang,
- Haoning Wu,
- Yingjie Zhou,
- Chunyi Li,
- Wei Sun,
- Chaofeng Chen,
- Xiongkuo Min,
- Xiaohong Liu,
- Weisi Lin,
- Guangtao Zhai
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 7783–7792https://doi.org/10.1145/3664647.3680946Although large multi-modality models (LMMs) have seen extensive exploration and application in various quality assessment studies, their integration into Point Cloud Quality Assessment (PCQA) remains unexplored. Given LMMs' exceptional performance and ...
- research-articleOctober 2024
T2I-Scorer: Quantitative Evaluation on Text-to-Image Generation via Fine-Tuned Large Multi-Modal Models
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 3676–3685https://doi.org/10.1145/3664647.3680939Text-to-image (T2I) generation is a pivotal and core interest within the realm of AI content generation. Amid the swift advancements of both open-source (such as Stable Diffusion) and proprietary (for example, DALLE, MidJourney) T2I models, there is a ...
- research-articleOctober 2024
Subjective-Aligned Dataset and Metric for Text-to-Video Quality Assessment
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 7793–7802https://doi.org/10.1145/3664647.3680868With the rapid development of generative models, AI-Generated Content (AIGC) has exponentially increased in daily lives. Among them, Text-to-Video (T2V) generation has received widespread attention. Though many T2V models have been released for ...
- research-articleOctober 2024
Hidden Barcode in Sub-Images with Invisible Locating Marker
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 10Article No.: 302, Pages 1–24https://doi.org/10.1145/3674976The prevalence of the Internet of Things has led to the widespread adoption of 2D barcodes as a means of offline-to-online communication. Whereas, 2D barcodes are not ideal for publicity materials due to their space-consuming nature. Recent works have ...
- research-articleOctober 2024JUST ACCEPTED
Audio-visual Saliency Prediction Model with Implicit Neural Representation
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Just Accepted https://doi.org/10.1145/3698881With the remarkable advancement of deep learning techniques and the wide availability of large-scale datasets, the performance of audio-visual saliency prediction has been drastically improved. Actually, audio-visual saliency prediction is still at an ...
- ArticleNovember 2024
GLARE: Low Light Image Enhancement via Generative Latent Feature Based Codebook Retrieval
AbstractMost existing Low-light Image Enhancement (LLIE) methods either directly map Low-Light (LL) to Normal-Light (NL) images or use semantic or illumination maps as guides. However, the ill-posed nature of LLIE and the difficulty of semantic retrieval ...
- ArticleNovember 2024
UniProcessor: A Text-Induced Unified Low-Level Image Processor
AbstractImage processing, including image restoration, image enhancement, etc., involves generating a high-quality clean image from a degraded input. Deep learning-based methods have shown superior performance for various image processing tasks in terms ...
- ArticleNovember 2024
Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression
AbstractUnsupervised video semantic compression (UVSC), i.e., compressing videos to better support various analysis tasks, has recently garnered attention. However, the semantic richness of previous methods remains limited, due to the single semantic ...
- ArticleOctober 2024
Towards Open-Ended Visual Quality Comparison
- Haoning Wu,
- Hanwei Zhu,
- Zicheng Zhang,
- Erli Zhang,
- Chaofeng Chen,
- Liang Liao,
- Chunyi Li,
- Annan Wang,
- Wenxiu Sun,
- Qiong Yan,
- Xiaohong Liu,
- Guangtao Zhai,
- Shiqi Wang,
- Weisi Lin
AbstractComparative settings (e.g. pairwise choice, listwise ranking) have been adopted by a wide range of subjective studies for image quality assessment (IQA), as it inherently standardizes the evaluation criteria across different observers and offer ...
- research-articleNovember 2024
A cross-temporal multimodal fusion system based on deep learning for orthodontic monitoring
- Haiwen Chen,
- Zhiyuan Qu,
- Yuan Tian,
- Ning Jiang,
- Yuan Qin,
- Jie Gao,
- Ruoyan Zhang,
- Yanning Ma,
- Zuolin Jin,
- Guangtao Zhai
Computers in Biology and Medicine (CBIM), Volume 180, Issue Chttps://doi.org/10.1016/j.compbiomed.2024.109025Abstract IntroductionIn the treatment of malocclusion, continuous monitoring of the three-dimensional relationship between dental roots and the surrounding alveolar bone is essential for preventing complications from orthodontic procedures. Cone-beam ...
Graphical abstractDisplay Omitted
Highlights- The first deep learning based orthodontic system to continuous risk monitoring.
- Cross-temporal fusion framework for multimodal medical imaging registration.
- Novel registration method based on segmentation for internal structure ...
- research-articleAugust 2024
Q-Bench<inline-formula><tex-math notation="LaTeX">$^+$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mo>+</mml:mo></mml:msup></mml:math><inline-graphic xlink:href="zhang-ieq1-3445770.gif"/></alternatives></inline-formula>: A Benchmark for Multi-Modal Foundation Models on Low-Level Vision From Single Images to Pairs
IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 46, Issue 12Pages 10404–10418https://doi.org/10.1109/TPAMI.2024.3445770The rapid development of Multi-modality Large Language Models (MLLMs) has navigated a paradigm shift in computer vision, moving towards versatile foundational models. However, evaluating MLLMs in <italic>low-level visual perception and understanding</...
- research-articleJanuary 2025
DiffStega: towards universal training-free coverless image steganography with diffusion models
IJCAI '24: Proceedings of the Thirty-Third International Joint Conference on Artificial IntelligenceArticle No.: 175, Pages 1579–1587https://doi.org/10.24963/ijcai.2024/175Traditional image steganography focuses on concealing one image within another, aiming to avoid steganalysis by unauthorized entities. Coverless image steganography (CIS) enhances imperceptibility by not using any cover image. Recent works have utilized ...