research-article

Attribute-Driven Multimodal Hierarchical Prompts for Image Aesthetic Quality Assessment

Authors:

Leida LiAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 2399 - 2408

https://doi.org/10.1145/3664647.3681175

Published: 28 October 2024 Publication History

Abstract

Image Aesthetic Quality Assessment (IAQA) aims to simulate users' visual perception to judge the aesthetic quality of images. In social media, users' aesthetic experiences are often reflected in their textual comments regarding the aesthetic attributes of images. To fully explore the attribute information perceived by users for evaluating image aesthetic quality, this paper proposes an image aesthetic quality assessment method based on attribute-driven multimodal hierarchical prompts. Unlike existing IAQA methods that utilize multimodal pre-training or straightforward prompts for model learning, the proposed method leverages attribute comments and quality-level text templates to hierarchically learn the aesthetic attributes and quality of images. Specifically, we first leverage users' aesthetic attribute comments to perform prompt learning on images. The learned attribute-driven multimodal features can comprehensively capture the semantic information of image aesthetic attributes perceived by users. Then, we construct text templates for different aesthetic quality levels to further facilitate prompt learning through semantic information related to the aesthetic quality of images. The proposed method can explicitly simulate users' aesthetic judgment of images to obtain more precise aesthetic quality. Experimental results demonstrate that the proposed IAQA method based on hierarchical prompts outperforms existing methods significantly on multiple IAQA databases. Our source code is public at https://github.com/GitHub-Ju/AMHP.

References

[1]

Luigi Celona, Marco Leonardi, Paolo Napoletano, and Alessandro Rozza. 2022. Composition and Style Attributes Guided Image Aesthetic Assessment. IEEE Trans. Image Process. 31 (2022), 5009--5024.

Digital Library

[2]

Yanbei Chen, Yongqin Xian, A. Sophia Koepke, Ying Shan, and Zeynep Akata. 2021. Distilling Audio-Visual Knowledge by Compositional Contrastive Learning. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 7016--7025.

[3]

Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z Wang. 2006. Studying Aesthetics in Photographic Images Using a Computational Approach. In Proc. Eur. Conf. Comp. Vis. 288--301.

Digital Library

[4]

Yubin Deng, Chen Change Loy, and Xiaoou Tang. 2017. Image Aesthetic Assessment: An Experimental Survey. IEEE Signal Process. Mag. 34, 4 (2017), 80--106.

[5]

Yubin Deng, Chen Change Loy, and Xiaoou Tang. 2018. Aesthetic-Driven Image Enhancement by Adversarial Learning. In Proc. ACM Int. Conf. Multimedia. 870--878.

Digital Library

[6]

Shuai He, Yongchang Zhang, Rui Xie, Dongxiang Jiang, and Anlong Ming. 2022. Rethinking Image Aesthetics Assessment: Models, Datasets and Benchmarks. In Proc. Int. Joint Conf. Artif. Intell. 942--948.

[7]

Vlad Hosu, Bastian Goldlucke, and Dietmar Saupe. 2019. Effective Aesthetics Prediction with Multi-level Spatially Pooled Features. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 9375--9383.

[8]

Jingwen Hou, Sheng Yang, and Weisi Lin. 2020. Object-level Attention for Aesthetic Rating Distribution Prediction. In Proc. ACM Int. Conf. Multimedia. 816--824.

Digital Library

[9]

Yipo Huang, Leida Li, Pengfei Chen, Jinjian Wu, Yuzhe Yang, Yaqian Li, and Guangming Shi. 2024. Coarse-to-fine Image Aesthetics Assessment with Dynamic Attribute Selection. IEEE Trans. Multimedia (2024), 1--14. https://doi.org/10.1109/TMM.2024.3389452

Digital Library

[10]

Yipo Huang, Xiangfei Sheng, Zhichao Yang, Quan Yuan, Zhichao Duan, Pengfei Chen, Leida Li, Weisi Lin, and Guangming Shi. 2024. AesExpert: Towards Multimodality Foundation Model for Image Aesthetics Perception. arXiv preprint arXiv:2404.09624 (2024).

[11]

Yipo Huang, Quan Yuan, Xiangfei Sheng, Zhichao Yang, Haoning Wu, Pengfei Chen, Yuzhe Yang, Leida Li, and Weisi Lin. 2024. AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception. arXiv preprint arXiv:2401.08276 (2024).

[12]

Yatai Ji, Junjie Wang, Yuan Gong, Lin Zhang, Yanru Zhu, Hongfa Wang, Jiaxing Zhang, Tetsuya Sakai, and Yujiu Yang. 2023. MAP: Multimodal Uncertainty-Aware Vision-Language Pre-Training Model. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 23262--23271.

[13]

Gengyun Jia, Peipei Li, and Ran He. 2023. Theme-Aware Aesthetic Distribution Prediction with Full-Resolution Photographs. IEEE Trans. Neural. Netw. Learn. Syst. 34, 11 (2023), 8654--8668.

[14]

Xin Jin, Le Wu, Xiaodong Li, Siyu Chen, Siwei Peng, Jingying Chi, Shiming Ge, Chenggen Song, and Geng Zhao. 2018. Predicting Aesthetic Score Distribution through Cumulative Jensen-shannon Divergence. In Proc. AAAI Int. Conf. Artif. Intell. 77--84.

[15]

Xin Jin, LeWu, Geng Zhao, Xiaodong Li, Xiaokun Zhang, Shiming Ge, Dongqing Zou, Bin Zhou, and Xinghui Zhou. 2019. Aesthetic Attributes Assessment of Images. In Proc. ACM Int. Conf. Multimedia. 311--319.

Digital Library

[16]

Yueying Kao, Ran He, and Kaiqi Huang. 2017. Deep Aesthetic Quality Assessment with Semantic Information. IEEE Trans. Image Process. 26, 3 (2017), 1482--1495.

Digital Library

[17]

Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. 2021. MUSIQ: Multi-Scale Image Quality Transformer. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 5148--5157.

[18]

Junjie Ke, Keren Ye, Jiahui Yu, Yonghui Wu, Peyman Milanfar, and Feng Yang. 2023. VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 10041--10051.

[19]

Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, and Fahad Shahbaz Khan. 2023. Maple: Multi-modal Prompt Learning. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 19113--19122.

[20]

Shu Kong, Xiaohui Shen, Zhe Lin, Radomir Mech, and Charless Fowlkes. 2016. Photo Aesthetics Ranking Network with Attributes and Content Adaptation. In Proc. Eur. Conf. Comp. Vis. 662--679.

[21]

Qi Kuang, Xin Jin, Qinping Zhao, and Bin Zhou. 2020. Deep Multimodality Learning for UAV Video Aesthetic Quality Assessment. IEEE Trans. Multimedia 22, 10 (2020), 2623--2634.

[22]

Michal Kucer, Alexander C Loui, and David W Messinger. 2018. Leveraging Expert Feature Knowledge for Predicting Image Aesthetics. IEEE Trans. Image Process. 27, 10 (2018), 5100--5112.

[23]

Leida Li, Jiachen Duan, Yuzhe Yang, Liwu Xu, Yaqian Li, and Yandong Guo. 2022. Psychology Inspired Model for Hierarchical Image Aesthetic Attribute Prediction. In Proc. IEEE Int. Conf. Multimedia Expo. 1--6.

[24]

Leida Li, Yipo Huang, Jinjian Wu, Yuzhe Yang, Yaqian Li, Yandong Guo, and Guangming Shi. 2023. Theme-Aware Visual Attribute Reasoning for Image Aesthetics Assessment. IEEE Trans. Circuits. Syst. Video Technol. 33, 9 (2023), 4798--4811.

Digital Library

[25]

Leida Li, Hancheng Zhu, Sicheng Zhao, Guiguang Ding, and Weisi Lin. 2020. Personality-assisted Multi-task Learning for Generic and Personalized Image Aesthetics Assessment. IEEE Trans. Image Process. 29 (2020), 3898--3910.

[26]

Dong Liu, Rohit Puri, Nagendra Kamath, and Subhabrata Bhattacharya. 2020. Composition-aware Image Aesthetics Assessment. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 3569--3578.

[27]

Zheyuan Liu, Cristian Rodriguez-Opazo, Damien Teney, and Stephen Gould. 2021. Image Retrieval on Real-Life Images with Pre-Trained Vision-and-Language Models. In Proc. IEEE Int. Conf. Comp. Vis. 2125--2134.

[28]

Peng Lu, Jiahui Liu, Xujun Peng, and Xiaojie Wang. 2020. Weakly Supervised Real-time Image Cropping Based on Aesthetic Distributions. In Proc. ACM Int. Conf. Multimedia. 120--128.

Digital Library

[29]

Xin Lu, Zhe Lin, Hailin Jin, Jianchao Yang, and James Z Wang. 2014. Rapid: Rating Pictorial Aesthetics Using Deep Learning. In Proc. ACM Int. Conf. Multimedia. 457--466.

Digital Library

[30]

Xin Lu, Zhe Lin, Xiaohui Shen, Radomir Mech, and James Z Wang. 2015. Deep Multi-patch Aggregation Network for Image Style, Aesthetics, and Quality Estimation. In Proc. IEEE Int. Conf. Comp. Vis. 990--998.

Digital Library

[31]

Shuang Ma, Jing Liu, and Chang Wen Chen. 2017. A-lamp: Adaptive Layout-aware Multi-patch Deep Convolutional Neural Network for Photo Aesthetic Assessment. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 4535--4544.

[32]

Luca Marchesotti, Florent Perronnin, Diane Larlus, and Gabriela Csurka. 2011. Assessing the Aesthetic Quality of Photographs Using Generic Image Descriptors. In Proc. IEEE Int. Conf. Comp. Vis. 1784--1791.

Digital Library

[33]

Naila Murray and Albert Gordo. 2017. A Deep Architecture for Unified Aesthetic Prediction. arXiv preprint arXiv:1708.04890 (2017).

[34]

Naila Murray, Luca Marchesotti, and Florent Perronnin. 2012. AVA: A Large-scale Database for Aesthetic Visual Analysis. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2408--2415.

[35]

Masashi Nishiyama, Takahiro Okabe, Imari Sato, and Yoichi Sato. 2011. Aesthetic Quality Classification of Photographs Based on Color Harmony. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 33--40.

Digital Library

[36]

Yuzhen Niu, Shanshan Chen, Bingrui Song, Zhixian Chen, and Wenxi Liu. 2022. Comment-guided Semantics-aware Image Aesthetics Assessment. IEEE Trans. Circuits Syst. Video Technol. 33, 3 (2022), 1487--1492.

Digital Library

[37]

Bowen Pan, Shangfei Wang, and Qisheng Jiang. 2019. Image Aesthetic Assessment Assisted by Attributes through Adversarial Learning. In Proc. AAAI Conf. Artif. Intell. 679--686.

Digital Library

[38]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning Transferable Visual Models from Natural Language Supervision. In Proc. Int. Conf. Mach. Learn. 8748--8763.

[39]

Dongyu She, Yu-Kun Lai, Gaoxiong Yi, and Kun Xu. 2021. Hierarchical layout-aware graph convolutional network for unified aesthetics assessment. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 8475--8484.

[40]

Xiangfei Sheng, Leida Li, Pengfei Chen, Jinjian Wu, Weisheng Dong, Yuzhe Yang, Liwu Xu, Yaqian Li, and Guangming Shi. 2023. AesCLIP: Multi-Attribute Contrastive Learning for Image Aesthetics Assessment. In Proc. ACM Int. Conf. Multimedia. 1117--1126.

Digital Library

[41]

Yangyang Shu, Qian Li, Lingqiao Liu, and Guandong Xu. 2024. Semi-Supervised Adversarial Learning for Attribute-Aware Photo Aesthetic Assessment. IEEE Trans. Multimedia 26 (2024), 4086--4096.

Digital Library

[42]

Wei-Tse Sun, Ting-Hsuan Chao, Yin-Hsi Kuo, and Winston H. Hsu. 2017. Photo Filter Recommendation by Category-Aware Aesthetic Learning. IEEE Trans. Multimedia 19, 8 (2017), 1870--1880.

Digital Library

[43]

Hossein Talebi and Peyman Milanfar. 2018. NIMA: Neural image assessment. IEEE Trans. Image Process. 27, 8 (2018), 3998--4011.

[44]

Xiaoou Tang, Wei Luo, and Xiaogang Wang. 2013. Content-based Photo Quality Assessment. IEEE Trans. Multimedia 15, 8 (2013), 1930--1943.

Digital Library

[45]

Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. 2023. Exploring Clip for Assessing the Look and Feel of Images. In Proc. AAAI Int. Conf. Artif. Intell. 2555--2563.

Digital Library

[46]

Zhizhong Wang, Zhanjie Zhang, Lei Zhao, Zhiwen Zuo, Ailin Li, Wei Xing, and Dongming Lu. 2022. AesUST: Towards Aesthetic-Enhanced Universal Style Transfer. In Proc. ACM Int. Conf. Multimedia. 1095--1106.

Digital Library

[47]

Qianqian Xu, Qingming Huang, Tingting Jiang, Bowei Yan, Weisi Lin, and Yuan Yao. 2012. HodgeRank on Random Graphs for Subjective Video Quality Assessment. IEEE Trans. Multimedia 14, 3 (2012), 844--857.

Digital Library

[48]

Qianqian Xu, Qingming Huang, and Yuan Yao. 2012. Online Crowdsourcing Subjective Image Quality Assessment. In Proc. ACM Int. Conf. Multimedia. 359--368.

Digital Library

[49]

Qianqian Xu, Tingting Jiang, Yuan Yao, Qingming Huang, Bowei Yan, and Weisi Lin. 2011. Random Partial Paired Comparison for Subjective Video Quality Assessment via Hodgerank. In Proc. ACM Int. Conf. Multimedia. 393--402.

Digital Library

[50]

Qianqian Xu, Jiechao Xiong, Qingming Huang, and Yuan Yao. 2013. Robust Evaluation for Quality of Experience in Crowdsourcing. In Proc. ACM Int. Conf. Multimedia. 43--52.

Digital Library

[51]

Yuzhe Yang, Liwu Xu, Leida Li, Nan Qie, Yaqian Li, Peng Zhang, and Yandong Guo. 2022. Personalized Image Aesthetics Assessment with Rich Attributes. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 19861--19869.

[52]

Xin Yuan, Zhe Lin, Jason Kuen, Jianming Zhang, Yilin Wang, Michael Maire, Ajinkya Kale, and Baldo Faieta. 2021. Multimodal Contrastive Training for Visual Representation Learning. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 6995--7004.

[53]

Hui Zeng, Zisheng Cao, Lei Zhang, and Alan C Bovik. 2019. A Unified Probabilistic Formulation of Image Aesthetic Assessment. IEEE Trans. Image Process. 29 (2019), 1548--1561.

[54]

Xiaodan Zhang, Xinbo Gao, Wen Lu, Lihuo He, and Jie Li. 2020. Beyond vision: A multimodal recurrent attention convolutional neural network for unified image aesthetic prediction tasks. IEEE Tran. Multimedia 23 (2020), 611--623.

[55]

Ye Zhou, Xin Lu, Junping Zhang, and James Z Wang. 2016. Joint Image and Text Representation for Aesthetics Analysis. In Proc. ACM Int. Conf. Multimedia. 262--266.

Digital Library

[56]

Hancheng Zhu, Zhiwen Shao, Yong Zhou, Guangcheng Wang, Pengfei Chen, and Leida Li. 2023. Personalized Image Aesthetics Assessment with Attribute-guided Fine-grained Feature Representation. In Proc. ACM Int. Conf. Multimedia. 6794--6802.

Digital Library

[57]

Hancheng Zhu, Yong Zhou, Rui Yao, Guangcheng Wang, and Yuzhe Yang. 2022. Learning Image Aesthetic Subjectivity from Attribute-aware Relational Reasoning Network. Pattern Recognit. Lett. 155 (2022), 84--91.

Digital Library

Index Terms

Attribute-Driven Multimodal Hierarchical Prompts for Image Aesthetic Quality Assessment
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations

Recommendations

Text-guided Multi-Task Image Aesthetic Quality Assessment
McGE '24: Proceedings of the 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice

In the realm of image aesthetic quality assessment, additional tagging information, such as scene classification, photographic style, and aesthetic attributes, embodies a wealth of aesthetic connotations. The textual descriptions and visual features ...
Emotion-aware hierarchical interaction network for multimodal image aesthetics assessment
Abstract
Image aesthetics assessment (IAA) has attracted increasing attention recently but is still challenging due to its high abstraction and complexity. Intuitively, image emotion and aesthetics are both human subjective feelings evoked by visual ...
Highlights
- Human emotional experience potentially affects image aesthetics perception.
- Exploiting emotion information to enhance aesthetic learning.
- User comments provide aesthetic and emotional semantic information.
- Interactions between ...
Improving Image Aesthetic Assessment via Multiple Image Joint Learning
Image Aesthetic Assessment (IAA) is an emerging paradigm that predicts aesthetic score as the popular aesthetic taste for an image. Previous IAA approaches take a single image as input to predict the aesthetic score of the image. However, we discover that ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
104
Total Downloads

Downloads (Last 12 months)104
Downloads (Last 6 weeks)28

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten