Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3664647.3681175acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Attribute-Driven Multimodal Hierarchical Prompts for Image Aesthetic Quality Assessment

Published: 28 October 2024 Publication History

Abstract

Image Aesthetic Quality Assessment (IAQA) aims to simulate users' visual perception to judge the aesthetic quality of images. In social media, users' aesthetic experiences are often reflected in their textual comments regarding the aesthetic attributes of images. To fully explore the attribute information perceived by users for evaluating image aesthetic quality, this paper proposes an image aesthetic quality assessment method based on attribute-driven multimodal hierarchical prompts. Unlike existing IAQA methods that utilize multimodal pre-training or straightforward prompts for model learning, the proposed method leverages attribute comments and quality-level text templates to hierarchically learn the aesthetic attributes and quality of images. Specifically, we first leverage users' aesthetic attribute comments to perform prompt learning on images. The learned attribute-driven multimodal features can comprehensively capture the semantic information of image aesthetic attributes perceived by users. Then, we construct text templates for different aesthetic quality levels to further facilitate prompt learning through semantic information related to the aesthetic quality of images. The proposed method can explicitly simulate users' aesthetic judgment of images to obtain more precise aesthetic quality. Experimental results demonstrate that the proposed IAQA method based on hierarchical prompts outperforms existing methods significantly on multiple IAQA databases. Our source code is public at https://github.com/GitHub-Ju/AMHP.

References

[1]
Luigi Celona, Marco Leonardi, Paolo Napoletano, and Alessandro Rozza. 2022. Composition and Style Attributes Guided Image Aesthetic Assessment. IEEE Trans. Image Process. 31 (2022), 5009--5024.
[2]
Yanbei Chen, Yongqin Xian, A. Sophia Koepke, Ying Shan, and Zeynep Akata. 2021. Distilling Audio-Visual Knowledge by Compositional Contrastive Learning. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 7016--7025.
[3]
Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z Wang. 2006. Studying Aesthetics in Photographic Images Using a Computational Approach. In Proc. Eur. Conf. Comp. Vis. 288--301.
[4]
Yubin Deng, Chen Change Loy, and Xiaoou Tang. 2017. Image Aesthetic Assessment: An Experimental Survey. IEEE Signal Process. Mag. 34, 4 (2017), 80--106.
[5]
Yubin Deng, Chen Change Loy, and Xiaoou Tang. 2018. Aesthetic-Driven Image Enhancement by Adversarial Learning. In Proc. ACM Int. Conf. Multimedia. 870--878.
[6]
Shuai He, Yongchang Zhang, Rui Xie, Dongxiang Jiang, and Anlong Ming. 2022. Rethinking Image Aesthetics Assessment: Models, Datasets and Benchmarks. In Proc. Int. Joint Conf. Artif. Intell. 942--948.
[7]
Vlad Hosu, Bastian Goldlucke, and Dietmar Saupe. 2019. Effective Aesthetics Prediction with Multi-level Spatially Pooled Features. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 9375--9383.
[8]
Jingwen Hou, Sheng Yang, and Weisi Lin. 2020. Object-level Attention for Aesthetic Rating Distribution Prediction. In Proc. ACM Int. Conf. Multimedia. 816--824.
[9]
Yipo Huang, Leida Li, Pengfei Chen, Jinjian Wu, Yuzhe Yang, Yaqian Li, and Guangming Shi. 2024. Coarse-to-fine Image Aesthetics Assessment with Dynamic Attribute Selection. IEEE Trans. Multimedia (2024), 1--14. https://doi.org/10.1109/TMM.2024.3389452
[10]
Yipo Huang, Xiangfei Sheng, Zhichao Yang, Quan Yuan, Zhichao Duan, Pengfei Chen, Leida Li, Weisi Lin, and Guangming Shi. 2024. AesExpert: Towards Multimodality Foundation Model for Image Aesthetics Perception. arXiv preprint arXiv:2404.09624 (2024).
[11]
Yipo Huang, Quan Yuan, Xiangfei Sheng, Zhichao Yang, Haoning Wu, Pengfei Chen, Yuzhe Yang, Leida Li, and Weisi Lin. 2024. AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception. arXiv preprint arXiv:2401.08276 (2024).
[12]
Yatai Ji, Junjie Wang, Yuan Gong, Lin Zhang, Yanru Zhu, Hongfa Wang, Jiaxing Zhang, Tetsuya Sakai, and Yujiu Yang. 2023. MAP: Multimodal Uncertainty-Aware Vision-Language Pre-Training Model. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 23262--23271.
[13]
Gengyun Jia, Peipei Li, and Ran He. 2023. Theme-Aware Aesthetic Distribution Prediction with Full-Resolution Photographs. IEEE Trans. Neural. Netw. Learn. Syst. 34, 11 (2023), 8654--8668.
[14]
Xin Jin, Le Wu, Xiaodong Li, Siyu Chen, Siwei Peng, Jingying Chi, Shiming Ge, Chenggen Song, and Geng Zhao. 2018. Predicting Aesthetic Score Distribution through Cumulative Jensen-shannon Divergence. In Proc. AAAI Int. Conf. Artif. Intell. 77--84.
[15]
Xin Jin, LeWu, Geng Zhao, Xiaodong Li, Xiaokun Zhang, Shiming Ge, Dongqing Zou, Bin Zhou, and Xinghui Zhou. 2019. Aesthetic Attributes Assessment of Images. In Proc. ACM Int. Conf. Multimedia. 311--319.
[16]
Yueying Kao, Ran He, and Kaiqi Huang. 2017. Deep Aesthetic Quality Assessment with Semantic Information. IEEE Trans. Image Process. 26, 3 (2017), 1482--1495.
[17]
Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. 2021. MUSIQ: Multi-Scale Image Quality Transformer. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 5148--5157.
[18]
Junjie Ke, Keren Ye, Jiahui Yu, Yonghui Wu, Peyman Milanfar, and Feng Yang. 2023. VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 10041--10051.
[19]
Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, and Fahad Shahbaz Khan. 2023. Maple: Multi-modal Prompt Learning. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 19113--19122.
[20]
Shu Kong, Xiaohui Shen, Zhe Lin, Radomir Mech, and Charless Fowlkes. 2016. Photo Aesthetics Ranking Network with Attributes and Content Adaptation. In Proc. Eur. Conf. Comp. Vis. 662--679.
[21]
Qi Kuang, Xin Jin, Qinping Zhao, and Bin Zhou. 2020. Deep Multimodality Learning for UAV Video Aesthetic Quality Assessment. IEEE Trans. Multimedia 22, 10 (2020), 2623--2634.
[22]
Michal Kucer, Alexander C Loui, and David W Messinger. 2018. Leveraging Expert Feature Knowledge for Predicting Image Aesthetics. IEEE Trans. Image Process. 27, 10 (2018), 5100--5112.
[23]
Leida Li, Jiachen Duan, Yuzhe Yang, Liwu Xu, Yaqian Li, and Yandong Guo. 2022. Psychology Inspired Model for Hierarchical Image Aesthetic Attribute Prediction. In Proc. IEEE Int. Conf. Multimedia Expo. 1--6.
[24]
Leida Li, Yipo Huang, Jinjian Wu, Yuzhe Yang, Yaqian Li, Yandong Guo, and Guangming Shi. 2023. Theme-Aware Visual Attribute Reasoning for Image Aesthetics Assessment. IEEE Trans. Circuits. Syst. Video Technol. 33, 9 (2023), 4798--4811.
[25]
Leida Li, Hancheng Zhu, Sicheng Zhao, Guiguang Ding, and Weisi Lin. 2020. Personality-assisted Multi-task Learning for Generic and Personalized Image Aesthetics Assessment. IEEE Trans. Image Process. 29 (2020), 3898--3910.
[26]
Dong Liu, Rohit Puri, Nagendra Kamath, and Subhabrata Bhattacharya. 2020. Composition-aware Image Aesthetics Assessment. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 3569--3578.
[27]
Zheyuan Liu, Cristian Rodriguez-Opazo, Damien Teney, and Stephen Gould. 2021. Image Retrieval on Real-Life Images with Pre-Trained Vision-and-Language Models. In Proc. IEEE Int. Conf. Comp. Vis. 2125--2134.
[28]
Peng Lu, Jiahui Liu, Xujun Peng, and Xiaojie Wang. 2020. Weakly Supervised Real-time Image Cropping Based on Aesthetic Distributions. In Proc. ACM Int. Conf. Multimedia. 120--128.
[29]
Xin Lu, Zhe Lin, Hailin Jin, Jianchao Yang, and James Z Wang. 2014. Rapid: Rating Pictorial Aesthetics Using Deep Learning. In Proc. ACM Int. Conf. Multimedia. 457--466.
[30]
Xin Lu, Zhe Lin, Xiaohui Shen, Radomir Mech, and James Z Wang. 2015. Deep Multi-patch Aggregation Network for Image Style, Aesthetics, and Quality Estimation. In Proc. IEEE Int. Conf. Comp. Vis. 990--998.
[31]
Shuang Ma, Jing Liu, and Chang Wen Chen. 2017. A-lamp: Adaptive Layout-aware Multi-patch Deep Convolutional Neural Network for Photo Aesthetic Assessment. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 4535--4544.
[32]
Luca Marchesotti, Florent Perronnin, Diane Larlus, and Gabriela Csurka. 2011. Assessing the Aesthetic Quality of Photographs Using Generic Image Descriptors. In Proc. IEEE Int. Conf. Comp. Vis. 1784--1791.
[33]
Naila Murray and Albert Gordo. 2017. A Deep Architecture for Unified Aesthetic Prediction. arXiv preprint arXiv:1708.04890 (2017).
[34]
Naila Murray, Luca Marchesotti, and Florent Perronnin. 2012. AVA: A Large-scale Database for Aesthetic Visual Analysis. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2408--2415.
[35]
Masashi Nishiyama, Takahiro Okabe, Imari Sato, and Yoichi Sato. 2011. Aesthetic Quality Classification of Photographs Based on Color Harmony. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 33--40.
[36]
Yuzhen Niu, Shanshan Chen, Bingrui Song, Zhixian Chen, and Wenxi Liu. 2022. Comment-guided Semantics-aware Image Aesthetics Assessment. IEEE Trans. Circuits Syst. Video Technol. 33, 3 (2022), 1487--1492.
[37]
Bowen Pan, Shangfei Wang, and Qisheng Jiang. 2019. Image Aesthetic Assessment Assisted by Attributes through Adversarial Learning. In Proc. AAAI Conf. Artif. Intell. 679--686.
[38]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning Transferable Visual Models from Natural Language Supervision. In Proc. Int. Conf. Mach. Learn. 8748--8763.
[39]
Dongyu She, Yu-Kun Lai, Gaoxiong Yi, and Kun Xu. 2021. Hierarchical layout-aware graph convolutional network for unified aesthetics assessment. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 8475--8484.
[40]
Xiangfei Sheng, Leida Li, Pengfei Chen, Jinjian Wu, Weisheng Dong, Yuzhe Yang, Liwu Xu, Yaqian Li, and Guangming Shi. 2023. AesCLIP: Multi-Attribute Contrastive Learning for Image Aesthetics Assessment. In Proc. ACM Int. Conf. Multimedia. 1117--1126.
[41]
Yangyang Shu, Qian Li, Lingqiao Liu, and Guandong Xu. 2024. Semi-Supervised Adversarial Learning for Attribute-Aware Photo Aesthetic Assessment. IEEE Trans. Multimedia 26 (2024), 4086--4096.
[42]
Wei-Tse Sun, Ting-Hsuan Chao, Yin-Hsi Kuo, and Winston H. Hsu. 2017. Photo Filter Recommendation by Category-Aware Aesthetic Learning. IEEE Trans. Multimedia 19, 8 (2017), 1870--1880.
[43]
Hossein Talebi and Peyman Milanfar. 2018. NIMA: Neural image assessment. IEEE Trans. Image Process. 27, 8 (2018), 3998--4011.
[44]
Xiaoou Tang, Wei Luo, and Xiaogang Wang. 2013. Content-based Photo Quality Assessment. IEEE Trans. Multimedia 15, 8 (2013), 1930--1943.
[45]
Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. 2023. Exploring Clip for Assessing the Look and Feel of Images. In Proc. AAAI Int. Conf. Artif. Intell. 2555--2563.
[46]
Zhizhong Wang, Zhanjie Zhang, Lei Zhao, Zhiwen Zuo, Ailin Li, Wei Xing, and Dongming Lu. 2022. AesUST: Towards Aesthetic-Enhanced Universal Style Transfer. In Proc. ACM Int. Conf. Multimedia. 1095--1106.
[47]
Qianqian Xu, Qingming Huang, Tingting Jiang, Bowei Yan, Weisi Lin, and Yuan Yao. 2012. HodgeRank on Random Graphs for Subjective Video Quality Assessment. IEEE Trans. Multimedia 14, 3 (2012), 844--857.
[48]
Qianqian Xu, Qingming Huang, and Yuan Yao. 2012. Online Crowdsourcing Subjective Image Quality Assessment. In Proc. ACM Int. Conf. Multimedia. 359--368.
[49]
Qianqian Xu, Tingting Jiang, Yuan Yao, Qingming Huang, Bowei Yan, and Weisi Lin. 2011. Random Partial Paired Comparison for Subjective Video Quality Assessment via Hodgerank. In Proc. ACM Int. Conf. Multimedia. 393--402.
[50]
Qianqian Xu, Jiechao Xiong, Qingming Huang, and Yuan Yao. 2013. Robust Evaluation for Quality of Experience in Crowdsourcing. In Proc. ACM Int. Conf. Multimedia. 43--52.
[51]
Yuzhe Yang, Liwu Xu, Leida Li, Nan Qie, Yaqian Li, Peng Zhang, and Yandong Guo. 2022. Personalized Image Aesthetics Assessment with Rich Attributes. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 19861--19869.
[52]
Xin Yuan, Zhe Lin, Jason Kuen, Jianming Zhang, Yilin Wang, Michael Maire, Ajinkya Kale, and Baldo Faieta. 2021. Multimodal Contrastive Training for Visual Representation Learning. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 6995--7004.
[53]
Hui Zeng, Zisheng Cao, Lei Zhang, and Alan C Bovik. 2019. A Unified Probabilistic Formulation of Image Aesthetic Assessment. IEEE Trans. Image Process. 29 (2019), 1548--1561.
[54]
Xiaodan Zhang, Xinbo Gao, Wen Lu, Lihuo He, and Jie Li. 2020. Beyond vision: A multimodal recurrent attention convolutional neural network for unified image aesthetic prediction tasks. IEEE Tran. Multimedia 23 (2020), 611--623.
[55]
Ye Zhou, Xin Lu, Junping Zhang, and James Z Wang. 2016. Joint Image and Text Representation for Aesthetics Analysis. In Proc. ACM Int. Conf. Multimedia. 262--266.
[56]
Hancheng Zhu, Zhiwen Shao, Yong Zhou, Guangcheng Wang, Pengfei Chen, and Leida Li. 2023. Personalized Image Aesthetics Assessment with Attribute-guided Fine-grained Feature Representation. In Proc. ACM Int. Conf. Multimedia. 6794--6802.
[57]
Hancheng Zhu, Yong Zhou, Rui Yao, Guangcheng Wang, and Yuzhe Yang. 2022. Learning Image Aesthetic Subjectivity from Attribute-aware Relational Reasoning Network. Pattern Recognit. Lett. 155 (2022), 84--91.

Index Terms

  1. Attribute-Driven Multimodal Hierarchical Prompts for Image Aesthetic Quality Assessment

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. aesthetics-driven
    2. hierarchical prompts
    3. image aesthetic quality assessment
    4. multimodal learning

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 104
      Total Downloads
    • Downloads (Last 12 months)104
    • Downloads (Last 6 weeks)28
    Reflects downloads up to 26 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media