Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3664647.3681011acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

AesMamba: Universal Image Aesthetic Assessment with State Space Models

Published: 28 October 2024 Publication History

Abstract

Image Aesthetic Assessment (IAA) aims to objectively predict the generic or personalized evaluations, of the aesthetic or fine-grained multi-attributes, based on visual or multimodal inputs. Previously, researchers have designed diverse and specialized methods, for specific IAA tasks, based on different input-output situations. Is it possible to design a universal IAA framework applicable for the whole IAA task taxonomy? In this paper, we explore this issue, and propose a modular IAA framework, dubbed AesMamba. Specially, we use the Visual State Space Model (VMamba), instead of CNNs or ViTs, to learn comprehensive representations of aesthetic-related attributes; because VMamba can efficiently achieve both global and local effective receptive fields. Afterward, a modal-adaptive module is used to automatically produce the integrated representations, conditioned on the type of input. In the prediction module, we propose a Multitask Balanced Adaptation (MBA) module, to boost task-specific features, with emphasis on the tail instances. Finally, we formulate the personalized IAA task as a multimodal learning problem, by converting a user's anonymous subject characters to a text prompt. This prompting strategy effectively employs the semantics of flexibly selected characters, for inferring individual preferences. AesMamba can be applied to diverse IAA tasks, through flexible combination of these modules. Extensive experiments on numerous datasets, demonstrate that AesMamba consistently achieves superior or competitive performance, on all IAA tasks, in comparison with previous SOTA methods. The code has been released at https://github.com/AiArt-Gao/AesMamba Github.

References

[1]
Seyed Ali Amirshahi, Gregor Uwe Hayn-Leichsenring, Joachim Denzler, and Christoph Redies. 2015. Jenaesthetics subjective dataset: analyzing paintings by subjective scores. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops. Springer, 3--19.
[2]
Hangbo Bao, Wenhui Wang, Li Dong, Qiang Liu, Owais Khan Mohammed, Kriti Aggarwal, Subhojit Som, Songhao Piao, and Furu Wei. 2022. Vlmo: Unified vision-language pre-training with mixture-of-modality-experts. NeurIPS, Vol. 35 (2022), 32897--32912.
[3]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems, Vol. 33 (2020), 1877--1901.
[4]
Luigi Celona, Marco Leonardi, Paolo Napoletano, and Alessandro Rozza. 2022. Composition and style attributes guided image aesthetic assessment. IEEE Transactions on Image Processing, Vol. 31 (2022), 5009--5024.
[5]
Hangwei Chen, Feng Shao, Baoyang Mu, and Qiuping Jiang. 2024. Image Aesthetics Assessment with Emotion-Aware Multi-Branch Network. IEEE Transactions on Instrumentation and Measurement (2024).
[6]
Qiuyu Chen, Wei Zhang, Ning Zhou, Peng Lei, Yi Xu, Yu Zheng, and Jianping Fan. 2020. Adaptive fractional dilated convolution network for image aesthetics assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14114--14123.
[7]
Chaoran Cui, Wenya Yang, Cheng Shi, Meng Wang, Xiushan Nie, and Yilong Yin. 2020. Personalized image quality assessment with social-sensed aesthetic preference. Information Sciences, Vol. 512 (2020), 780--794.
[8]
Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z Wang. 2006. Studying aesthetics in photographic images using a computational approach. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 288--301.
[9]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186.
[10]
Sagnik Dhar, Vicente Ordonez, and Tamara L Berg. 2011. High level describable attributes for predicting aesthetics and interestingness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 1657--1664.
[11]
Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, et al. 2023. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, Vol. 5, 3 (2023), 220--235.
[12]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations. 1--9.
[13]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations. 1--22.
[14]
Koustav Ghosal, Aakanksha Rana, and Aljosa Smolic. 2019. Aesthetic Image Captioning From Weakly-Labelled Photographs. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.
[15]
Koustav Ghosal and Aljosa Smolic. 2022. Image aesthetics assessment using graph attention network. In 2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 3160--3167.
[16]
Samuel Goree, Weslie Khoo, and David J Crandall. 2023. Correct for whom? subjectivity and the evaluation of personalized image aesthetics assessment models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 11818--11827.
[17]
Albert Gu and Tri Dao. 2023. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023), 1--37.
[18]
Albert Gu, Karan Goel, Ankit Gupta, and Christopher Ré. 2022. On the parameterization and initialization of diagonal state space models. Advances in Neural Information Processing Systems, Vol. 35 (2022), 35971--35983.
[19]
Albert Gu, Karan Goel, and Christopher Re. 2021. Efficiently Modeling Long Sequences with Structured State Spaces. In International Conference on Learning Representations.
[20]
Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, and Christopher Ré. 2021. Combining recurrent, convolutional, and continuous-time models with linear state space layers. Advances in Neural Information Processing Systems, Vol. 34 (2021), 572--585.
[21]
Ankit Gupta, Albert Gu, and Jonathan Berant. 2022. Diagonal state spaces are as effective as structured state spaces. Advances in Neural Information Processing Systems, Vol. 35 (2022), 22982--22994.
[22]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 770--778.
[23]
Shuai He, Anlong Ming, Yaqi Li, Jinyuan Sun, ShunTian Zheng, and Huadong Ma. 2023. Thinking image color aesthetics assessment: Models, datasets and benchmarks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 21838--21847.
[24]
Shuai He, Yongchang Zhang, Rui Xie, Dongxiang Jiang, and Anlong Ming. 2022. Rethinking image aesthetics assessment: Models, datasets and benchmarks. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22. 942--948.
[25]
Vlad Hosu, Bastian Goldlucke, and Dietmar Saupe. 2019. Effective aesthetics prediction with multi-level spatially pooled features. In Proceedings of the IEEE/CVF conference on Cmputer Vision and Pattern Recognition. 9375--9383.
[26]
Jingwen Hou, Weisi Lin, Yuming Fang, Haoning Wu, Chaofeng Chen, Liang Liao, and Weide Liu. 2023. Towards transparent deep image aesthetics assessment with tag-based content descriptors. IEEE Transactions on Image Processing (2023).
[27]
Jingwen Hou, Weisi Lin, Guanghui Yue, Weide Liu, and Baoquan Zhao. 2023. Interaction-Matrix Based Personalized Image Aesthetics Assessment. IEEE Transactions on Multimedia, Vol. 25 (2023), 5263--5278.
[28]
Tao Huang, Xiaohuan Pei, Shan You, Fei Wang, Chen Qian, and Chang Xu. 2024. Localmamba: Visual state space model with windowed selective scan. arXiv preprint arXiv:2403.09338 (2024).
[29]
Yipo Huang, Leida Li, Yuzhe Yang, Yaqian Li, and Yandong Guo. 2023. Explainable and Generalizable Blind Image Quality Assessment via Semantic Attribute Reasoning. IEEE Transactions on Multimedia, Vol. 25 (2023), 7672--7685.
[30]
Yipo Huang, Quan Yuan, Xiangfei Sheng, Zhichao Yang, Haoning Wu, Pengfei Chen, Yuzhe Yang, Leida Li, and Weisi Lin. 2024. AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception. arXiv preprint arXiv:2401.08276 (2024), 1--9.
[31]
Xin Jin, Xinning Li, Hao Lou, Chenyu Fan, Qiang Deng, Chaoen Xiao, Shuai Cui, and Amit Kumar Singh. 2023. Aesthetic attribute assessment of images numerically on mixed multi-attribute datasets. ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 18, 3s (2023), 1--16.
[32]
Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. 2021. Musiq: Multi-scale image quality transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5148--5157.
[33]
Junjie Ke, Keren Ye, Jiahui Yu, Yonghui Wu, Peyman Milanfar, and Feng Yang. 2023. Vila: Learning image aesthetics from user comments with vision-language pretraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10041--10051.
[34]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980 (2014).
[35]
Shu Kong, Xiaohui Shen, Zhe Lin, Radomir Mech, and Charless Fowlkes. 2016. Photo aesthetics ranking network with attributes and content adaptation. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 662--679.
[36]
Guipeng Lan, Shuai Xiao, Jiachen Yang, Yanshuang Zhou, Jiabao Wen, Wen Lu, and Xinbo Gao. 2024. Image Aesthetics Assessment Based on Hypernetwork of Emotion Fusion. IEEE Transactions on Multimedia, Vol. 26 (2024), 3640--3650.
[37]
Hao Li, Lei Wang, Yuanqiao Zhang, Yue Wu, and Maoguo Gong. 2023. Survey of Evolutionary Multitasking Optimization. Journal of Software, Vol. 34, 2 (2023), 509--538.
[38]
J Li, R Datta, D Joshi, and JZ Wang. 2006. Studying aesthetics in photographic images using a computational approach. Lecture Notes in Computer Science, Vol. 3953 (2006), 288--301.
[39]
Leida Li, Jiachen Duan, Yuzhe Yang, Liwu Xu, Yaqian Li, and Yandong Guo. 2022. Psychology inspired model for hierarchical image aesthetic attribute prediction. In 2022 IEEE International Conference on Multimedia and Expo. IEEE, 1--6.
[40]
Leida Li, Yipo Huang, Jinjian Wu, Yuzhe Yang, Yaqian Li, Yandong Guo, and Guangming Shi. 2023. Theme-aware Visual Attribute Reasoning for Image Aesthetics Assessment. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 33 (2023), 4798--4811.
[41]
Leida Li, Tianwu Zhi, Guangming Shi, Yuzhe Yang, Liwu Xu, Yaqian Li, and Yandong Guo. 2023 d. Anchor-based knowledge embedding for image aesthetics assessment. Neurocomputing, Vol. 539 (2023), 126197.
[42]
Leida Li, Hancheng Zhu, Sicheng Zhao, Guiguang Ding, and Weisi Lin. 2020. Personality-assisted multi-task learning for generic and personalized image aesthetics assessment. IEEE Transactions on Image Processing, Vol. 29 (2020), 3898--3910.
[43]
Leida Li, Tong Zhu, Pengfei Chen, Yuzhe Yang, Yaqian Li, and Weisi Lin. 2023 e. Image Aesthetics Assessment With Attribute-Assisted Multimodal Memory Network. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 33, 12 (2023), 7413--7424.
[44]
Weijie Li, Yitian Wan, Xingjiao Wu, Junjie Xu, and Liang He. 2023. UMAAF: Unveiling Aesthetics via Multifarious Attributes of Images. arXiv preprint arXiv:2311.11306 (2023), 1--16.
[45]
Yaohui Li, Yuzhe Yang, Huaxiong Li, Haoxing Chen, Liwu Xu, Leida Li, Yaqian Li, and Yandong Guo. 2022. Transductive aesthetic preference propagation for personalized image aesthetics assessment. In Proceedings of the 30th ACM International Conference on Multimedia. 896--904.
[46]
Jing Liu, Jincheng Lv, Min Yuan, Jing Zhang, and Yuting Su. 2020. ABSNet: Aesthetics-based saliency network using multi-task convolutional network. IEEE Signal Process. Lett., Vol. 27 (2020), 2014--2018.
[47]
Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, and Yunfan Liu. 2024. Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166 (2024), 1--14.
[48]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012--10022.
[49]
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. 2022. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11976--11986.
[50]
Yiwen Luo and Xiaoou Tang. 2008. Photo and video quality evaluation: Focusing on the subject. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 386--399.
[51]
Pei Lv, Jianqi Fan, Xixi Nie, Weiming Dong, Xiaoheng Jiang, Bing Zhou, Mingliang Xu, and Changsheng Xu. 2021. User-guided personalized image aesthetic assessment based on deep reinforcement learning. IEEE Transactions on Multimedia, Vol. 25 (2021), 736--749.
[52]
Pei Lv, Meng Wang, Yongbo Xu, Ze Peng, Junyi Sun, Shimei Su, Bing Zhou, and Mingliang Xu. 2018. USAR: An interactive user-specific aesthetic ranking framework for images. In Proceedings of the 26th ACM international conference on Multimedia. 1328--1336.
[53]
Harsh Mehta, Ankit Gupta, Ashok Cutkosky, and Behnam Neyshabur. 2023. Long Range Language Modeling via Gated State Spaces. In International Conference on Learning Representations.
[54]
Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, and Ying Shan. 2024. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 4296--4304.
[55]
Naila Murray, Luca Marchesotti, and Florent Perronnin. 2012. AVA: A large-scale database for aesthetic visual analysis. In 2012 IEEE conference on Computer Vision and Pattern Recognition. IEEE, 2408--2415.
[56]
Xixi Nie, Bo Hu, Xinbo Gao, Leida Li, Xiaodan Zhang, and Bin Xiao. 2023. BMI-Net: A Brain-inspired Multimodal Interaction Network for Image Aesthetic Assessment. In Proceedings of the 31st ACM International Conference on Multimedia. 5514--5522.
[57]
Yuzhen Niu, Shanshan Chen, Bingrui Song, Zhixian Chen, and Wenxi Liu. 2022. Comment-guided semantics-aware image aesthetics assessment. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 33, 3 (2022), 1487--1492.
[58]
Xiaohuan Pei, Tao Huang, and Chang Xu. 2024. Efficientvmamba: Atrous selective scan for light weight visual mamba. arXiv preprint arXiv:2403.09977 (2024).
[59]
Jan Pfister, Konstantin Kobs, and Andreas Hotho. 2021. Self-supervised multi-task pretraining improves image aesthetic assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 816--825.
[60]
Yanyuan Qiao, Zheng Yu, Longteng Guo, Sihan Chen, Zijia Zhao, Mingzhen Sun, Qi Wu, and Jing Liu. 2024. VL-Mamba: Exploring State Space Models for Multimodal Learning. arXiv preprint arXiv:2403.13600 (2024).
[61]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In ICML. 8748--8763.
[62]
Jian Ren, Xiaohui Shen, Zhe Lin, Radomir Mech, and David J Foran. 2017. Personalized image aesthetics. In Proceedings of the IEEE international conference on computer vision. 638--647.
[63]
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, Vol. 35 (2022), 25278--25294.
[64]
Noam Shazeer. 2020. Glu variants improve transformer. arXiv preprint arXiv:2002.05202 (2020), 1--5.
[65]
Dongyu She, Yu-Kun Lai, Gaoxiong Yi, and Kun Xu. 2021. Hierarchical layout-aware graph convolutional network for unified aesthetics assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8475--8484.
[66]
Kekai Sheng, Weiming Dong, Chongyang Ma, Xing Mei, Feiyue Huang, and Bao-Gang Hu. 2018. Attention-based multi-patch aggregation for image aesthetic assessment. In Proceedings of the 26th ACM international conference on Multimedia. 879--886.
[67]
Xiangfei Sheng, Leida Li, Pengfei Chen, Jinjian Wu, Weisheng Dong, Yuzhe Yang, Liwu Xu, Yaqian Li, and Guangming Shi. 2023. AesCLIP: Multi-Attribute Contrastive Learning for Image Aesthetics Assessment. In Proceedings of the 31st ACM International Conference on Multimedia. 1117--1126.
[68]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations (ICLR). 1--14.
[69]
Derya Soydaner and Johan Wagemans. 2024. Multi-Task Convolutional Neural Network for Image Aesthetic Assessment. IEEE Access, Vol. 12 (2024), 4716--4729.
[70]
Hossein Talebi and Peyman Milanfar. 2018. NIMA: Neural image assessment. IEEE transactions on image processing, Vol. 27, 8 (2018), 3998--4011.
[71]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems, Vol. 30 (2017), 1--11.
[72]
Guolong Wang, Junchi Yan, and Zheng Qin. 2018. Collaborative and Attentive Learning for Personalized Image Aesthetic Assessment. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 957--963.
[73]
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016), 1--23.
[74]
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1492--1500.
[75]
Zhiwei Xiong, Han Yu, and Zhiqi Shen. 2023. Federated learning for personalized image aesthetics assessment. In 2023 IEEE International Conference on Multimedia and Expo. IEEE, 336--341.
[76]
Zhengzhuo Xu, Ruikang Liu, Shuo Yang, Zenghao Chai, and Chun Yuan. 2023. Learning Imbalanced Data with Vision Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15793--15803.
[77]
Xingao Yan, Feng Shao, Hangwei Chen, and Qiuping Jiang. 2024. Hybrid CNN-transformer based meta-learning approach for personalized image aesthetics assessment. Journal of Visual Communication and Image Representation, Vol. 98 (2024), 104044--104053.
[78]
Yuzhe Yang, Liwu Xu, Leida Li, Nan Qie, Yaqian Li, Peng Zhang, and Yandong Guo. 2022. Personalized image aesthetics assessment with rich attributes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19861--19869.
[79]
Zhichao Yang, Leida Li, Yuzhe Yang, Yaqian Li, and Weisi Lin. 2023. Multi-level transitional contrast learning for personalized image aesthetics assessment. IEEE Transactions on Multimedia (2023).
[80]
Ran Yi, Haoyuan Tian, Zhihao Gu, Yu-Kun Lai, and Paul L Rosin. 2023. Towards artistic image aesthetics assessment: a large-scale dataset and a new method. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22388--22397.
[81]
Hui Zeng, Zisheng Cao, Lei Zhang, and Alan C Bovik. 2019. A unified probabilistic formulation of image aesthetic assessment. IEEE Transactions on Image Processing, Vol. 29 (2019), 1548--1561.
[82]
Hui Zeng, Zisheng Cao, Lei Zhang, and Alan C Bovik. 2020. A Unified Probabilistic Formulation of Image Aesthetic Assessment. IEEE Transactions on Image Processing, Vol. 29 (2020), 1548--1561.
[83]
Xiaodan Zhang, Xinbo Gao, Lihuo He, and Wen Lu. 2021. MSCAN: Multimodal Self-and-Collaborative Attention Network for image aesthetic prediction tasks. Neurocomputing, Vol. 430 (2021), 14--23.
[84]
Xiaodan Zhang, Xinbo Gao, Wen Lu, and Lihuo He. 2019. A gated peripheral-foveal convolutional neural network for unified image aesthetic prediction. IEEE Trans. Multimedia, Vol. 21, 11 (2019), 2815--2826.
[85]
Xiaodan Zhang, Xinbo Gao, Wen Lu, Lihuo He, and Jie Li. 2020. Beyond vision: A multimodal recurrent attention convolutional neural network for unified image aesthetic prediction tasks. IEEE Trans. Multimedia, Vol. 23 (2020), 611--623.
[86]
Xin Zhang, Xinyu Jiang, Qing Song, and Pengzhou Zhang. 2023. A Visual Enhancement Network with Feature Fusion for Image Aesthetic Assessment. Electronics, Vol. 12, 11 (2023), 2526--2543.
[87]
Xiaodan Zhang, Yuan Xiao, Jinye Peng, Xinbo Gao, and Bo Hu. 2024. Confidence-based dynamic cross-modal memory network for image aesthetic assessment. Pattern Recognition, Vol. 149 (2024), 110227--110239.
[88]
Lin Zhao, Meimei Shang, Fei Gao, Rongsheng Li, Fei Huang, and Jun Yu. 2020. Representation learning of image composition for aesthetic prediction. Computer Vision and Image Understanding, Vol. 199 (2020), 103024.
[89]
Zhipeng Zhong, Fei Zhou, and Guoping Qiu. 2023. Aesthetically relevant image captioning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 3733--3741.
[90]
Hancheng Zhu, Leida Li, Jinjian Wu, Sicheng Zhao, Guiguang Ding, and Guangming Shi. 2020. Personalized image aesthetics assessment via meta-learning with bilevel gradient optimization. IEEE Transactions on Cybernetics, Vol. 52, 3 (2020), 1798--1811.
[91]
Hancheng Zhu, Zhiwen Shao, Yong Zhou, Guangcheng Wang, Pengfei Chen, and Leida Li. 2023. Personalized Image Aesthetics Assessment with Attribute-guided Fine-grained Feature Representation. In Proceedings of the 31st ACM International Conference on Multimedia. 6794--6802.
[92]
Hancheng Zhu, Yong Zhou, Leida Li, Yaqian Li, and Yandong Guo. 2021. Learning personalized image aesthetics from subjective and objective attributes. IEEE Transactions on Multimedia, Vol. 25 (2021), 179--190.
[93]
Hancheng Zhu, Yun Zhou, Leida Li, JiaQi Zhao, and Wenliang Du. 2022. Interaction-Matrix Based Personalized Image Aesthetics Assessment. Journal of Image and Graphics, Vol. 27, 10 (2022), 2937--2951.
[94]
Hancheng Zhu, Yong Zhou, Zhiwen Shao, Wenliang Du, Guangcheng Wang, and Qiaoyue Li. 2022. Personalized Image Aesthetics Assessment via Multi-Attribute Interactive Reasoning. Mathematics, Vol. 10, 22 (2022), 4181--4195.
[95]
Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and Xinggang Wang. 2024. Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model. arXiv preprint arXiv:2401.09417 (2024).

Index Terms

  1. AesMamba: Universal Image Aesthetic Assessment with State Space Models

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. image aesthetic assessment
    2. imbalanced learning
    3. multimodal learning
    4. multitask learning
    5. state space model

    Qualifiers

    • Research-article

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 90
      Total Downloads
    • Downloads (Last 12 months)90
    • Downloads (Last 6 weeks)30
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media