research-article

Attention-based Multi-Patch Aggregation for Image Aesthetic Assessment

Authors:

Bao-Gang HuAuthors Info & Claims

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 879 - 886

https://doi.org/10.1145/3240508.3240554

Published: 15 October 2018 Publication History

Abstract

Aggregation structures with explicit information, such as image attributes and scene semantics, are effective and popular for intelligent systems for assessing aesthetics of visual data. However, useful information may not be available due to the high cost of manual annotation and expert design. In this paper, we present a novel multi-patch (MP) aggregation method for image aesthetic assessment. Different from state-of-the-art methods, which augment an MP aggregation network with various visual attributes, we train the model in an end-to-end manner with aesthetic labels only (i.e., aesthetically positive or negative). We achieve the goal by resorting to an attention-based mechanism that adaptively adjusts the weight of each patch during the training process to improve learning efficiency. In addition, we propose a set of objectives with three typical attention mechanisms (i.e., average, minimum, and adaptive) and evaluate their effectiveness on the Aesthetic Visual Analysis (AVA) benchmark. Numerical results show that our approach outperforms existing methods by a large margin. We further verify the effectiveness of the proposed attention-based objectives via ablation studies and shed light on the design of aesthetic assessment systems.

References

[1]

Subhabrata Bhattacharya, Rahul Sukthankar, and Mubarak Shah. 2011. A holistic approach to aesthetic enhancement of photographs. ACM Transactions on Multimedia Computing, Communications, and Applications, Vol. 7, 1 (2011), 21.

Digital Library

[2]

Chunshui Cao, Xianming Liu, Yi Yang, Yinan Yu, Jiang Wang, Zilei Wang, Yongzhen Huang, Liang Wang, Chang Huang, Wei Xu, et al. 2015. Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks. In Proceedings of the IEEE International Conference on Computer Vision. 2956--2964.

Digital Library

[3]

Kuang-Yu Chang, Kung-Hung Lu, and Chu-Song Chen. 2017. Aesthetic Critiques Generation for Photos. In IEEE International Conference on Computer Vision. IEEE, 3534--3543.

[4]

Yi-Ling Chen, Jan Klopp, Min Sun, Shao-Yi Chien, and Kwan-Liu Ma. 2017. Learning to Compose with Professional Photographs on the Web. In Proceedings of ACM on Multimedia Conference. ACM, 37--45.

Digital Library

[5]

Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z Wang. 2006. Studying aesthetics in photographic images using a computational approach. In European Conference on Computer Vision. Springer, 288--301.

Digital Library

[6]

Yubin Deng, Chen Change Loy, and Xiaoou Tang. 2017. Image Aesthetic Assessment: An experimental survey. IEEE Signal Processing Magazine, Vol. 34, 4 (2017), 80--106.

[7]

Michael Freeman. 2006. The Complete Guide to Light and Lighting in Digital Photography (A Lark Photography Book) .Lark Books.

Digital Library

[8]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2014. Spatial pyramid pooling in deep convolutional networks for visual recognition. In European Conference on Computer Vision. Springer, 346--361.

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Computer Vision and Pattern Recognition. IEEE, 770--778.

[10]

Laurent Itti and Christof Koch. 2001. Computational modelling of visual attention. Nature Reviews Neuroscience, Vol. 2, 3 (2001), 194.

[11]

Yueying Kao, Ran He, and Kaiqi Huang. 2017. Deep aesthetic quality assessment with semantic information. IEEE Transactions on Image Processing, Vol. 26, 3 (2017), 1482--1495.

Digital Library

[12]

Yueying Kao, Kaiqi Huang, and Steve Maybank. 2016. Hierarchical aesthetic quality assessment using deep convolutional neural networks. Signal Processing: Image Communication, Vol. 47, C (2016), 500--510.

Digital Library

[13]

Yan Ke, Xiaoou Tang, and Feng Jing. 2006. The Design of High-Level Features for Photo Quality Assessment. In IEEE Computer Vision and Pattern Recognition. IEEE, 419--426.

Digital Library

[14]

Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2016. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint:1609.04836 (2016).

[15]

Shu Kong, Xiaohui Shen, Zhe Lin, Radomir Mech, and Charless C Fowlkes. 2016b. Photo Aesthetics Ranking Network with Attributes and Content Adaptation. European Conference on Computer Vision (2016), 662--679.

[16]

Yan Kong, Weiming Dong, Xing Mei, Chongyang Ma, Tong-Yee Lee, Siwei Lyu, Feiyue Huang, and Xiaopeng Zhang. 2016a. Measuring and Predicting Visual Importance of Similar Objects. IEEE Transactions on Visualization and Computer Graphics, Vol. 22, 12 (2016), 2564--2578.

Digital Library

[17]

Tsung Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. 2017. Focal Loss for Dense Object Detection. In IEEE International Conference on Computer Vision. IEEE, 2999--3007.

[18]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In IEEE Computer Vision and Pattern Recognition. IEEE, 3431--3440.

[19]

Mai Long, Jin Hailin, and Liu Feng. 2016. Composition-preserving deep photo aesthetics assessment. In IEEE Computer Vision and Pattern Recognition. IEEE, 497--506.

[20]

Ilya Loshchilov and Frank Hutter. 2015. Online Batch Selection for Faster Training of Neural Networks. Mathematics (2015).

[21]

Xin Lu, Zhe Lin, Hailin Jin, Jianchao Yang, and James Z Wang. 2015a. Rating image aesthetics using deep learning. IEEE Transactions on Multimedia, Vol. 17, 11 (2015), 2021--2034.

Digital Library

[22]

Xin Lu, Zhe Lin, Xiaohui Shen, Radomir Mech, and James Z Wang. 2015b. Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In IEEE International Conference on Computer Vision. IEEE, 990--998.

Digital Library

[23]

Shuang Ma, Jing Liu, and Wen Chen Chang. 2017. A-Lamp: Adaptive layout-aware multi-patch deep convolutional neural network for photo aesthetic assessment. In IEEE Computer Vision and Pattern Recognition. IEEE, 722--731.

[24]

Raghu Maithra, Gilmer Justin, Yosinski Jason, and Jascha Sohl-Dickstein. 2017. SVCCA: Singular vector canonical correlation analysis for deep learning dynamics and interpretability. In Advances in Neural Information Processing Systems. 6076--6085.

[25]

Naila Murray, Luca Marchesotti, and Florent Perronnin. 2012. AVA: A large-scale database for aesthetic visual analysis. In IEEE Computer Vision and Pattern Recognition. IEEE, 2408--2415.

Digital Library

[26]

Clark V. Poling. 1975. Johannes Itten, Design and Form: The Basic Course at the Bauhaus and Later .Thames and Hudson. 368--370 pages.

[27]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S Bernstein, et al. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, Vol. 115, 3 (2015), 211--252.

Digital Library

[28]

Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. 2016. Training Region-Based Object Detectors with Online Hard Example Mining. In IEEE Computer Vision and Pattern Recognition. IEEE, 761--769.

[29]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint:1409.1556 (2014).

[30]

Marijn F Stollenga, Jonathan Masci, Faustino Gomez, and Jürgen Schmidhuber. 2014. Deep networks with internal selective attention through feedback connections. In Advances in neural information processing systems. 3545--3553.

Digital Library

[31]

Hossein Talebi and Peyman Milanfar. 2018. NIMA: Neural image assessment. IEEE Transactions on Image Processing, Vol. 27, 8 (2018), 3998--4011.

[32]

Hanghang Tong, Mingjing Li, Hong-Jiang Zhang, Jingrui He, and Changshui Zhang. 2004. Classification of digital photos taken by photographers or home users. In Pacific-Rim Conference on Multimedia. Springer, 198--205.

Digital Library

[33]

Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. 2017. Residual attention network for image classification. IEEE Computer Vision and Pattern Recognition, 3156--3164.

[34]

Zhangyang Wang, Shiyu Chang, Florin Dolcos, Diane Beck, Ding Liu, and Thomas S Huang. 2016. Brain-inspired deep networks for image aesthetics assessment. arXiv preprint:1601.04155 (2016).

[35]

Wang Wenguan and Shen Jianbing. 2017. Deep cropping via attention box prediction and aesthetic assessment. In IEEE International Conference on Computer Vision. IEEE, 2186--2194.

[36]

Luming Zhang. 2016. Describing Human Aesthetic Perception by Deeply-learned Attributes from Flickr. arXiv preprint:1605.07699 (2016).

Cited By

Zhao XShi LHan ZYuan P(2024)A Mobile Image Aesthetics Processing System with Intelligent Scene PerceptionApplied Sciences10.3390/app1402082214:2(822)Online publication date: 18-Jan-2024
https://doi.org/10.3390/app14020822
Chai XSun YGao Y(2024)Towards Data-Driving Multi-View Evaluation Framework for ScratchTsinghua Science and Technology10.26599/TST.2023.901001629:2(517-528)Online publication date: Apr-2024
https://doi.org/10.26599/TST.2023.9010016
Santos ICasal MCorreia JTorrente-Patiño ÁMachado PRomero J(2024)Towards Robust Evaluation of Aesthetic and Photographic Quality Metrics: Insights from a Comprehensive DatasetComplexity10.1155/2024/82235862024:1Online publication date: 26-Sep-2024
https://doi.org/10.1155/2024/8223586
Show More Cited By

Index Terms

Attention-based Multi-Patch Aggregation for Image Aesthetic Assessment
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Image and video acquisition
        Computational photography
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Context-aware Attention Network for Predicting Image Aesthetic Subjectivity
MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Image aesthetic assessment involves both fine-grained details and the holistic layout of images. However, most of current approaches learn the local and the holistic information separately, which has a potential loss of contextual information. ...
Object-level Attention for Aesthetic Rating Distribution Prediction
MM '20: Proceedings of the 28th ACM International Conference on Multimedia

We study the problem of image aesthetic assessment (IAA) and aim to automatically predict the image aesthetic quality in the form of discrete distribution, which is particularly important in IAA due to its nature of having possibly higher ...
Image Aesthetic Assessment Based on Emotion-Assisted Multi-Task Learning Network
ICMSSP '21: Proceedings of the 2021 6th International Conference on Multimedia Systems and Signal Processing

Image emotion recognition and image aesthetic assessment are recent research hotspots in user perception of image content. However, for the study of image aesthetics and image emotion, the vast majority of studies are separated from the relationship ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '18: Proceedings of the 26th ACM international conference on Multimedia

October 2018

2167 pages

ISBN:9781450356657

DOI:10.1145/3240508

General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Beijing Natural Science Foundation

Conference

MM '18

Sponsor:

SIGMM

MM '18: ACM Multimedia Conference

October 22 - 26, 2018

Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

74
Total Citations
View Citations
824
Total Downloads

Downloads (Last 12 months)57
Downloads (Last 6 weeks)7

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhao XShi LHan ZYuan P(2024)A Mobile Image Aesthetics Processing System with Intelligent Scene PerceptionApplied Sciences10.3390/app1402082214:2(822)Online publication date: 18-Jan-2024
https://doi.org/10.3390/app14020822
Chai XSun YGao Y(2024)Towards Data-Driving Multi-View Evaluation Framework for ScratchTsinghua Science and Technology10.26599/TST.2023.901001629:2(517-528)Online publication date: Apr-2024
https://doi.org/10.26599/TST.2023.9010016
Santos ICasal MCorreia JTorrente-Patiño ÁMachado PRomero J(2024)Towards Robust Evaluation of Aesthetic and Photographic Quality Metrics: Insights from a Comprehensive DatasetComplexity10.1155/2024/82235862024:1Online publication date: 26-Sep-2024
https://doi.org/10.1155/2024/8223586
Shi TChen CWu ZHao AFang Y(2024)Improving Image Aesthetic Assessment via Multiple Image Joint LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3687128Online publication date: 21-Aug-2024
https://doi.org/10.1145/3687128
Zhong Z(2024)Research on Product Advertising Design Combining Feature Extraction Technology and Web3D TechnologyACM Transactions on Asian and Low-Resource Language Information Processing10.1145/360894823:6(1-13)Online publication date: 22-Jun-2024
https://dl.acm.org/doi/10.1145/3608948
Huang JGong YZhang LZhang JNie LYin Y(2024)Modeling Multiple Aesthetic Views for Series Photo SelectionIEEE Transactions on Multimedia10.1109/TMM.2023.329075126(1983-1995)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3290751
Chen HShao FMu BJiang Q(2024)Image Aesthetics Assessment With Emotion-Aware Multibranch NetworkIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.336517473(1-15)Online publication date: 2024
https://doi.org/10.1109/TIM.2024.3365174
Zhang KZhu DMin XGao ZZhai G(2024)Synergetic Assessment of Quality and Aesthetic: Approach and Comprehensive Benchmark DatasetIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.3303933(1-1)Online publication date: 2024
https://doi.org/10.1109/TCSVT.2023.3303933
Li SHe XLiang H(2024)Top-Down Guidance Based ViT-CNN Network Considering Theme Information for Image Aesthetic Assessment2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10687765(1-6)Online publication date: 15-Jul-2024
https://doi.org/10.1109/ICME57554.2024.10687765
Chen PZhang XZhou CFan DTu PZhang LQian Y(2024)Learning Triangular Distribution in Visual World2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01048(11019-11029)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.01048
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents