Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3240508.3240554acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Attention-based Multi-Patch Aggregation for Image Aesthetic Assessment

Published: 15 October 2018 Publication History

Abstract

Aggregation structures with explicit information, such as image attributes and scene semantics, are effective and popular for intelligent systems for assessing aesthetics of visual data. However, useful information may not be available due to the high cost of manual annotation and expert design. In this paper, we present a novel multi-patch (MP) aggregation method for image aesthetic assessment. Different from state-of-the-art methods, which augment an MP aggregation network with various visual attributes, we train the model in an end-to-end manner with aesthetic labels only (i.e., aesthetically positive or negative). We achieve the goal by resorting to an attention-based mechanism that adaptively adjusts the weight of each patch during the training process to improve learning efficiency. In addition, we propose a set of objectives with three typical attention mechanisms (i.e., average, minimum, and adaptive) and evaluate their effectiveness on the Aesthetic Visual Analysis (AVA) benchmark. Numerical results show that our approach outperforms existing methods by a large margin. We further verify the effectiveness of the proposed attention-based objectives via ablation studies and shed light on the design of aesthetic assessment systems.

References

[1]
Subhabrata Bhattacharya, Rahul Sukthankar, and Mubarak Shah. 2011. A holistic approach to aesthetic enhancement of photographs. ACM Transactions on Multimedia Computing, Communications, and Applications, Vol. 7, 1 (2011), 21.
[2]
Chunshui Cao, Xianming Liu, Yi Yang, Yinan Yu, Jiang Wang, Zilei Wang, Yongzhen Huang, Liang Wang, Chang Huang, Wei Xu, et al. 2015. Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks. In Proceedings of the IEEE International Conference on Computer Vision. 2956--2964.
[3]
Kuang-Yu Chang, Kung-Hung Lu, and Chu-Song Chen. 2017. Aesthetic Critiques Generation for Photos. In IEEE International Conference on Computer Vision. IEEE, 3534--3543.
[4]
Yi-Ling Chen, Jan Klopp, Min Sun, Shao-Yi Chien, and Kwan-Liu Ma. 2017. Learning to Compose with Professional Photographs on the Web. In Proceedings of ACM on Multimedia Conference. ACM, 37--45.
[5]
Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z Wang. 2006. Studying aesthetics in photographic images using a computational approach. In European Conference on Computer Vision. Springer, 288--301.
[6]
Yubin Deng, Chen Change Loy, and Xiaoou Tang. 2017. Image Aesthetic Assessment: An experimental survey. IEEE Signal Processing Magazine, Vol. 34, 4 (2017), 80--106.
[7]
Michael Freeman. 2006. The Complete Guide to Light and Lighting in Digital Photography (A Lark Photography Book) .Lark Books.
[8]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2014. Spatial pyramid pooling in deep convolutional networks for visual recognition. In European Conference on Computer Vision. Springer, 346--361.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Computer Vision and Pattern Recognition. IEEE, 770--778.
[10]
Laurent Itti and Christof Koch. 2001. Computational modelling of visual attention. Nature Reviews Neuroscience, Vol. 2, 3 (2001), 194.
[11]
Yueying Kao, Ran He, and Kaiqi Huang. 2017. Deep aesthetic quality assessment with semantic information. IEEE Transactions on Image Processing, Vol. 26, 3 (2017), 1482--1495.
[12]
Yueying Kao, Kaiqi Huang, and Steve Maybank. 2016. Hierarchical aesthetic quality assessment using deep convolutional neural networks. Signal Processing: Image Communication, Vol. 47, C (2016), 500--510.
[13]
Yan Ke, Xiaoou Tang, and Feng Jing. 2006. The Design of High-Level Features for Photo Quality Assessment. In IEEE Computer Vision and Pattern Recognition. IEEE, 419--426.
[14]
Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2016. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint:1609.04836 (2016).
[15]
Shu Kong, Xiaohui Shen, Zhe Lin, Radomir Mech, and Charless C Fowlkes. 2016b. Photo Aesthetics Ranking Network with Attributes and Content Adaptation. European Conference on Computer Vision (2016), 662--679.
[16]
Yan Kong, Weiming Dong, Xing Mei, Chongyang Ma, Tong-Yee Lee, Siwei Lyu, Feiyue Huang, and Xiaopeng Zhang. 2016a. Measuring and Predicting Visual Importance of Similar Objects. IEEE Transactions on Visualization and Computer Graphics, Vol. 22, 12 (2016), 2564--2578.
[17]
Tsung Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. 2017. Focal Loss for Dense Object Detection. In IEEE International Conference on Computer Vision. IEEE, 2999--3007.
[18]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In IEEE Computer Vision and Pattern Recognition. IEEE, 3431--3440.
[19]
Mai Long, Jin Hailin, and Liu Feng. 2016. Composition-preserving deep photo aesthetics assessment. In IEEE Computer Vision and Pattern Recognition. IEEE, 497--506.
[20]
Ilya Loshchilov and Frank Hutter. 2015. Online Batch Selection for Faster Training of Neural Networks. Mathematics (2015).
[21]
Xin Lu, Zhe Lin, Hailin Jin, Jianchao Yang, and James Z Wang. 2015a. Rating image aesthetics using deep learning. IEEE Transactions on Multimedia, Vol. 17, 11 (2015), 2021--2034.
[22]
Xin Lu, Zhe Lin, Xiaohui Shen, Radomir Mech, and James Z Wang. 2015b. Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In IEEE International Conference on Computer Vision. IEEE, 990--998.
[23]
Shuang Ma, Jing Liu, and Wen Chen Chang. 2017. A-Lamp: Adaptive layout-aware multi-patch deep convolutional neural network for photo aesthetic assessment. In IEEE Computer Vision and Pattern Recognition. IEEE, 722--731.
[24]
Raghu Maithra, Gilmer Justin, Yosinski Jason, and Jascha Sohl-Dickstein. 2017. SVCCA: Singular vector canonical correlation analysis for deep learning dynamics and interpretability. In Advances in Neural Information Processing Systems. 6076--6085.
[25]
Naila Murray, Luca Marchesotti, and Florent Perronnin. 2012. AVA: A large-scale database for aesthetic visual analysis. In IEEE Computer Vision and Pattern Recognition. IEEE, 2408--2415.
[26]
Clark V. Poling. 1975. Johannes Itten, Design and Form: The Basic Course at the Bauhaus and Later .Thames and Hudson. 368--370 pages.
[27]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S Bernstein, et al. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, Vol. 115, 3 (2015), 211--252.
[28]
Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. 2016. Training Region-Based Object Detectors with Online Hard Example Mining. In IEEE Computer Vision and Pattern Recognition. IEEE, 761--769.
[29]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint:1409.1556 (2014).
[30]
Marijn F Stollenga, Jonathan Masci, Faustino Gomez, and Jürgen Schmidhuber. 2014. Deep networks with internal selective attention through feedback connections. In Advances in neural information processing systems. 3545--3553.
[31]
Hossein Talebi and Peyman Milanfar. 2018. NIMA: Neural image assessment. IEEE Transactions on Image Processing, Vol. 27, 8 (2018), 3998--4011.
[32]
Hanghang Tong, Mingjing Li, Hong-Jiang Zhang, Jingrui He, and Changshui Zhang. 2004. Classification of digital photos taken by photographers or home users. In Pacific-Rim Conference on Multimedia. Springer, 198--205.
[33]
Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. 2017. Residual attention network for image classification. IEEE Computer Vision and Pattern Recognition, 3156--3164.
[34]
Zhangyang Wang, Shiyu Chang, Florin Dolcos, Diane Beck, Ding Liu, and Thomas S Huang. 2016. Brain-inspired deep networks for image aesthetics assessment. arXiv preprint:1601.04155 (2016).
[35]
Wang Wenguan and Shen Jianbing. 2017. Deep cropping via attention box prediction and aesthetic assessment. In IEEE International Conference on Computer Vision. IEEE, 2186--2194.
[36]
Luming Zhang. 2016. Describing Human Aesthetic Perception by Deeply-learned Attributes from Flickr. arXiv preprint:1605.07699 (2016).

Cited By

View all
  • (2024)A Mobile Image Aesthetics Processing System with Intelligent Scene PerceptionApplied Sciences10.3390/app1402082214:2(822)Online publication date: 18-Jan-2024
  • (2024)Towards Data-Driving Multi-View Evaluation Framework for ScratchTsinghua Science and Technology10.26599/TST.2023.901001629:2(517-528)Online publication date: Apr-2024
  • (2024)Towards Robust Evaluation of Aesthetic and Photographic Quality Metrics: Insights from a Comprehensive DatasetComplexity10.1155/2024/82235862024:1Online publication date: 26-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '18: Proceedings of the 26th ACM international conference on Multimedia
October 2018
2167 pages
ISBN:9781450356657
DOI:10.1145/3240508
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. attention mechanism
  2. convolutional neural network
  3. image aesthetic assessment
  4. multi-patch aggregation

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • Beijing Natural Science Foundation

Conference

MM '18
Sponsor:
MM '18: ACM Multimedia Conference
October 22 - 26, 2018
Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;
Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24
The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne , VIC , Australia

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)57
  • Downloads (Last 6 weeks)7
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Mobile Image Aesthetics Processing System with Intelligent Scene PerceptionApplied Sciences10.3390/app1402082214:2(822)Online publication date: 18-Jan-2024
  • (2024)Towards Data-Driving Multi-View Evaluation Framework for ScratchTsinghua Science and Technology10.26599/TST.2023.901001629:2(517-528)Online publication date: Apr-2024
  • (2024)Towards Robust Evaluation of Aesthetic and Photographic Quality Metrics: Insights from a Comprehensive DatasetComplexity10.1155/2024/82235862024:1Online publication date: 26-Sep-2024
  • (2024)Improving Image Aesthetic Assessment via Multiple Image Joint LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3687128Online publication date: 21-Aug-2024
  • (2024)Research on Product Advertising Design Combining Feature Extraction Technology and Web3D TechnologyACM Transactions on Asian and Low-Resource Language Information Processing10.1145/360894823:6(1-13)Online publication date: 22-Jun-2024
  • (2024)Modeling Multiple Aesthetic Views for Series Photo SelectionIEEE Transactions on Multimedia10.1109/TMM.2023.329075126(1983-1995)Online publication date: 1-Jan-2024
  • (2024)Image Aesthetics Assessment With Emotion-Aware Multibranch NetworkIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.336517473(1-15)Online publication date: 2024
  • (2024)Synergetic Assessment of Quality and Aesthetic: Approach and Comprehensive Benchmark DatasetIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.3303933(1-1)Online publication date: 2024
  • (2024)Top-Down Guidance Based ViT-CNN Network Considering Theme Information for Image Aesthetic Assessment2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10687765(1-6)Online publication date: 15-Jul-2024
  • (2024)Learning Triangular Distribution in Visual World2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01048(11019-11029)Online publication date: 16-Jun-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media