Abstract
Images are powerful tools with which to convey human emotions, with different images stimulating diverse emotions. Numerous factors affect the emotions stimulated by the image, and many researchers have previously focused on low-level features such as color, texture and so on. Inspired by the successful use of deep convolutional neural networks (CNN) in the visual recognition field, we used a data augmentation method for small data sets to gain the sufficient number of the training dataset. In this paper, we use low-level features (color and texture features) of the image to assist the extraction of advanced features (image object category features and deep emotion features of images), which are automatically learned by deep networks, to obtain more effective image sentiment features. Then, we use the stack sparse auto-encoding network to recognize the emotions evoked by the image. Finally, high-level semantic descriptive phrases including image emotions and objects are output. Our experiments are carried out on the IAPS and GAPED data sets of the dimension space and the artphoto data set of the discrete space. Compared with the traditional manual extraction methods and other existing models, our method is superior to in terms of test performance.
Similar content being viewed by others
References
Lang, P. J., Bradley, M. M., and Cuthbert, B. N., Emotion, motivation, and anxiety: brain mechanisms and psychophysiology. Biological Psychiatry 44(12):1248–1263, 1998.
Joshi, D., Datta, R., Fedorovskaya, E. et al., Aesthetics and emotions in images. IEEE Signal Processing Magazine 28(5):94–115, 2011.
Wang, W. N., and Yu, Y. L., A Survey of Image Emotional Semantic Research. Journal of Circuits and Systems 8(5):101–109, 2003.
Zhao S, Gao Y, Jiang X, et al. (2014) Exploring principles-of-art features for image emotion recognition. Proceedings of the 22nd ACM international conference on multimedia. USA:ACM, 2014, 47–56.
Krizhevsky A, Sutskever I, Hinton G E, ImageNet classification with deep convolutional neural networks. International Conference on Neural Information Processing Systems. Curran Associates Inc, 2012, 1097–1105.
Simonyan K, Zisserman A, Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556, 2014.
Long, J., Shelhamer, E., and Darrell, T., Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis & Machine Intelligence 39(4):640–651, 2014.
Ren S, He K, Girshick R, et al., Faster R-CNN: towards real-time object detection with region proposal networks. International Conference on Neural Information Processing Systems. MIT Press, 2015, 91–99.
Cordts, M., Omran, M., Ramos, S. et al., The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE conference on computer vision and pattern recognition:3213–3223, 2016.
You Q, Luo J, Jin H, Yang J, Robust image sentiment analysis using progressively trained and domain transferred deep networks. National conference on artificial intelligence, 2015, 381–388.
Peng K C, Chen T, Sadovnik A, et al., A mixed bag of emotions: Model, predict, and transfer emotion distributions. Computer Vision and Pattern Recognition. IEEE, 2015, 860–868.
Campos V, Salvador A, Giro-I-Nieto X, et al., Diving Deep into Sentiment: Understanding Fine-tuned CNNs for Visual Sentiment Prediction. International Workshop on Affect & Sentiment in Multimedia. ACM, 2015, 57–62.
You Q, Luo J, Jin H, et al., Building a large scale dataset for image emotion recognition: the fine print and the benchmark. National conference on artificial intelligence, 2016, 308–314.
Rao T, Xu M, Xu D, Learning multi-level deep representations for image emotion classification. arXiv preprint arXiv:1611.07145, 2016.
Chen T, Borth D, Darrell T, et al., DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks. arXiv preprint arXiv:1410.8586, 2014.
Borth D, Ji R, Chen T, et al., Large-scale visual sentiment ontology and detectors using adjective noun pairs. Proceedings of the 21st ACM international conference on Multimedia. ACM, 2013, 223–232.
Alamedapineda X, Ricci E, Yan Y, et al., Recognizing Emotions from Abstract Paintings Using Non-Linear Matrix Completion. computer vision and pattern recognition, 2016, 5240–5248.
Deng J, Dong W, Socher R, et al., ImageNet: A large-scale hierarchical image database. Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009, 248–255.
Hinton, G. E., Osindero, S., and Teh, Y. W., A Fast Learning Algorithm for Deep Belief Nets. Neural Computation 18(7):1527–1554, 2006.
Rueda-Plata D, Ramos-Pollán R, González F A, Supervised greedy layer-wise training for deep convolutional networks with small datasets. international conference on computational collective intelligence, 2015, 275–84.
Kim, H. R., Kim, Y. S., Kim, S. J. et al., Building emotional machines: Recognizing image emotions through deep neural networks. IEEE Transactions on Multimedia 20(11):2980–2992, 2018.
Sohaib, M., Kim J M. Reliable Fault Diagnosis of Rotary Machine Bearings Using a Stacked Sparse Autoencoder-Based Deep Neural Network. Shock and Vibration 1-11, 2018.
Mehrabian, A., Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament. Current Psychology 14(4):261–292, 1996.
Lang P J, Bradley M M, Cuthbert B N, International affective picture system (IAPS): Technical manual and affective ratings. NIMH Center for the Study of Emotion and Attention 1:39–58, 1997.
Danglauser, E. S., and Scherer, K. R., The Geneva affective picture database (GAPED): A new 730-picture database focusing on valence and normative significance. Behavior Research Methods 43(2):468, 2011.
Ioffe S, Szegedy C, Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
Hinton G E, Zemel R S, Autoencoders, Minimum Description Length and Helmholtz Free Energy. neural information processing systems, 1993, 3–10
Vincent P, Larochelle H, Bengio Y, et al., Extracting and composing robust features with denoising autoencoders. international conference on machine learning, 2008, 1096–1103.
Machajdik J, Hanbury A, Affective image classification using features inspired by psychology and art theory. ACM multimedia, 2010, 83–92.
Szegedy C, Ioffe S, Vanhoucke V, et al., Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. national conference on artificial intelligence 4278–4284, 2016.
Wilson, A. C., Roelofs, R., Stern, M. et al., The marginal value of adaptive gradient methods in machine learning. Advances in Neural Information Processing Systems:4148–4158, 2017.
Gulcehre, C., Moczulski, M., Denil, M. et al., Noisy activation functions. International conference on machine learning:3059–3068, 2016.
Acknowledgments
This work is partially supported by National Natural Science Foundation of China (No.61976150, No.61873178 and No.61876124), Natural Science Foundation of Shanxi Province (No.201801D121135), Key Research and Development (R&D) Projects of Shanxi Province (No.201803D31038), Key Research and Development (R&D) Projects of Jinzhong City (No.Y192006).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors have no conflict of interest in submitting the paper to Journal of Medical systems.
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the Topical Collection on Image & Signal Processing
Rights and permissions
About this article
Cite this article
Yang, X., Wang, Z., Deng, H. et al. Recognizing Image Semantic Information Through Multi-Feature Fusion and SSAE-Based Deep Network. J Med Syst 44, 46 (2020). https://doi.org/10.1007/s10916-019-1498-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-019-1498-8