Abstract
With the proliferation of digital cameras and mobile devices, people are taking much more photos than ever before. However, these photos can be redundant in content and varied in quality. Therefore there is a growing need for tools to manage the photo collections. One efficient photo management way is photo collection summarization which segments the photo collection into different events and then selects a set of representative and high quality photos (key photos) from those events. However, existing photo collection summarization methods mainly consider the low-level features for photo representation only, such as color, texture, etc, while ignore many other useful features, for example high-level semantic feature and location. Moreover, they often return fixed summarization results which provide little flexibility. In this paper, we propose a multi-modal and multi-scale photo collection summarization method by leveraging multi-modal features, including time, location and high-level semantic features. We first use Gaussian mixture model to segment photo collection into events. With images represented by those multi-modal features, our event segmentation algorithm can generate better performance since the multi-modal features can better capture the inhomogeneous structure of events. Next we propose a novel key photo ranking and selection algorithm to select representative and high quality photos from the events for summarization. Our key photo ranking algorithm takes the importance of both events and photos into consideration. Furthermore, our photo summarization method allows users to control the scale of event segmentation and number of key photos selected. We evaluate our method by extensive experiments on four photo collections. Experimental results demonstrate that our method achieves better performance than previous photo collection summarization methods.
Similar content being viewed by others
Explore related subjects
Find the latest articles, discoveries, and news in related topics.References
Bao B-K, Liu G, Changsheng X, Yan S (2012) Inductive robust principal component analysis. IEEE Trans Image Process 21(8):3794–3800
Bao B-K, Zhu G, Shen J, Yan S (2013) Robust image analysis with sparse representation on quantized visual features. IEEE Trans Image Process 22(3):860–871
Bengio Y, Courville AC, Pascal V (2013) Representation learning: A review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Chu W-T, Lin C-H (2008) Automatic selection of representative photo and smart thumbnailing using near-duplicate detection. In: ACM Multimedia. ACM, pp 829–832
Cooper M, Foote J, Girgensohn A, Wilcox L (2005) Temporal event clustering for digital photo collections. ACM Trans Multimedia Comput Commun Appl 1:269–288
Gong B, Jain R (2007) Segmenting photo streams in events based on optical metadata. In: ICSC. IEEE Computer Society, pp 71–78
Gozali JP, Kan M-Y, Sundaram H (2012) Hidden markov model for event photo stream segmentation. In: Proceedings of the 2012 IEEE International Conference on Multimedia and Expo Workshops. IEEE Computer Society, pp 25–30
Graham A, Garcia-Molina H, Paepcke A, Winograd T (2002) Time as essence for photo browsing through personal digital libraries. In: Proceedings of the second ACM/IEEE-CS joint conference on digital libraries. ACM, pp 326–335
Hong R, Tang J, Tan H-K, Ngo C-W, Shuicheng Y, Chua T-S (2011) Beyond search: event-driven summarization for web videos. TOMCCAP 7(4):35
Hong R, Wang M, Gao Y, Tao D, Li X, Xindong W (2014) Image annotation by multiple-instance learning with discriminative feature mapping and selection. IEEE T Cybernetics 44(5):669–680
Jing Y, Visualrank SB (2008) Applying pagerank to large-scale image search. IEEE Trans Pattern Anal Mach Intell 30:1877–1890
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, vol 25. Curran Associates Inc, pp 1097–1105
Liu H, Mei T, Luo J, Li H, Li S (2012) Finding perfect rendezvous on the go: accurate mobile visual localization and its applications to routing. In: Proceedings of the 20th ACM international conference on multimedia. ACM, pp 9–18
Loui AC (2000) Automatic image event segmentation and quality screening for albuming applications. In: ICME 2000, pp 1125–1128
Loui AC, Wood M D (1999) A software system for automatic albuming of consumer pictures. In: Proceedings of the seventh ACM international conference on multimedia (Part 2), MULTIMEDIA ’99. ACM, pp 159–162
Loui A, Savakis A (2000) Automatic image event segmentation and quality screening for albumin application. In: Proceedings of IEEE international conference on multimedia and expo. IEEE, pp 1125–1128
Mei T, Wang B, Hua X-S, Zhou H-Q, Li S (2006) Probabilistic multimodality fusion for event based home photo clustering. In: ICME. IEEE, pp 1757–1760
Mei T, Wang B, Hua X-S, Zhou H-Q, Li S (2006) Probabilistic multimodality fusion for event based home photo clustering. In: 2006 IEEE international conference on multimedia and expo. IEEE, pp 1757–1760
Murray N, Marchesotti L, Perronnin F (2012) Ava: A large-scale database for aesthetic visual analysis. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2408–2415
Platt J C (2000) Autoalbum: clustering digital photographs using probabilistic model merging. Institute of Electrical and Electronics Engineers,Inc
Platt J C, Czerwinski M, Field B (2003) Phototoc: Automatic clustering for browsing personal photographs. Institute of Electrical and Electronics Engineers, Inc., p 21
Richang H, Bao B-K, Guangcan L (11) General subspace learning with corrupted training data via graph embedding. IEEE Trans Image Process 22:2013
Tamura H, Mori S, Yamawaki T Texture features corresponding to visual perception. IEEE Trans Syst Man Cybern 8(6):1978
Tao M, Yong R, Li S, Tian Q (2014) Multimedia search reranking: a literature survey. ACM Comput Surv 46(3)
Teng L, Tao M, Kewon I-S, Hua X-S (2009) Multi-video synopsis for video representation. Signal Process 89(13)
Ullas G (2003) Modeling and clustering of photo capture streams. In: Proceedings of the 5th ACM SIGMM international workshop on multimedia information retrieval. ACM, pp 47–54
Vinod N, Hinton GE, Thorsten J (2010) Rectified linear units improve restricted boltzmann machines. In: Fnkranz J (ed) ICML. Omni press, pp 807–814
Yangqing J (2013) Caffe: an open source convolutional architecture for fast feature embedding. http://caffe.berkeleyvision.org/
Zhiwei L, Wang B, Li M, Wei-Ying M (2005) A probabilistic model for retrospective news event detection. In: SIGIR. ACM, pp 106–113
Acknowledgments
This work is supported by the NSFC under the contract No.61201413 and 61390514, the Specialized Research Fund for the Doctoral Program of Higher Education No. WJ2100060003, the Fundamental Research Funds for the Central Universities No. WK2100060011, WK2100100021.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shen, X., Tian, X. Multi-modal and multi-scale photo collection summarization. Multimed Tools Appl 75, 2527–2541 (2016). https://doi.org/10.1007/s11042-015-2658-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-2658-6