Abstract
Large annotated datasets are needed for successful Deep Learning methodologies to achieve human-level performance. These needs restrict the impact of Deep Learning and build the necessity to create smaller and richer representative datasets that can offer a potential solution to this problem. In this paper, we propose task-specific image corpus summarization using semantic information and self-supervision. Our methodology makes use of GAN for the generation of features and leverages rotational invariance for employing self-supervision. All these objectives are facilitated on features from Resnet34. A summary can be obtained efficiently by using k-means clustering on the semantic embedding space and then selecting examples nearest to centroids. In comparison to end-to-end trained models, the proposed model does not require retraining to obtain summaries of different lengths. We also test our model by extensive qualitative and quantitative experiments.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Data available on request from the authors.
Code availibility
Code available on request from the authors.
References
Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. In: Advances in neural information processing systems, pp. 41–48
Benrhouma O, Hermassi H, Abd El-Latif AA, Belghith S (2016) Chaotic watermark for blind forgery detection in images. Multimed Tools Appl 75(14):8695–8718
Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel CA (2019) Mixmatch: A holistic approach to semi-supervised learning. In: Advances in neural information processing systems, pp. 5049–5059
Cai D, He X, Li Z, Ma WY, Wen JR (2004) Hierarchical clustering of www image search results using visual, textual and link information. In: Proceedings of the 12th annual ACM international conference on Multimedia, ACM, pp. 952–959
Camargo JE, Gonzalez FA (2009) A multi-class kernel alignment method for image collection summarization. In: Iberoamerican congress on pattern recognition, Springer, pp. 545–552
Chen JY, Bouman CA, Dalton JC (2000) Hierarchical browsing and search of large image databases. IEEE Trans Image Process 9(3):442–455
Chen W, Chen X, Zhang J, Huang K (2016) A multi-task deep network for person re-identification. CoRR arXiv:1607.05369
Deng D (2007) Content-based image collection summarization and comparison using self-organizing maps. Pattern Recognit 40(2):718–727
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, Ieee, pp. 248–255
Dutta T, Singh A, Biswas S (2020) Adaptive margin diversity regularizer for handling data imbalance in zero-shot sbir. In: European conference on computervVision, Springer, pp. 349–364
Dutta T, Singh A, Biswas S (2020) Styleguide: zero-shot sketch-based image retrieval using style-guided image generation. IEEE Trans Multimed
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338
Fisher Y (2012) Fractal image compression: theory and application. Springer Science & Business Media
Gad R, Talha M, Abd El-Latif AA, Zorkany M, Ayman ES, Nawal EF, Muhammad G (2018) Iris recognition using multi-algorithmic approaches for cognitive internet of things (ciot) framework. Future Gener Comput Syst 89:178–191
Gao B, Liu TY, Qin T, Zheng X, Cheng QS, Ma WY (2005) Web image clustering by consistent utilization of visual features and surrounding texts. In: Proceedings of the 13th annual ACM international conference on Multimedia, ACM, pp. 112–121
Gidaris S, Singh P, Komodakis N (2018) Unsupervised representation learning by predicting image rotations. In: International conference on learning representations. https://openreview.net/forum?id=S1v4N2l0-
Gini C (1912) Variabilita e mutabilita. In: Pizetti E, Salvemini T (eds) Reprinted in memorie di metodologica statistica. Libreria Eredi Virgilio Veschi, Rome
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
Ionescu B, Gînscă AL, Boteanu B, Lupu M, Popescu A, Muller H (2016) Div150multi: a social image retrieval result diversification dataset with multi-topic queries. In: Proceedings of the 7th international conference on multimedia systems, pp. 46:1–46:6. https://doi.org/10.1145/2910017.2910620
Ketchen DJ, Shook CL (1996) The application of cluster analysis in strategic management research: an analysis and critique. Strateg Manage J 17(6):441–458
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images
Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial lstm networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2982–2991
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543
Połap D, Włodarczyk-Sielicka M, Wawrzyniak N (2021) Automatic ship classification for a riverside monitoring system using a cascade of artificial intelligence techniques including penalties and rewards. ISA Trans
Ruder S (2017) An overview of multi-task learning in deep neural networks. CoRR abs/1706.05098http://arxiv.org/abs/1706.05098
Sharma DK, Singh A, Khanna A, Jain A (2017) Evaluation of parameters and techniques for genetic algorithm based channel allocation in cognitive radio networks. In: 2017 tenth international conference on contemporary computing (IC3), IEEE, pp. 1–6
Sharma DK, Singh A, Saroha A (2018) Language identification for hindi language transliterated text in roman script using generative adversarial networks. In: Towards extensible and adaptable methods in computing, Springer, pp. 267–279
Simon I, Snavely N, Seitz SM (2007) Scene summarization for online image collections. In: Proceedings of the IEEE international conference on computer vision, pp. 1–8
Singh A, Sharma DK (2020) Image collection summarization: past, present and future. In: Data visualization and knowledge engineering, Springer, pp. 49–78
Singh A, Virmani L, Subramanyam A (2019) Image corpus representative summarization. In: 2019 IEEE fifth international conference on multimedia bigdData (BigMM), IEEE, pp. 21–29
Sinha P, Mehrotra S, Jain R (2011) Effective summarization of large collections of personal photos. In: Proceedings of the 20th international conference companion on World wide web, ACM, pp. 127–128
Song Y, Vallmitjana J, Stent A, Jaimes A (2015) Tvsum: Summarizing web videos using titles. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5179–5187
Soni R, Kumar B, Chand S (2019) Optimal feature and classifier selection for text region classification in natural scene images using weka tool. Multimed Tools Appl 78(22):31757–31791
Soni R, Kumar B, Chand S (2019) Text detection and localization in natural scene images based on text awareness score. Appl Intell 49(4):1376–1405
Stan D, Sethi IK (2003) eid: a system for exploration of image databases. Inf Process Manage 39(3):335–361
Thorndike RL (1953) Who belongs in the family? Psychometrika 18(4):267–276
Tschiatschek S, Iyer RK, Wei H, Bilmes JA (2014) Learning mixtures of submodular functions for image collection summarization. In: Advances in neural information processing systems, pp. 1413–1421
Wang H, Kawahara Y, Weng C, Yuan J (2017) Representative selection with structured sparsity. Pattern Recognit 63:268–278
Wang N, Li Q, Abd El-Latif AA, Zhang T, Niu X (2014) Toward accurate localization and high recognition performance for noisy iris images. Multimed Tools Appl 71(3):1411–1430
Xian Y, Lampert CH, Schiele B, Akata Z (2017) Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. Preprint arXiv:1707.00600
Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Ttrans Pattern Analys Mach Intell 41(9):2251–2265
Yang C, Shen J, Peng J, Fan J (2013) Image collection summarization via dictionary learning for sparse representation. Pattern Recognit 46(3):948–961
Zhang K, Chao WL, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: Proceedings of the European conference on computer vision, pp. 766–782
Zhou K, Qiao Y, Xiang T (2018) Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: Thirty-second AAAI conference on artificial intelligence
Funding
This article received no Funding from External Sources.
Author information
Authors and Affiliations
Contributions
DKS: Supervision, Conceptualization, Methodology, Software, Data curation, Validation, Investigation, Visualization, Writing—original draft. AS: Conceptualization, Methodology, Data curation, Validation, Investigation, Writing—review & editing. SK S: Supervision, Methodology, Validation, Writing—review & editing. GS: Methodology, Validation, Writing—review & editing. JC-WL: Methodology, Writing—review & editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no Conflicts of Interest to declare for this manuscript.
Additional information
Communicated by Irfan Uddin.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sharma, D.K., Singh, A., Sharma, S.K. et al. Task-specific image summaries using semantic information and self-supervision. Soft Comput 26, 7581–7594 (2022). https://doi.org/10.1007/s00500-021-06603-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-06603-6