Task-specific image summaries using semantic information and self-supervision

Sharma, Deepak Kumar; Singh, Anurag; Sharma, Sudhir Kumar; Srivastava, Gautam; Lin, Jerry Chun-Wei

doi:10.1007/s00500-021-06603-6

Task-specific image summaries using semantic information and self-supervision

Focus
Published: 21 January 2022

Volume 26, pages 7581–7594, (2022)
Cite this article

Soft Computing Aims and scope Submit manuscript

236 Accesses
5 Citations
Explore all metrics

Abstract

Large annotated datasets are needed for successful Deep Learning methodologies to achieve human-level performance. These needs restrict the impact of Deep Learning and build the necessity to create smaller and richer representative datasets that can offer a potential solution to this problem. In this paper, we propose task-specific image corpus summarization using semantic information and self-supervision. Our methodology makes use of GAN for the generation of features and leverages rotational invariance for employing self-supervision. All these objectives are facilitated on features from Resnet34. A summary can be obtained efficiently by using k-means clustering on the semantic embedding space and then selecting examples nearest to centroids. In comparison to end-to-end trained models, the proposed model does not require retraining to obtain summaries of different lengths. We also test our model by extensive qualitative and quantitative experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NewsStories: Illustrating Articles with Visual Summaries

SLIP: Self-supervision Meets Language-Image Pre-training

Align vision-language semantics by multi-task learning for multi-modal summarization

Article 17 May 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

Data available on request from the authors.

Code availibility

Code available on request from the authors.

References

Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. In: Advances in neural information processing systems, pp. 41–48
Benrhouma O, Hermassi H, Abd El-Latif AA, Belghith S (2016) Chaotic watermark for blind forgery detection in images. Multimed Tools Appl 75(14):8695–8718
Article Google Scholar
Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel CA (2019) Mixmatch: A holistic approach to semi-supervised learning. In: Advances in neural information processing systems, pp. 5049–5059
Cai D, He X, Li Z, Ma WY, Wen JR (2004) Hierarchical clustering of www image search results using visual, textual and link information. In: Proceedings of the 12th annual ACM international conference on Multimedia, ACM, pp. 952–959
Camargo JE, Gonzalez FA (2009) A multi-class kernel alignment method for image collection summarization. In: Iberoamerican congress on pattern recognition, Springer, pp. 545–552
Chen JY, Bouman CA, Dalton JC (2000) Hierarchical browsing and search of large image databases. IEEE Trans Image Process 9(3):442–455
Article Google Scholar
Chen W, Chen X, Zhang J, Huang K (2016) A multi-task deep network for person re-identification. CoRR arXiv:1607.05369
Deng D (2007) Content-based image collection summarization and comparison using self-organizing maps. Pattern Recognit 40(2):718–727
Article Google Scholar
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, Ieee, pp. 248–255
Dutta T, Singh A, Biswas S (2020) Adaptive margin diversity regularizer for handling data imbalance in zero-shot sbir. In: European conference on computervVision, Springer, pp. 349–364
Dutta T, Singh A, Biswas S (2020) Styleguide: zero-shot sketch-based image retrieval using style-guided image generation. IEEE Trans Multimed
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338
Article Google Scholar
Fisher Y (2012) Fractal image compression: theory and application. Springer Science & Business Media
Gad R, Talha M, Abd El-Latif AA, Zorkany M, Ayman ES, Nawal EF, Muhammad G (2018) Iris recognition using multi-algorithmic approaches for cognitive internet of things (ciot) framework. Future Gener Comput Syst 89:178–191
Article Google Scholar
Gao B, Liu TY, Qin T, Zheng X, Cheng QS, Ma WY (2005) Web image clustering by consistent utilization of visual features and surrounding texts. In: Proceedings of the 13th annual ACM international conference on Multimedia, ACM, pp. 112–121
Gidaris S, Singh P, Komodakis N (2018) Unsupervised representation learning by predicting image rotations. In: International conference on learning representations. https://openreview.net/forum?id=S1v4N2l0-
Gini C (1912) Variabilita e mutabilita. In: Pizetti E, Salvemini T (eds) Reprinted in memorie di metodologica statistica. Libreria Eredi Virgilio Veschi, Rome
Google Scholar
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
Ionescu B, Gînscă AL, Boteanu B, Lupu M, Popescu A, Muller H (2016) Div150multi: a social image retrieval result diversification dataset with multi-topic queries. In: Proceedings of the 7th international conference on multimedia systems, pp. 46:1–46:6. https://doi.org/10.1145/2910017.2910620
Ketchen DJ, Shook CL (1996) The application of cluster analysis in strategic management research: an analysis and critique. Strateg Manage J 17(6):441–458
Article Google Scholar
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images
Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial lstm networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2982–2991
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543
Połap D, Włodarczyk-Sielicka M, Wawrzyniak N (2021) Automatic ship classification for a riverside monitoring system using a cascade of artificial intelligence techniques including penalties and rewards. ISA Trans
Ruder S (2017) An overview of multi-task learning in deep neural networks. CoRR abs/1706.05098http://arxiv.org/abs/1706.05098
Sharma DK, Singh A, Khanna A, Jain A (2017) Evaluation of parameters and techniques for genetic algorithm based channel allocation in cognitive radio networks. In: 2017 tenth international conference on contemporary computing (IC3), IEEE, pp. 1–6
Sharma DK, Singh A, Saroha A (2018) Language identification for hindi language transliterated text in roman script using generative adversarial networks. In: Towards extensible and adaptable methods in computing, Springer, pp. 267–279
Simon I, Snavely N, Seitz SM (2007) Scene summarization for online image collections. In: Proceedings of the IEEE international conference on computer vision, pp. 1–8
Singh A, Sharma DK (2020) Image collection summarization: past, present and future. In: Data visualization and knowledge engineering, Springer, pp. 49–78
Singh A, Virmani L, Subramanyam A (2019) Image corpus representative summarization. In: 2019 IEEE fifth international conference on multimedia bigdData (BigMM), IEEE, pp. 21–29
Sinha P, Mehrotra S, Jain R (2011) Effective summarization of large collections of personal photos. In: Proceedings of the 20th international conference companion on World wide web, ACM, pp. 127–128
Song Y, Vallmitjana J, Stent A, Jaimes A (2015) Tvsum: Summarizing web videos using titles. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5179–5187
Soni R, Kumar B, Chand S (2019) Optimal feature and classifier selection for text region classification in natural scene images using weka tool. Multimed Tools Appl 78(22):31757–31791
Article Google Scholar
Soni R, Kumar B, Chand S (2019) Text detection and localization in natural scene images based on text awareness score. Appl Intell 49(4):1376–1405
Article Google Scholar
Stan D, Sethi IK (2003) eid: a system for exploration of image databases. Inf Process Manage 39(3):335–361
Article Google Scholar
Thorndike RL (1953) Who belongs in the family? Psychometrika 18(4):267–276
Article Google Scholar
Tschiatschek S, Iyer RK, Wei H, Bilmes JA (2014) Learning mixtures of submodular functions for image collection summarization. In: Advances in neural information processing systems, pp. 1413–1421
Wang H, Kawahara Y, Weng C, Yuan J (2017) Representative selection with structured sparsity. Pattern Recognit 63:268–278
Article Google Scholar
Wang N, Li Q, Abd El-Latif AA, Zhang T, Niu X (2014) Toward accurate localization and high recognition performance for noisy iris images. Multimed Tools Appl 71(3):1411–1430
Article Google Scholar
Xian Y, Lampert CH, Schiele B, Akata Z (2017) Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. Preprint arXiv:1707.00600
Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Ttrans Pattern Analys Mach Intell 41(9):2251–2265
Article Google Scholar
Yang C, Shen J, Peng J, Fan J (2013) Image collection summarization via dictionary learning for sparse representation. Pattern Recognit 46(3):948–961
Article Google Scholar
Zhang K, Chao WL, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: Proceedings of the European conference on computer vision, pp. 766–782
Zhou K, Qiao Y, Xiang T (2018) Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: Thirty-second AAAI conference on artificial intelligence

Download references

Funding

This article received no Funding from External Sources.

Author information

Authors and Affiliations

Department of Information Technology, Netaji Subhas University of Technology, New Delhi, India
Deepak Kumar Sharma
Department of Computer Science, Technical University of Munich, Munich, Germany
Anurag Singh
Department of Computer Science, Institute of Information Technology and Management, New Delhi, India
Sudhir Kumar Sharma
Department of Math and Computer Science, Brandon University, Brandon, MB, Canada
Gautam Srivastava
Research Centre for Interneural Computing, China Medical University, Taichung, Taiwan
Gautam Srivastava
Western Norway University of Applied Sciences, Bergen, Norway
Jerry Chun-Wei Lin

Authors

Deepak Kumar Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Anurag Singh
View author publications
You can also search for this author in PubMed Google Scholar
Sudhir Kumar Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Gautam Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Jerry Chun-Wei Lin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

DKS: Supervision, Conceptualization, Methodology, Software, Data curation, Validation, Investigation, Visualization, Writing—original draft. AS: Conceptualization, Methodology, Data curation, Validation, Investigation, Writing—review & editing. SK S: Supervision, Methodology, Validation, Writing—review & editing. GS: Methodology, Validation, Writing—review & editing. JC-WL: Methodology, Writing—review & editing.

Corresponding author

Correspondence to Gautam Srivastava.

Ethics declarations

Conflict of interest

The authors have no Conflicts of Interest to declare for this manuscript.

Additional information

Communicated by Irfan Uddin.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sharma, D.K., Singh, A., Sharma, S.K. et al. Task-specific image summaries using semantic information and self-supervision. Soft Comput 26, 7581–7594 (2022). https://doi.org/10.1007/s00500-021-06603-6

Download citation

Accepted: 16 November 2021
Published: 21 January 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s00500-021-06603-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Task-specific image summaries using semantic information and self-supervision

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

NewsStories: Illustrating Articles with Visual Summaries

SLIP: Self-supervision Meets Language-Image Pre-training

Align vision-language semantics by multi-task learning for multi-modal summarization

Data availability

Code availibility

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Task-specific image summaries using semantic information and self-supervision

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

NewsStories: Illustrating Articles with Visual Summaries

SLIP: Self-supervision Meets Language-Image Pre-training

Align vision-language semantics by multi-task learning for multi-modal summarization

Explore related subjects

Data availability

Code availibility

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation