Data augmentation by morphological mixup for solving Raven’s progressive matrices

He, Wentao; Ren, Jianfeng; Bai, Ruibin

doi:10.1007/s00371-023-02930-x

Data augmentation by morphological mixup for solving Raven’s progressive matrices

Original article
Published: 08 July 2023

Volume 40, pages 2457–2470, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

171 Accesses
Explore all metrics

Abstract

Raven’s progressive matrix (RPM) is one kind of visual abstract reasoning tasks, which tests the ability of extracting reasoning rules from limited samples and applying them to an unknown setting. It is frequently used in evaluating human intelligence. Recent advances of RPM-like datasets and solution models partially address the challenges of visually understanding the RPM questions and logically reasoning the missing answers. This paper tackles the challenges of the poor generalization performance due to insufficient samples in RPM datasets. To address the problem of insufficient data for precisely conducting relational reasoning in RPMs, we propose an effective scheme, namely candidate answer morphological mixup (CAM-Mix). CAM-Mix serves as a data augmentation strategy by gray-scale image morphological mixup, which regularizes various solution methods and overcomes the model overfitting problem. Compared with existing methods, a more accurate decision boundary could be defined by creating new negative candidate answers semantically similar to the correct answers. Experimental results show that the proposed data augmentation method on state-of-the-art RPM solution models can provide significant and consistent performance improvements on various RPM-like datasets compared with state-of-the-art solution models and other data augmentation strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Interpretable Neuro-symbolic Model for Raven’s Progressive Matrices Reasoning

Article 25 May 2023

TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering

Evaluation of Multiple Approaches for Visual Question Reasoning

Data availability statement

The datasets generated during and analyzed during the current study are available in the GitHub repositories, https://github.com/WellyZhang/RAVEN and https://github.com/husheng12345/SRAN.

References

Ametefe, D.S., Sarnin, S.S., Ali, D.M., Muhammad, Z.Z.: Fingerprint pattern classification using deep transfer learning and data augmentation. Vis. Comput. 39, 1–14 (2022)
Google Scholar
Amizadeh, S., Palangi, H., Polozov, A., Huang, Y., Koishida, K.: Neuro-symbolic visual reasoning: Disentangling “Visual” from “Reasoning”. In: Proc. 37th Int. Conf. Mach. Learn. vol. 119, pp. 279–290 (2020)
Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Lawrence Zitnick, C., Parikh, D.: VQA: Visual question answering. In: Proc. IEEE Int. Conf. Comput. Vis. pp. 2425–2433 (2015)
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Ding, R., Ren, J., Yu, H., Li, J.: Dynamic texture recognition using PDV hashing and dictionary learning on multi-scale volume local binary pattern. In: Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. pp. 1840–1844 (2022)
Dvornik, N., Mairal, J., Schmid, C.: On the importance of visual context for data augmentation in scene understanding. IEEE Trans. Pattern Anal. Mach. Intell. 43(6), 2014–2028 (2021)
Article Google Scholar
Ebadi, M., Ebrahimi, A.: Video data compression by progressive iterative approximation. Int. J. Interact. Multimed. Artif. Intell. 6(6), 189–195 (2021)
Google Scholar
Guo, H., Mao, Y., Zhang, R.: Mixup as locally linear out-of-manifold regularization. In: Proc. 33rd AAAI Conf. Artif. Intell. vol. 33, pp. 3714–3722 (2019)
Hashemi Hosseinabad, S., Safayani, M., Mirzaei, A.: Multiple answers to a question: a new approach for visual question answering. Vis. Comput. 37(1), 119–131 (2021)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. pp. 770–778 (2016)
He, W., Ren, J., Bai, R., Jiang, X.: Two-stage rule-induction visual reasoning on RPMs with an application to video prediction. arXiv preprint arXiv:2111.12301 (2021)
He, W., Zhang, J., Ren, J., Bai, R., Jiang, X.: Hierarchical Con–ViT with attention-based relational reasoner for visual analogical reasoning. In: Proc. 37th AAAI Conf. Artif. Intell. vol. 37, pp. 22–30 (2023)
Hu, S., Ma, Y., Liu, X., Wei, Y., Bai, S.: Stratified rule-aware network for abstract visual reasoning. In: Proc. 35th AAAI Conf. Artif. Intell. 35(2), 1567–1574 (2021)
Inoue, H.: Data augmentation by pairing samples for images classification. arXiv preprint arXiv:1801.02929 (2018)
Khan, M.J., Khan, M.J., Siddiqui, A.M., Khurshid, K.: An automated and efficient convolutional architecture for disguise-invariant face recognition using noise-based data augmentation and deep transfer learning. Vis. Comput. 38(2), 509–523 (2022)
Article Google Scholar
Kong, W., Ye, S., Yao, C., Ren, J.: Confidence-based event-centric online video question answering on a newly constructed ATBS dataset. In: Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. (2023)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Proc. NeurIPS 25, 1097–1105 (2012)
Google Scholar
Liang, D., Yang, F., Zhang, T., Yang, P.: Understanding mixup training methods. IEEE. Access 6, 58774–58783 (2018)
Article Google Scholar
Liu, S., Guo, H., Hu, J.G., Zhao, X., Zhao, C., Wang, T., Zhu, Y., Wang, J., Tang, M.: A novel data augmentation scheme for pedestrian detection with attribute preserving GAN. Neurocomputing 401, 123–132 (2020)
Article Google Scholar
Liu, X., Xu, Q., Wang, N.: A survey on deep neural network-based image captioning. Vis. Comput. 35(3), 445–470 (2019)
Article Google Scholar
Mai, Z., Hu, G., Chen, D., Shen, F., Shen, H.T.: Metamixup: Learning adaptive interpolation policy of mixup with metalearning. IEEE Trans. Neural Netw. Learn. Syst. 33(7), 3050–3064 (2022)
Article MathSciNet Google Scholar
Maragos, P.: A representation theory for morphological image and signal processing. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 586–599 (1989)
Article Google Scholar
Nazari, K., Ebadi, M.J., Berahmand, K.: Diagnosis of alternaria disease and leafminer pest on tomato leaves using image processing techniques. J. Sci. Food Agric. 102(15), 6907–6920 (2022)
Article Google Scholar
Ren, J., Jiang, X.: A three-step classification framework to handle complex data distribution for radar UAV detection. Pattern Recognit. 111, 107709 (2021)
Article Google Scholar
Santoro, A., Hill, F., Barrett, D., Morcos, A., Lillicrap, T.: Measuring abstract reasoning in neural networks. In: Proc. 35th Int. Conf. Mach. Learn. pp. 4477–4486 (2018)
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 60 (2019)
Article Google Scholar
Song, X., Jin, J., Yao, C., Wang, S., Ren, J., Bai, R.: Siamese-discriminant deep reinforcement learning for solving jigsaw puzzles with large eroded gaps. In: Proc. 37th AAAI Conf. Artif. Intell. vol. 37, 2303–2311 (2023)
Song, X., Yang, X., Ren, J., Bai, R., Jiang, X.: Solving jigsaw puzzle of large eroded gaps using puzzlet discriminant network. In: Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. (2023)
Summers, C., Dinneen, M.J.: Improved mixed-example data augmentation. In: Proc. IEEE Winter Conf. Appl. Comput. Vis. pp. 1262–1270 (2019)
Takahashi, R., Matsubara, T., Uehara, K.: Data augmentation using random image cropping and patching for deep CNNs. IEEE Trans. Circuits Syst. Video Technol. 30(9), 2917–2931 (2019)
Article Google Scholar
Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Lopez-Paz, D., Bengio, Y.: Manifold mixup: Better representations by interpolating hidden states. In: Proc. 36th Int. Conf. Mach. Learn. pp. 6438–6447 (2019)
Wang, S., Ren, J., Bai, R.: A semi-supervised adaptive discriminative discretization method improving discrimination power of regularized naive Bayes. Expert Syst. Appl. 225, 120094 (2023)
Article Google Scholar
Wang, X., Jiang, X., Ren, J.: Blood vessel segmentation from fundus image by a cascade classification framework. Pattern Recognit. 88, 331–341 (2019)
Article Google Scholar
Yan, F., Silamu, W., Li, Y., Chai, Y.: SPCA-Net: a based on spatial position relationship co-attention network for visual question answering. Vis. Comput. 38, 1–12 (2022)
Article Google Scholar
Zhang, C., Gao, F., Jia, B., Zhu, Y., Zhu, S.C.: RAVEN: A dataset for relational and analogical visual reasoning. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. pp. 5317–5327 (2019)
Zhang, C., Jia, B., Gao, F., Zhu, Y., Lu, H., Zhu, S.C.: Learning perceptual inference by contrasting. In: Proc. NeurIPS. pp. 1075–1087 (2019)
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: Mixup: beyond empirical risk minimization. In: Proc. 6th Int. Conf. Learn. Represent. (2018)
Zhang, J., Ren, J., Zhang, Q., Liu, J., Jiang, X.: Spatial context-aware object-attentional network for multi-label image classification. IEEE Trans. Image Process. 32, 3000–3012 (2023)
Article Google Scholar
Zhang, J., Zhang, Q., Ren, J., Zhao, Y., Liu, J.: Spatial-context-aware deep neural network for multi-class image classification. In: Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. pp. 1960–1964 (2022)
Zheng, K., Zha, Z.J., Wei, W.: Abstract reasoning with distracting features. In: Proc. NeurIPS. pp. 5842–5853 (2019)
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proc. 34th AAAI Conf. Artif. Intell. pp. 13001–13008 (2020)
Zhou, F., Hu, Y., Shen, X.: MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition. Vis. Comput. 35(11), 1583–1594 (2019)
Article Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proc. IEEE Int. Conf. Comput. Vis. pp. 2223–2232 (2017)
Zhuo, T., Huang, Q., Kankanhalli, M.: Unsupervised abstract reasoning for Raven’s problem matrices. IEEE Trans. Image Process. 30, 8332–8341 (2021)
Article Google Scholar

Download references

Funding

This research was supported in part by the National Natural Science Foundation of China under Grant No. 72071116 and in part by the Ningbo Municipal Science and Technology Bureau under Grant Nos. 2019B10026 and 2022Z173.

Author information

Authors and Affiliations

The School of Computer Science, University of Nottingham Ningbo China, 199 Taikang East Road, Ningbo, 315100, China
Wentao He, Jianfeng Ren & Ruibin Bai
Nottingham Ningbo China Beacons of Excellence Research and Innovation Institute, University of Nottingham Ningbo China, 199 Taikang East Road, Ningbo, 315100, China
Jianfeng Ren & Ruibin Bai

Authors

Wentao He
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng Ren
View author publications
You can also search for this author in PubMed Google Scholar
Ruibin Bai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianfeng Ren.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

He, W., Ren, J. & Bai, R. Data augmentation by morphological mixup for solving Raven’s progressive matrices. Vis Comput 40, 2457–2470 (2024). https://doi.org/10.1007/s00371-023-02930-x

Download citation

Accepted: 28 May 2023
Published: 08 July 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s00371-023-02930-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data augmentation by morphological mixup for solving Raven’s progressive matrices

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Interpretable Neuro-symbolic Model for Raven’s Progressive Matrices Reasoning

TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering

Evaluation of Multiple Approaches for Visual Question Reasoning

Data availability statement

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Data augmentation by morphological mixup for solving Raven’s progressive matrices

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Interpretable Neuro-symbolic Model for Raven’s Progressive Matrices Reasoning

TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering

Evaluation of Multiple Approaches for Visual Question Reasoning

Data availability statement

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation