Abstract
Raven’s progressive matrix (RPM) is one kind of visual abstract reasoning tasks, which tests the ability of extracting reasoning rules from limited samples and applying them to an unknown setting. It is frequently used in evaluating human intelligence. Recent advances of RPM-like datasets and solution models partially address the challenges of visually understanding the RPM questions and logically reasoning the missing answers. This paper tackles the challenges of the poor generalization performance due to insufficient samples in RPM datasets. To address the problem of insufficient data for precisely conducting relational reasoning in RPMs, we propose an effective scheme, namely candidate answer morphological mixup (CAM-Mix). CAM-Mix serves as a data augmentation strategy by gray-scale image morphological mixup, which regularizes various solution methods and overcomes the model overfitting problem. Compared with existing methods, a more accurate decision boundary could be defined by creating new negative candidate answers semantically similar to the correct answers. Experimental results show that the proposed data augmentation method on state-of-the-art RPM solution models can provide significant and consistent performance improvements on various RPM-like datasets compared with state-of-the-art solution models and other data augmentation strategies.
Similar content being viewed by others
Data availability statement
The datasets generated during and analyzed during the current study are available in the GitHub repositories, https://github.com/WellyZhang/RAVEN and https://github.com/husheng12345/SRAN.
References
Ametefe, D.S., Sarnin, S.S., Ali, D.M., Muhammad, Z.Z.: Fingerprint pattern classification using deep transfer learning and data augmentation. Vis. Comput. 39, 1–14 (2022)
Amizadeh, S., Palangi, H., Polozov, A., Huang, Y., Koishida, K.: Neuro-symbolic visual reasoning: Disentangling “Visual” from “Reasoning”. In: Proc. 37th Int. Conf. Mach. Learn. vol. 119, pp. 279–290 (2020)
Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Lawrence Zitnick, C., Parikh, D.: VQA: Visual question answering. In: Proc. IEEE Int. Conf. Comput. Vis. pp. 2425–2433 (2015)
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Ding, R., Ren, J., Yu, H., Li, J.: Dynamic texture recognition using PDV hashing and dictionary learning on multi-scale volume local binary pattern. In: Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. pp. 1840–1844 (2022)
Dvornik, N., Mairal, J., Schmid, C.: On the importance of visual context for data augmentation in scene understanding. IEEE Trans. Pattern Anal. Mach. Intell. 43(6), 2014–2028 (2021)
Ebadi, M., Ebrahimi, A.: Video data compression by progressive iterative approximation. Int. J. Interact. Multimed. Artif. Intell. 6(6), 189–195 (2021)
Guo, H., Mao, Y., Zhang, R.: Mixup as locally linear out-of-manifold regularization. In: Proc. 33rd AAAI Conf. Artif. Intell. vol. 33, pp. 3714–3722 (2019)
Hashemi Hosseinabad, S., Safayani, M., Mirzaei, A.: Multiple answers to a question: a new approach for visual question answering. Vis. Comput. 37(1), 119–131 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. pp. 770–778 (2016)
He, W., Ren, J., Bai, R., Jiang, X.: Two-stage rule-induction visual reasoning on RPMs with an application to video prediction. arXiv preprint arXiv:2111.12301 (2021)
He, W., Zhang, J., Ren, J., Bai, R., Jiang, X.: Hierarchical Con–ViT with attention-based relational reasoner for visual analogical reasoning. In: Proc. 37th AAAI Conf. Artif. Intell. vol. 37, pp. 22–30 (2023)
Hu, S., Ma, Y., Liu, X., Wei, Y., Bai, S.: Stratified rule-aware network for abstract visual reasoning. In: Proc. 35th AAAI Conf. Artif. Intell. 35(2), 1567–1574 (2021)
Inoue, H.: Data augmentation by pairing samples for images classification. arXiv preprint arXiv:1801.02929 (2018)
Khan, M.J., Khan, M.J., Siddiqui, A.M., Khurshid, K.: An automated and efficient convolutional architecture for disguise-invariant face recognition using noise-based data augmentation and deep transfer learning. Vis. Comput. 38(2), 509–523 (2022)
Kong, W., Ye, S., Yao, C., Ren, J.: Confidence-based event-centric online video question answering on a newly constructed ATBS dataset. In: Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. (2023)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Proc. NeurIPS 25, 1097–1105 (2012)
Liang, D., Yang, F., Zhang, T., Yang, P.: Understanding mixup training methods. IEEE. Access 6, 58774–58783 (2018)
Liu, S., Guo, H., Hu, J.G., Zhao, X., Zhao, C., Wang, T., Zhu, Y., Wang, J., Tang, M.: A novel data augmentation scheme for pedestrian detection with attribute preserving GAN. Neurocomputing 401, 123–132 (2020)
Liu, X., Xu, Q., Wang, N.: A survey on deep neural network-based image captioning. Vis. Comput. 35(3), 445–470 (2019)
Mai, Z., Hu, G., Chen, D., Shen, F., Shen, H.T.: Metamixup: Learning adaptive interpolation policy of mixup with metalearning. IEEE Trans. Neural Netw. Learn. Syst. 33(7), 3050–3064 (2022)
Maragos, P.: A representation theory for morphological image and signal processing. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 586–599 (1989)
Nazari, K., Ebadi, M.J., Berahmand, K.: Diagnosis of alternaria disease and leafminer pest on tomato leaves using image processing techniques. J. Sci. Food Agric. 102(15), 6907–6920 (2022)
Ren, J., Jiang, X.: A three-step classification framework to handle complex data distribution for radar UAV detection. Pattern Recognit. 111, 107709 (2021)
Santoro, A., Hill, F., Barrett, D., Morcos, A., Lillicrap, T.: Measuring abstract reasoning in neural networks. In: Proc. 35th Int. Conf. Mach. Learn. pp. 4477–4486 (2018)
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 60 (2019)
Song, X., Jin, J., Yao, C., Wang, S., Ren, J., Bai, R.: Siamese-discriminant deep reinforcement learning for solving jigsaw puzzles with large eroded gaps. In: Proc. 37th AAAI Conf. Artif. Intell. vol. 37, 2303–2311 (2023)
Song, X., Yang, X., Ren, J., Bai, R., Jiang, X.: Solving jigsaw puzzle of large eroded gaps using puzzlet discriminant network. In: Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. (2023)
Summers, C., Dinneen, M.J.: Improved mixed-example data augmentation. In: Proc. IEEE Winter Conf. Appl. Comput. Vis. pp. 1262–1270 (2019)
Takahashi, R., Matsubara, T., Uehara, K.: Data augmentation using random image cropping and patching for deep CNNs. IEEE Trans. Circuits Syst. Video Technol. 30(9), 2917–2931 (2019)
Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Lopez-Paz, D., Bengio, Y.: Manifold mixup: Better representations by interpolating hidden states. In: Proc. 36th Int. Conf. Mach. Learn. pp. 6438–6447 (2019)
Wang, S., Ren, J., Bai, R.: A semi-supervised adaptive discriminative discretization method improving discrimination power of regularized naive Bayes. Expert Syst. Appl. 225, 120094 (2023)
Wang, X., Jiang, X., Ren, J.: Blood vessel segmentation from fundus image by a cascade classification framework. Pattern Recognit. 88, 331–341 (2019)
Yan, F., Silamu, W., Li, Y., Chai, Y.: SPCA-Net: a based on spatial position relationship co-attention network for visual question answering. Vis. Comput. 38, 1–12 (2022)
Zhang, C., Gao, F., Jia, B., Zhu, Y., Zhu, S.C.: RAVEN: A dataset for relational and analogical visual reasoning. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. pp. 5317–5327 (2019)
Zhang, C., Jia, B., Gao, F., Zhu, Y., Lu, H., Zhu, S.C.: Learning perceptual inference by contrasting. In: Proc. NeurIPS. pp. 1075–1087 (2019)
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: Mixup: beyond empirical risk minimization. In: Proc. 6th Int. Conf. Learn. Represent. (2018)
Zhang, J., Ren, J., Zhang, Q., Liu, J., Jiang, X.: Spatial context-aware object-attentional network for multi-label image classification. IEEE Trans. Image Process. 32, 3000–3012 (2023)
Zhang, J., Zhang, Q., Ren, J., Zhao, Y., Liu, J.: Spatial-context-aware deep neural network for multi-class image classification. In: Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. pp. 1960–1964 (2022)
Zheng, K., Zha, Z.J., Wei, W.: Abstract reasoning with distracting features. In: Proc. NeurIPS. pp. 5842–5853 (2019)
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proc. 34th AAAI Conf. Artif. Intell. pp. 13001–13008 (2020)
Zhou, F., Hu, Y., Shen, X.: MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition. Vis. Comput. 35(11), 1583–1594 (2019)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proc. IEEE Int. Conf. Comput. Vis. pp. 2223–2232 (2017)
Zhuo, T., Huang, Q., Kankanhalli, M.: Unsupervised abstract reasoning for Raven’s problem matrices. IEEE Trans. Image Process. 30, 8332–8341 (2021)
Funding
This research was supported in part by the National Natural Science Foundation of China under Grant No. 72071116 and in part by the Ningbo Municipal Science and Technology Bureau under Grant Nos. 2019B10026 and 2022Z173.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest to this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
He, W., Ren, J. & Bai, R. Data augmentation by morphological mixup for solving Raven’s progressive matrices. Vis Comput 40, 2457–2470 (2024). https://doi.org/10.1007/s00371-023-02930-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-02930-x