research-article

Runge-Kutta Guided Feature Augmentation for Few-Sample Learning

Authors:

Heng Tao ShenAuthors Info & Claims

IEEE Transactions on Multimedia, Volume 26

Pages 7349 - 7358

https://doi.org/10.1109/TMM.2024.3366404

Published: 15 February 2024 Publication History

Abstract

Deep Neural Networks (DNNs) have primarily been demonstrated to be successful when large-scale labeled data are available. However, DNNs usually fail when tasked in few-sample learning scenarios, and the results will be much worse when the limited data show large intra-class variation and inter-class similarity (a.k.a fine-grained classification). To solve this challenging task, the idea of carrying out feature augmentation is visited and better achieved by exploring the merit of the forward Euler method in solving ordinary differential equations (ODEs), and a novel high-order feature augmentation (HFA) model with ResNet is proposed. Specifically, the proposed method leverages the stacked residual structure to model the direction of feature change over the initial state, and uses the triplet loss as constraint to model the step size of change in an adaptive manner. As a result, the initial features can then be augmented by a residual structure with a forward Eulerian form to generate features of the same subcategory with a similar representation as the input image. Furthermore, the proposed augmentation mechanism enjoys two additional benefits: a) it can help avoid the over-fitting issue when learned with insufficient training data; b) it can be used seamlessly with any residual structure-based classification network, and the ResNet used in this paper remains unchanged during testing. Extensive experiments are carried out on fine-grained visual categorization benchmarks, and the results demonstrate that our approach can significantly improve the categorization performance when the training data is highly insufficient.

References

[1]

J. Wei, Y. Yang, X. Xu, X. Zhu, and H. T. Shen, “Universal weighting metric learning for cross-modal retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 10, pp. 6534–6545, Oct. 2022.

Digital Library

[2]

J. Wei et al., “Universal weighting metric learning for cross-modal matching,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 13002–13011.

[3]

J. Wei, X. Xu, Z. Wang, and G. Wang, “Meta self-paced learning for cross-modal matching,” in Proc. 29th ACM Int. Conf. Multimedia, 2021, pp. 3835–3843.

Digital Library

[4]

T. Gebru, J. Krause, J. Deng, and F.-F. Li, “Scalable annotation of fine-grained categories without experts,” in Proc. CHI Conf. Hum. Factors Comput. Syst., 2017, pp. 1877–1881.

[5]

B. Zhou, Q. Cui, X.-S. Wei, and Z.-M. Chen, “BBN: Bilateral-branch network with cumulative learning for long-tailed visual recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 9716–9725.

[6]

X. He and Y. Peng, “Only learn one sample: Fine-grained visual categorization with one sample training,” in Proc. ACM Int. Conf. Multimedia, 2018, pp. 1372–1380.

[7]

F. Pahde, P. Jähnichen, T. Klein, and M. Nabi, “Cross-modal hallucination for few-shot fine-grained recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshop, 2018, pp. 1–6.

[8]

Z. Wang et al., “Discovering attractive segments in the user-generated video streams,” Inf. Process. Manage., vol. 57, no. 1, 2020, Art. no.

[9]

A. Vobecký, D. Hurych, M. Uricár, P. Pérez, and J. Sivic, “Artificial dummies for urban dataset augmentation,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 2692–2700.

[10]

Y. Zhu, W. Min, and S. Jiang, “Attribute-guided feature learning for few-shot image recognition,” IEEE Trans. Multimedia, vol. 23, pp. 1200–1209, 2021.

Digital Library

[11]

X. Zhong, C. Gu, M. Ye, W. Huang, and C.-W. Lin, “Graph complemented latent representation for few-shot image classification,” IEEE Trans. Multimedia, vol. 25, pp. 1979–1990, 2023.

Digital Library

[12]

J. Wei et al., “Less is better: Exponential loss for cross-modal matching,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 9, pp. 5271–5280, Sep. 2023.

[13]

J. Wei et al., “Semantic guided knowledge graph for large-scale zero-shot learning,” J. Vis. Commun. Image Representation, vol. 88, 2022, Art. no.

[14]

Y. Zhu, R. Li, Y. Yang, and N. Ye, “Learning cascade attention for fine-grained image classification,” Neural Netw., vol. 122, pp. 174–182, 2020.

Digital Library

[15]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.

[16]

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The Caltech-UCSD Birds-200-2011 dataset,” California Inst. Technol., Pasadena, CA, USA, Tech. Rep. CNS-TR-2011-001, 2011.

[17]

X. Guan, Y. Yang, Z. Wang, and J. Li, “Semantic feature augmentation for fine-grained visual categorization with few-sample training,” in Proc. 2nd ACM Int. Conf. Multimedia Asia, 2021, pp. 1–7.

Digital Library

[18]

Z. Chen et al., “Multi-level semantic feature augmentation for one-shot learning,” IEEE Trans. Image Process., vol. 28, no. 9, pp. 4594–4605, Sep. 2019.

[19]

T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud, “Neural ordinary differential equations,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2018, pp. 6572–6583.

[20]

A. F. Queiruga, N. B. Erichson, D. Taylor, and M. W. Mahoney, “Continuous-in-depth neural networks,” 2020, arXiv:2008.02389.

[21]

Z. Luo, Z. Sun, W. Zhou, and S. Kamata, “Rethinking ResNets: Improved stacking strategies with high order schemes,” Complex & Intell. Syst., vol. 8, no. 4, pp. 3395–3407, 2022.

[22]

E. Weinan, “A proposal on machine learning via dynamical systems,” Commun. Math. Statist., vol. 5, no. 1, pp. 1–11, 2017.

[23]

W. Su, S. Boyd, and E. J. Candes, “A differential equation for modeling Nesterov's accelerated gradient method: Theory and insights,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2015, pp. 2510–2518.

[24]

M. Muehlebach and M. I. Jordan, “A dynamical systems perspective on Nesterov acceleration,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 4656–4662.

[25]

A. Orvieto and A. Lucchi, “Shadowing properties of optimization algorithms,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2019, pp. 12671–12682.

[26]

H. Deng et al., “A unified Taylor framework for revisiting attribution methods,” in Proc. AAAI Conf. Artif. Intell., vol. 35, no. 13, 2021, pp. 11462–11469.

[27]

Y. Lu, A. Zhong, Q. Li, and B. Dong, “Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 3282–3291.

[28]

Q. Li, L. Chen, C. Tai, and E. Weinan, “Maximum principle based algorithms for deep learning,” J. Mach. Learn. Res., vol. 18, no. 165, pp. 1–29, 2017.

[29]

Q. Li and S. Hao, “An optimal control approach to deep learning and applications to discrete-weight neural networks,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 2991–3000.

[30]

S. Sonoda and N. Murata, “Transport analysis of infinitely deep neural network,” J. Mach. Learn. Res., vol. 20, no. 1, pp. 1–52, 2019.

[31]

T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neural ordinary differential equations,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2018, pp. 6571–6583.

[32]

D. Balduzzi, B. McWilliams, and T. Butler-Yeoman, “Neural Taylor approximations: Convergence and exploration in rectifier networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 351–360.

[33]

K. Yue et al., “Compact generalized non-local network,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2018, pp. 6511–6520.

[34]

X. Guan, Y. Yang, J. Li, X. Xu, and H. T. Shen, “Mind the remainder: Taylor's theorem view on recurrent neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 4, pp. 1507–1519, Apr. 2022.

[35]

J. Wei et al., “Residual graph convolutional networks for zero-shot learning,” in Proc. ACM Int. Conf. MultiMedia Asia, 2019, pp. 1–6.

[36]

H. Huang, J. Zhang, J. Zhang, J. Xu, and Q. Wu, “Low-rank pairwise alignment bilinear network for few-shot fine-grained image classification,” IEEE Trans. Multimedia, vol. 23, pp. 1666–1680, 2021.

Digital Library

[37]

Y. Peng, X. He, and J. Zhao, “Object-part attention model for fine-grained image classification,” IEEE Trans. Image Process., vol. 27, no. 3, pp. 1487–1500, Mar. 2018.

[38]

L. Xie, J. Wang, B. Zhang, and Q. Tian, “Fine-grained image search,” IEEE Trans. Multimedia, vol. 17, pp. 636–647, 2015.

Digital Library

[39]

M. Tan, F. Yuan, J. Yu, G. Wang, and X. Gu, “Fine-grained image classification via multi-scale selective hierarchical biquadratic pooling,” ACM Trans. Multimedia Comput., Commun., Appl., vol. 18, no. 1s, pp. 1–23, 2022.

Digital Library

[40]

T.-I. Chen et al., “Dual-awareness attention for few-shot object detection,” IEEE Trans. Multimedia, vol. 25, pp. 291–301, 2023.

[41]

H. Zhang, H. Li, and P. Koniusz, “Multi-level second-order few-shot learning,” IEEE Trans. Multimedia, vol. 25, pp. 2111–2126, 2023.

Digital Library

[42]

W. Li et al., “Revisiting local descriptor based image-to-class measure for few-shot learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 7253–7260.

[43]

H. Huang, J. Zhang, J. Zhang, Q. Wu, and J. Xu, “Compare more nuanced: Pairwise alignment bilinear network for few-shot fine-grained learning,” in Proc. IEEE Int. Conf. Multimedia Expo, 2019, pp. 91–96.

[44]

Y.-J. Wang and C.-T. Lin, “Runge-Kutta neural network for identification of dynamical systems in high accuracy,” IEEE Trans. Neural Netw., vol. 9, no. 2, pp. 294–307, Mar. 1998.

Digital Library

[45]

A. Iserles, A First Course in the Numerical Analysis of Differential Equations (Series Cambridge Texts in Applied Mathematics). Cambridge, U.K.: Cambridge Univ. Press, 1996.

Digital Library

[46]

S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional block attention module,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 3–19.

Digital Library

[47]

F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified embedding for face recognition and clustering,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 815–823.

[48]

T.-Y. Lin, A. RoyChowdhury, and S. Maji, “Bilinear convolutional neural networks for fine-grained visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 6, pp. 1309–1322, Jun. 2018.

[49]

C. Yu, X. Zhao, Q. Zheng, P. Zhang, and X. You, “Hierarchical bilinear pooling for fine-grained visual recognition,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 574–589.

[50]

T. Hu and H. Qi, “See better before looking closer: Weakly supervised data augmentation network for fine-grained visual classification,” 2019, arXiv:1901.09891.

[51]

Y. Chen, Y. Bai, W. Zhang, and T. Mei, “Destruction and construction learning for fine-grained image recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 5152–5161.

[52]

X. Guan, G. Wang, X. Xu, and Y. Bin, “Learning hierarchal channel attention for fine-grained visual classification,” in Proc. 29th ACM Int. Conf. MultiMedia, 2021, pp. 5011–5019.

Digital Library

[53]

A. Khosla, N. Jayadevaprakash, B. Yao, and L. Fei-Fei, “Novel dataset for fine-grained image categorization,” in Proc. First Workshop Fine Grained Vis. Categorization, 2011, pp. 125–126.

[54]

S. Maji, E. Rahtu, J. Kannala, M. Blaschko, and A. Vedaldi, “Fine-grained visual classification of aircraft,” 2013, arXiv:1306.5151.

Recommendations

Runge-Kutta discontinuous Galerkin method using WENO limiters II

In J. Qiu, C.-W. Shu, Runge-Kutta discontinuous Galerkin method using WENO limiters, SIAM Journal on Scientific Computing 26 (2005) 907-929], Qiu and Shu investigated using weighted essentially non-oscillatory (WENO) finite volume methodology as ...
Strong stability of singly-diagonally-implicit Runge--Kutta methods

This paper deals with the numerical solution of initial value problems, for systems of ordinary differential equations, by Runge-Kutta methods (RKMs) with special nonlinear stability properties indicated by the terms total-variation-diminishing (TVD), ...
TOTAL VARIATION DIMINISHING RUNGE-KUTTA SCHEMES

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia

IEEE Transactions on Multimedia Volume 26, Issue

2024

9891 pages

Issue’s Table of Contents

1520-9210 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 15 February 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents