Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Runge-Kutta Guided Feature Augmentation for Few-Sample Learning

Published: 15 February 2024 Publication History

Abstract

Deep Neural Networks (DNNs) have primarily been demonstrated to be successful when large-scale labeled data are available. However, DNNs usually fail when tasked in few-sample learning scenarios, and the results will be much worse when the limited data show large intra-class variation and inter-class similarity (a.k.a fine-grained classification). To solve this challenging task, the idea of carrying out feature augmentation is visited and better achieved by exploring the merit of the forward Euler method in solving ordinary differential equations (ODEs), and a novel high-order feature augmentation (HFA) model with ResNet is proposed. Specifically, the proposed method leverages the stacked residual structure to model the direction of feature change over the initial state, and uses the triplet loss as constraint to model the step size of change in an adaptive manner. As a result, the initial features can then be augmented by a residual structure with a forward Eulerian form to generate features of the same subcategory with a similar representation as the input image. Furthermore, the proposed augmentation mechanism enjoys two additional benefits: a) it can help avoid the over-fitting issue when learned with insufficient training data; b) it can be used seamlessly with any residual structure-based classification network, and the ResNet used in this paper remains unchanged during testing. Extensive experiments are carried out on fine-grained visual categorization benchmarks, and the results demonstrate that our approach can significantly improve the categorization performance when the training data is highly insufficient.

References

[1]
J. Wei, Y. Yang, X. Xu, X. Zhu, and H. T. Shen, “Universal weighting metric learning for cross-modal retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 10, pp. 6534–6545, Oct. 2022.
[2]
J. Wei et al., “Universal weighting metric learning for cross-modal matching,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 13002–13011.
[3]
J. Wei, X. Xu, Z. Wang, and G. Wang, “Meta self-paced learning for cross-modal matching,” in Proc. 29th ACM Int. Conf. Multimedia, 2021, pp. 3835–3843.
[4]
T. Gebru, J. Krause, J. Deng, and F.-F. Li, “Scalable annotation of fine-grained categories without experts,” in Proc. CHI Conf. Hum. Factors Comput. Syst., 2017, pp. 1877–1881.
[5]
B. Zhou, Q. Cui, X.-S. Wei, and Z.-M. Chen, “BBN: Bilateral-branch network with cumulative learning for long-tailed visual recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 9716–9725.
[6]
X. He and Y. Peng, “Only learn one sample: Fine-grained visual categorization with one sample training,” in Proc. ACM Int. Conf. Multimedia, 2018, pp. 1372–1380.
[7]
F. Pahde, P. Jähnichen, T. Klein, and M. Nabi, “Cross-modal hallucination for few-shot fine-grained recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshop, 2018, pp. 1–6.
[8]
Z. Wang et al., “Discovering attractive segments in the user-generated video streams,” Inf. Process. Manage., vol. 57, no. 1, 2020, Art. no.
[9]
A. Vobecký, D. Hurych, M. Uricár, P. Pérez, and J. Sivic, “Artificial dummies for urban dataset augmentation,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 2692–2700.
[10]
Y. Zhu, W. Min, and S. Jiang, “Attribute-guided feature learning for few-shot image recognition,” IEEE Trans. Multimedia, vol. 23, pp. 1200–1209, 2021.
[11]
X. Zhong, C. Gu, M. Ye, W. Huang, and C.-W. Lin, “Graph complemented latent representation for few-shot image classification,” IEEE Trans. Multimedia, vol. 25, pp. 1979–1990, 2023.
[12]
J. Wei et al., “Less is better: Exponential loss for cross-modal matching,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 9, pp. 5271–5280, Sep. 2023.
[13]
J. Wei et al., “Semantic guided knowledge graph for large-scale zero-shot learning,” J. Vis. Commun. Image Representation, vol. 88, 2022, Art. no.
[14]
Y. Zhu, R. Li, Y. Yang, and N. Ye, “Learning cascade attention for fine-grained image classification,” Neural Netw., vol. 122, pp. 174–182, 2020.
[15]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
[16]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The Caltech-UCSD Birds-200-2011 dataset,” California Inst. Technol., Pasadena, CA, USA, Tech. Rep. CNS-TR-2011-001, 2011.
[17]
X. Guan, Y. Yang, Z. Wang, and J. Li, “Semantic feature augmentation for fine-grained visual categorization with few-sample training,” in Proc. 2nd ACM Int. Conf. Multimedia Asia, 2021, pp. 1–7.
[18]
Z. Chen et al., “Multi-level semantic feature augmentation for one-shot learning,” IEEE Trans. Image Process., vol. 28, no. 9, pp. 4594–4605, Sep. 2019.
[19]
T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud, “Neural ordinary differential equations,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2018, pp. 6572–6583.
[20]
A. F. Queiruga, N. B. Erichson, D. Taylor, and M. W. Mahoney, “Continuous-in-depth neural networks,” 2020, arXiv:2008.02389.
[21]
Z. Luo, Z. Sun, W. Zhou, and S. Kamata, “Rethinking ResNets: Improved stacking strategies with high order schemes,” Complex & Intell. Syst., vol. 8, no. 4, pp. 3395–3407, 2022.
[22]
E. Weinan, “A proposal on machine learning via dynamical systems,” Commun. Math. Statist., vol. 5, no. 1, pp. 1–11, 2017.
[23]
W. Su, S. Boyd, and E. J. Candes, “A differential equation for modeling Nesterov's accelerated gradient method: Theory and insights,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2015, pp. 2510–2518.
[24]
M. Muehlebach and M. I. Jordan, “A dynamical systems perspective on Nesterov acceleration,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 4656–4662.
[25]
A. Orvieto and A. Lucchi, “Shadowing properties of optimization algorithms,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2019, pp. 12671–12682.
[26]
H. Deng et al., “A unified Taylor framework for revisiting attribution methods,” in Proc. AAAI Conf. Artif. Intell., vol. 35, no. 13, 2021, pp. 11462–11469.
[27]
Y. Lu, A. Zhong, Q. Li, and B. Dong, “Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 3282–3291.
[28]
Q. Li, L. Chen, C. Tai, and E. Weinan, “Maximum principle based algorithms for deep learning,” J. Mach. Learn. Res., vol. 18, no. 165, pp. 1–29, 2017.
[29]
Q. Li and S. Hao, “An optimal control approach to deep learning and applications to discrete-weight neural networks,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 2991–3000.
[30]
S. Sonoda and N. Murata, “Transport analysis of infinitely deep neural network,” J. Mach. Learn. Res., vol. 20, no. 1, pp. 1–52, 2019.
[31]
T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neural ordinary differential equations,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2018, pp. 6571–6583.
[32]
D. Balduzzi, B. McWilliams, and T. Butler-Yeoman, “Neural Taylor approximations: Convergence and exploration in rectifier networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 351–360.
[33]
K. Yue et al., “Compact generalized non-local network,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2018, pp. 6511–6520.
[34]
X. Guan, Y. Yang, J. Li, X. Xu, and H. T. Shen, “Mind the remainder: Taylor's theorem view on recurrent neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 4, pp. 1507–1519, Apr. 2022.
[35]
J. Wei et al., “Residual graph convolutional networks for zero-shot learning,” in Proc. ACM Int. Conf. MultiMedia Asia, 2019, pp. 1–6.
[36]
H. Huang, J. Zhang, J. Zhang, J. Xu, and Q. Wu, “Low-rank pairwise alignment bilinear network for few-shot fine-grained image classification,” IEEE Trans. Multimedia, vol. 23, pp. 1666–1680, 2021.
[37]
Y. Peng, X. He, and J. Zhao, “Object-part attention model for fine-grained image classification,” IEEE Trans. Image Process., vol. 27, no. 3, pp. 1487–1500, Mar. 2018.
[38]
L. Xie, J. Wang, B. Zhang, and Q. Tian, “Fine-grained image search,” IEEE Trans. Multimedia, vol. 17, pp. 636–647, 2015.
[39]
M. Tan, F. Yuan, J. Yu, G. Wang, and X. Gu, “Fine-grained image classification via multi-scale selective hierarchical biquadratic pooling,” ACM Trans. Multimedia Comput., Commun., Appl., vol. 18, no. 1s, pp. 1–23, 2022.
[40]
T.-I. Chen et al., “Dual-awareness attention for few-shot object detection,” IEEE Trans. Multimedia, vol. 25, pp. 291–301, 2023.
[41]
H. Zhang, H. Li, and P. Koniusz, “Multi-level second-order few-shot learning,” IEEE Trans. Multimedia, vol. 25, pp. 2111–2126, 2023.
[42]
W. Li et al., “Revisiting local descriptor based image-to-class measure for few-shot learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 7253–7260.
[43]
H. Huang, J. Zhang, J. Zhang, Q. Wu, and J. Xu, “Compare more nuanced: Pairwise alignment bilinear network for few-shot fine-grained learning,” in Proc. IEEE Int. Conf. Multimedia Expo, 2019, pp. 91–96.
[44]
Y.-J. Wang and C.-T. Lin, “Runge-Kutta neural network for identification of dynamical systems in high accuracy,” IEEE Trans. Neural Netw., vol. 9, no. 2, pp. 294–307, Mar. 1998.
[45]
A. Iserles, A First Course in the Numerical Analysis of Differential Equations (Series Cambridge Texts in Applied Mathematics). Cambridge, U.K.: Cambridge Univ. Press, 1996.
[46]
S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional block attention module,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 3–19.
[47]
F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified embedding for face recognition and clustering,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 815–823.
[48]
T.-Y. Lin, A. RoyChowdhury, and S. Maji, “Bilinear convolutional neural networks for fine-grained visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 6, pp. 1309–1322, Jun. 2018.
[49]
C. Yu, X. Zhao, Q. Zheng, P. Zhang, and X. You, “Hierarchical bilinear pooling for fine-grained visual recognition,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 574–589.
[50]
T. Hu and H. Qi, “See better before looking closer: Weakly supervised data augmentation network for fine-grained visual classification,” 2019, arXiv:1901.09891.
[51]
Y. Chen, Y. Bai, W. Zhang, and T. Mei, “Destruction and construction learning for fine-grained image recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 5152–5161.
[52]
X. Guan, G. Wang, X. Xu, and Y. Bin, “Learning hierarchal channel attention for fine-grained visual classification,” in Proc. 29th ACM Int. Conf. MultiMedia, 2021, pp. 5011–5019.
[53]
A. Khosla, N. Jayadevaprakash, B. Yao, and L. Fei-Fei, “Novel dataset for fine-grained image categorization,” in Proc. First Workshop Fine Grained Vis. Categorization, 2011, pp. 125–126.
[54]
S. Maji, E. Rahtu, J. Kannala, M. Blaschko, and A. Vedaldi, “Fine-grained visual classification of aircraft,” 2013, arXiv:1306.5151.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia
IEEE Transactions on Multimedia  Volume 26, Issue
2024
9891 pages

Publisher

IEEE Press

Publication History

Published: 15 February 2024

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media