Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3394486.3403083acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Recurrent Networks for Guided Multi-Attention Classification

Published: 20 August 2020 Publication History

Abstract

Attention-based image classification has gained increasing popularity in recent years. State-of-the-art methods for attention-based classification typically require a large training set and operate under the assumption that the label of an image depends solely on a single object (i.e. region of interest) in the image. However, in many real-world applications (e.g. medical imaging), it is very expensive to collect a large training set. Moreover, the label of each image is usually determined jointly by multiple regions of interest (ROIs). Fortunately, for such applications, it is often possible to collect the locations of the ROIs in each training image. In this paper, we study the problem of guided multi-attention classification, the goal of which is to achieve high accuracy under the dual constraints of (1) small sample size, and (2) multiple ROIs for each image. We propose a model, called Guided Attention Recurrent Network (GARN), for multi-attention classification. Different from existing attention-based methods, GARN utilizes guidance information regarding multiple ROIs thus allowing it to work well even when sample size is small. Empirical studies on three different visual tasks show that our guided attention approach can effectively boost model performance for multi-attention image classification.

References

[1]
Bogdan Alexe, Thomas Deselaers, and Vittorio Ferrari. 2010. What is an object?. In Proc. 2010 IEEE Conf. Computer Vision and Pattern Recognition (CVPR'10). 73--80.
[2]
Bogdan Alexe, Nicolas Heess, Yee W Teh, and Vittorio Ferrari. 2012. Searching for objects driven by context. In Advances in Neural Information Processing Systems 25 (NeurIPS'12). 881--889.
[3]
Jimmy Ba, Volodymyr Mnih, and Koray Kavukcuoglu. 2015. Multiple object recognition with visual attention. Proc. 3rd Int. Conf. Learning Representations (ICLR'15).
[4]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proc. 3rd Int. Conf. Learning Representations (ICLR'15).
[5]
Diangarti Bhalang Tarianga, Prithviraj Senguptab, Aniket Roy, Rajat Subhra Chakraborty, and Ruchira Naskar. 2019. Classification of Computer Generated and Natural Images based on Efficient Deep Convolutional Recurrent Attention Model. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
[6]
Tom Brosch, Roger Tam, Alzheimers Disease Neuroimaging Initiative, et al. 2013. Manifold learning of brain MRIs by deep learning. In Proc. 16th Int. Conf. Medical Image Computing and Computer-Assisted Intervention (MICCAI'13). 633--640.
[7]
Ed Bullmore and Olaf Sporns. 2009. Complex brain networks: graph theoretical analysis of structural and functional systems. Nature reviews neuroscience, Vol. 10, 3 (2009), 186--198.
[8]
Nicholas J Butko and Javier R Movellan. 2009. Optimal scanning for faster object detection. In Proc. 2009 IEEE Conf. Computer Vision and Pattern Recognition (CVPR'09). 2751--2758.
[9]
Misha Denil, Loris Bazzani, Hugo Larochelle, and Nando de Freitas. 2012. Learning where to attend with deep architectures for image tracking. Neural Computation, Vol. 24, 8 (2012), 2151--2184.
[10]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. 2014 IEEE Conf. Computer Vision and Pattern Recognition (CVPR'14). 580--587.
[11]
Ian J Goodfellow, Yaroslav Bulatov, Julian Ibarz, Sacha Arnoud, and Vinay Shet. 2014. Multi-digit number recognition from street view imagery using deep convolutional neural networks. In Proc. 2nd Int. Conf. Learning Representations (ICLR'14).
[12]
Albert Haque, Alexandre Alahi, and Li Fei-Fei. 2016. Recurrent attention models for depth-based person identification. In Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR'16). 1229--1238.
[13]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NeurIPS'12). 1097--1105.
[14]
Hugo Larochelle and Geoffrey E Hinton. 2010. Learning to combine foveal glimpses with a third-order Boltzmann machine. In Advances in Neural Information Processing Systems 23 (NeurIPS'10). 1243--1251.
[15]
John Boaz Lee, Xiangnan Kong, Yihan Bao, and Constance Moore. 2017. Identifying Deep Contrasting Networks from Time Series Data: Application to Brain Network Analysis. In Proc. 17th SIAM Int. Conf. Data Mining (SDM'17). 543--551.
[16]
Jun Liu, Gang Wang, Ping Hu, Ling-Yu Duan, and Alex C Kot. 2017. Global context-aware attention LS™ networks for 3D action recognition. In Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR'17).
[17]
Arthur Mensch, Gaël Varoquaux, and Bertrand Thirion. 2016. Compressed online dictionary learning for fast resting-state fMRI decomposition. In Proc. 13th IEEE Int. Symposium on Biomedical Imaging (ISBI'16). 1282--1285.
[18]
Simon Mezgec and Barbara Korouvsić Seljak. 2017. NutriNet: A Deep Learning Food and Drink Image Recognition System for Dietary Assessment. Nutrients, Vol. 9, 7 (2017), 657.
[19]
Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. Recurrent models of visual attention. In Advances in Neural Information Processing Systems 27 (NeurIPS'14). 2204--2212.
[20]
Dong Nie, Han Zhang, Ehsan Adeli, Luyan Liu, and Dinggang Shen. 2016. 3D deep learning for multi-modal imaging-guided survival time prediction of brain tumor patients. In Proc. 19th Int. Conf. Medical Image Computing and Computer-Assisted Intervention (MICCAI'16). 212--220.
[21]
Charlie Tang, Nitish Srivastava, and Russ R Salakhutdinov. 2014. Learning generative models with visual attention. Advances in Neural Information Processing Systems 27 (NeurIPS'14). 1808--1816.
[22]
Nathalie Tzourio-Mazoyer, Brigitte Landeau, Dimitri Papathanassiou, Fabrice Crivello, Olivier Etard, Nicolas Delcroix, Bernard Mazoyer, and Marc Joliot. 2002. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage, Vol. 15, 1 (2002), 273--289.
[23]
Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, Vol. 8 (1992), 229--256.
[24]
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In Proc. 32nd Int. Conf. Machine Learning (ICML'15). 2048--2057.
[25]
Jingyuan Zhang, Bokai Cao, Sihong Xie, Chun-Ta Lu, Philip S. Yu, and Ann B. Ragin. 2016. Identifying Connectivity Patterns for Brain Diseases via Multi-side-view Guided Deep Architectures. In Proc. 16th SIAM Int. Conf. Data Mining (SDM'16). 36--44.
[26]
Yudong Zhang, Zhengchao Dong, Preetha Phillips, Shuihua Wang, Genlin Ji, Jiquan Yang, and Ti-Fei Yuan. 2015. Detection of subjects and brain regions related to Alzheimer's disease using 3D MRI scans based on eigenbrain and machine learning. Frontiers in Computational Neuroscience, Vol. 9 (2015), 66.
[27]
Xin Zhao, Liufang Sang, Guiguang Ding, Jungong Han, Na Di, and Chenggang Yan. 2019. Recurrent attention model for pedestrian attribute recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 9275--9282.
[28]
Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen. 2013. Discriminative brain effective connectivity analysis for Alzheimer's disease: a kernel learning approach upon sparse Gaussian Bayesian network. In Proc. 2013 IEEE Conf. Computer Vision and Pattern Recognition (CVPR'13). 2243--2250.
[29]
Zhen Zhou, Yan Huang, Wei Wang, Liang Wang, and Tieniu Tan. 2017. See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR'17). 6776--6785.
[30]
Lei Zhu, Zijun Deng, Xiaowei Hu, Chi-Wing Fu, Xuemiao Xu, Jing Qin, and Pheng-Ann Heng. 2018. Bidirectional feature pyramid network with recurrent attention residual modules for shadow detection. In Proceedings of the European Conference on Computer Vision (ECCV). 121--136.

Cited By

View all
  • (2023)BL-GAN: Semi-Supervised Bug Localization via Generative Adversarial NetworkIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322532935:11(11112-11125)Online publication date: 1-Nov-2023
  • (2021)Retina-like Imaging and Its Applications: A Brief ReviewApplied Sciences10.3390/app1115705811:15(7058)Online publication date: 30-Jul-2021
  • (2021)Re-Attention Is All You Need: Memory-Efficient Scene Text Detection via Re-Attention on Uncertain Regions2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS51168.2021.9636510(452-459)Online publication date: 27-Sep-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
August 2020
3664 pages
ISBN:9781450379984
DOI:10.1145/3394486
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. brain network classification
  2. recurrent attention model
  3. visual attention network

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)79
  • Downloads (Last 6 weeks)9
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)BL-GAN: Semi-Supervised Bug Localization via Generative Adversarial NetworkIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322532935:11(11112-11125)Online publication date: 1-Nov-2023
  • (2021)Retina-like Imaging and Its Applications: A Brief ReviewApplied Sciences10.3390/app1115705811:15(7058)Online publication date: 30-Jul-2021
  • (2021)Re-Attention Is All You Need: Memory-Efficient Scene Text Detection via Re-Attention on Uncertain Regions2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS51168.2021.9636510(452-459)Online publication date: 27-Sep-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media