research-article

Public Access

Recurrent Networks for Guided Multi-Attention Classification

Authors:

Constance MooreAuthors Info & Claims

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 412 - 420

https://doi.org/10.1145/3394486.3403083

Published: 20 August 2020 Publication History

Abstract

Attention-based image classification has gained increasing popularity in recent years. State-of-the-art methods for attention-based classification typically require a large training set and operate under the assumption that the label of an image depends solely on a single object (i.e. region of interest) in the image. However, in many real-world applications (e.g. medical imaging), it is very expensive to collect a large training set. Moreover, the label of each image is usually determined jointly by multiple regions of interest (ROIs). Fortunately, for such applications, it is often possible to collect the locations of the ROIs in each training image. In this paper, we study the problem of guided multi-attention classification, the goal of which is to achieve high accuracy under the dual constraints of (1) small sample size, and (2) multiple ROIs for each image. We propose a model, called Guided Attention Recurrent Network (GARN), for multi-attention classification. Different from existing attention-based methods, GARN utilizes guidance information regarding multiple ROIs thus allowing it to work well even when sample size is small. Empirical studies on three different visual tasks show that our guided attention approach can effectively boost model performance for multi-attention image classification.

References

[1]

Bogdan Alexe, Thomas Deselaers, and Vittorio Ferrari. 2010. What is an object?. In Proc. 2010 IEEE Conf. Computer Vision and Pattern Recognition (CVPR'10). 73--80.

[2]

Bogdan Alexe, Nicolas Heess, Yee W Teh, and Vittorio Ferrari. 2012. Searching for objects driven by context. In Advances in Neural Information Processing Systems 25 (NeurIPS'12). 881--889.

[3]

Jimmy Ba, Volodymyr Mnih, and Koray Kavukcuoglu. 2015. Multiple object recognition with visual attention. Proc. 3rd Int. Conf. Learning Representations (ICLR'15).

[4]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proc. 3rd Int. Conf. Learning Representations (ICLR'15).

[5]

Diangarti Bhalang Tarianga, Prithviraj Senguptab, Aniket Roy, Rajat Subhra Chakraborty, and Ruchira Naskar. 2019. Classification of Computer Generated and Natural Images based on Efficient Deep Convolutional Recurrent Attention Model. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.

[6]

Tom Brosch, Roger Tam, Alzheimers Disease Neuroimaging Initiative, et al. 2013. Manifold learning of brain MRIs by deep learning. In Proc. 16th Int. Conf. Medical Image Computing and Computer-Assisted Intervention (MICCAI'13). 633--640.

[7]

Ed Bullmore and Olaf Sporns. 2009. Complex brain networks: graph theoretical analysis of structural and functional systems. Nature reviews neuroscience, Vol. 10, 3 (2009), 186--198.

[8]

Nicholas J Butko and Javier R Movellan. 2009. Optimal scanning for faster object detection. In Proc. 2009 IEEE Conf. Computer Vision and Pattern Recognition (CVPR'09). 2751--2758.

[9]

Misha Denil, Loris Bazzani, Hugo Larochelle, and Nando de Freitas. 2012. Learning where to attend with deep architectures for image tracking. Neural Computation, Vol. 24, 8 (2012), 2151--2184.

Digital Library

[10]

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. 2014 IEEE Conf. Computer Vision and Pattern Recognition (CVPR'14). 580--587.

Digital Library

[11]

Ian J Goodfellow, Yaroslav Bulatov, Julian Ibarz, Sacha Arnoud, and Vinay Shet. 2014. Multi-digit number recognition from street view imagery using deep convolutional neural networks. In Proc. 2nd Int. Conf. Learning Representations (ICLR'14).

[12]

Albert Haque, Alexandre Alahi, and Li Fei-Fei. 2016. Recurrent attention models for depth-based person identification. In Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR'16). 1229--1238.

[13]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NeurIPS'12). 1097--1105.

[14]

Hugo Larochelle and Geoffrey E Hinton. 2010. Learning to combine foveal glimpses with a third-order Boltzmann machine. In Advances in Neural Information Processing Systems 23 (NeurIPS'10). 1243--1251.

[15]

John Boaz Lee, Xiangnan Kong, Yihan Bao, and Constance Moore. 2017. Identifying Deep Contrasting Networks from Time Series Data: Application to Brain Network Analysis. In Proc. 17th SIAM Int. Conf. Data Mining (SDM'17). 543--551.

[16]

Jun Liu, Gang Wang, Ping Hu, Ling-Yu Duan, and Alex C Kot. 2017. Global context-aware attention LS™ networks for 3D action recognition. In Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR'17).

[17]

Arthur Mensch, Gaël Varoquaux, and Bertrand Thirion. 2016. Compressed online dictionary learning for fast resting-state fMRI decomposition. In Proc. 13th IEEE Int. Symposium on Biomedical Imaging (ISBI'16). 1282--1285.

[18]

Simon Mezgec and Barbara Korouvsić Seljak. 2017. NutriNet: A Deep Learning Food and Drink Image Recognition System for Dietary Assessment. Nutrients, Vol. 9, 7 (2017), 657.

[19]

Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. Recurrent models of visual attention. In Advances in Neural Information Processing Systems 27 (NeurIPS'14). 2204--2212.

[20]

Dong Nie, Han Zhang, Ehsan Adeli, Luyan Liu, and Dinggang Shen. 2016. 3D deep learning for multi-modal imaging-guided survival time prediction of brain tumor patients. In Proc. 19th Int. Conf. Medical Image Computing and Computer-Assisted Intervention (MICCAI'16). 212--220.

Digital Library

[21]

Charlie Tang, Nitish Srivastava, and Russ R Salakhutdinov. 2014. Learning generative models with visual attention. Advances in Neural Information Processing Systems 27 (NeurIPS'14). 1808--1816.

[22]

Nathalie Tzourio-Mazoyer, Brigitte Landeau, Dimitri Papathanassiou, Fabrice Crivello, Olivier Etard, Nicolas Delcroix, Bernard Mazoyer, and Marc Joliot. 2002. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage, Vol. 15, 1 (2002), 273--289.

[23]

Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, Vol. 8 (1992), 229--256.

Digital Library

[24]

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In Proc. 32nd Int. Conf. Machine Learning (ICML'15). 2048--2057.

[25]

Jingyuan Zhang, Bokai Cao, Sihong Xie, Chun-Ta Lu, Philip S. Yu, and Ann B. Ragin. 2016. Identifying Connectivity Patterns for Brain Diseases via Multi-side-view Guided Deep Architectures. In Proc. 16th SIAM Int. Conf. Data Mining (SDM'16). 36--44.

[26]

Yudong Zhang, Zhengchao Dong, Preetha Phillips, Shuihua Wang, Genlin Ji, Jiquan Yang, and Ti-Fei Yuan. 2015. Detection of subjects and brain regions related to Alzheimer's disease using 3D MRI scans based on eigenbrain and machine learning. Frontiers in Computational Neuroscience, Vol. 9 (2015), 66.

[27]

Xin Zhao, Liufang Sang, Guiguang Ding, Jungong Han, Na Di, and Chenggang Yan. 2019. Recurrent attention model for pedestrian attribute recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 9275--9282.

Digital Library

[28]

Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen. 2013. Discriminative brain effective connectivity analysis for Alzheimer's disease: a kernel learning approach upon sparse Gaussian Bayesian network. In Proc. 2013 IEEE Conf. Computer Vision and Pattern Recognition (CVPR'13). 2243--2250.

Digital Library

[29]

Zhen Zhou, Yan Huang, Wei Wang, Liang Wang, and Tieniu Tan. 2017. See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR'17). 6776--6785.

[30]

Lei Zhu, Zijun Deng, Xiaowei Hu, Chi-Wing Fu, Xuemiao Xu, Jing Qin, and Pheng-Ann Heng. 2018. Bidirectional feature pyramid network with recurrent attention residual modules for shadow detection. In Proceedings of the European Conference on Computer Vision (ECCV). 121--136.

Digital Library

Cited By

Zhu ZTong HWang YLi Y(2023)BL-GAN: Semi-Supervised Bug Localization via Generative Adversarial NetworkIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322532935:11(11112-11125)Online publication date: 1-Nov-2023
https://doi.org/10.1109/TKDE.2022.3225329
Hao QTao YCao JTang MCheng YZhou DNing YBao CCui H(2021)Retina-like Imaging and Its Applications: A Brief ReviewApplied Sciences10.3390/app1115705811:15(7058)Online publication date: 30-Jul-2021
https://doi.org/10.3390/app11157058
Chang HChen HShen YShuai HCheng W(2021)Re-Attention Is All You Need: Memory-Efficient Scene Text Detection via Re-Attention on Uncertain Regions2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS51168.2021.9636510(452-459)Online publication date: 27-Sep-2021
https://dl.acm.org/doi/10.1109/IROS51168.2021.9636510

Index Terms

Recurrent Networks for Guided Multi-Attention Classification

Recommendations

SLGAT: Soft Labels Guided Graph Attention Networks
Advances in Knowledge Discovery and Data Mining
Abstract
Graph convolutional neural networks have been widely studied for semi-supervised classification on graph-structured data in recent years. They usually learn node representations by transforming, propagating, aggregating node features and ...
Attentive Recurrent Neural Network for Weak-supervised Multi-label Image Classification
MM '18: Proceedings of the 26th ACM international conference on Multimedia

Multi-label image classification is a fundamental and challenging task in computer vision, and recently achieved significant progress by exploiting semantic relations among labels. However, the spatial positions of labels for multi-labels images are ...
Multi-granularity Recurrent Attention Graph Neural Network for Few-Shot Learning
MultiMedia Modeling
Abstract
Few-shot learning aims to learn a classifier that classifies unseen classes well with limited labeled samples. Existing meta learning-based works, whether graph neural network or other baseline approaches in few-shot learning, has benefited from ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

August 2020

3664 pages

ISBN:9781450379984

DOI:10.1145/3394486

General Chairs:
Rajesh Gupta
UC San Diego, USA
,
Yan Liu
USC, USA
,
Program Chairs:
Mohak Shah
LG Electronics, USA
,
Suju Rajan
Linkedin, USA
,
Publications Chairs:
Jiliang Tang
Michigan State, USA
,
B. Aditya Prakash
Georgia Tech, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

KDD '20

Sponsor:

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

July 6 - 10, 2020

CA, Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
575
Total Downloads

Downloads (Last 12 months)79
Downloads (Last 6 weeks)9

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhu ZTong HWang YLi Y(2023)BL-GAN: Semi-Supervised Bug Localization via Generative Adversarial NetworkIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322532935:11(11112-11125)Online publication date: 1-Nov-2023
https://doi.org/10.1109/TKDE.2022.3225329
Hao QTao YCao JTang MCheng YZhou DNing YBao CCui H(2021)Retina-like Imaging and Its Applications: A Brief ReviewApplied Sciences10.3390/app1115705811:15(7058)Online publication date: 30-Jul-2021
https://doi.org/10.3390/app11157058
Chang HChen HShen YShuai HCheng W(2021)Re-Attention Is All You Need: Memory-Efficient Scene Text Detection via Re-Attention on Uncertain Regions2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS51168.2021.9636510(452-459)Online publication date: 27-Sep-2021
https://dl.acm.org/doi/10.1109/IROS51168.2021.9636510

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents