CARU: A Content-Adaptive Recurrent Unit for the Transition of Hidden State in NLP

Chan, Ka-Hou; Ke, Wei; Im, Sio-Kei

doi:10.1007/978-3-030-63830-6_58

Ka-Hou Chan^14,15,
Wei Ke^14,15 &
Sio-Kei Im¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12532))

Included in the following conference series:

International Conference on Neural Information Processing

2456 Accesses
18 Citations
3 Altmetric

Abstract

This article introduces a novel RNN unit inspired by GRU, namely the Content-Adaptive Recurrent Unit (CARU). The design of CARU contains all the features of GRU but requires fewer training parameters. We make use of the concept of weights in our design to analyze the transition of hidden states. At the same time, we also describe how the content adaptive gate handles the received words and alleviates the long-term dependence problem. As a result, the unit can improve the accuracy of the experiments, and the results show that CARU not only has better performance than GRU, but also produces faster training. Moreover, the proposed unit is general and can be applied to all RNN related neural network models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Label-Dependencies Aware Recurrent Neural Networks

New Recurrent Neural Network Variants for Sequence Labeling

Bidirectional internal memory gate recurrent neural networks for spoken language understanding

Article 12 May 2020

Notes

1.
For the complete hidden state, (1) should include the bias parameter as $h^{\left( t+1\right) }_{n\times 1} = \varvec{w_{n\times n}}h^{\left( t\right) }_{n\times 1} + \varvec{b_{n\times 1}} + \varvec{w_{n\times m}} v^{\left( t\right) }_{m\times 1} + \varvec{b_{n\times 1}}$, and followed by a non-linear activation function $\tanh \left( h^{\left( t+1\right) }\right) $ that is to prevent divergence during training. To facilitate derivation, we ignore them in this section.

References

Bianchini, M., Scarselli, F.: On the complexity of neural network classifiers: a comparison between shallow and deep architectures. IEEE Trans. Neural Netw. Learn. Syst. 25(8), 1553–1565 (2014)
Article Google Scholar
Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. In: SSST@EMNLP, pp. 103–111. Association for Computational Linguistics (2014)
Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734. ACL (2014)
Google Scholar
Collins, J., Sohl-Dickstein, J., Sussillo, D.: Capacity and trainability in recurrent neural networks. In: ICLR (Poster). OpenReview.net (2017)
Google Scholar
Elliott, D., Frank, S., Sima’an, K., Specia, L.: Multi30k: multilingual English-German image descriptions. In: VL@ACL. The Association for Computer Linguistics (2016)
Google Scholar
Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)
Article Google Scholar
François, T., Miltsakaki, E.: Do NLP and machine learning improve traditional readability formulas? In: PITR@NAACL-HLT, pp. 49–57. Association for Computational Linguistics (2012)
Google Scholar
Gers, F.A., Schmidhuber, J.: LSTM recurrent networks learn simple context-free and context-sensitive languages. IEEE Trans. Neural Netw. 12(6), 1333–1340 (2001)
Article Google Scholar
Gers, F.A., Schmidhuber, J., Cummins, F.A.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)
Article Google Scholar
Heck, J.C., Salem, F.M.: Simplified minimal gated unit variations for recurrent neural networks. In: MWSCAS, pp. 1593–1596. IEEE (2017)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: EMNLP, pp. 1700–1709. ACL (2013)
Google Scholar
Kim, S., Seo, H., Rim, H.: Information retrieval using word senses: root sense tagging approach. In: SIGIR, pp. 258–265. ACM (2004)
Google Scholar
Lopez, M.M., Kalita, J.: Deep learning applied to NLP. CoRR abs/1703.03091 (2017)
Google Scholar
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: ACL, pp. 142–150. The Association for Computer Linguistics (2011)
Google Scholar
Mikolov, T., Kombrink, S., Burget, L., Cernocký, J., Khudanpur, S.: Extensions of recurrent neural network language model. In: ICASSP, pp. 5528–5531. IEEE (2011)
Google Scholar
Nguyen, D.Q., Dras, M., Johnson, M.: A novel neural network model for joint POS tagging and graph-based dependency parsing. In: CoNLL Shared Task (2), pp. 134–142. Association for Computational Linguistics (2017)
Google Scholar
Paszke, A., et al.: Automatic differentiation in Pytorch (2017)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, pp. 1532–1543. ACL (2014)
Google Scholar
Poria, S., Cambria, E., Gelbukh, A.F.: Aspect extraction for opinion mining with a deep convolutional neural network. Knowl. Based Syst. 108, 42–49 (2016)
Article Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986). https://doi.org/10.1038/323533a0
Article MATH Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS, pp. 3104–3112 (2014)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR, pp. 1–9. IEEE Computer Society (2015)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Google Scholar
Wang, P., Qian, Y., Soong, F.K., He, L., Zhao, H.: A unified tagging solution: bidirectional LSTM recurrent neural network with word embedding. CoRR abs/1511.00215 (2015)
Google Scholar
Werbos, P.J.: Generalization of backpropagation with application to a recurrent gas market model. Neural Netw. 1(4), 339–356 (1988)
Article Google Scholar
Zhang, L., Wang, S., Liu, B.: Deep learning for sentiment analysis: a survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8(4) (2018)
Google Scholar
Zhang, Y., Clark, S.: Joint word segmentation and POS tagging using a single perceptron. In: ACL, pp. 888–896. The Association for Computer Linguistics (2008)
Google Scholar
Zhou, G.-B., Wu, J., Zhang, C.-L., Zhou, Z.-H.: Minimal gated unit for recurrent neural networks. Int. J. Autom. Comput. 13(3), 226–234 (2016). https://doi.org/10.1007/s11633-016-1006-2
Article Google Scholar

Download references

Acknowledgment

The article is part of the research project funded by The Science and Technology Development Fund, Macau SAR (File no. 0001/2018/AFJ).

Author information

Authors and Affiliations

School of Applied Sciences, Macao Polytechnic Institute, Macao, China
Ka-Hou Chan & Wei Ke
Macao Polytechnic Institute, Macao, China
Ka-Hou Chan, Wei Ke & Sio-Kei Im

Authors

Ka-Hou Chan
View author publications
You can also search for this author in PubMed Google Scholar
Wei Ke
View author publications
You can also search for this author in PubMed Google Scholar
Sio-Kei Im
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ka-Hou Chan .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chan, KH., Ke, W., Im, SK. (2020). CARU: A Content-Adaptive Recurrent Unit for the Transition of Hidden State in NLP. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12532. Springer, Cham. https://doi.org/10.1007/978-3-030-63830-6_58

Download citation

DOI: https://doi.org/10.1007/978-3-030-63830-6_58
Published: 19 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63829-0
Online ISBN: 978-3-030-63830-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CARU: A Content-Adaptive Recurrent Unit for the Transition of Hidden State in NLP

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Label-Dependencies Aware Recurrent Neural Networks

New Recurrent Neural Network Variants for Sequence Labeling

Bidirectional internal memory gate recurrent neural networks for spoken language understanding

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

CARU: A Content-Adaptive Recurrent Unit for the Transition of Hidden State in NLP

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Label-Dependencies Aware Recurrent Neural Networks

New Recurrent Neural Network Variants for Sequence Labeling

Bidirectional internal memory gate recurrent neural networks for spoken language understanding

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation