research-article

Entangled Representation Learning: A Bidirectional Encoder Decoder Model

Authors:

Zuoquan LinAuthors Info & Claims

ACAI '22: Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence

Article No.: 70, Pages 1 - 7

https://doi.org/10.1145/3579654.3579728

Published: 14 March 2023 Publication History

Abstract

Encoder-decoder model encodes input sentences to hidden representations and decodes the representations to the output in unidirectional way. We introduce a bidirectional encoder decoder model that adds a reverse decoder-encoder for the feedback from the output to the input. We implement the bidirectional encoder decoder model based on the transformer. The decoder-encoder attention mechanism enhances the representations via the interaction between encoder and decoder. Experiment results demonstrate the benefit of bidirectional model. And the visualization results of bidirectional representations reveal that our model learn an entangled representation of both encoder and decoder.

References

[1]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1409.0473

[2]

Jen-Tzung Chien and Wei-Hsiang Chang. 2021. Dualformer: A Unified Bidirectional Sequence-to-Sequence Learning. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021. IEEE, 7718–7722. https://doi.org/10.1109/ICASSP39728.2021.9413402

[3]

Stéphane Clinchant, Kweon Woo Jung, and Vassilina Nikoulina. 2019. On the use of BERT for Neural Machine Translation. CoRR abs/1909.12744(2019). arXiv:1909.12744http://arxiv.org/abs/1909.12744

[4]

Alexis Conneau, Douwe Kiela, Holger Schwenk, Loïc Barrault, and Antoine Bordes. 2017. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. CoRR abs/1705.02364(2017). arXiv:1705.02364http://arxiv.org/abs/1705.02364

[5]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. CoRR abs/2010.11929(2020). arXiv:2010.11929https://arxiv.org/abs/2010.11929

[6]

Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. 2018. Understanding Back-Translation at Scale. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). Association for Computational Linguistics, 489–500. https://doi.org/10.18653/v1/d18-1045

[7]

Zhihao Fan, Yeyun Gong, Dayiheng Liu, Zhongyu Wei, Siyuan Wang, Jian Jiao, Nan Duan, Ruofei Zhang, and Xuanjing Huang. 2021. Mask Attention Neorks: Rethinking and Strengthen Transformer. CoRR abs/2103.13597(2021). arXiv:2103.13597https://arxiv.org/abs/2103.13597

[8]

Miguel Graça, Yunsu Kim, Julian Schamper, Shahram Khadivi, and Hermann Ney. 2019. Generalizing Back-Translation in Neural Machine Translation. CoRR abs/1906.07286(2019). arXiv:1906.07286http://arxiv.org/abs/1906.07286

[9]

Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-Ying Ma. 2016. Dual Learning for Machine Translation. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 820–828. https://proceedings.neurips.cc/paper/2016/hash/5b69b9cb83065d403869739ae7f0995e-Abstract.html

[10]

Cong Duy Vu Hoang, Philipp Koehn, Gholamreza Haffari, and Trevor Cohn. 2018. Iterative Back-Translation for Neural Machine Translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, NMT@ACL 2018, Melbourne, Australia, July 20, 2018, Alexandra Birch, Andrew M. Finch, Minh-Thang Luong, Graham Neubig, and Yusuke Oda (Eds.). Association for Computational Linguistics, 18–24. https://doi.org/10.18653/v1/w18-2703

[11]

Andrej Karpathy and Li Fei-Fei. 2014. Deep Visual-Semantic Alignments for Generating Image Descriptions. CoRR abs/1412.2306(2014). arXiv:1412.2306http://arxiv.org/abs/1412.2306

[12]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980

[13]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2019. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. CoRR abs/1910.13461(2019). arXiv:1910.13461http://arxiv.org/abs/1910.13461

[14]

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. CoRR abs/1405.0312(2014). arXiv:1405.0312http://arxiv.org/abs/1405.0312

[15]

Ron Mokady, Amir Hertz, and Amit H. Bermano. 2021. ClipCap: CLIP Prefix for Image Captioning. CoRR abs/2111.09734(2021). arXiv:2111.09734https://arxiv.org/abs/2111.09734

[16]

Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A Fast, Extensible Toolkit for Sequence Modeling. CoRR abs/1904.01038(2019). arXiv:1904.01038http://arxiv.org/abs/1904.01038

[17]

Myle Ott, Sergey Edunov, David Grangier, and Michael Auli. 2018. Scaling Neural Machine Translation. CoRR abs/1806.00187(2018). arXiv:1806.00187http://arxiv.org/abs/1806.00187

[18]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (Philadelphia, Pennsylvania) (ACL ’02). Association for Computational Linguistics, USA, 311–318. https://doi.org/10.3115/1073083.1073135

Digital Library

[19]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8748–8763. https://proceedings.mlr.press/v139/radford21a.html

[20]

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners.

[21]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Edinburgh Neural Machine Translation Systems for WMT 16. In Proceedings of the First Conference on Machine Translation, WMT 2016, colocated with ACL 2016, August 11-12, Berlin, Germany. The Association for Computer Linguistics, 371–376. https://doi.org/10.18653/v1/w16-2323

[22]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving Neural Machine Translation Models with Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics. https://doi.org/10.18653/v1/p16-1009

[23]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics. https://doi.org/10.18653/v1/p16-1162

[24]

Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2019. MASS: Masked Sequence to Sequence Pre-training for Language Generation. CoRR abs/1905.02450(2019). arXiv:1905.02450http://arxiv.org/abs/1905.02450

[25]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. CoRR abs/1409.3215(2014). arXiv:1409.3215http://arxiv.org/abs/1409.3215

[26]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015. Rethinking the Inception Architecture for Computer Vision. CoRR abs/1512.00567(2015). arXiv:1512.00567http://arxiv.org/abs/1512.00567

[27]

Sho Takase and Shun Kiyono. 2021. Lessons on Parameter Sharing across Layers in Transformers. CoRR abs/2104.06022(2021). arXiv:2104.06022https://arxiv.org/abs/2104.06022

[28]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research 9, 86 (2008), 2579–2605. http://jmlr.org/papers/v9/vandermaaten08a.html

[29]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998–6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

Digital Library

[30]

Jesse Vig and Yonatan Belinkov. 2019. Analyzing the Structure of Attention in a Transformer Language Model. CoRR abs/1906.04284(2019). arXiv:1906.04284http://arxiv.org/abs/1906.04284

[31]

Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Lina Maria Rojas-Barahona, Pei-Hao Su, Stefan Ultes, David Vandyke, and Steve J. Young. 2016. A Network-based End-to-End Trainable Task-oriented Dialogue System. CoRR abs/1604.04562(2016). arXiv:1604.04562http://arxiv.org/abs/1604.04562

[32]

Yingce Xia, Tianyu He, Xu Tan, Fei Tian, Di He, and Tao Qin. 2019. Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019. AAAI Press, 5466–5473. https://doi.org/10.1609/aaai.v33i01.33015466

Digital Library

[33]

Yingce Xia, Tao Qin, Wei Chen, Jiang Bian, Nenghai Yu, and Tie-Yan Liu. 2017. Dual Supervised Learning. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017(Proceedings of Machine Learning Research, Vol. 70), Doina Precupand Yee Whye Teh (Eds.). PMLR, 3789–3798. http://proceedings.mlr.press/v70/xia17a.html

[34]

Yingce Xia, Xu Tan, Fei Tian, Tao Qin, Nenghai Yu, and Tie-Yan Liu. 2018. Model-Level Dual Learning. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018(Proceedings of Machine Learning Research, Vol. 80), Jennifer G. Dy and Andreas Krause (Eds.). PMLR, 5379–5388. http://proceedings.mlr.press/v80/xia18a.html

[35]

Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, and Bill Dolan. 2019. DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation. CoRR abs/1911.00536(2019). arXiv:1911.00536http://arxiv.org/abs/1911.00536

[36]

Zhirui Zhang, Shujie Liu, Mu Li, Ming Zhou, and Enhong Chen. 2018. Joint Training for Neural Machine Translation Models with Monolingual Data. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 555–562. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16336

[37]

Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang Li, and Tie-Yan Liu. 2020. Incorporating BERT into Neural Machine Translation. CoRR abs/2002.06823(2020). arXiv:2002.06823https://arxiv.org/abs/2002.06823

Index Terms

Entangled Representation Learning: A Bidirectional Encoder Decoder Model
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation
      2. Natural language generation

Recommendations

Entangled Bidirectional Encoder to Autoregressive Decoder for Sequential Recommendation
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Recently, BERT has shown overwhelming performance in sequential recommendation by using a bidirectional attention mechanism. Although the bidirectional model effectively captures dynamics from user interaction, its training strategy does not fit well to ...
Visually Lossless JPEG 2000 Decoder
DCC '13: Proceedings of the 2013 Data Compression Conference

Visually lossless coding is a method through which an image is coded with numerical losses that are not noticeable by visual inspection. Contrary to numerically lossless coding, visually lossless coding can achieve high compression ratios. In general, ...
Low-power video encoder/decoder using wavelet/TSVQ with conditional replenishment
ICASSP '96: Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 06

This paper presents a low-power, low-complexity video encoder/decoder algorithm and architecture for portable video-conferencing applications. We encode each frame using wavelet decomposition followed by tree-structured vector quantization (TSVQ) of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACAI '22: Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence

December 2022

770 pages

ISBN:9781450398336

DOI:10.1145/3579654

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 March 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Natural Science Foundation of China

Conference

ACAI 2022

ACAI 2022: 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence

December 23 - 25, 2022

Sanya, China

Acceptance Rates

Overall Acceptance Rate 173 of 395 submissions, 44%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
59
Total Downloads

Downloads (Last 12 months)46
Downloads (Last 6 weeks)2

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents