EP-Transformer: Efficient Context Propagation for Long Document

Xie, Chunmei; Wang, Hang; Chen, Siye; Ma, ShiHan; Huang, Tiantian; Shi, Qiuhui; Huang, Wenkang; Wang, Hongbin

doi:10.1007/978-3-031-44696-2_45

Chunmei Xie¹¹,
Hang Wang¹¹,
Siye Chen¹¹,
ShiHan Ma¹¹,
Tiantian Huang¹¹,
Qiuhui Shi¹¹,
Wenkang Huang¹¹ &
…
Hongbin Wang¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14303))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

1135 Accesses

Abstract

Nowadays, transformers are widely used in NLP applications. However, since the model complexity increases quadratically with sequence length, it is intractable to use transformers on very long documents. In this paper, we propose EP-Transformer, a hierarchical Transformer concentrating on efficient context propagation for the whole long documents. Specifically, two components are designed: similarity-sensitive fusion block and unsupervised contrast learning strategy, the former is dedicated to the multi-level aggregation of representation between segments, and the latter aims to generate more unbiased global features. EP-Transformer not only enables effectively capturing local and global contextual information of long sequence, but also reduces the computational complexity. In order to verify the effectiveness of EP-Transformer on long sequences, we evaluate it on three commonly used classification datasets (IMDB, Hyperpartisan news, and Arxiv) and two QA datasets (WikiHop and TriviaQA). Experimental results demonstrate that EP-Transformer significantly outperforms other baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Extracting Variable-Depth Logical Document Hierarchy from Long Documents: Method, Evaluation, and Application

Article 31 May 2022

Deep structural enhanced network for document clustering

Article 23 September 2022

An in-depth analysis of passage-level label transfer for contextual document ranking

Article 08 December 2023

Notes

References

Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems. vol. 30 (2017)
Google Scholar
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Advances in Neural Information Processing Systems. vol. 32 (2019)
Google Scholar
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Sun, Y., Wang, S., Li, Y., Feng, S., Tian, H., Wu, H., Wang, H.: ERNIE 2.0: a continual pre-training framework for language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 8968–8975 (2020)
Google Scholar
Radford, A., Jeffrey, W., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Google Scholar
Abreu, J., Fred, L., Macêdo, D., Zanchettin, C.: Hierarchical attentional hybrid neural networks for document classification. In: Tetko, I.V., Kůrková, V., Karpov, P., Theis, F. (eds.) ICANN 2019. LNCS, vol. 11731, pp. 396–402. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30493-5_39
Chapter Google Scholar
Wu, C., Wu, F., Qi, T., Huang, Y.: Hi-transformer: hierarchical interactive transformer for efficient and effective long document modeling. arXiv preprint arXiv:2106.01040 (2021)
Zaheer, M., et al.: Transformers for longer sequences big bird. Adv. Neural. Inf. Process. Syst. 33, 17283–17297 (2020)
Google Scholar
Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Maas, A., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150 (2011)
Google Scholar
Kiesel, J., et al.: SemEval-2019 task 4: hyperpartisan news detection. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 829–839 (2019)
Google Scholar
He, J., Wang, L., Liu, L., Feng, J., Hao, W.: Long document classification from local word glimpses via recurrent attention learning. IEEE Access 7, 40707–40718 (2019)
Article Google Scholar
Welbl, J., Stenetorp, P., Riedel, S.: Constructing datasets for multi-hop reading comprehension across documents. Trans. Assoc. Comput. Linguist. 6, 287–302 (2018)
Article Google Scholar
Joshi, M., Choi, E., Weld, D.S., Zettlemoyer, L.: TriviaQA: a large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551 (2017)
Child, R., Gray, S., Radford, A., Sutskever, I.: Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019)
Kitaev, N., Kaiser, Ł., Levskaya, A.: Reformer: the efficient transformer. arXiv preprint arXiv:2001.04451 (2020)
Wang, S., Li, B.Z., Khabsa, M., Fang, H., Ma, H.: Linformer: self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020)
Zhou, H., et al.: Informer: beyond efficient transformer for long sequence time-series forecasting. In: Proceedings of AAAI (2021)
Google Scholar
Chen, T., et al.: $\{$TVM$\}$: An automated $\{$End-to-End$\}$ optimizing compiler for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pp. 578–594 (2018)
Google Scholar
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860 (2019)
Ding, S., et al.: ERNIE-Doc: a retrospective long-document modeling transformer. arXiv preprint arXiv:2012.15688 (2020)
Zhang, X., Wei, F., Zhou, M.: HIBERT: document level pre-training of hierarchical bidirectional transformers for document summarization. arXiv preprint arXiv:1905.06566 (2019)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006). vol. 2, pp. 1735–1742. IEEE (2006)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
Google Scholar
Henderson, M., et al.: Efficient natural language response suggestion for smart reply. arXiv preprint arXiv:1705.00652 (2017)
Clark, C., Gardner, M.: Simple and effective multi-paragraph reading comprehension. arXiv preprint arXiv:1710.10723 (2017)

Download references

Author information

Authors and Affiliations

Ant Group, Hangzhou, China
Chunmei Xie, Hang Wang, Siye Chen, ShiHan Ma, Tiantian Huang, Qiuhui Shi, Wenkang Huang & Hongbin Wang

Authors

Chunmei Xie
View author publications
You can also search for this author in PubMed Google Scholar
Hang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Siye Chen
View author publications
You can also search for this author in PubMed Google Scholar
ShiHan Ma
View author publications
You can also search for this author in PubMed Google Scholar
Tiantian Huang
View author publications
You can also search for this author in PubMed Google Scholar
Qiuhui Shi
View author publications
You can also search for this author in PubMed Google Scholar
Wenkang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hongbin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongbin Wang .

Editor information

Editors and Affiliations

Emory University, Atlanta, GA, USA
Fei Liu
Microsoft Research Asia, Beijing, China
Nan Duan
Soochow University, Suzhou, China
Qingting Xu
Soochow University, Suzhou, China
Yu Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xie, C. et al. (2023). EP-Transformer: Efficient Context Propagation for Long Document. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14303. Springer, Cham. https://doi.org/10.1007/978-3-031-44696-2_45

Download citation

DOI: https://doi.org/10.1007/978-3-031-44696-2_45
Published: 08 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44695-5
Online ISBN: 978-3-031-44696-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

EP-Transformer: Efficient Context Propagation for Long Document

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Extracting Variable-Depth Logical Document Hierarchy from Long Documents: Method, Evaluation, and Application

Deep structural enhanced network for document clustering

An in-depth analysis of passage-level label transfer for contextual document ranking

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

EP-Transformer: Efficient Context Propagation for Long Document

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Extracting Variable-Depth Logical Document Hierarchy from Long Documents: Method, Evaluation, and Application

Deep structural enhanced network for document clustering

An in-depth analysis of passage-level label transfer for contextual document ranking

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation