Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

EP-Transformer: Efficient Context Propagation for Long Document

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2023)

Abstract

Nowadays, transformers are widely used in NLP applications. However, since the model complexity increases quadratically with sequence length, it is intractable to use transformers on very long documents. In this paper, we propose EP-Transformer, a hierarchical Transformer concentrating on efficient context propagation for the whole long documents. Specifically, two components are designed: similarity-sensitive fusion block and unsupervised contrast learning strategy, the former is dedicated to the multi-level aggregation of representation between segments, and the latter aims to generate more unbiased global features. EP-Transformer not only enables effectively capturing local and global contextual information of long sequence, but also reduces the computational complexity. In order to verify the effectiveness of EP-Transformer on long sequences, we evaluate it on three commonly used classification datasets (IMDB, Hyperpartisan news, and Arxiv) and two QA datasets (WikiHop and TriviaQA). Experimental results demonstrate that EP-Transformer significantly outperforms other baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://zenodo.org/record/1489920#.YmFnzBBBzep.

  2. 2.

    https://github.com/LiqunW/Long-document-dataset.

References

  1. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems. vol. 30 (2017)

    Google Scholar 

  2. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Advances in Neural Information Processing Systems. vol. 32 (2019)

    Google Scholar 

  3. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)

  4. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  5. Sun, Y., Wang, S., Li, Y., Feng, S., Tian, H., Wu, H., Wang, H.: ERNIE 2.0: a continual pre-training framework for language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 8968–8975 (2020)

    Google Scholar 

  6. Radford, A., Jeffrey, W., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)

    Google Scholar 

  7. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)

    Google Scholar 

  8. Abreu, J., Fred, L., Macêdo, D., Zanchettin, C.: Hierarchical attentional hybrid neural networks for document classification. In: Tetko, I.V., Kůrková, V., Karpov, P., Theis, F. (eds.) ICANN 2019. LNCS, vol. 11731, pp. 396–402. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30493-5_39

    Chapter  Google Scholar 

  9. Wu, C., Wu, F., Qi, T., Huang, Y.: Hi-transformer: hierarchical interactive transformer for efficient and effective long document modeling. arXiv preprint arXiv:2106.01040 (2021)

  10. Zaheer, M., et al.: Transformers for longer sequences big bird. Adv. Neural. Inf. Process. Syst. 33, 17283–17297 (2020)

    Google Scholar 

  11. Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020)

  12. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  13. Maas, A., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150 (2011)

    Google Scholar 

  14. Kiesel, J., et al.: SemEval-2019 task 4: hyperpartisan news detection. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 829–839 (2019)

    Google Scholar 

  15. He, J., Wang, L., Liu, L., Feng, J., Hao, W.: Long document classification from local word glimpses via recurrent attention learning. IEEE Access 7, 40707–40718 (2019)

    Article  Google Scholar 

  16. Welbl, J., Stenetorp, P., Riedel, S.: Constructing datasets for multi-hop reading comprehension across documents. Trans. Assoc. Comput. Linguist. 6, 287–302 (2018)

    Article  Google Scholar 

  17. Joshi, M., Choi, E., Weld, D.S., Zettlemoyer, L.: TriviaQA: a large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551 (2017)

  18. Child, R., Gray, S., Radford, A., Sutskever, I.: Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019)

  19. Kitaev, N., Kaiser, Ł., Levskaya, A.: Reformer: the efficient transformer. arXiv preprint arXiv:2001.04451 (2020)

  20. Wang, S., Li, B.Z., Khabsa, M., Fang, H., Ma, H.: Linformer: self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020)

  21. Zhou, H., et al.: Informer: beyond efficient transformer for long sequence time-series forecasting. In: Proceedings of AAAI (2021)

    Google Scholar 

  22. Chen, T., et al.: \(\{\)TVM\(\}\): An automated \(\{\)End-to-End\(\}\) optimizing compiler for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pp. 578–594 (2018)

    Google Scholar 

  23. Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860 (2019)

  24. Ding, S., et al.: ERNIE-Doc: a retrospective long-document modeling transformer. arXiv preprint arXiv:2012.15688 (2020)

  25. Zhang, X., Wei, F., Zhou, M.: HIBERT: document level pre-training of hierarchical bidirectional transformers for document summarization. arXiv preprint arXiv:1905.06566 (2019)

  26. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006). vol. 2, pp. 1735–1742. IEEE (2006)

    Google Scholar 

  27. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)

    Google Scholar 

  28. Henderson, M., et al.: Efficient natural language response suggestion for smart reply. arXiv preprint arXiv:1705.00652 (2017)

  29. Clark, C., Gardner, M.: Simple and effective multi-paragraph reading comprehension. arXiv preprint arXiv:1710.10723 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongbin Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xie, C. et al. (2023). EP-Transformer: Efficient Context Propagation for Long Document. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14303. Springer, Cham. https://doi.org/10.1007/978-3-031-44696-2_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44696-2_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44695-5

  • Online ISBN: 978-3-031-44696-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics