Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3677182.3677282acmotherconferencesArticle/Chapter ViewAbstractPublication PagesasensConference Proceedingsconference-collections
research-article

A Concise Review of Long Context in Large Language Models

Published: 03 August 2024 Publication History

Abstract

Sincerely in part to the rise of high-performance computer systems and transformer models, natural language processing has advanced. Also, a multitude of applications built on large language models continually improve people's cognitive abilities. Large language models continue to face difficulties when dealing with long context input. Many studies have suggested various specific strategies to address the challenge of extended context, however as of yet, no thorough summary of these studies exists. In this paper, we discuss the issues raised and the developments that have occurred in the long context application of large language models, and we attempt to suggest future directions for research and development.

References

[1]
S. Chen, S. Wong, L. Chen, and Y. Tian, "Extending context window of large language models via positional interpolation," arXiv preprint arXiv:2306.15595, 2023.
[2]
W. Xiong, J. Liu, I. Molybog, H. Zhang, P. Bhargava, R. Hou, "Effective long-context scaling of foundation models," arXiv preprint arXiv:2309.16039, 2023.
[3]
S. Tworkowski, K. Staniszewski, M. Pacek, Y. Wu, H. Michalewski, and P. Miłoś, "Focused transformer: Contrastive training for context scaling," arXiv preprint arXiv:2307.03170, 2023.
[4]
W. Wang, L. Dong, H. Cheng, X. Liu, X. Yan, J. Gao, "Augmenting Language Models with Long-Term Memory," arXiv preprint arXiv:2306.07174, 2023.
[5]
O. Press, N. A. Smith, and M. Lewis, "Train short, test long: Attention with linear biases enables input length extrapolation," arXiv preprint arXiv:2108.12409, 2021.
[6]
J. Su, Y. Lu, S. Pan, A. Murtadha, B. Wen, and Y. Liu, "Roformer: Enhanced transformer with rotary position embedding," arXiv preprint arXiv:2104.09864, 2021.
[7]
Y. Sun, L. Dong, B. Patra, S. Ma, S. Huang, A. Benhaim, "A length-extrapolatable transformer," arXiv preprint arXiv:2212.10554, 2022.
[8]
bloc, "Add NTK-Aware interpolation "by parts" correction," ed, 2023, p. https://github.com/jquesnelle/yarn/pull/1.
[9]
emozilla, "Dynamically Scaled RoPE further increases performance of long context LLaMA with zero fine-tuning," ed, 2023, p. https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases.
[10]
Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. V. Le, and R. Salakhutdinov, "Transformer-xl: Attentive language models beyond a fixed-length context," arXiv preprint arXiv:1901.02860, 2019.
[11]
R. Child, S. Gray, A. Radford, and I. Sutskever, "Generating long sequences with sparse transformers. arXiv 2019," arXiv preprint arXiv:1904.10509.
[12]
I. Beltagy, M. E. Peters, and A. Cohan, "Longformer: The long-document transformer," arXiv preprint arXiv:2004.05150, 2020.
[13]
Y. Wu, M. N. Rabe, D. Hutchins, and C. Szegedy, "Memorizing transformers," arXiv preprint arXiv:2203.08913, 2022.
[14]
N. Shazeer, "Fast transformer decoding: One write-head is all you need," arXiv preprint arXiv:1911.02150, 2019.
[15]
J. Ainslie, J. Lee-Thorp, M. de Jong, Y. Zemlyanskiy, F. Lebrón, and S. Sanghai, "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints," arXiv preprint arXiv:2305.13245, 2023.
[16]
Y. Chen, S. Qian, H. Tang, X. Lai, Z. Liu, S. Han, "Longlora: Efficient fine-tuning of long-context large language models," arXiv preprint arXiv:2309.12307, 2023.
[17]
M. Poli, S. Massaroli, E. Nguyen, D. Y. Fu, T. Dao, S. Baccus, "Hyena hierarchy: Towards larger convolutional language models," arXiv preprint arXiv:2302.10866, 2023.
[18]
B. Peng, E. Alcaide, Q. Anthony, A. Albalak, S. Arcadinho, H. Cao, "RWKV: Reinventing RNNs for the Transformer Era," arXiv preprint arXiv:2305.13048, 2023.
[19]
Y. Bai, X. Lv, J. Zhang, H. Lyu, J. Tang, Z. Huang, "LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding," arXiv preprint arXiv:2308.14508, 2023.
[20]
A. Pal, D. Karkhanis, M. Roberts, S. Dooley, A. Sundararajan, and S. Naidu, "Giraffe: Adventures in expanding context lengths in llms," arXiv preprint arXiv:2308.10882, 2023.
[21]
J. W. Rae, A. Potapenko, S. M. Jayakumar, and T. P. Lillicrap, "Compressive transformers for long-range sequence modelling," arXiv preprint arXiv:1911.05507, 2019.
[22]
K. Krishna, E. Bransom, B. Kuehl, M. Iyyer, P. Dasigi, A. Cohan, "LongEval: Guidelines for human evaluation of faithfulness in long-form summarization," arXiv preprint arXiv:2301.13298, 2023.

Index Terms

  1. A Concise Review of Long Context in Large Language Models

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ASENS '24: Proceedings of the International Conference on Algorithms, Software Engineering, and Network Security
    April 2024
    759 pages
    ISBN:9798400709784
    DOI:10.1145/3677182
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 August 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ASENS 2024

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 39
      Total Downloads
    • Downloads (Last 12 months)39
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 25 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media