research-article

A Concise Review of Long Context in Large Language Models

Authors:

Penghui ShangAuthors Info & Claims

ASENS '24: Proceedings of the International Conference on Algorithms, Software Engineering, and Network Security

Pages 563 - 566

https://doi.org/10.1145/3677182.3677282

Published: 03 August 2024 Publication History

Abstract

Sincerely in part to the rise of high-performance computer systems and transformer models, natural language processing has advanced. Also, a multitude of applications built on large language models continually improve people's cognitive abilities. Large language models continue to face difficulties when dealing with long context input. Many studies have suggested various specific strategies to address the challenge of extended context, however as of yet, no thorough summary of these studies exists. In this paper, we discuss the issues raised and the developments that have occurred in the long context application of large language models, and we attempt to suggest future directions for research and development.

References

[1]

S. Chen, S. Wong, L. Chen, and Y. Tian, "Extending context window of large language models via positional interpolation," arXiv preprint arXiv:2306.15595, 2023.

[2]

W. Xiong, J. Liu, I. Molybog, H. Zhang, P. Bhargava, R. Hou, "Effective long-context scaling of foundation models," arXiv preprint arXiv:2309.16039, 2023.

[3]

S. Tworkowski, K. Staniszewski, M. Pacek, Y. Wu, H. Michalewski, and P. Miłoś, "Focused transformer: Contrastive training for context scaling," arXiv preprint arXiv:2307.03170, 2023.

[4]

W. Wang, L. Dong, H. Cheng, X. Liu, X. Yan, J. Gao, "Augmenting Language Models with Long-Term Memory," arXiv preprint arXiv:2306.07174, 2023.

[5]

O. Press, N. A. Smith, and M. Lewis, "Train short, test long: Attention with linear biases enables input length extrapolation," arXiv preprint arXiv:2108.12409, 2021.

[6]

J. Su, Y. Lu, S. Pan, A. Murtadha, B. Wen, and Y. Liu, "Roformer: Enhanced transformer with rotary position embedding," arXiv preprint arXiv:2104.09864, 2021.

[7]

Y. Sun, L. Dong, B. Patra, S. Ma, S. Huang, A. Benhaim, "A length-extrapolatable transformer," arXiv preprint arXiv:2212.10554, 2022.

[8]

bloc, "Add NTK-Aware interpolation "by parts" correction," ed, 2023, p. https://github.com/jquesnelle/yarn/pull/1.

[9]

emozilla, "Dynamically Scaled RoPE further increases performance of long context LLaMA with zero fine-tuning," ed, 2023, p. https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases.

[10]

Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. V. Le, and R. Salakhutdinov, "Transformer-xl: Attentive language models beyond a fixed-length context," arXiv preprint arXiv:1901.02860, 2019.

[11]

R. Child, S. Gray, A. Radford, and I. Sutskever, "Generating long sequences with sparse transformers. arXiv 2019," arXiv preprint arXiv:1904.10509.

[12]

I. Beltagy, M. E. Peters, and A. Cohan, "Longformer: The long-document transformer," arXiv preprint arXiv:2004.05150, 2020.

[13]

Y. Wu, M. N. Rabe, D. Hutchins, and C. Szegedy, "Memorizing transformers," arXiv preprint arXiv:2203.08913, 2022.

[14]

N. Shazeer, "Fast transformer decoding: One write-head is all you need," arXiv preprint arXiv:1911.02150, 2019.

[15]

J. Ainslie, J. Lee-Thorp, M. de Jong, Y. Zemlyanskiy, F. Lebrón, and S. Sanghai, "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints," arXiv preprint arXiv:2305.13245, 2023.

[16]

Y. Chen, S. Qian, H. Tang, X. Lai, Z. Liu, S. Han, "Longlora: Efficient fine-tuning of long-context large language models," arXiv preprint arXiv:2309.12307, 2023.

[17]

M. Poli, S. Massaroli, E. Nguyen, D. Y. Fu, T. Dao, S. Baccus, "Hyena hierarchy: Towards larger convolutional language models," arXiv preprint arXiv:2302.10866, 2023.

[18]

B. Peng, E. Alcaide, Q. Anthony, A. Albalak, S. Arcadinho, H. Cao, "RWKV: Reinventing RNNs for the Transformer Era," arXiv preprint arXiv:2305.13048, 2023.

[19]

Y. Bai, X. Lv, J. Zhang, H. Lyu, J. Tang, Z. Huang, "LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding," arXiv preprint arXiv:2308.14508, 2023.

[20]

A. Pal, D. Karkhanis, M. Roberts, S. Dooley, A. Sundararajan, and S. Naidu, "Giraffe: Adventures in expanding context lengths in llms," arXiv preprint arXiv:2308.10882, 2023.

[21]

J. W. Rae, A. Potapenko, S. M. Jayakumar, and T. P. Lillicrap, "Compressive transformers for long-range sequence modelling," arXiv preprint arXiv:1911.05507, 2019.

[22]

K. Krishna, E. Bransom, B. Kuehl, M. Iyyer, P. Dasigi, A. Cohan, "LongEval: Guidelines for human evaluation of faithfulness in long-form summarization," arXiv preprint arXiv:2301.13298, 2023.

Index Terms

A Concise Review of Long Context in Large Language Models
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation

Recommendations

A Declarative Language for Context Activation
COP '18: Proceedings of the 10th ACM International Workshop on Context-Oriented Programming: Advanced Modularity for Run-time Composition

Context-oriented Programming proposes a language-level technique to enable dynamic adaptations by the activation of contextual situations sensed from the environment. Context activation triggers the dynamic composition of behavioral adaptations with the ...
Half-context language models

This article investigates the effects of different degrees of contextual granularity on language model performance. It presents a new language model that combines clustering and half-contextualization, a novel representation of contexts. Half-...
Context script language and processor for context-awareness in ubiquitous intelligent environment
UIC'07: Proceedings of the 4th international conference on Ubiquitous Intelligence and Computing

In this paper, we propose a new context script language which can be used for building context-aware systems in an effective manner. The proposed context script language can represent both various decisions on contextawareness and appropriate procedures ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ASENS '24: Proceedings of the International Conference on Algorithms, Software Engineering, and Network Security

April 2024

759 pages

ISBN:9798400709784

DOI:10.1145/3677182

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 August 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

ASENS 2024

ASENS 2024: International Conference on Algorithms, Software Engineering, and Network Security

April 26 - 28, 2024

Nanchang, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
39
Total Downloads

Downloads (Last 12 months)39
Downloads (Last 6 weeks)10

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten