survey

Open access

Knowledge Editing for Large Language Models: A Survey

Authors:

Jundong LiAuthors Info & Claims

ACM Computing Surveys, Volume 57, Issue 3

Article No.: 59, Pages 1 - 37

https://doi.org/10.1145/3698590

Published: 11 November 2024 Publication History

PDF eReader

Abstract

Large Language Models (LLMs) have recently transformed both the academic and industrial landscapes due to their remarkable capacity to understand, analyze, and generate texts based on their vast knowledge and reasoning ability. Nevertheless, one major drawback of LLMs is their substantial computational cost for pre-training due to their unprecedented amounts of parameters. The disadvantage is exacerbated when new knowledge frequently needs to be introduced into the pre-trained model. Therefore, it is imperative to develop effective and efficient techniques to update pre-trained LLMs. Traditional methods encode new knowledge in pre-trained LLMs through direct fine-tuning. However, naively re-training LLMs can be computationally intensive and risks degenerating valuable pre-trained knowledge irrelevant to the update in the model. Recently, Knowledge-based Model Editing (KME), also known as Knowledge Editing or Model Editing, has attracted increasing attention, which aims at precisely modifying the LLMs to incorporate specific knowledge, without negatively influencing other irrelevant knowledge. In this survey, we aim at providing a comprehensive and in-depth overview of recent advances in the field of KME. We first introduce a general formulation of KME to encompass different KME strategies. Afterward, we provide an innovative taxonomy of KME techniques based on how the new knowledge is introduced into pre-trained LLMs, and investigate existing KME strategies while analyzing key insights, advantages, and limitations of methods from each category. Moreover, representative metrics, datasets, and applications of KME are introduced accordingly. Finally, we provide an in-depth analysis regarding the practicality and remaining challenges of KME and suggest promising research directions for further advancement in this field.

1 Introduction

Recently, large language models (LLMs) have become a heated topic that revolutionizes both academia and industry [10, 109, 144, 173]. With the substantial factual knowledge and reasoning ability gained from pre-training on large corpora, LLMs have exhibited an unprecedented understanding of textual information, which are able to analyze and generate texts akin to human experts [84, 87, 135, 138, 176]. Nevertheless, one main drawback of LLMs is the extremely high computational overhead of the training process due to the large amounts of parameters [59, 64, 179]. This is exacerbated by the continuous evolvement of the world where the requirement of updating pre-trained LLMs to rectify obsolete information or incorporate new knowledge to maintain their relevancy is constantly emerging [85, 92, 128, 134]. For example, as in Figure 1, the outdated LLM, GPT-3.5, cannot precisely describe the latest achievements of the famous soccer player Lionel Messi, which requires an explicit injection of new knowledge to generate the correct answers.

Fig. 1.

One feasible while straightforward strategy for updating pre-trained LLMs is through naive fine-tuning [20, 31, 141, 161], where parameters of pre-trained LLMs are directly optimized to encode new knowledge from new data [6, 99, 111, 173]. For example, various instruction-tuning methods are proposed to fine-tune pre-trained LLMs on newly collected data in a supervised learning manner [100, 112, 157, 159]. Although such fine-tuning techniques are widely used and capable of injecting new knowledge into LLMs, they are known for the following disadvantages: (1) Even with parameter-efficient strategies to improve efficiency [89, 158, 170], fine-tuning LLMs may still require intensive computational resources [97, 102, 174]. (2) Fine-tuning LLMs alters the pre-trained parameters without constraints, which can lead to the overfitting problem, where LLMs face the risk of losing valuable existing knowledge [172].

To address the drawbacks of updating LLMs with naive fine-tuning, more attention has been devoted to Knowledge-based Model Editing¹ (KME). In general, KME aims at precisely modifying the behavior of pre-trained LLMs to update specific knowledge, without negatively influencing other pre-trained knowledge irrelevant to the updates [116, 152, 167]. In KME, the update of a specific piece of knowledge in LLMs is typically formulated as an edit, such as rectifying the answer to “Who is the president of the USA?” from “Trump” to “Biden”. Regarding a specific edit, KME strategies typically modify the model output by either introducing an auxiliary network (or set of parameters) into the pre-trained model [52, 79, 175] or updating the (partial) parameters to store the new knowledge [22, 49, 51, 83]. Through these strategies, KME techniques can store new knowledge in new parameters or locate it in model parameters for updating, thereby precisely injecting the knowledge into the model. In addition, certain methods further introduce optimization constraints to ensure that the edited model maintains consistent behaviors on unmodified knowledge [13, 106, 177]. With these advantages, KME techniques can provide an efficient and effective way to constantly update LLMs with novel knowledge without explicit model re-training [172].

While sharing certain similarities with fine-tuning strategies, KME poses unique advantages in updating LLMs, which are worthy of deeper investigations. Particularly, both KME and model fine-tuning seek to update pre-trained LLMs with new knowledge. However, aside from this shared objective, KME focuses more on two crucial properties that cannot be easily addressed by fine-tuning. (1) Locality requires that KME does not unintentionally influence the output of other irrelevant inputs with distinct semantics. For example, when the edit regarding the president of the USA is updated, KME should not alter its knowledge about the prime minister of the UK. The practicality of KME methods largely relies on their ability to maintain the outputs for unrelated inputs, which serves as a major difference between KME and fine-tuning [117]. (2) Generality represents whether the edited model can generalize to a broader range of relevant inputs regarding the edited knowledge. Specifically, it indicates the model’s capability to present consistent behavior on inputs that share semantic similarities. For example, when the model is edited regarding the president, the answer to the query about the leader or the head of government should also change accordingly. In practice, it is important for KME methods to ensure that the edited model can adapt well to such related input texts. To summarize, due to these two unique objectives, KME remains a challenging task that requires specific strategies for satisfactory effectiveness.

Differences between this survey and existing ones. Several surveys have been conducted to examine various aspects of (large) language models [12, 34, 71, 73, 142, 173]. Nevertheless, there is still a dearth of thorough investigations of existing literature and continuous progress in editing LLMs. For example, recent works [100, 159] have discussed the fine-tuning strategies that inject new knowledge in pre-trained LLMs with more data samples. However, the distinctiveness of KME, i.e., locality and generality, is not adequately discussed, which will be thoroughly analyzed in this survey. Two other surveys [35, 63] review knowledge-enhanced language models. However, they mainly focus on leveraging external knowledge to enhance the performance of the pre-trained LLMs, without addressing the editing task based on specific knowledge. To the best of our knowledge, the most related work [167] to our survey provides a brief overview of KME and concisely discusses the advantages of KME methods and their challenges. Nevertheless, the investigation lacks a thorough examination of more details of KME, e.g., categorizations, datasets, and applications. The following work [172] additionally includes experiments with classic KME methods. Another recent work [152] proposes a framework for KME that unifies several representative methods. This work focuses on the implementation of KME techniques, with less emphasis on the technical details of different strategies. A more recent study [116] discusses the limitations of KME methods regarding the faithfulness of edited models, while it is relatively short and lacks a more comprehensive introduction to all existing methods. Considering the rapid advancement of KME techniques, we believe it is imperative to review the details of all representative KME methods, summarize the commonalities while discussing the uniqueness of each method, and discuss open challenges and prospective directions in the domain of KME to facilitate further advancement.

Contributions of this survey. This survey provides a comprehensive and in-depth analysis of techniques, challenges, and opportunities associated with the editing of pre-trained LLMs. We first provide an overview of KME tasks along with an innovative formulation. Particularly, we formulate the general KME task as a constrained optimization problem, which simultaneously incorporates the goals of accuracy, locality, and generality. We then classify the existing KME strategies into three main categories, i.e., external memorization, global optimization, and local modification. More importantly, we demonstrate that methods in each category can be formulated as a specialized constrained optimization problem, where the characteristics are theoretically summarized based on the general formulation. In addition, we provide valuable insights into the effectiveness and feasibility of methods in each category, which can assist practitioners in selecting the most suitable KME method tailored to a specific task. Our analysis regarding the strengths and weaknesses of KME methods also serves as a catalyst for ongoing progress within the KME research community. In concrete, our key contributions can be summarized into three folds as follows:

—

Novel Categorization. We introduce a comprehensive and structured categorization framework to systematically summarize the existing works for LLM editing. Specifically, based on how the new knowledge is introduced into pre-trained LLMs, our categorization encompasses three distinct categories: external memorization, global optimization, and local modification, where their commonality and differences are thoroughly discussed in this survey.

—

In-Depth Analysis. We formulate the task of KME as a constrained optimization problem, where methods from each category can be viewed as a special case with refined constraints. Furthermore, we emphasize the primary insights, advantages, and limitations of each category. Within this context, we delve deep into representative methods from each category and systematically analyze their interconnections.

—

Future Directions. We analyze the practicality of existing KME techniques regarding a variety of datasets and applications. We also comprehensively discuss the challenges of the existing KME techniques and suggest promising research directions for future exploration.

The remainder of this article is organized as follows. Section 2 introduces the background knowledge for KME. Section 3 provides a general formulation of the KME task, which can fit into various application scenarios. Section 4 provides a comprehensive summary of evaluation metrics for KME strategies, which is crucial for a fair comparison across various methods. Before delving into the specific methods, we provide a comprehensive categorization of existing methods into three classes in Section 5.1, where their relationship and differences are thoroughly discussed. Then we introduce the methods from the three categories in detail, where the advantages and limitations of each category are summarized. Section 6 introduces the prevalently used public datasets. Section 7 provides a thorough introduction to various realistic tasks that can benefit from KME techniques. Section 8 discusses the potential challenges of KME that have not been addressed by existing techniques. This section also provides several potential directions that can inspire future research. Lastly, we conclude this survey in Section 9.

2 Background

In this section, we provide an overview of the editing strategies for machine learning models and the basics of LLMs as background knowledge to facilitate the understanding of technical details in KME. In this survey, we use bold uppercase letters (e.g., \(\mathbf {K}\) and \(\mathbf {V}\)) to represent matrices, use lowercase bold letters (e.g., \(\mathbf {k}\) and \(\mathbf {v}\)) to represent vectors, and use calligraphic uppercase letters (e.g., \(\mathcal {X}\) and \(\mathcal {Y}\)) to represent sets. We summarize the primary notations used in this survey in Table 1 for the convenience of understanding.

Table 1.

Notations	Detailed Descriptions
x	Input (prompt) to LLMs
y	Output of LLMs
\((x,y)\)	Input-output pair
\(t=(s,r,o)\)	Original knowledge triple (before editing)
s/r/o	Subject/Relation/Object in a knowledge triple
\(t^=(s,r,o^)\)	Target knowledge triple (after editing)
\(e=(s,r,o\rightarrow o^*)\)	Edit descriptor
\(\mathcal {X}_e\)	In-scope input space
\(\mathcal {Y}_e\)	Original output space (before editing)
\(\mathcal {Y}_e^*\)	Target output space (after editing)
\(\mathcal {E}=\lbrace e_i\rbrace\)	Set of edits
\(\mathcal {O}_e\)	Out-scope input space
\(\mathbf {q}^{(l)}_i\)/\(\mathbf {k}^{(l)}_{i}\)/\(\mathbf {v}^{(l)}_{i}\)	Query/Key/Value vector for the i-th head of the l-th attention module in Transformer
\(\mathbf {W}^{(l)}_1\), \(\mathbf {W}^{(l)}_2\)	Weights of the fully connected layers of the l-th attention module in Transformer
\(\mathbf {h}^{(l)}\)	Output from the l-th self-attention module in Transformer
\(\Vert\)	Vector concatenation

Table 1. Important Notations used in This Survey

2.1 Editing of Machine Learning Models

Machine learning models [41, 54, 74] pre-trained on large datasets frequently serve as foundation models for various tasks in the real-world [26, 126]. In practical scenarios, there is often a need to modify these pre-trained models to enhance the performance for specific downstream tasks [18, 20, 103, 164, 178], reduce biases or undesirable behaviors [39, 104, 113, 123], tailor models to align more closely with human preferences [44, 72, 88], or incorporate novel information [101, 167, 177].

Model Editing is a special type of model modification strategy where the modification should be as precise as possible. Nevertheless, it should accurately modify the pre-trained model to encode specific knowledge while maximally preserving the existing knowledge, without affecting their behavior on unrelated inputs [68]. First explored in the computer vision field, Bau et al. [8] investigate the potential of editing generative adversarial networks (GANs) [45] by viewing an intermediate layer as a linear memory, which can be manipulated to incorporate novel content. Afterward, Editable Training [133] is proposed to encourage fast editing of the trained model in a model-agnostic manner. The goal is to change the model predictions on a subset of inputs corresponding to misclassified objects, without altering the results for other inputs. In [125], the authors propose a method that allows for the modification of a classifier’s behavior by editing its decision rules, which can be used to correct errors or reduce biases in model predictions. In the field of natural language processing, several works [22, 102] have been proposed to perform editing regarding textual information. Specifically, Zhu et al. [177] propose a constrained fine-tuning loss to explicitly modify specific factual knowledge in transformer-based models [146]. More recent works [42, 43] discover that the MLP layers in transformers actually act as key-value memories, thereby enabling the editing of specific knowledge within the corresponding layers.

2.2 Language Models

2.2.1 Transformers.

Transformers lie in the core of LLMs [27, 121, 146]. The fully-fledged transformer possesses an encoder-decoder architecture initially designed for the neural machine translation (NMT) task [137]. Nowadays, transformers have found wide applications in most fields of the NLP community, beyond their original purpose. Generally, a transformer network is constructed from multiple stacks of the self-attention module with residual connections, which is pivotal for capturing contextual information from textual sequences. The self-attention module is composed of a self-attention layer (SelfAtt) and a point-wise feed-forward neural network layer (FFN) formulated as follows:

\begin{equation} \begin{aligned}& \mathbf {h}^{A, (l-1)}_{i} = \operatorname{SelfAtt}_i\left(\mathbf {h}^{(l-1)}_{i}\right) =\operatorname{Softmax}\left(\mathbf {q}^{(l)}_{i} \left(\mathbf {k}^{(l)}_i\right)^\top \right) \mathbf {v}_{i}^{(l)}, \\ & \mathbf {h}^{F, (l-1)} = \operatorname{FFN}\left(\mathbf {h}^{(l-1)}\right) =\operatorname{GELU}\left(\mathbf {h}^{(l-1)} \mathbf {W}^{(l)}_1\right) \mathbf {W}^{(l)}_2, \mathbf {h}^{(0)}=\mathbf {x}, \\ & \mathbf {h}^{(l)} = \mathbf {h}^{A, (l-1)} + \mathbf {h}^{F, (l-1)} = \big \Vert _{i} \operatorname{SelfAtt}_i \left(\mathbf {h}^{(l-1)}_{i} \right) + \operatorname{FFN} \left(\mathbf {h}^{(l-1)} \right), \end{aligned} \end{equation}

(1)

where \(\mathbf {q}^{(l)}_i\), \(\mathbf {k}^{(l)}_{i}\), and \(\mathbf {v}^{(l)}_{i}\) represent the sequences of query, key, and value vectors for the ith attention head of the lth attention module, respectively. GELU is an activation function. They are calculated from \(\mathbf {h}^{(l-1)}_{i}\), the ith slice of the outputs from the \((l-1)\)-th self-attention module (i.e., \(\mathbf {h}^{(l-1)}\)), and \(\mathbf {x}\) denotes the input sequence of token embeddings. \(\Vert\) represents vector concatenation. Normalizing factors in the self-attention layer are omitted for simplicity.

Generally, multi-head self-attention directs the model to attend to different parts of the sequence to predict the next token. Specifically, the prediction is based on different types of relationships and dependencies within the textual data, where the output \(\mathbf {h}^{A, (l-1)}_{i}\) is a weighted sum of the value vector of other tokens. In contrast, FFN adds new information \(\mathbf {h}^{F, (l-1)}_{i}\) to the weighted sum of the embeddings of the attended tokens based on the information stored in the weights of the fully connected layers, i.e., \(\mathbf {W}^{(l)}_1\) and \(\mathbf {W}^{(l)}_2\). The final layer outputs of the transformer, i.e., \(\mathbf {h}^{(L)}\), can be used in various downstream NLP tasks. For token-level tasks (e.g., part-of-speech tagging [19]), the entire hidden representation sequence \(\mathbf {h}^{(L)}\) can be utilized to predict the target sequence. For the sequence-level tasks (e.g., sentiment analysis [160]), the hidden representation of the last token, i.e., \(\mathbf {h}^{(L)}_{-1}\), can be considered as a summary of the sequence and thus used for the predictions.

2.2.2 Large Language Models (LLMs).

Transformers with billions of parameters trained on large corpora have demonstrated emergent ability, showcasing an unprecedented understanding of factual and commonsense knowledge [173]. Consequently, these models are referred to as LLMs to indicate their drastic distinction from traditional small-scale language models [34, 142]. Generally, based on the specific parts of the transformer utilized for language modeling, existing LLMs can be categorized into three classes: encoder-only LLMs, such as BERT [74], encoder-decoder-based LLMs such as T5 [119], and decoder-only models (also the most common structure in LLMs) such as different versions of GPT [118] and LLaMA [144].

2.3 Relevant Topics

KME intersects with several extensively researched topics, yet these techniques cannot effectively address KME-specific challenges [141, 161]. The most relevant approach is model fine-tuning [6, 20, 99], including parameter-efficient fine-tuning [89, 158, 170], which requires fewer parameter updates. However, fine-tuning remains computationally intensive and is often impractical for black-box LLMs [172, 173]. Another related area is machine unlearning [105], which aims at removing the influence of individual samples from models. Unlike KME, which focuses on abstract and generalized knowledge updates, machine unlearning targets the elimination of specific training data, making it unsuitable for KME. On the other hand, external memorization KME methods share similarities with retrieval-augmented generation (RAG) [40], where a large repository of documents is stored and retrieved as needed to provide contextually relevant information for generating responses. While RAG can introduce new knowledge into LLMs by retrieving recently added documents, it does not effectively update the inherent knowledge within LLMs. Thus, RAG is not suitable for the fundamental knowledge updates that KME seeks to achieve.

3 Problem Formulation

In this section, we provide a formal definition for the knowledge-based model editing (KME) task for pre-trained LLMs, where a general formulation of the KME objective is formulated to encompass specific KME strategies. The task of KME for LLMs can be broadly defined as the process of precisely modifying the behavior of pre-trained LLMs, such that new knowledge can be incorporated to maintain the currentness and relevancy of LLMs can be maintained, without negatively influencing other pre-trained knowledge irrelevant to the edits. To provide a clear formulation, we present the definitions of different terms used in KME, where the overall process is illustrated in Figure 2.

Fig. 2.

Editing Target. In this survey, we represent the knowledge required to be injected into LLMs as a knowledge triple \(t = (s,r,o)\), where s is the subject (e.g., president of the USA), r is the relation (e.g., is), and o is the object (e.g., Biden). From the perspective of knowledge triple, the objective of KME for LLMs is to modify the original knowledge triple \(t=(s, r, o)\) encoded in the pre-trained weights of the model into the target knowledge triple \(t^*=(s,r,o^*)\), where \(o^*\) is the target object different from o. In this manner, we can define an edit as a tuple \(e=(t,t^*)=(s,r,o\rightarrow o^*)\), which denotes the update of the obsolete old knowledge t into the new knowledge \(t^{*}\).

Input and Output Space. Given a pair of subject s and relation r, in order to query LLMs to obtain the object o, \((s,r)\) needs to be transformed into natural language, which we denoted as x. x is also referred to as the prompt in this survey. The LLM output y is also textual and can be converted back to an object o as the query result. In this way, \((x,y)\) can be considered as the natural language input-output pair associated with the knowledge triple \(t=(s,r,o)\). For example, the prompt x transformed from s and r can be “The president of the USA is”, and y is the model output “Joe Biden”. Note that due to the diversity of natural language, multiple \((x,y)\) pairs can be associated with the same knowledge triple t. We denote the set of textual inputs associated with subject s and relation r in an edit e as \(\mathcal {X}_e=I(s,r)\), referred to as in-scope input space. Similarly, we define the set of textual outputs that can be associated with the object o in the same edit e as \(\mathcal {Y}_e^*=O^*(s,r,o^*)\) (i.e., target output space), and the original textual output space as \(\mathcal {Y}_e=O(s,r,o)\) (i.e., original output space). Given an edit e, the aim of KME is to modify the behavior of language models from \(\mathcal {Y}_e\) to \(\mathcal {Y}_e^*\), regarding the input in \(\mathcal {X}_e\). To accommodate the scenarios where multiple edits are performed, we can define the union of \(\mathcal {X}_e\) over a set of edits \(\mathcal {E}=\lbrace e_1,e_2,\ldots\,\rbrace\) as \(\mathcal {X}_{\mathcal {E}}=\bigcup _{e\in \mathcal {E}}\mathcal {X}_e\). Similarly, we can define \(\mathcal {Y}_{\mathcal {E}}=\bigcup _{e\in \mathcal {E}}\mathcal {Y}_e\) and \(\mathcal {Y}^*_{\mathcal {E}}=\bigcup _{e\in \mathcal {E}}\mathcal {Y}^*_e\).

Formulation. We denote the pre-trained LLM with parameter \(\phi\) as \(f:\mathcal {X}\rightarrow \mathcal {Y}\) and the edited model with updated parameter \(\phi ^*\) as \(f^*:\mathcal {X}\rightarrow \mathcal {Y}^*\). The objective of knowledge-based model editing is to precisely update the pre-trained LLM f into \(f^{*}\) according to edits in the edit set \(\mathcal {E}\) such that for each edit e and for each \(y \in \mathcal {Y}_{e}\), the changes to the input-output pairs irrelevant to the edits is minimized. The problem of KME can be formulated as follows:

Definition 1.

The objective for KME on a series of edits \(\mathcal {E}\) is represented as follows:

\begin{equation} \begin{aligned}& \min \mathbb {E}_{e \in \mathcal {E}} \mathbb {E}_{x, y^{*} \in \mathcal {X}_e, \mathcal {Y}^*_e} \mathcal {L} (f^*(x), y^{*}), \text{where}\ \ f^*=M(f; \mathcal {E}),\\ &\;\text{s.t.}\;f^*(x)=f(x),\ \ \forall x\in \mathcal {X}\setminus \mathcal {X}_\mathcal {E}, \end{aligned} \end{equation}

(2)

where \(\mathcal {L}\) is a specific loss function that measures the discrepancy between the model output \(f^*(x)\) and \(y^*\) from the desirable response set \(\mathcal {Y}^*_e\). \(M(f;\mathcal {E})\) denotes the modification applied to f based on the desirable edits \(\mathcal {E}\).

From the above definition, we can summarize two crucial perspectives regarding the objective of KME: (1) Generality, which requires that the correct answers in the target output space \(\mathcal {Y}^*_e\) can be achieved, provided prompts in the in-scope input space, i.e., \(\mathcal {X}_e\), where the target knowledge triple \(t^{*} \in e\) can be updated into the pre-trained model; (2) Locality, which requires the consistency of model output regarding unrelated input, i.e., \(\mathcal {X}\setminus \mathcal {X}_\mathcal {E}\), where valuable pre-trained knowledge can be maximally preserved after the editing. Here, we note that locality is especially important for editing LLMs, as the knowledge that needs to be updated often occupies only a small fraction of all knowledge encompassed by the pre-trained model. In other words, the output of an edited model regarding most input prompts should remain consistent with the output before editing.

4 Evaluation Metrics

Before introducing the taxonomy of KME and the exemplar methods in detail, in this section, we first discuss various metrics commonly used to evaluate the effectiveness of different KME strategies from varied perspectives. We summarize the metrics to facilitate the understanding in terms of the properties and advantages of different methods.

4.1 Accuracy

Accuracy is a straightforward metric for evaluating the effectiveness of KME techniques [17, 29, 79, 101, 106, 174, 175], defined as the success rate of editing in terms of a specific set of pre-defined input-output pairs \((x_e,y^*_e)\) associated with all the edited knowledge. Accuracy can be easily defined to evaluate the performance of KME on classification tasks, e.g., fact checking [102, 114], where the answers y are categorical. Defining the prompt and ground truth related to an edit e as \(x_e\) and \(y^*_e\), respectively, the metric of the accuracy of an edited model \(f^*\) is formulated as follows:

\begin{equation} {\bf Acc}(f^*;\mathcal {E})=\mathbb {E}_{e\in \mathcal {E}}\mathbb {1}\lbrace f^*(x_e)= y^*_e\rbrace . \end{equation}

(3)

Since accuracy is defined on a deterministic set of prompt-answer pairs, it provides a fair comparison between KME methods [22, 97, 98]. Nevertheless, it is non-trivial to evaluate the practicality of KME methods with accuracy, as there is no consensus on how to design the \(\mathcal {E}\), especially when the task needs to output a long sequence such as question answering or text generation [29, 97, 98].

4.2 Locality

One crucial metric for the KME strategies is locality [17, 25, 83, 101], which reflects the capability of the edited model \(f^{*}\) to preserve the pre-trained knowledge in f irrelevant to the edits in \(\mathcal {E}\). Note that in most KME applications, the number of required edits makes for an extremely small fraction of the entire knowledge learned and preserved in the pre-trained LLMs [167, 172]. Consequently, the locality measurement is of great importance in assessing the capability of edited models to preserve unrelated knowledge [49, 95, 104]. Given an edit e, the edited model \(f^{*}\), and the original pre-trained model f, the locality of \(f^{*}\) can be defined as the expectation of matched agreement between the edited model and unedited model for out-scope inputs, which can be defined as follows:

\begin{equation} {\bf Loc}(f^{*}, f; e)=\mathbb {E}_{x \notin \mathcal {X}_{e}} \mathbb {1}\lbrace f^*(x)= f(x)\rbrace . \end{equation}

(4)

We can also consider the locality regarding the entire edit set \(\mathcal {E}\), which can be defined as follows:

\begin{equation} {\bf Loc}(f^{*}, f; \mathcal {E})=\mathbb {E}_{x \notin \mathcal {X}_{\mathcal {E}}} \mathbb {1}\lbrace f^*(x)= f(x)\rbrace , \ \ \text{where}\ \ \mathcal {X}_{\mathcal {E}}=\bigcup _{e\in \mathcal {E}}\mathcal {X}_e. \end{equation}

(5)

Although the above metric measures the overall locality of \(f^{*}\) based on all inputs that are not in \(\mathcal {X}_{\mathcal {E}}\), it is difficult to compute in realistic scenarios, as the entire input space can be excessively large or even infinite [167]. Therefore, existing methods generally resort to alternative solutions that pre-define the specific range of out-scope inputs to calculate the locality metric [15, 22, 25, 82, 97]. For example, in SERAC [102], the authors generate hard out-scope examples from the dataset zsRE [78] by selectively sampling from training inputs with high semantic similarity with the edit input, based on embeddings obtained from a pre-trained semantic embedding model. Denoting the out-scope input space related to the input \(\mathcal {X}_{e}\) as \(\mathcal {O}_{e}\), we can similarly define the feasible out-scope input space for multiple edits as \(\mathcal {O}_{\mathcal {E}}=\bigcup _{e\in \mathcal {E}}\mathcal {O}_e\). In this manner, we define a specific metric of locality of \(f^{*}\) regarding \(\mathcal {E}\) as follows:

\begin{equation} {\bf Loc}(f^{*}, f; \mathcal {O}_{e}) = \mathbb {E}_{x \in \mathcal {O}_{e} } \mathbb {1}\lbrace f^*(x)= f(x)\rbrace , \end{equation}

(6)

\begin{equation} {\bf Loc}(f^{*}, f; \mathcal {O}_{\mathcal {E}})=\mathbb {E}_{x \in \mathcal {O}_{\mathcal {E}}} \mathbb {1}\lbrace f^*(x)= f(x)\rbrace , \ \ \text{where}\ \ \mathcal {O}_{\mathcal {E}}=\bigcup _{e\in \mathcal {E}}\mathcal {O}_e. \end{equation}

(7)

4.3 Generality

Aside from locality, another crucial metric is generality, which indicates the capability of the edited model \(f^{*}\) to correctly respond to semantically similar prompts [13, 101, 106, 130, 177]. This requires the generalization of the updated knowledge to other in-scope inputs that do not appear in the training set while conveying similar or related meanings [50, 163]. As such, ensuring the generality of edited models prevents the edited model from overfitting to a particular input [172]. Specifically, in the scenarios of knowledge-based model editing, the inherent diversity of natural language determines that various in-scope inputs x can correspond to a specific knowledge triple t [152]. These semantically equivalent inputs can involve differences in aspects such as syntax, morphology, genre, or even language. Existing works mostly pre-define a specific in-scope input space of each edit via different strategies [61, 86, 136, 166, 168]. For example, in the CounterFact dataset proposed in ROME [97], the authors utilize prompts that involve distinct yet semantically related subjects as the in-scope input. In general, the generality of an edited model \(f^{*}\) is defined as the expectation of exact-match agreement between the output of the edited model and true labels for in-scope inputs, which can be defined on either an edit e or the edit set \(\mathcal {E}\) as

\begin{equation} {\bf Gen}(f^*; e)=\mathbb {E}_{x \in \mathcal {X}_{{e}}} \mathbb {1}\lbrace f^*(x)\in \mathcal {Y}_e^*\rbrace , \end{equation}

(8)

\begin{equation} {\bf Gen}(f^*; \mathcal {E})=\mathbb {E}_{x \in \mathcal {X}_{\mathcal {E}}} \mathbb {1}\lbrace f^*(x)\in \mathcal {Y}_e^*\rbrace , \ \ \text{where}\ \ \mathcal {X}_{\mathcal {E}}=\bigcup _{e\in \mathcal {E}}\mathcal {X}_e. \end{equation}

(9)

4.4 Portability

In addition to generality, another vital metric is portability, which measures the effectiveness of the edited model \(f^{*}\) in transferring a conducted edit to other logically related edits that can be interpreted via reasoning [172]. For example, if an edit is conducted toward the President of the USA, the edit regarding the query “Which political party does the current President of the USA belong to?” should also be achieved. This ensures that the edited model is not limited to responding to specific input formats. In concrete, the transfer of knowledge is crucial for robust generalization of the edited model. In practice, portability can be assessed with logically related edits obtained in different ways [21, 167]. Denoting an edit as \(e=(s,r,o\rightarrow o^*)\), hereby we introduce two common types of logically related edits \(\tilde{e}\). (1) Reversed Relation: \(\tilde{e}=(o\rightarrow o^*, \tilde{r},s)\), where \(\tilde{r}\) is the reversed relation of r, and (2) Neighboring Relation: \(\tilde{e}=(s, r\oplus r_\epsilon , \epsilon \rightarrow \epsilon ^*)\), where both \((o, r_\epsilon , \epsilon)\) and \((o^*, r_\epsilon , \epsilon ^*)\) exist in the pre-trained knowledge, and \(r\oplus r_\epsilon\) is a combined relation from r and \(r_\epsilon\). In this manner, we define portability as the edited model performance on one or multiple logically related edits as follows:

\begin{equation} {\bf Por}(f^*; \tilde{e})=\mathbb {E}_{x \in \mathcal {X}_{\tilde{e}}} \mathbb {1}\lbrace f^*(x)\in \mathcal {Y}_{\tilde{e}}^*\rbrace , \end{equation}

(10)

\begin{equation} {\bf Por}(f^*; \widetilde{\mathcal {E}})=\mathbb {E}_{x \in \mathcal {X}_{\widetilde{\mathcal {E}}}} \mathbb {1}\lbrace f^*(x)\in \mathcal {Y}_{\tilde{e}}^*\rbrace , \ \ \text{where}\ \ \mathcal {X}_{\widetilde{\mathcal {E}}}=\bigcup _{\tilde{e}\in \mathcal {\widetilde{E}}}\mathcal {X}_{\tilde{e}}. \end{equation}

(11)

4.5 Retainability

Retainability characterizes the ability of KME techniques to preserve the desired properties of edited models after multiple consecutive edits [47, 69, 169]. In the presence of ever-evolving information, practitioners may need to frequently update a conversational model (i.e., sequential editing). Such a KME setting requires that the model does not forget previous edits after each new modification [81]. It is essential to distinguish retainability from scalability, which evaluates the model’s ability to handle a vast number of edits [15]. In contrast, retainability assesses the consistent performance of the model after each individual edit, presenting a more challenging objective to achieve. Recently, T-Patcher [66] first explores the sequential setting of KME and observes that many existing approaches significantly fall short in terms of retainability. In SLAG [53], the authors also discover a significant drop in editing performance when multiple beliefs are updated continuously. To assess the retainability of an edited language model \(f^{*}\), we define it as follows:

\begin{equation} \begin{aligned}\mathbf {Ret}(M;\mathcal {E})=\frac{1}{|\mathcal {E}|-1}\sum \limits _{i=1}^{|\mathcal {E}|-1}\mathbf {Acc}(M(f;\lbrace e_1,e_2,\ldots\,, e_{i+1}\rbrace)) - \mathbf {Acc}(M(f;\lbrace e_1,e_2,\ldots\,, e_{i}\rbrace)) \end{aligned} \end{equation}

(12)

where \(\mathbf {Acc}\) is the accuracy measurement, \(|\mathcal {E}|\) is the number of edits in the edit set, and M denotes the editing strategy that modifies the pre-trained model f into \(f^{*}\) with \(i/i+1\) consecutive edits \(\lbrace e_1,e_2,\ldots\,, e_{i}, (e_{i+1})\rbrace\). The retainability metric aims at quantifying the effect of applying consecutive edits to a model and measures how the performance will change the editing strategy M, where a higher retainability means that after each edit, the less the change in the overall performance of the edited model \(f^{*}\) is required.

4.6 Scalability

The scalability of an editing strategy refers to its capability to incorporate a large number of edits simultaneously [15]. Recently, several works have emerged that can inject multiple new knowledge into specific parameters of pre-trained LLMs [168, 172]. For instance, SERAC [102] can perform a maximum of 75 edits. In addition, MEMIT [98] is proposed to enable thousands of edits without significant influence on editing accuracy. When there is a need to edit a model with a vast number of edits concurrently, simply employing the current knowledge-based model editing techniques in a sequential manner is proven ineffective in achieving such scalability [167]. To effectively evaluate the scalability of edited language models, we define the scalability of an edited model as follows:

\begin{equation} \mathbf {Sca}(M;\mathcal {E})=\mathbb {E}_{e\in \mathcal {E}}\mathbf {Acc}(M(f;e)) -\mathbf {Acc}(M(f;\mathcal {E})), \end{equation}

(13)

where \(\mathbf {Acc}(M(f;\mathcal {E}))\) denotes the accuracy of the edited model after conducting all edits in \(\mathcal {E}\), whereas \(\mathbf {Acc}(M(f;e))\) is the accuracy of only performing the edit e. \(\mathbf {Sca}\) demonstrates the model performance and practicality in the presence of multiple edits. Nevertheless, we note that baseline value \(\mathbf {Acc}(M(f;\lbrace e\rbrace))\) is also important in evaluating the scalability of various models. This is because, with higher accuracy for each e, the retainment of such performance after multiple edits is more difficult. Therefore, we further define the relative version of Equation (13) as follows:

\begin{equation} \mathbf {Sca}_{rel}(M;\mathcal {E})=\left(\mathbb {E}_{e\in \mathcal {E}}\mathbf {Acc}(M(f;\lbrace e\rbrace)) -\mathbf {Acc}(M(f;\mathcal {E}))\right)/\mathbb {E}_{e\in \mathcal {E}}\mathbf {Acc}(M(f;\lbrace e\rbrace)). \end{equation}

(14)

The introduced scalability measurement further considers the magnitude of the original accuracy to provide a fairer evaluation.

5 Methodologies

In this section, we introduce existing KME strategies in detail. We first provide an innovative taxonomy of existing KME strategies based on how and where the new knowledge is injected into the pre-trained LLMs, where the advantages and drawbacks are thoroughly discussed. We then introduce various methods from each category, with an emphasis on analyzing the technical details, insights, shortcomings, and their relationships.

5.1 Categorization of KME Methods

Faced with the rapid deprecation of old information and the emergence of new knowledge, various KME methodologies have been proposed to update the pre-trained LLMs to maintain their updatedness and relevancy. KME ensures that new knowledge can be efficiently incorporated into the pre-trained LLMs without negatively influencing the pre-trained knowledge irrelevant to the edit. In this survey, we categorize existing KME methods into three main classes as follows:

—

External Memorization-based methods leverage an external memory to store the new knowledge for editing without modifying the pre-trained weights, where the pre-trained knowledge can be fully preserved in the LLM weights. By storing new knowledge with external parameters, the memory-based strategies enable precise representation of new knowledge with good scalability, as the memory is easily extensible to incorporate new knowledge.

—

Global Optimization-based methods seek to achieve generalizable incorporation of the new knowledge into pre-trained LLMs via optimization with the guidance of new knowledge, where tailored strategies are introduced to limit the influence of other pre-trained knowledge, distinguishing it from naive fine-tuning. Nevertheless, these methods may fall short in editing efficiency when applied to LLMs due to the large number of parameters to be optimized.

—

Local Modification-based methods aim at locating the related parameters of specific knowledge in LLMs and update it accordingly to incorporate the new knowledge relevant to the edit. The main advantage of local modification is the possibility of only updating a small fraction of model parameters, thereby providing considerable memory efficiency compared to memorization-based methods and computational efficiency compared to global optimization.

The above categorization is achieved based on where (e.g., external parameters or internal weights) and how (e.g., via optimization or direct incorporation) new knowledge is introduced into the LLM during editing. Methods in each category exhibit different strengths and weaknesses regarding the four crucial evaluation metrics introduced in Section 4. For example, external memorization prevails in scenarios that require massive editing while the computational resources are limited, as the size of the memory is controllable to fit into different requirements. On the other hand, global optimization is advantageous when practitioners focus more on the generality of edited knowledge, as the optimization can promote the learning of relevant knowledge [2]. The taxonomy is visually illustrated in Figure 3, and a more detailed demonstration of each category is presented in Figure 4.

Fig. 3.

Fig. 4.

5.2 External Memorization

5.2.1 Overview.

The editing approaches via external memorization aim at modifying the current model \(f_\phi\) (with parameter \(\phi\)) via introducing external memory represented by additional trainable parameters \(\omega\) that encodes the new knowledge, resulting in an edited LLM model \(f^*_{\phi , \omega }\). The rationale behind the external memorization strategy is that storing new knowledge in additional parameters is intuitive and straightforward to edit the pre-trained LLMs with good scalability, as the parameter size can be expanded to store more knowledge. In addition, the influence on the pre-trained knowledge can be minimized as this strategy does not alter the original parameters \(\phi\). Based on the general formulation of KME in Equation (2), the objective of external memorization approaches can be formulated as follows:

\begin{equation} \begin{aligned}& \min \mathbb {E}_{e \in \mathcal {E}} \mathbb {E}_{x, y^{*} \in \mathcal {X}_e, \mathcal {Y}^*_e} \mathcal {L} (f^*_{\phi , \omega }(x), y^{*}), \text{where}\ \ f^*_{\phi , \omega }=M(f_\phi , \omega ; \mathcal {E}),\\ &\;\text{s.t.}\;f^*_{\phi , \omega }(x)=f_\phi (x),\ \ \forall x\in \mathcal {X}\setminus \mathcal {X}_\mathcal {E}, \end{aligned} \end{equation}

(15)

where \(f_\phi\) denotes the LLM before editing with the pre-trained parameter \(\phi\), and \(f^*_{\phi , \omega }\) denotes the edited LLM with \(\phi\) and additional parameter \(\omega\) as the external memorization. Moreover, based on whether the introduced parameters are directly incorporated into the model process or not, external memorization strategies can be divided into two categories, i.e., memory-based methods and extension-based methods.

5.2.2 Memory-based Strategies.

In memory-based strategies, the external memory, outside the intrinsic architecture of the pre-trained LLM, functions as a repository to store edited knowledge. Here the edits are generally converted to text via pre-defined templates [154, 174, 175]. The LLM can access and update this memory as required during inference.

One exemplar work is SERAC [102], which stores the edited samples \(x, y^{*} \in \mathcal {X}_{e}, \mathcal {Y}^{*}_{e}\) in a cache without performing modifications on the original model. When presented with a new prompt \(x^{\prime }\), SERAC uses a scope classifier to determine whether the prompt falls within the scope of any cached instances. If yes, the desirable output \(y^{\prime }\) associated with the new prompt \(x^{\prime }\) is predicted via a counterfactual model \(f_c\) which utilizes the most relevant edit example as follows:

\begin{equation} f^*_{\phi ,\omega }(x) =\left\lbrace \begin{array}{ll} f_{\phi }(x), & \text{if}\ x\ \text{is not in scope of any edit},\\ f_c(x,\mathcal {E}), & \text{otherwise}.\\ \end{array}\right. \end{equation}

(16)

SERAC is a gradient-free approach to KME without relying on gradients of the target label \(y^{*}\) w.r.t. the pre-trained model parameters. In addition to using memory as an external repository, the desirable edits can also be stored in the form of human feedback. For example, Language Patch [104] performs editing by integrating patches in natural language, and MemPrompt [95] involves human feedback prompts to address the issue of lacking commonsense knowledge regarding a particular task. An integral feature of the Language Patch [104] framework is its ability to empower practitioners with the capability to create, edit, or remove patches without necessitating frequent model re-training. This trait not only streamlines the development process but also enhances the adaptability and versatility of the edited model. To enable the automatic correction in memory, MemPrompt [95] equips the language model with a memory bank containing corrective feedback to rectify misunderstandings. Specifically, MemPrompt leverages question-specific historical feedback to refine responses on novel and unencountered instances through prompt adjustments.

In KAFT [79], controllability is achieved through the utilization of counterfactual data augmentations. In this approach, the entity representing the answer within the context is substituted with an alternative but still plausible entity. This substitution is intentionally designed to introduce a conflict with the genuine ground truth, thereby enhancing the controllability and robustness of LLMs with respect to their working memory. The aim is at ensuring that LLMs remain responsive to pertinent contextual information while filtering out noisy or irrelevant data.

In addition to relying on parameter-based memory, recent works also leverage prompting techniques of LLMs, e.g., in-context learning [30] and chain-of-thought prompting [162], to promote editing performance of external memorization. Specifically, IKE [174] introduces novel factual information into a pre-trained LLM via in-context learning, where a set of k demonstrations, i.e., \(\omega =\lbrace x_{i}, y^{*}_{i}\rbrace _{i=1}^{k}\), is selected as the reference points. These demonstrations will alter the prediction of a target factual detail when the input is influenced by an edit. Particularly, IKE guarantees a balance between generality and locality via storing factual knowledge as prompts. The process can be formulated as follows:

\begin{equation} f^*_{\phi , \omega }(x)=f_\phi (\omega \Vert x),\ \text{where}\ \omega =\lbrace x_{i}, y^{*}_{i}\rbrace _{i=1}^{k}. \end{equation}

(17)

Here \(\Vert\) denotes the concatenation of the reference points in \(\omega\) and the input x, which follows an in-context learning manner. Note that in this process, the framework first transforms all new facts into natural language to input them into LLMs. Similar methods of knowledge editing based on prompts [15, 131, 136, 154] can also update and modify knowledge within LLMs. These approaches allow users to guide the model to generate desired outputs by providing specific prompts, and effectively and dynamically adjusting the model’s knowledge base. By leveraging the flexibility of prompts and the contextual understanding of LLMs, users can correct or update information in real-time. These methods offer immediacy, flexibility, and cost-efficiency, making them powerful tools for maintaining the accuracy and relevance of language models in rapidly evolving knowledge domains. Although the prompt approaches effectively edit factual knowledge via in-context learning, they cannot solve more complex questions that involve multiple relations. To deal with this, MeLLo [175] first explores the evaluation of the editing effectiveness in language models regarding multi-hop knowledge. For example, when editing knowledge about the president of the USA, the query regarding the president’s children should change accordingly. MeLLo proposes to enable multi-hop editing by breaking down each query into subquestions, such that the model generates a provisional answer. Subsequently, each subquestion is used to retrieve the most pertinent fact from the memory to assist the model in answering the query.

5.2.3 Extension-based Strategies.

Extension-based strategies utilize supplementary parameters to assimilate modified or additional information into the original language model. These supplementary parameters are designed to represent the newly introduced knowledge or necessary adjustments tailored for specific tasks or domains. Different from memory-based methods, by incorporating new parameters into the language model, extension-based approaches can effectively leverage and expand the model’s functionality.

Extension-based methods can be implemented through various means, and one representative way is to modify the Feed-forward Neural Network (FFN) output. For example, CALINET [29] uses the output from sub-models fine-tuned specifically on factual texts to refine the original FFN output produced by the base model. Another technique T-Patcher [66] introduces a limited number of trainable neurons, referred to as “patches”, in the final FFN layer to alter the model’s behavior while retaining all original parameters to avoid reducing the model’s overall performance. Generally, these methods that refine the structure of FFN can be formulated as follows:

\begin{equation} \operatorname{FFN}({\bf h}) =\operatorname{GELU}\left({\bf h} \mathbf {W}_1\right) \mathbf {W}_2+ \operatorname{GELU}\left(\mathbf {h}\cdot \mathbf {k}_p +b_p\right)\cdot \mathbf {v}_p, \end{equation}

(18)

where \(\mathbf {k}_p\) is the patch key, \(\mathbf {v}_p\) is the patch value, and \(b_p\) is the patch bias scalar. The introduced patches are flexible in size and can be accurately activated to edit specific knowledge without affecting other model parameters.

Alternatively, a different technique involves integrating an adapter into a specific layer of a pre-trained model. This adapter consists of a discrete dictionary comprising keys and values, where each key represents a cached activation generated by the preceding layer and each corresponding value decodes into the desired model output. This dictionary is systematically updated over time. In line with this concept, GRACE [52] introduces an adapter that enables judicious decisions regarding the utilization of the dictionary for a given input, accomplished via the implementation of a deferral mechanism. It is crucial to achieve a balance between the advantages of preserving the original model’s integrity and the practical considerations associated with storage space when implementing this approach. COMEBA-HK [81] incorporates hook layers within the neural network architecture. These layers allow for the sequential editing of the model by enabling updates to be applied in batches. This approach facilitates the integration of new knowledge without requiring extensive retraining of the entire model, making it a scalable solution for continuous learning and adaptation. SWEA [82] focuses on altering the embeddings of specific subject words within the model. By directly updating these embeddings, the method can inject new factual knowledge into the LLMs. This approach ensures that the updates are precise and relevant, thereby enhancing the model’s ability to reflect current information accurately.

5.2.4 Summary.

The eternal memorization methodology operates by preserving the parameters within the original model while modifying specific output results through external interventions via memory or additional model parameters. One notable advantage of this approach is its minimal perturbation of the original model, thereby ensuring the consistency of unedited knowledge. It allows for precise adjustments without necessitating a complete overhaul of the model’s architecture. However, it is imperative to acknowledge a tradeoff inherent in this methodology. Its efficacy is contingent upon the storage and invocation of the edited knowledge, a factor that leads to concerns regarding storage capacity. Depending on the scale of knowledge to be edited, this approach may entail substantial storage requisites. Therefore, cautiously seeking a balance between the advantages of preserving the original model’s integrity and the practical considerations of storage capacity becomes a pivotal concern when employing this particular approach.

5.3 Global Optimization

5.3.1 Overview.

Different from external memorization methods that introduce new parameters to assist the editing of pre-trained LLMs, there also exist branches of works that do not rely on external parameters or memory. Concretely, global optimization strategies aim at injecting new knowledge into LLMs by updating all parameters, i.e., \(\phi\) in Equation (15). Through fine-tuning model parameters with specific designs to ensure the preservation of knowledge irrelevant to the target knowledge \(t^*\), the LLMs are endowed with the ability to absorb new information without altering unedited knowledge. Generally, the goal of global optimization methods can be formulated as follows:

\begin{equation} \begin{aligned}& \min \mathbb {E}_{e \in \mathcal {E}} \mathbb {E}_{x, y^{*} \in \mathcal {X}_e, \mathcal {Y}^*_e} \mathcal {L} (f_{\phi ^*}(x), y^{*}),\ \text{where}\ \ f_{\phi ^*}=M(f_\phi ; \mathcal {E}),\\ &\;\text{s.t.}\;f_{\phi ^*}(x)=f_{\phi }(x),\ \ \forall x\in \mathcal {X}\setminus \mathcal {X}_\mathcal {E}, \end{aligned} \end{equation}

(19)

where \(f_\phi\) denotes the LLM before editing with the pre-trained parameter \(\phi\), and \(f_{\phi ^*}\) denotes the edited LLM with updated parameter \(\phi ^*\). Generally, these methods focus more on the precision and generality of desirable knowledge, as the fine-tuning process ensures that the LLMs achieve satisfactory results regarding the edits and relevant knowledge. Nevertheless, as fine-tuning affects all parameters, they cannot easily preserve the locality of edited models, i.e., maintaining consistent output for unedited knowledge [167]. In practice, directly applying fine-tuning strategies typically exhibits suboptimal performance on KME due to overfitting concerns [98, 152]. Furthermore, fine-tuning large language models is also time-consuming and lacks scalability for multiple edits. Therefore, recently, motivated by these two challenges in fine-tuning, several global optimization works have been proposed and can be categorized as constrained fine-tuning methods and intermediate fine-tuning methods. Note that this section primarily focuses on methods from the model training perspective. Additionally, certain studies [38, 69] address the overfitting challenge by constructing more a comprehensive \(\mathcal {X_{E}^{\prime }}\) with the following fine-tuning goal:

\begin{equation} \begin{aligned}& \min \mathbb {E}_{e \in \mathcal {E}} \mathbb {E}_{x, y^{*} \in \mathcal {X}_{e}^{\prime }, {\mathcal {Y}^*_e}^{\prime } } \mathcal {L} (f_{\phi ^*}(x), y^{*}),\ \text{where}\ \ f_{\phi ^*}=M(f_\phi ; \mathcal {E}),\\ &\;\text{s.t.}\;\mathcal {X_E}\subset \mathcal {X_{E}}^{\prime }, \mathcal {X_{E}}^{\prime }\subseteq \mathcal {X}. \end{aligned} \end{equation}

(20)

5.3.2 Constrained Fine-tuning.

Constrained fine-tuning strategies generally apply specific constraints to prevent updating on non-target knowledge in \(\lbrace \mathcal {X}\setminus \mathcal {X}_\mathcal {E},\mathcal {Y}\setminus \mathcal {Y}_\mathcal {E}\rbrace\). In this manner, the objective in Equation (20) is transformed into a constrained optimization problem:

\begin{equation} \begin{aligned}& \min \mathbb {E}_{e \in \mathcal {E}} \mathbb {E}_{x, y^{*} \in \mathcal {X}_e, \mathcal {Y}^*_e} \mathcal {L} (f_{\phi ^*}(x), y^{*}),\ \text{where}\ \ f_{\phi ^*}=M(f_\phi ; \mathcal {E}),\\ & \;\text{s.t.}\;\ \Vert \mathcal {L}(f_{\phi ^*}(x), y)-\mathcal {L}(f_{\phi }(x), y)\Vert \le \delta , \forall x,y\in \mathcal {X}\setminus \mathcal {X}_\mathcal {E},\mathcal {Y}\setminus \mathcal {Y}_\mathcal {E}, \end{aligned} \end{equation}

(21)

where \(\phi\), \(\phi ^*\) are the parameters before and after updating, respectively. \(\delta\) is a scalar hyper-parameter to restrict the difference between losses of \(f_{\phi ^*}\) and \(f_\phi\). The constraint in Equation (21) restricts the change of the edited model on unmodified knowledge. Zhu et al. [177] first propose an approximate optimization constraint that is easier for implementation and computation:

(22)

The updates are regularized by restricting the norm of parameters before and after updating. RECT [48] adopts a similar yet simpler approach, specifically modifying only the top-k% of parameters with the largest numerical updates during fine-tuning. Although restricting the norm is helpful in preventing the forgetting of original knowledge, the fine-tuning process can be less effective. To deal with this, RecAdam [13], in addition to the norm constraint, applies an annealing technique to control the ratio between the parameter norm and the fine-tuning loss as follows:

\begin{equation} \mathcal {L}_{total}=\lambda (t)\mathcal {L}_{FT}+(1-\lambda (t))\Vert \phi ^*-\phi \Vert ,\ \ \text{where}\ \ \lambda (t)=\frac{1}{1+\exp (-k\cdot (t-t_0))}. \end{equation}

(23)

Here k and \(t_0\) are hyper-parameters. t is the number of fine-tuning steps. Such a design enables a gradual fine-tuning process that prevents massive parameter updates at the beginning. Motivated by the intuition of regularization to preserve original knowledge, PPA [77] employs LoRA [62] in the feed-forward (FFN) layers of the transformer decoder. LoRA is proposed to train the expansion/reduction matrix, instead of the model parameter \(\phi\), to improve training speed by only updating parameters with a low intrinsic rank via dimensionality reduction. PPA leverages plug-in modules trained with constraints via LoRA to keep original knowledge intact. Moreover, the authors assess whether the content of the inputs falls within the scope of \(\mathcal {X}_\mathcal {E}\) using the K-adapter module [153], and redirect such inputs to the new plug-in modules. This information is then used to determine whether to employ LoRA within the FFN layers. Furthermore, MELO [169] clusters the edits and employs multiple non-overlapping LoRA blocks for fine-tuning each cluster separately, thereby mitigating the issue of catastrophic forgetting. F-Learning (Forgetting before Learning) [106] proposes another approach to preserve original knowledge, which learns knowledge parameters \(\Delta \phi\) that indicates old knowledge to be forgotten, defined as follows:

\begin{equation} \phi ^*=\phi -\lambda \Delta \phi ,\ \ \text{where}\ \ \Delta \phi =\text{FT}(\phi ; \mathcal {K}_{old})-\phi . \end{equation}

(24)

Here \(\mathcal {K}_{old}\) denotes the dataset composed of old knowledge that we desire to forget, and \(\text{FT}(\phi ;\mathcal {K}_{old})\) is the supervised fine-tuning process of parameters \(\phi\) on dataset \(\mathcal {K}_{old}\). \(\lambda\) is a hyper-parameter used to control the rate of forgetting. Based on the assumption that subtracting the parameters \(\Delta \phi\) from \(\phi\) can help the model forget this part of old knowledge [68], F-Learning defines the forgetting process as a subtraction operation to obtain the updated model parameter \(\phi ^*\).

On the other hand, other works also resort to meta-learning [36, 145] to apply more flexible constraints. Meta-learning addresses the issue of overfitting by training a model that can quickly adapt to new tasks [60]. By exposing the model to a variety of tasks during training, meta-learning improves the model’s ability to generalize from limited data and reduces the risk of overfitting individual tasks [67]. In the scenario of KME, the optimal model parameters \(\phi ^*\) should minimize the expected loss over a variety of meta-tasks [120]:

\begin{equation} \phi ^* = \text{argmin}_\phi \mathbb {E}_{D\sim \mathcal {D}}[\mathcal {L}_\phi ({D})], \end{equation}

(25)

where \(\mathcal {D}\) corresponds to the sample set for each meta-task D. Moreover, each meta task \({D}\) contains multiple \((x^*, y^*)\) pairs for editing. In practice, such methods often introduce additional objective functions or networks to regulate parameter updates. As a typical meta-learning method for KME, Editable Training [133] focuses on effectively rectifying errors within models while preserving their performance on other irrelevant data instances. Following a model-agnostic training manner, the authors introduce additional constraints to restrict parameter updates in a different way. Specifically, the loss function is separated into \(\mathcal {L}_{base}\) (task-specific objective function), \(\mathcal {L}_{edit}\) (computed on the edit set \(\mathcal {X}_\mathcal {E}\)), and \(\mathcal {L}_{local}\) (computed on samples in \(\mathcal {X}\setminus \mathcal {X}_\mathcal {E}\)). Moreover, the models are updated in a meta-learning manner, where k steps of gradient descent would be applied for parameters before computing the objective function.

5.3.3 Intermediate Fine-tuning Strategies.

While constrained fine-tuning techniques have demonstrated remarkable efficacy in a variety of NLP tasks [7, 164, 179], they still exhibit instability and high computational cost when applied to KME, primarily due to the necessity of altering all parameters [167]. A potential solution to address this challenge is to utilize an intermediate model to obtain the updated parameters in an efficient manner. Such an intermediate model is required to maintain significantly fewer parameters to ensure efficiency [17]. In general, recent works have widely adopted the Hyper-Network [51] as the intermediate model. Specifically, the Hyper-Network is a small network that generates the weights for a larger network, referred to as the main network. Specifically, the Hyper-Network takes inputs that contain information about the structure of the weights and generates the weights for layers in the main network. With the generated weights, the main network is updated to map input data to desired output targets. The updating process for the main network, denoted as \(\phi\), can be defined as follows:

\begin{equation} \begin{aligned}\phi ^*&=\phi + \Delta \phi , \ \ \text{where}\ \ \Delta \phi = \text{H}(\nabla _\phi \mathcal {L} (f_{\phi }(x), y^{*})) \ \ \text{and} \ \ x, y^* \in \mathcal {X}_\mathcal {E}, \mathcal {Y}^*_\mathcal {E}, \end{aligned} \end{equation}

(26)

where \(\text{H}(\cdot)\) denotes the hyper-network. \(\Delta \phi\) is the weight deviation calculated by the hyper-network. According to a recent study [147], task-specific Hyper-Networks (i.e., networks that generate target model weights based on task attributes) are effective in mitigating catastrophic forgetting issues. Therefore, such methods are suitable for the setting of KME, which requires the preservation of unedited knowledge.

Recently, researchers have proposed to adopt hyper-networks in various ways for parameter updates in KME. As a classic example, KE [25] first proposes to edit knowledge and rectify erroneous or unexpected predictions without expensive fine-tuning. Specifically, it trains a hyper-network via constrained optimization to modify facts without affecting pre-trained knowledge irrelevant to the edit. The trained hypernetwork is then used to predict the weight update at the inference time. Based on KE, SLAG [53] further appends metrics for two types of input texts: (1) Inputs that are not in the desired edit set \(\mathcal {X}_\mathcal {E}\) but logically related to \(\mathcal {E}\); (2) Inputs that share a formal resemblance to edited knowledge, but do not lead to changes in the prediction outcomes.

However, hyper-networks are generally not capable of updating large language models due to the massive parameter size. To tackle this challenge, MEND [101] adopts a mechanism referred to as gradient decomposition. In particular, it leverages small auxiliary editing networks to transform the gradients obtained by standard fine-tuning into edits of weights in a pre-trained model. As gradients are generally high-dimensional objects, a low-rank decomposition of the gradients is utilized to achieve the transformation. Particularly, MEND parameterizes the gradient mapping functions as MLPs with a single hidden layer, such that a significantly small number of parameters are required, compared with the edited models. In this manner, MEND enables fast model editing that can operate on considerably large pre-trained language models. Moreover, KGEditor [17] proposes to combine the benefits of memory-based methods and hyper-networks to ensure flexibility and further reduce computation costs. Particularly, KGEditor introduces an additional layer with the same architecture of FFN layers for storing knowledge. Then it constructs a hyper-network based on a bi-directional LSTM [58] that encodes embeddings of triples. In this manner, KGEditor becomes an efficient way to edit knowledge graph embeddings.

5.3.4 Summary.

Global optimization methods typically apply specific fine-tuning restrictions to regularize parameter updates, namely constrained fine-tuning strategies. This is to prevent overfitting and ensure the model’s performance on the unedited knowledge. One crucial advantage of such strategies is its generality regarding the relevant knowledge, i.e., in-scope inputs \(\mathcal {X}_e\) of edit e. As the global optimization affects all parameters in a language model, the relevant knowledge in it will also be edited, thereby generalizing to such knowledge. On the other hand, the high computation costs of fine-tuning all parameters also motivate researchers to propose intermediate fine-tuning strategies that leverage hyper-networks. Furthermore, global optimization methods are mostly model-agnostic, which means they can be applied to other editing methods. Nevertheless, such possibilities are less explored in the context of KME. In terms of the drawbacks, global optimization methods are suboptimal in maintaining the locality of edited models, as the optimization can easily influence unedited knowledge. Hence, it is crucial to achieve a balance between generality and locality when optimizing language models with specific constraints or intermediate designs.

5.4 Local Modification

5.4.1 Overview.

To tackle the challenge of fine-tuning methods with respect to locality, extensive research has been conducted on the local modification strategy for KME tasks [102, 167]. These techniques originate from the concept of identifying and modifying specific relevant weights in a pre-trained model to achieve desirable outputs. The primary objective is to first locate the weights \(\phi _{k}\) that store the knowledge in a pre-trained model \(f_{\phi }\) regarding the input x. Afterward, by adjusting these weights, it becomes possible to generate the correct output \(y^{*}\) from the same input x without re-training or fine-tuning the whole model. Recently, researchers have generalized the local modification strategy to LLMs, where the efficiency of information updates for pre-trained LLMs can be substantially improved. Generally, the goal of the local modification strategy of KME can be formulated as a constrained optimization problem with refined constraints as follows:

\begin{equation} \begin{aligned}& \min _{\phi ^{*}_{k}} \mathbb {E}_{e \in \mathcal {E}} \mathbb {E}_{x, y^* \in \mathcal {X}_e, \mathcal {Y}^*_e} \mathcal {L} (f^*_{\overline{\phi }_{k}, \phi _{k}^{*}}(x), y^*), \\ &\;\text{s.t.}\;f^*_{\overline{\phi }_{k}, \phi _{k}^{*}}(x)=f(x),\ \ \forall x\in \mathcal {X}\setminus \mathcal {X}_\mathcal {E},\\ & \text{where} \ \ \phi _k = L(f_{\phi }, \mathcal {E}),\ \overline{\phi }_k = \phi \setminus \phi _k, \ f^*_{\overline{\phi }_k, \phi ^{*}_k}=M(f_{\phi }, \mathcal {E}). \end{aligned} \end{equation}

(27)

Here \(\phi ^*\) denotes the edited weights related to the new knowledge, and \(\overline{\phi }_k\) denotes the unedited weights. Equation (27) breaks down the local modification strategy for KME into two steps: (1) The locating step, denoted by function L, locates the relevant weights \(\phi _k\) in pre-trained model \(f_{\phi }\) that store the obsolete information regarding the query x. (2) The editing step, denoted by function M, edits the located weights \(\phi _k\) into new weights \(\phi _k^{*}\) such that the correct answer \(y^{*}\) given the query x can be generated by the model with \(\phi _k^{*}\). By only updating a small fraction of model weights, the editing step avoids negatively influencing other irrelevant information, (i.e., \(x \in \mathcal {X} \setminus \mathcal {X}_\mathcal {E}\)).

In the following subsections, we first introduce the concept of knowledge neuron in LLMs, which are specific neurons that store factual knowledge and can be activated to generate the desirable answer based on a certain query x. Then we discuss two local modification strategies for KME: (1) the groundtruth-based strategies, which identify and edit knowledge neurons based on the supervision signal provided by the groundtruth; (2) the prompt-based strategies, which locate knowledge neurons based on the input prompts.

Knowledge Neurons. LLMs pre-trained on large corpora can be viewed as databases that store factual and common-sense knowledge in the pre-trained model weights [49]. To update such knowledge by locally modifying the weights in the pre-trained LLMs, it is imperative to identify which weights store such information, i.e., locating the knowledge neurons. This can be challenging due to the complex transformer architecture of LLMs [7].

As described in Section 2.2.1, the transformer structure of LLMs consists of two primary types of layers, i.e., (1) the self-attention layer and (2) the point-wise FFN layer, which is implemented as a two-layer multi-layer perceptron (MLP). Particularly, given a prompt x, the self-attention layers of the LLMs use the query vector of the last token and the key vectors of the previous tokens to calculate a weighted sum of their value vectors. Therefore, given the input x, these layers provide information about which previous tokens we should consider when generating the answer. Here we provide a simplified example for illustration. To answer the question “Who is the current president of the USA?”, the self-attention layer indicates that the model should attend to words “president” and “USA”, i.e., \({\bf v}_{president}\), \({\bf v}_{USA}\), to determine the answer. This provides us with a start-up embedding \({\bf h}^{start}\) to generate the answer token, which is the weighted sum of the values of the two attended words, i.e., \(w_{1}{\bf v}_{president} + w_{2}{\bf v}_{USA}\). However, the information regarding the current president of the USA is not provided. In contrast, recent works [42, 43, 97, 98] claim that the residual added to \({\bf h}^{start}\) by the outputs of FNN layers, i.e., \({\bf h}^{next} = {\bf h}^{start} + \operatorname{FFN}({\bf h}^{start})\), injects the information “Biden” to \({\bf h}^{start}\) and leads to the generation of correct answers. Therefore, neurons in the FFN can be viewed as the knowledge neurons that store the factual knowledge. The role of FFN in storing knowledge can be theoretically analyzed by revisiting their formulation in Equation (1), which we rewrite as follows:

\begin{equation} \begin{aligned}\text{SelfAtt}_i({\bf x})=\text{Softmax}\left({\bf q}_i {\bf k}_i^\top \right) {\bf v}_i, \quad \text{FFN}({\bf h})=\text{GELU}\left({\bf h} {\bf W}_1\right) {\bf W}_2. \end{aligned} \end{equation}

(28)

Specifically, comparing the above two equations, we observe that the input \({\bf h}\) to the FFN acts similarly to the query \({\bf q}\) to the SelfAtt. Moreover, the weights of the first layer \(\mathbf {W}_{1}\) can be viewed as the key \(\mathbf {v}\), where \(\operatorname{GELU}\left({\bf h} {\bf W}_1\right)\) can be viewed as calculating an unnormalized attention score over the row vectors of \({\bf W}_{2}\). Finally, the weights of the second layer \({\bf W}_{2}\) can be viewed as the value (or the memory) that stores the knowledge, which can be retrieved according to the unnormalized weights calculated by the first layer.

5.4.2 Groundtruth-based Strategies.

Based on the knowledge neuron view of the FFN layer weights in pre-trained LLMs, various groundtruth-based methods are proposed to locate and edit the pre-trained LLMs. Generally, they perform editing in a top-down manner, utilizing the supervision signal provided by the correct groundtruth \(y^*\). As an exemplar work, KD [22] proposes to change each weight \(w^{(l)}_{i}\) (i.e., the ith weight in the lth layer of FFN) from 0 to the pre-trained value \(\hat{w}^{(l)}_{i}\) and calculates the cumulative change in the probability of predicting the output \(y^{*}\) with input x, where the weights with a high cumulative probability are considered relevant for knowledge regarding \(y^{*}\). DEPN [165] proposes a similar cumulative probability-based strategy to detect knowledge neurons that store privacy knowledge. In contrast to locating and editing an individual weight \({w}^{(l)}_{i}\), ROME [97] proposes to update an entire FFN layer to encode the new knowledge of \(y^{*}\). Specifically, they view the second layer weights \({\bf W}_{2}\) in the FFN layer in Equation (28) as a linear associative memory [3, 75] in the form of \({\bf K}{\bf W}_{2} = {\bf V}\), where the keys \({\bf K}\) and values \({\bf V}\) associated with \({\bf W}_{2}\) can be directly calculated via pseudoinverse. With such a view of \({\bf W}_{2}\) in the FFN layer, the optimization objective of updating it into \(\hat{{\bf W}}_{2}\) to encode new knowledge in the edit \(e = (s,r,o\rightarrow o^{*})\) can be formulated as follows:

\begin{equation} \min \Vert {\bf K}\hat{{\bf W}}_{2} - {\bf V}\Vert \ \text{s.t.} \ \hat{{\bf W}} {\bf k}^*={\bf h}^*. \end{equation}

(29)

Here \({\bf k}^{*}\), which should encode the information of the subject s, is calculated by sampling multiple \(x \sim \mathcal {X}_{e}\) and taking the average of the outputs from the first dense layer of the FFN. The target activation \({\bf h}^{*}\) is calculated via optimizing the probability of outputting the correct answers \(y^{*} \in \mathcal {Y}_{e}\) of the pre-trained LLM via the subsequent layers. Then, an efficient rank-one update is conducted on the weights \({\bf W}_{2}\) according to Equation (29), such that after the update, the edited FFN layer can output the correct hidden representation \({\bf h}^{*}\) conducive to the generation of the right answer \(y^{*}\) from \({\bf k}^{*}\). The ROME framework has been shown to generalize to the large Mamba model [130]. Recently, MEMIT [98] proposes to further generalize the above editing strategy of the FFN layers of pre-trained LLMs to the mass editing of different knowledge. Particularly, with u new edits \(\lbrace e_{1}, e_{2},\ldots\,, e_{u}\rbrace\) that are required to be updated in the weights \({\bf W}_{2}\), the mass knowledge editing problem can be formulated as the following optimization problem:

\begin{equation} \min \left(\sum _{i=1}^n\left\Vert {\bf k}_i \hat{{\bf W}}_{2} -{\bf v}_i\right\Vert ^2+\sum _{i=n+1}^{n+u}\left\Vert {\bf k}^{*}_i \hat{{\bf W}}_ {2} -{\bf v}^{*}_i\right\Vert ^2\right), \end{equation}

(30)

where \({\bf k}_{i}\), \({\bf v}_{i}\) are the original key, value pairs associated with the weights \({\bf W}_{2}\) (i.e., row vectors in matrices \({\bf K}\), \({\bf V}\) in Equation (29)), whereas \({\bf k}_{i}^{*}\), \({\bf v}^{*}_{i}\) are the updated key, value pairs calculated from the i-th edit \(e_{i}\) as with Equation (29). In addition, since multiple edits are required, the update is shared among different MLP layers, which is conducted in a top-down manner to prevent the potential issue of editing layers that could affect the ones that have already been edited. The residual for each edit is spread evenly over the range of the critical FFN layer. The strategy of residual attribution has recently been improved by PMET [83], which adopts a square root strategy to spread residuals to bottom FFN layers such that more precise information can be conveyed to critical layers. Furthermore, EMMET [50] generalized ROME and MEMIT by formulating the mass knowledge editing problem as a preservation (of irrelevant knowledge)-memorization (of new knowledge) constrained optimization problem, where they derive closed form weight update formulae when the edit is exact, i.e., \({\bf k}^{*}_i \hat{{\bf W}}_{2} = {\bf v}^{*}_i\) instead of minimizing the MSE in Equation (30).

From the application’s perspective, to remove toxic knowledge of LLM, DINM [149] identifies layers that store toxic knowledge with the discrepancy of toxic/non-toxic sequence embeddings, and uses the non-toxic samples to locally modify the weights of identified layers.

5.4.3 Prompt-based Strategies.

Tailored to characteristics of LLMs that provide answer \(y^{*}\) based on the prompt x, the operation of locating and editing knowledge neurons can also be conducted in a bottom-up manner, which aims at changing the prompt to detect neurons to be edited. Specifically, by masking out the key information and observing the difference of activations in the intermediate layers of the LLM, the weights that store the information regarding the query x can be located and updated to store the new information \(y^{*}\). For example, ROME [97] proposes a corruption-and-restore based strategy to identify relevant layers (or their hidden output variables \({\bf h}\)) that store the information based on the prompt x. It first randomly masks the hidden representations of the key vectors \(\mathbf {k}\) (as described in Equation (1)) of the tokens in the prompts from a certain intermediate layer of the pre-trained LLM. Then it calculates the reduced probability of predicting y (i.e., the obsolete outputs) as the causal mediation effects of x on y mediated by \({\bf h}\). Consequently, the weights in layers with large mediated effects are viewed as knowledge neurons that store the information of y. MEMIT_CSK [49] extends the above corruption-based strategy to editing common sense knowledge. The authors argue that, different from the factual knowledge that can be directly retrieved by the subject s, the object o and relation r also matter for commonsense knowledge. Therefore, three types of corruption and edit locations, i.e., subject, verb, and object, are thoroughly analyzed, where the performance of editing commonsense knowledge can be improved. Moreover, BIRD [93] studies the novel problem of bidirectional KME, which requires the edited model to possess reversibility. For example, if the phrase “The capital of France is” is edited to a counterfactual “London” within a model, it should logically be able to retrieve the inverse fact. That is, when presented with “London is the capital of”, the model should respond with “France” rather than “England”. Based on the strategy of ROME, BIRD introduces a novel objective that involves the bidirectional relationships between subject and object in an edit. In this manner, the updated model weights can preserve reversibility by learning such information.

5.4.4 Summary.

In this part, we introduce the local modification strategy for pre-trained LLMs for efficient updates of new information without adding new weights or optimizing the whole network. We start by analyzing the pivotal role of the point-wise feedforward layers, i.e., the FFNs, to store the factual information in pre-trained LLMs, with the knowledge neurons associated with the FFN layer thoroughly analyzed. We then discuss the groundtruth-based strategies, which achieve the modification in a top-down manner, generally based on least squares objectives computed from the output y. We further discuss the prompt-based strategies, which conduct modifications in a bottom-up manner based on the input prompt x. Nevertheless, the scalability and retainability of local modification methods lack further improvements, as the performance might deteriorate with more edits performed [98].

6 Datasets

Recently, multiple datasets have been established to facilitate the evaluation of KME methods, and we summarize the commonly-used datasets in Table 2 to benefit future KME research. Specifically, these datasets can be divided into two groups: generation datasets (i.e., textual output) and classification datasets (i.e., categorical output). The datasets are obtained from a variety of sources, including knowledge graphs, Wikipedia pages, crowd-sourced responses, and so on., which are adapted by researchers to fit into the KME setting.

Table 2.

Dataset	Type	# Train	# Test	Input	Output	Used in
zsRE	Relational	244,173	244,173	Factual Statement	Object	[25, 38, 48, 50, 52, 66, 69, 77, 81, 97, 98, 101, 102, 106, 136, 151, 156, 169]
CounterFact	Relational	N/A	21,919	Factual Question	Object	[15, 38, 50, 61, 81, 97, 98, 106, 130, 136, 156, 168, 174]
WikiGen	Generation	N/A	68k	Wiki Passage	Continuation	[101]
T-REx-100/-1000	Relational	N/A	100/1,000	Factual Statement	Object	[29, 79]
ParaRel	Relational	N/A	253,448	Factual Question	Object	[22]
NQ-SituatedQA	QA	N/A	67.3k	User Query	Answer	[23, 77]
MQuAKE-CF/-T	Relational	N/A	9,218/1,825	Multi-hop Question	Object	[47, 69, 82, 131, 155, 175]
Hallucination	Hallucination	N/A	1,392	(Fake) Biography	Biography	[52, 151, 169]
MMEdit-E-VQA	Multimodal	6,346	2,093	Image & Question	Answer	[16]
MMEdit-E-IC	Multimodal	2,849	1,000	Image	Description	[16]
ECBD	Relational	N/A	1000	Reference to Entity	Completion	[108]
Conflict Edit	Relational	N/A	7,500	Factual Statement	Object	[86]
Round Edit	Relational	N/A	5,000	Factual Statement	Object	[86]
UKE	Relational	N/A	2,478	Factual Question	Object	[166]
RippleEdits	Relational	N/A	5,000	Factual Question	Object	[21, 69]
VLKEB	Multimodal	5,000	3,174	Image	Description	[65]
MLaKE	Multilingual	N/A	9,432	Question	Answer	[163]
FEVER	Fact Checking	104,966	10,444	Fact Description	Binary Label	[15, 25, 66, 101]
ConvSent	Sentimental	287,802	15,989	Topic Opinion	Sentiment	[102]
Bias in Bio	Biographical	5,000	5,000	Biographical Sentence	Occupation	[57]
VitaminC-FC	Fact Checking	370,653	55,197	Fact Description	Binary Label	[102]
SCOTUS	Categorization	7,400	931	Court Documents	Dispute Topic	[52, 169]

Table 2. Statistics of Prevalent KME Datasets, Including Generation and Classification Datasets

6.1 Generation Datasets

For generation datasets, the target is in the form of textual content that is required to be generated by LLMs. Serving as pivotal resources to evaluate KME methods, most generation datasets are based on relational knowledge and used for assessing the ability of editing techniques to inject new factual knowledge. This is because relational datasets preserve more definitive answers for each input and thus are more convenient and precise for evaluation [167, 172]. Specifically, these datasets are generally curated from the corresponding relational datasets to encompass diverse relational contexts, ranging from question-answer pairs to intricate multi-hop queries. Therefore, the most prevalent output format is an object to be predicted.

In this subsection, we present the most representative generation datasets, shedding light on their unique attributes, the nature of their content, and the specific challenges they present for evaluating KME methods on factual knowledge as follows:

—

zsRE [78]: zsRE is one of the most prevalent Question Answering (QA) datasets extended and adopted by [25, 101] for KME evaluation. zsRE is suitable for evaluating KME due to its annotations of human-generated question paraphrases, which allow researchers to assess the model resilience to semantically equivalent inputs. In zsRE, each relation is associated with a set of crowd-sourced template questions, such as “What is Albert Einstein’s alma mater?”. Each entry cites a Wikipedia sentence, serving as the factual basis or provenance. The dataset also contains negative examples that are generated by pairing a valid question with a random sentence.

—

CounterFact [97]: CounterFact is established to distinguish superficial alterations in the word selections and significant, generalized modifications in its foundational factual knowledge. Proposed in ROME [97], each entry in CounterFact originates from a related record in ParaRel [32], containing a knowledge triple and meticulously crafted prompt templates. It is important to note that all subjects, relations, and objects in this tuple are recognized entities in Wikidata [148].

—

WikiGen [101]: Firstly proposed in MEND [101], WikiGen consists of approximately 68 k question-answer pairs, with a similar size to zsRE. Here, each question corresponds to a sentence randomly sampled from Wikitext-103, and each answer is a 10-token sample obtained from a pre-trained distilGPT-2 model [94]. It is noteworthy that greedy 10-token prediction of the base model only aligns with edit targets for less than 1% of samples.

—

T-REx-100 & T-REx-1000 [33]: First used in CALINET [29], the authors adopt the classic relational dataset T-REx [33] for evaluating model editors by extracting factual triplets of varying sizes (100 and 1,000). Particularly, for each triplet, the authors insert the head and tail entities into the template in LAMA [115] based on the relation they share, which results in two datasets with 100 and 1,000 facts, respectively, for the purpose of false knowledge detection. It should be noted that each fact in these datasets is represented by several paraphrased sentences.

—

ParaRel [32]: ParaRel is an expert-curated dataset that comprises diverse prompt templates for 38 relations, sourced from the T-REx dataset [33]. Firstly used in KN [22], the authors insert the head entity into each relational fact and set the tail entity as a blank for prediction. To ensure a rich variety in templates, relations with less than four prompt templates are excluded, resulting in 34 relations in total. Each of these relations, on average, preserves 8.63 distinct prompt templates, leading to a total of 253,448 knowledge-revealing prompts for 27,738 relational facts.

—

NQ-SituatedQA [76]: Natural Questions (NQ) is a comprehensive question-answering dataset originating from user searches. In PPA [77], the authors utilize NQ as the source knowledge while excluding any outdated information as identified by SituatedQA [171] to create a novel dataset NQ-SituatedQA. SituatedQA is a dataset containing questions within a subset of NQ that are dependent on specific time and location. The authors then incorporate the time-dependent QA pairs from this subset, annotated using the 2021 Wikipedia [148] dump.

—

MQuAKE [175]: MQuAKE is constructed from Wikidata [148] for evaluating the effectiveness of KME methods on multi-hop questions. In particular, it is designed to assess whether the edited models can correctly answer questions generated by chains of facts in plain text. MQuAKE consists of two datasets. (1) MQuAKE-CF is a diagnostic dataset, specifically crafted to evaluate KME methods in the context of counterfactual edits. (2) MQuAKE-T focuses on temporal-based knowledge updates and is aimed at assessing the effectiveness of KME techniques in updating outdated information with contemporary factual data.

—

Hallucination [52]: Firstly processed in GRACE [52], Hallucination is created from the dataset released in SelfCheckGPT [96], where the authors prompt GPT-3 to generate biographies based on concepts extracted from WikiBio. The sentences are annotated regarding the factual accuracy, and hallucinations in them are identified. Then in GRACE, the authors process this dataset by further extracting Wikipedia summaries from WikiBio and thereby acquire the correct entry of each sentence. In this manner, every edit consists of a potentially false biography generated by GPT-3 as the prompt, and a ground truth output, which is the correct next sentence extracted from Wikipedia. There exist 1,392 potential edits for test.

—

MMEdit [16]: This dataset is the first to explore the possibility of editing multimodal LLMs. Specifically, MMEdit consists of two prevalent multimodal tasks: Visual Question Answering (VQA) [4] and Image Captioning [56]. VQA involves developing algorithms that can analyze an image’s visual content, comprehend questions asked in natural language about the image, and accurately respond to those questions. Image Captioning aims at understanding an image and then generate a detailed and coherent natural language description of that image. To create dataset MMEdit, the authors utilize BLIP-2 OPT [80] and extract edit data from the evaluation datasets VQAv2 [46] and COCO Caption [14], specifically focusing on their suboptimal entries.

—

ECBD [108]: Based on the original dataset Entity Cloze By Date (ECBD) [107], the authors process this dataset for a novel task, namely Entity Knowledge Propagation (EKP). The task aimed at updating model parameters to incorporate knowledge about newly emerged entities that are not present in the pre-training data of the language models. For instance, BERT [27], trained in 2018, does not recognize “COVID-19” as it is a more recent entity. The processed dataset aims at providing evaluation for such a task with the help of definition sentences as input to update knowledge about new entities. The entities are taken from the date between 2020/01 and 2021/09 to ensure that they are not in training data. Each edit consists of a new entity, a description sentence, a probe sentence, and a ground truth completion.

—

VLKEB [65]: Large Vision-Language Model Knowledge Editing Benchmark (VLKEB) aims at addressing the unique challenges of editing large vision-language models, which face additional difficulties due to different data modalities and complex model components with limited data for LVLM editing. VLKEB collects data from the multi-modal knowledge graph MMKG [90] and extends the Portability metric for evaluation. With MMKG, VLKEB binds image data with knowledge entities, which can be used to extract entity-related knowledge for editing data.

—

MLaKE [163]: Multilingual Language Knowledge Editing (MLaKE) is proposed to evaluate the capability of KME methods in multilingual contexts and multi-hop reasoning across five languages: English, Chinese, Japanese, French, and German. MLaKE aggregates fact chains from Wikipedia in multiple languages and utilizes LLMs to generate questions in both free-form and multiple-choice formats. Notably, existing methods show relatively high generalization for languages within the same language family compared to those from different families. These findings underscore the need for advancements in multilingual knowledge editing.

—

UKE [166]: Unstructured Knowledge Editing (UKE) is proposed to evaluate the capability of KME methods in updating knowledge based on unstructured texts. Updating LLMs with texts appears to be a more realistic application, which is also more complex and difficult. The authors leverage subjects and objects in Wikidata [148] and retrieve the corresponding Wikipedia article summaries as unstructured texts. The authors also utilize LLMs to generate summaries for edits in two existing datasets, CounferFact [97] and MQuAKE-CF [175], to obtain unstructured texts.

—

RippleEdits [21]: This dataset proposes a novel evaluation criterion, which assesses the performance of KME methods on additional edits brought by an existing edit. In particular, injecting new knowledge (e.g., “Jack Depp is the son of Johnny Depp”) introduces a “ripple effect”, which necessitates the model to update related knowledge as well (e.g., “Jack Depp is the sibling of Lily-Rose Depp”). Based on this, the authors construct RippleEdits, consisting of 5,000 edits with various types of ripple effects.

—

Conflict/Round Edit [86]: This dataset pioneers in investigating the potential side effects of KME methods for LLMs. The proposed dataset and evaluation metrics underline two primary concerns: (1) Knowledge Conflict: Modifying sets of logically conflicting facts can amplify the existing inconsistencies within LLMs. (2) Knowledge Distortion: Altering model parameters to update factual knowledge can permanently disrupt the inherent knowledge framework of LLMs. The dataset is constructed from WikiData [148] with specific logical rules.

6.2 Classification Datasets

Classification datasets are also widely adopted to evaluate the effectiveness of KME. These datasets consist of prompt-target pairs, where the target is a discrete label instead of a textual sentence. In the context of KME, these labels help ascertain the alignment of model performance with desired edits. The advantages of classification datasets also involve their preciseness in evaluation without the need to define the specific output space. In this section, we summarize notable classification datasets that have been tailored and leveraged for assessing KME techniques as follows:

—

FEVER [143]: FEVER is a fact-checking dataset originally processed in KILT [114] for verifying factual knowledge in the form of binary classification. It necessitates the retrieval of sentence-level evidence to determine whether a claim is supported or refuted, and is widely used for evaluating the performance of KME. Specifically, FEVER excludes claims labeled as lacking sufficient information, as they typically do not provide any evidence to evaluate the claim.

—

ConvSent [102]: Firstly processed in SERAC [102], ConvSent is used to evaluate the capability of an editor to modify a dialog agent’s sentiment about a particular topic without influencing its responses to other topics. ConvSent is obtained from a list of 15,000 non-numeric entities from zsRE [25, 78], combined with 989 noun phrases from GPT-3 [10] for 15,989 topics. Particularly, for each entity, there are ten positive and ten negative sentiment completions, which can be noisy, from the BlenderBot model with 3B parameters [124]. The refined sentiment labels are achieved by a sentiment classifier [55] pre-trained on RoBERTa [91].

—

Bias in Bios [24]: Bias in Bios is a dataset originally proposed for fairness-related machine learning, containing approximately 397 k short professional biographies of online individuals, which are not relatively famous. Each biographical sentence is assigned an associated occupation label for the described person. To adopt this dataset for evaluating the performance of KME methods, the authors of REMEDI [57] extract a single sentence, modify it to display only the person’s first name, and then query the language model with the prompt that follows the structure: “Person has the occupation of...”. Then they evaluate the relative probabilities of the language model assigned to 28 potential occupations, where the language model is considered to be correct if the ground-truth occupation is ranked top-1.

—

VitaminC-FC [127]: Firstly processed in SERAC [102], VitaminC-FC is constructed based on a fact-checking dataset, VitaminC [127]. Particularly, VitaminC consists of more than 400,000 evidence-claim pairs, each of which is assigned a binary label to indicate whether the evidence entails the claim. The dataset was gathered from over 100,000 Wikipedia revisions that modify an underlying fact, along with additional synthetic ones. In SERAC, the authors convert VitaminC into a KME dataset by using the evidence as the edit descriptor and using claims from the same Wiki pages accordingly as in-scope samples.

—

SCOTUS [52]: Firstly proposed in GRACE [52], SCOTUS is processed with label shift based on the dataset with the same name from Fairlex [11]. This classification task is to categorize U.S. Supreme Court documents from various decades into one of 11 topics. The topics are clustered based on the specific matter of dispute, such as Criminal Procedure, Civil Rights, and First Amendment. Due to the evolution of categorization rules over time, the label distributions in this dataset also shift. Specifically, 7.4 k cases from 1946–1982 are used for training, and 931 cases from the 1991–2009 period are for test.

7 Applications

KME can benefit multiple downstream applications with the ability to precisely and efficiently inject knowledge into pre-trained LLMs. In the following, we introduce several key applications of KME techniques in realistic scenarios, where intuitive examples are provided in Table 3.

Table 3.

Task	Edit Descriptor e	In-scope Input \(x\sim \mathcal {X}_e\)	Original Output \(y\sim \mathcal {Y}_e\)	Target Output \(y\sim \mathcal {Y}_e^*\)
QA	(Kazakhstan, Captital,	What is the capital of	Astana	Nur-Sultan
	Astana\(\rightarrow\)Nur-Sultan)	Kazakhstan?
FC	(Marathon, Record,	Kipchoge holds the men’s	True	False
	Kipchoge\(\rightarrow\)Kiptum)	marathon world record.
NLG	(Jordan Poole, Play In,	Provide a short introduction	Jordan Poole entered	In 2023, Jordan Poole transitioned
	Warriors\(\rightarrow\)Wizards)	to Jordan Poole, describing	the Warriors’ rotation	from the Warriors to the Wizards,
		his current position.	recently.	remarking a significant change.

Table 3. Examples of Different Downstream Applications of KME: QA, FC, and NLG

7.1 Question Answering

Background. Question Answering (QA) is a core NLP task that aims at comprehending queries posed by users in natural language and provide answers based on the encoded knowledge in the pre-trained language model [132]. Traditional models for QA are generally fixed in their knowledge, capturing only the information available at the training time of [70, 115]. However, in our dynamic world, new information is generated incessantly, which necessitates the constant update of QA models [139]. Fortunately, KME methods enable the modification of QA models to cater to specific questions without disrupting responses to other unrelated inputs. Therefore, with KME strategies, the QA model can be efficiently updated on the run, where the currentness of the model can be guaranteed. Consequently, language model editing techniques have found broad applications across a myriad of QA contexts with potentially distinct requirements [77].

Existing Works. The QA task encompasses various aspects, such as conversational QA, definition-based QA, and notably, relation-based QA [110]. Relation-based QA is primarily adopted as an evaluation benchmark as it necessitates the retrieval of precise real-world facts in response to queries. This particular emphasis on specific information retrieval renders relation-based QA especially conducive to the benefits of KME techniques. For example, PPA [77] introduces an innovative task of Continuously-updated QA (CuQA), which intentionally emphasizes recurrent, substantial edits for language models to constantly update them with new information. An important aspect of the CuQA task is to ensure that the existing pre-trained knowledge remains unaltered with the integration of new knowledge. Therefore, this property is one important evaluation to assess model editing in CuQA tasks. In MQuAKE [175], the authors innovatively propose a multi-hop QA task that involves answering questions generated by chains of facts in plain text. Specifically, the task requires edited models to infer implicit relations that can be several hops away from the objects in the edit. For example, when a language model is modified regarding the president of the USA, an ideal model should also authentically alter answers to “Who is the son of the president of the USA”, which is a two-hop relation. Such a task is significantly more challenging as it necessitates the model to alter its reasoning results in addition to the original edit. Nevertheless, the proposed method MeLLo in MQuAKE still exhibits outstanding performance on this difficult task, demonstrating the potential of KME in generalizing edited knowledge to multi-hop relations.

7.2 Fact Checking

Background. Fact-checking (FC) is a pivotal task in journalism, information verification, and combating misinformation that aims at scrutinizing and affirming the authenticity of claims, statements, or information in news articles, social media, and other media content [37, 127]. In a world overwhelmed with ever-emerging information, fact-checking facilitates the trustworthiness in the sharing of distributed information, promotes information transparency, and aids individuals in making well-informed decisions [143]. However, it is crucial to constantly update fact-checking models. For instance, during the COVID-19 pandemic, initial understandings and guidelines about the virus evolved as researchers gathered more data [129]. A fact-checking model that cannot adapt to these rapidly changing facts would quickly become outdated and potentially spread misinformation, thereby requiring the application of language model editing. By integrating KME techniques into fact-checking models to consistently update them with the latest information and facts, it becomes possible to ensure the currentness, trustworthiness, and accuracy of the model despite the persistent evolution of information.

Existing Works. Recently, several works have proposed to apply KME techniques in fact-checking models. In [177], the authors first explore the potential of modifying specific factual knowledge within the transformer backbone of the fact-checking model while ensuring that overall model performance remains intact on facts irrelevant to the editing purpose. Particularly, they identify the critical components within the transformer backbones conducive to effective knowledge modifications. In SERAC [102], the authors propose to use evidence gathered from Wikipedia as edit descriptors to update potentially outdated knowledge in the model. The proposed method exhibits significant performance improvements over baselines and can be generalized to other in-scope inputs collected from the same Wikipedia page.

7.3 Natural Language Generation

Background. KME techniques are also promising to ensure the relevancy of the Natural Language Generation (NLG) task, which aims at generating coherent and contextually relevant content based on provided instructions [122]. Considering the rapid evolution of the global information landscape, it is essential for NLG models to remain up-to-date and ensure the accuracy of generated text while avoiding potentially false statements that may mislead the users.

Existing Works. In practice, several works have been proposed to apply KME methods to promote model performance in natural language generation tasks. For instance, FRUIT [5] proposes to update outdated Wikipedia articles according to the collection of new information about the article’s subject. Based on the T5 model [119], the authors utilize a compressed output format to eliminate the necessity of generating the entire update from scratch and promote thoughtful content structuring, which effectively handles the challenge of incoherence. In MEND [101], the authors apply their proposed method in the Wikitext generation task, where the edited model is required to produce credible 10-token extensions based on a provided Wikitext prefix [94]. With modification on multi-layer token-wise activations and gradients, the edited model presents higher coherence on the NLG task, which demonstrates the effectiveness of KME in generating target texts with richer information than QA or FC.

8 Discussion

8.1 Challenges

Despite the continual progress of works on KME, several critical aspects have been inadequately addressed by existing studies. Delving deeper into these challenges could offer researchers fresh insights and pave the way for the further advancement of the field. Consequently, we hereby outline the pressing challenges that await solutions in KME.

Tradeoff between Locality and Generality. In KME, it is crucial to balance two objectives, locality and generality (as defined in Section 4), such that a higher edit success rate can be achieved with minimal negative influence on knowledge irrelevant to the edits. When editing a language model, a potential tradeoff might emerge between these two desirable properties. As demonstrated in [167], local modification methods, such as MEMIT [98] and ROME [97] generally preserve a higher level of locality, as they locate precise locations of target knowledge to conduct the edition, which does not largely affect the unrelated weights. In addition, T-Patcher [66] points out that increasing the size of memory increases locality while decreasing the generality. These observations underscore the intricate balance between locality and generality. However, it remains challenging to tackle the tradeoff problem and achieve a balance between these two desirable properties of KME methods.

Theoretical Analysis. While many current KME studies focus on developing effective methods to enhance the editing performance regarding various desirable properties, there exists a notable gap between the practical application and the comparatively less discovered theoretical analysis. Recently, in [140], the authors provide theoretical support for the justification of identifying harmful training examples and editing the model by erasing the information from a Bayesian view. LEACE [9] introduces an analytical framework that offers a theoretical perspective for the task of erasing target concept information from every layer in language models. In general, the benefits of incorporating theoretical analysis are multi-faceted. First, theoretical analysis provides a deeper understanding of the mechanics underlying KME, allowing for more principled approaches to editing. Second, a strong theoretical basis sets a solid foundation for future research, encouraging more rigorous and systematic exploration in the field of KME. However, to the best of our knowledge, there still does not exist any comprehensive theoretical analysis regarding the KME problem that involves novel knowledge. We hope that future research will enrich the theoretical discourse that can deliver profound insights into the substantial foundations of KME methods.

Editing at Scale. Another crucial property that hinders the practical application of KME is scalability – the ability of editing strategy to effectively perform a large number of edits simultaneously [101]. For example, conversational systems [174] are expected to be constantly updated to incorporate an enormous number of global events and the information originating from them. However, as the number of applied edits increases, the coherence of language models is severely jeopardized, as multiple edits might contradict a broader spectrum of pre-existing knowledge in the models [152]. This can lead to decreased editing performance in both locality and generality metrics [102]. Although external memorization methods can alleviate such problems with a larger size of memories of additional parameters, they are still vulnerable if thousands of edits are required [97]. Moreover, simply adapting single-edit techniques for a multi-edit environment by merely applying them sequentially has been demonstrated to be proven suboptimal [98]. Therefore, the unique and intricate challenge of coherence renders editing at scale a formidable task.

Unstructured Editing. KME faces significant challenges due to its evaluation strategies that focus on knowledge triples, e.g., \(t=(s,r,o)\), which are not reflective of how real-world knowledge updates occur [65, 172]. In reality, updates are often found in unstructured texts such as news articles and scientific papers. To address this gap, a recent benchmark [166], namely Unstructured Knowledge Editing (UKE), is proposed to evaluate editing performance using unstructured texts as knowledge updates. The experimental results demonstrate significant performance declines of state-of-the-art KME methods. Notably, such a decline persists even with knowledge triplets extracted from unstructured texts. As such, it is imperative to develop more robust and adaptable methods that use unstructured texts for editing.

8.2 Future Directions

Despite the recent achievements in the development of KME strategies for effective and efficient updating of new knowledge into LLMs, KME research is still in its emerging stage. Several promising directions could be pursued to further advance this field. Accordingly, we identify five inspiring and important open problems worthy of exploration in the future as follows:

Optimization-Free Editing. Recently, prompt engineering has become a prevalent solution for modifying the behaviors of pre-trained LLMs in a human-preferable manner without the requirement of parameter update [30]. For example, in-context learning provides task descriptions and/or demonstrations in the form of plain text to promote the model performance [10], which makes it a potentially more efficient and practical strategy for language models. We note that IKE [174] proposes a novel framework that relies on demonstration contexts for KME without parameter updating, which explicitly formats the demonstrations that can guide the language model to copy, update, and retain the prediction of different prompts. However, such a strategy is difficult to scale and usually has unsatisfactory retention. Therefore, it remains a crucial while challenging task to develop optimization-free KME methods.

Auto-Discovery of Editing Targets. Current KME methods mainly rely on human expertise to identify and incorporate desirable knowledge into pre-trained LLMs [166, 167, 172]. This approach is inherently labor-intensive and can incur significant costs, especially considering the vast and rapidly expanding new information needed to be integrated into language models. A promising future direction lies in the automation of the edits, which aims at identifying, evaluating, and prioritizing new knowledge that needs to be integrated from raw resources such as websites and social media. Through this strategy, the application of KME can be streamlined, rendering it more practical and adaptable in real-world scenarios. A straightforward solution would be crawling new knowledge and transforming it into a knowledge base, querying LLMs for each knowledge triple, and editing the wrong answer. However, such a strategy still lacks efficiency. Therefore, it remains a crucial task to discover editing knowledge from various resources without human effort.

Continual Editing. Current KME methods primarily consider one-step offline editing [5, 25]; however, such an approach is not aligned with real-world applications where models might continually encounter novel knowledge to be injected. For example, an online QA model may continually encounter reports of incorrect answers from end users, where the editing needs to be conducted on the run [66]. Therefore, an optimal KME technique should be capable of instantaneously and continuously rectifying emergent issues. We note that continual editing of pre-trained LLMs presents a unique challenge: preventing the edited models from forgetting or contradicting previous edits. Despite the inherent complexities, the persistent demand for continual editing in practice underscores the importance of solving this challenge.

Robust Editing. An important direction for the advancement of KME lies in enhancing its robustness. In an era where misinformation spreads rapidly, it is urgent that edited models not only retain their accuracy but also resist adversarial attacks and misinformation [39]. Here, we should note that the concept of robustness extends beyond just maintaining factual accuracy; it involves fortifying the model against potentially adversarial external perturbations [113]. For example, if KME is maliciously applied to inject harmful knowledge into language models, the edited models can be easily transformed into tools for misinformation [141]. Therefore, to prevent such cases, it is crucial for KME techniques to develop capabilities that can identify and counteract such unwanted inputs, thereby enhancing their resilience against adversarial actions. In practice, as the trend leans toward open-sourcing LLMs, it becomes ever more crucial to safeguard against potential manipulations that can turn these models harmful.

Editable Fairness. With the wide application of LLMs to support decisions, the emphasis on fairness has grown significantly [150], which requires LLMs to fairly treat people with diverse background [1]. However, LLMs trained on large datasets inevitably incorporate certain biases during this pre-training phase [28]. Fortunately, the precision and efficiency of KME techniques offer a promising solution to mitigate such biases and promote fairness in pre-trained LLMs. For instance, in a model designed to classify biographical sentences with occupation [24], KME can be used to inject nuanced knowledge about a particular profession, guiding the model toward a more equitable understanding of individuals associated with that profession [57]. However, this remains a complex challenge, as fairness often entails considering disparate groups of individuals rather than specific people. This broader focus makes knowledge injection via KME a non-trivial task. Despite these difficulties, the enhancement of fairness in language models is paramount, and KME techniques present a promising avenue to achieve this goal.

9 Conclusions

In this survey, we present a comprehensive and in-depth review of KME techniques for precise and efficient updating of new knowledge in pre-trained LLMs. We first formulate the KME problem as a constrained optimization objective that simultaneously ensures the accuracy and retention of editing, which is general to encompass different KME strategies. We then provide an overview of the evaluation metrics for KME, which sheds light on the desirable attributes of edited models. Subsequently, we propose a structured taxonomy framework to systematically categorize existing KME techniques. Within each category, we outline the central challenges, elaborate on the representative methods, and discuss their strengths and weaknesses. Furthermore, we summarize the datasets widely utilized to assess KME techniques, highlighting that certain techniques demand specific dataset structures for training or evaluation. To inspire researchers to devise more practical implementations, we also spotlight the real-world applications of KME techniques. Finally, we identify several potential challenges for future research and provide insightful directions that are conducive to further advancement of the field.

Footnote

The concept is also termed as Knowledge Editing or Model Editing. For clarity, we refer to it as KME in this article.

References

[1]

Abubakar Abid, Maheen Farooqi, and James Zou. 2021. Persistent anti-muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society.

Abstract

1 Introduction

2 Background

2.1 Editing of Machine Learning Models

2.2 Language Models

2.2.1 Transformers.

2.2.2 Large Language Models (LLMs).

2.3 Relevant Topics

3 Problem Formulation

4 Evaluation Metrics

4.1 Accuracy

4.2 Locality

4.3 Generality

4.4 Portability

4.5 Retainability

4.6 Scalability

5 Methodologies

5.1 Categorization of KME Methods

5.2 External Memorization

5.2.1 Overview.

5.2.2 Memory-based Strategies.

5.2.3 Extension-based Strategies.

5.2.4 Summary.

5.3 Global Optimization

5.3.1 Overview.

5.3.2 Constrained Fine-tuning.

5.3.3 Intermediate Fine-tuning Strategies.

5.3.4 Summary.

5.4 Local Modification

5.4.1 Overview.

5.4.2 Groundtruth-based Strategies.

5.4.3 Prompt-based Strategies.

5.4.4 Summary.

6 Datasets

6.1 Generation Datasets

6.2 Classification Datasets

7 Applications

7.1 Question Answering

7.2 Fact Checking

7.3 Natural Language Generation

8 Discussion

8.1 Challenges

8.2 Future Directions

9 Conclusions

Footnote

References

Cited By

Index Terms

Recommendations

Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models

Knowledge updates: Semantics and complexity issues

Unleashing the Potential of Large Language Models for Knowledge Augmentation: A Practical Experiment on Incremental Sheet Forming

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations