Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Towards the Dynamics of a DNN Learning Symbolic Interactions

Qihan Ren1  , Junpeng Zhang111footnotemark: 1  , Yang Xu2, Yue Xin1, Dongrui Liu1,3, Quanshi Zhang1
1Shanghai Jiao Tong University
2Zhejiang University  3Shanghai Artificial Intelligence Laboratory
{renqihan, zhangjp63, zqs1022}@sjtu.edu.cn
Equal contribution.Quanshi Zhang is the corresponding author. He is with the Department of Computer Science and Engineering, the John Hopcroft Center, at the Shanghai Jiao Tong University, China. zqs1022@sjtu.edu.cn.
Abstract

This study proves the two-phase dynamics of a deep neural network (DNN) learning interactions. Despite the long disappointing view of the faithfulness of post-hoc explanation of a DNN, a series of theorems have been proven [27] in recent years to show that for a given input sample, a small set of interactions between input variables can be considered as primitive inference patterns that faithfully represent a DNN’s detailed inference logic on that sample. Particularly, Zhang et al. [41] have observed that various DNNs all learn interactions of different complexities in two distinct phases, and this two-phase dynamics well explains how a DNN changes from under-fitting to over-fitting. Therefore, in this study, we mathematically prove the two-phase dynamics of interactions, providing a theoretical mechanism for how the generalization power of a DNN changes during the training process. Experiments show that our theory well predicts the real dynamics of interactions on different DNNs trained for various tasks.

1 Introduction

Background: mathematically guaranteeing that the inference score of a DNN can be faithfully explained as symbolic interactions. Explaining the detailed inference logic hidden behind the output score of a DNN is considered one of the core issues for the post-hoc explanation of a DNN. However, after a comprehensive survey of various explanation methods, many studies [28, 1, 12] have unanimously and empirically arrived at a disappointing view of the faithfulness of almost all post-hoc explanation methods. Fortunately, the recent progress [27] has mathematically proven that given a specific input sample 𝒙=[x1,,xn]𝒙superscriptsubscript𝑥1subscript𝑥𝑛top\bm{x}=[x_{1},\cdots,x_{n}]^{\top}bold_italic_x = [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, a DNN111The proof in [27] requires the DNN to generate relatively stable inference outputs on masked samples, which is formulated by three mathematical conditions (see Appendix B). It is found that DNNs for image classification, 3D point cloud classification, tabular data classification, and text generation for a classification task usually only encodes a small set of interactions between input variables in the sample. It is proven that these interactions act like primitive inference patterns and can accurately predict all network outputs, no matter how we randomly mask the input sample222It is proven that no matter how we randomly mask variables of the input sample, we can always use numerical effects of a few interactions to accurately regress the network outputs on all masked samples.. An interaction refers to a non-linear relationship encoded by the DNN between a set of input variables in S𝑆Sitalic_S. For example, as Figure 1 shows, a DNN may encode a non-linear relationship between the three image patches in S={x1,x2,x3}𝑆subscript𝑥1subscript𝑥2subscript𝑥3S=\{x_{1},x_{2},x_{3}\}italic_S = { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT } to form a dog-snout pattern, which makes a numerical effect I(S)𝐼𝑆I(S)italic_I ( italic_S ) on the network output. The complexity (or order) of an interaction is defined as the number of input variables in the set S𝑆Sitalic_S, i.e., order(S)=def|S|order𝑆def𝑆{\rm order}(S)\overset{\text{\rm def}}{=}|S|roman_order ( italic_S ) overdef start_ARG = end_ARG | italic_S |.

Our task. Since Zhou et al. [44] found that high-order (complex) interactions usually have a much higher risk of over-fitting than low-order (simple) interactions, in this study, we hope to further track the change in the complexity of interactions during training, so as to explain the change of the DNN’s generalization power during training. In particular, the time when the DNN starts to learn high-order (complex) interactions indicates the starting point of over-fitting.

Refer to caption
Figure 1: (a) It is proven that the DNN’s inference on a certain sample is equivalent to a logical model that uses a small number of AND-OR interactions for inference. Each interaction corresponds to a non-linear (AND or OR) relationship between a set S𝑆Sitalic_S of input variables (e.g., image patches). (b) Sparsity of interactions. We show the strength |I(S|𝒙)||I(S|\bm{x})|| italic_I ( italic_S | bold_italic_x ) | of all 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT interactions sorted in descending order. (c) Illustration of the two-phase dynamics of a DNN learning interactions of different orders.

Specifically, we focus on the two-phase dynamics of interaction complexity which was empirically observed by [41], and we aim to mathematically prove this dynamics. First, before training, a DNN with randomly initialized parameters mainly encodes interactions of medium complexities. As Figure 2 shows, the distribution of interactions appears spindle-shaped. Then, in the first phase, the DNN eliminates interactions of medium and high complexities, thereby mainly encoding interactions of low complexity. In the second phase, the DNN gradually learns interactions of increasing complexities. We have conducted experiments to train DNNs with various architectures for different tasks. It shows that our theory can well predict the learning dynamics of interactions in real DNNs.

The proven two-phase dynamics explain hidden factors that push the DNN from under-fitting to over-fitting. (1) In the first phase, the DNN mainly removes noise interactions, (2) In the second phase, the DNN gradually learns more complex and non-generalizable interactions toward over-fitting.

2 Related work

Long-standing disappointment on the faithfulness of existing post-hoc explanation of DNNs. Many studies [30, 40, 29, 2, 15] have explained the inference score of a DNN, but how to mathematically formulate and guarantee the faithfulness of the explanation is still an open problem. For example, using an interpretable surrogate model to approximate the output of a DNN [3, 11, 35, 34] is a classic explanation technique. However, the good matching between the DNN’s output and the surrogate model’s output cannot fully guarantee that the two models use exactly the same inference patterns and/or use the same attention. Therefore, many studies [28, 12, 1] have unanimously and empirically arrived at a disappointing view of the faithfulness of current explanation methods. Rudin [28] pointed out that inaccurate post-hoc explanations of DNNs would be harmful to high-stakes applications. Ghassemi et al. [12] showed various failure cases of current explanation methods in the healthcare field and argued that using these methods to aid medical decisions was a false hope.

New progress towards proving the faithfulness of symbolic explanation of a DNN. Despite the disappointing view of post-hoc explanation methods, we have established a theory system of interactions within three years, which includes more than 30 papers, to quantify the symbolic concepts encoded by a DNN and explain the hidden factors that determine the generalization power and robustness of a DNN. We revisit this theory system as follows.

\bullet Proving interactions act as faithful primitives inference patterns encoded by the DNN. Recent achievements in the theory system of interactions have provided a new perspective to formulate primitive inference patterns encoded by a DNN. We discovered [23] and proved [27] that a DNN’s inference logic on a certain sample can be explained by only a small number of interactions. Furthermore, we discovered that salient interactions usually represented common inference patterns shared by different samples (sample-wise transferability of interactions) [21], and proposed a method to extract generalizable interactions shared by different DNNs (model-wise transferability of interactions) [4]. The above studies indicated that salient interactions could be considered primitive inference patterns encoded by a DNN, which served as the theoretical foundation of this study. Based on interactions, we also defined and learned the optimal baseline value for the Shapley value [25], and explained the encoding of different types of visual patterns in DNNs for image classification [5, 6].

\bullet Using interactions to explain the representation power of DNNs. Our recent studies showed that interactions well explained the hidden factors that determine the adversarial robustness [24], adversarial transferability [37], and generalization power [44] of a DNN. We also discovered and proved the representation bottleneck of a DNN in encoding middle-complexity interactions [7]. In addition, we proved that compared to a standard DNN, a Bayesian neural network (BNN) tended to avoid encoding complex interactions [26], thus explaining the good adversarial robustness of BNNs. We discovered and explained the phenomenon that DNNs tended to learn simple interactions more easily than complex interactions [22]. We found that complex interactions were less generalizable than simple interactions [44], and further discovered the two-phase dynamics of a DNN learning interactions of different complexities [41]. To this end, this study aims to theoretically prove the discovery in [41] to better understand the two-phase dynamics of interactions.

\bullet Using interactions to unify the common mechanism of various empirical deep learning methods. We proved that fourteen attribution methods could all be explained as a re-allocation of interaction effects [8]. We proved that twelve existing methods to improve adversarial transferability all shared the common utility of suppressing the interactions between adversarial perturbation units [42].

3 Dynamics of interactions

3.1 Preliminary: interactions

Let us consider a DNN v𝑣vitalic_v and an input sample 𝒙=[x1,,xn]𝒙superscriptsubscript𝑥1subscript𝑥𝑛top\bm{x}=[x_{1},\cdots,x_{n}]^{\top}bold_italic_x = [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT with n𝑛nitalic_n input variables indexed by N={1,,n}𝑁1𝑛N=\{1,\cdots,n\}italic_N = { 1 , ⋯ , italic_n }. In different tasks, one can define different input variables, e.g., each input variable may represent an image patch for image classification or a word/token for text classification. Let us consider a scalar output333For example, one may set v(𝒙)𝑣𝒙v(\bm{x})italic_v ( bold_italic_x ) as the loss value on sample 𝒙𝒙\bm{x}bold_italic_x. For a multi-category classification task, one usually either set v(𝒙)𝑣𝒙v(\bm{x})italic_v ( bold_italic_x ) to be the output score for the ground-truth category before the softmax operation, or follow[7] to set v(𝒙)=logp(ytruth|𝒙)1p(ytruth|𝒙)𝑣𝒙𝑝conditionalsuperscript𝑦truth𝒙1𝑝conditionalsuperscript𝑦truth𝒙v(\bm{x})=\log\frac{p(y^{\text{truth}}|\bm{x})}{1-p(y^{\text{truth}}|\bm{x})}italic_v ( bold_italic_x ) = roman_log divide start_ARG italic_p ( italic_y start_POSTSUPERSCRIPT truth end_POSTSUPERSCRIPT | bold_italic_x ) end_ARG start_ARG 1 - italic_p ( italic_y start_POSTSUPERSCRIPT truth end_POSTSUPERSCRIPT | bold_italic_x ) end_ARG. See Table 1 for a summary of mathematical settings for interactions. of a DNN, denoted by v(𝒙)𝑣𝒙v(\bm{x})\in\mathbb{R}italic_v ( bold_italic_x ) ∈ blackboard_R. Previous studies [4, 43] show that the output score v(𝐱)𝑣𝐱v(\bm{x})italic_v ( bold_italic_x ) can be decomposed into the sum of AND interactions and OR interactions.

v(𝒙)=v(𝒙)+SNIand(S|𝒙)+SNIor(S|𝒙),𝑣𝒙𝑣subscript𝒙subscript𝑆𝑁subscript𝐼andconditional𝑆𝒙subscript𝑆𝑁subscript𝐼orconditional𝑆𝒙v(\bm{x})=v(\bm{x}_{\emptyset})+\sum\nolimits_{\emptyset\neq S\subseteq N}I_{% \text{and}}(S|\bm{x})+\sum\nolimits_{\emptyset\neq S\subseteq N}I_{\text{or}}(% S|\bm{x}),italic_v ( bold_italic_x ) = italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT ∅ ≠ italic_S ⊆ italic_N end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) + ∑ start_POSTSUBSCRIPT ∅ ≠ italic_S ⊆ italic_N end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) , (1)

where the computation of Iand(S|𝒙)subscript𝐼andconditional𝑆𝒙I_{\text{and}}(S|\bm{x})italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) and Ior(S|𝒙)subscript𝐼orconditional𝑆𝒙I_{\text{or}}(S|\bm{x})italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) will be introduced later in Eq. (2).

How to understand the physical meaning of AND-OR interactions. Suppose that we are given an input sample 𝒙𝒙\bm{x}bold_italic_x. According to Theorem 2, a non-zero interaction effect Iand(S|𝒙)subscript𝐼andconditional𝑆𝒙I_{\text{and}}(S|\bm{x})italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) indicates that the entire function of the DNN must equivalently encode an AND relationship between input variables in the set SN𝑆𝑁S\subseteq Nitalic_S ⊆ italic_N, although the DNN does not use an explicit neuron to model such an AND relationship. As Figure 1 shows, when the image patchs in the set S2={x1=nose,x2=tongue,x3=cheek}subscript𝑆2formulae-sequencesubscript𝑥1noseformulae-sequencesubscript𝑥2tonguesubscript𝑥3cheekS_{2}\!=\!\{x_{1}\!=\!\textit{nose},x_{2}\!=\!\textit{tongue},x_{3}\!=\!% \textit{cheek}\}italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = nose , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = tongue , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = cheek } are all present (i.e., not masked), the three regions form a dog-snout pattern, and make a numerical effect Iand(S2|𝒙)subscript𝐼andconditionalsubscript𝑆2𝒙I_{\text{and}}(S_{2}|\bm{x})italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | bold_italic_x ) to push the output score v(𝒙)𝑣𝒙v(\bm{x})italic_v ( bold_italic_x ) towards the dog category. Masking any image patch in S2subscript𝑆2S_{2}italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT will deactivate the AND interaction and remove Iand(S2|𝒙)subscript𝐼andconditionalsubscript𝑆2𝒙I_{\text{and}}(S_{2}|\bm{x})italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | bold_italic_x ) from v(𝒙)𝑣𝒙v(\bm{x})italic_v ( bold_italic_x ). This will be shown by Theorem 2. Likewise, Ior(S|𝒙)subscript𝐼orconditional𝑆𝒙I_{\text{or}}(S|\bm{x})italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) can be considered as the numerical effect of the OR relationship encoded by the DNN between input variables in the set S𝑆Sitalic_S. As Figure 1 shows, when one of the patches in S1={x4=spotty region1,x5=spotty region2}subscript𝑆1formulae-sequencesubscript𝑥4spotty region1subscript𝑥5spotty region2S_{1}\!=\!\{x_{4}\!=\!\textit{spotty region1},x_{5}\!=\!\textit{spotty region2}\}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = { italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = spotty region1 , italic_x start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT = spotty region2 } is present, a speckles pattern is used by the DNN to make a numerical effect Ior(S1|𝒙)subscript𝐼orconditionalsubscript𝑆1𝒙I_{\text{or}}(S_{1}|\bm{x})italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | bold_italic_x ) on the network output v(𝒙)𝑣𝒙v(\bm{x})italic_v ( bold_italic_x ).

Definition and computation. Given a DNN and an input 𝒙𝒙\bm{x}bold_italic_x, the AND-OR interactions between each specific set of input variables SN(S)𝑆𝑁𝑆S\subseteq N(S\neq\emptyset)italic_S ⊆ italic_N ( italic_S ≠ ∅ ) are computed as follows [4, 43].

Iand(S|𝒙)=TS(1)|S||T|vand(𝒙T),Ior(S|𝒙)=TS(1)|S||T|vor(𝒙NT),formulae-sequencesubscript𝐼andconditional𝑆𝒙subscript𝑇𝑆superscript1𝑆𝑇subscript𝑣andsubscript𝒙𝑇subscript𝐼orconditional𝑆𝒙subscript𝑇𝑆superscript1𝑆𝑇subscript𝑣orsubscript𝒙𝑁𝑇I_{\text{and}}(S|\bm{x})=\sum\nolimits_{T\subseteq S}(-1)^{|S|-|T|}v_{\text{% and}}\left(\bm{x}_{T}\right),\quad I_{\text{or}}(S|\bm{x})=-\sum\nolimits_{T% \subseteq S}(-1)^{|S|-|T|}v_{\text{or}}\left(\bm{x}_{N\setminus T}\right),italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) = ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_S end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_S | - | italic_T | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) , italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) = - ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_S end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_S | - | italic_T | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_T end_POSTSUBSCRIPT ) , (2)

where 𝒙Tsubscript𝒙𝑇\bm{x}_{T}bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT denotes the sample in which input variables in NT𝑁𝑇N\setminus Titalic_N ∖ italic_T are masked444The masked states of input variables are represented by specific baseline values 𝒃=[b1,,bn]𝒃superscriptsubscript𝑏1subscript𝑏𝑛top\bm{b}=[b_{1},\cdots,b_{n}]^{\top}bold_italic_b = [ italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT by following [41]. See Appendix G.3 for the detailed setting of baseline values., while input variables in T𝑇Titalic_T are unchanged. The network output on each masked sample v(𝒙T),TN𝑣subscript𝒙𝑇𝑇𝑁v(\bm{x}_{T}),T\subseteq Nitalic_v ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) , italic_T ⊆ italic_N, is decomposed into two components: (1) the component vand(𝒙T)=0.5v(𝒙T)+γTsubscript𝑣andsubscript𝒙𝑇0.5𝑣subscript𝒙𝑇subscript𝛾𝑇v_{\text{and}}(\bm{x}_{T})=0.5v(\bm{x}_{T})+\gamma_{T}italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = 0.5 italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) + italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT that exclusively contains AND interactions, and (2) the component vor(𝒙T)=0.5v(𝒙T)γTsubscript𝑣orsubscript𝒙𝑇0.5𝑣subscript𝒙𝑇subscript𝛾𝑇v_{\text{or}}(\bm{x}_{T})=0.5v(\bm{x}_{T})-\gamma_{T}italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = 0.5 italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) - italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT that exclusively contains OR interactions, subject to v(𝒙T)=vand(𝒙T)+vor(𝒙T)𝑣subscript𝒙𝑇subscript𝑣andsubscript𝒙𝑇subscript𝑣orsubscript𝒙𝑇v(\bm{x}_{T})=v_{\text{and}}(\bm{x}_{T})+v_{\text{or}}(\bm{x}_{T})italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) + italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ). Appendix F.1 shows that vand(𝒙T)=v(𝒙)+STIand(S|𝒙)subscript𝑣andsubscript𝒙𝑇𝑣subscript𝒙subscriptsuperscript𝑆𝑇subscript𝐼andconditionalsuperscript𝑆𝒙v_{\text{and}}(\bm{x}_{T})=v(\bm{x}_{\emptyset})+\sum_{\emptyset\neq S^{\prime% }\subseteq T}I_{\text{and}}(S^{\prime}|\bm{x})italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT ∅ ≠ italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ italic_T end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_italic_x ) and vor(𝒙T)=SN:STIor(S|𝒙)subscript𝑣orsubscript𝒙𝑇subscript:superscript𝑆𝑁superscript𝑆𝑇subscript𝐼orconditionalsuperscript𝑆𝒙v_{\text{or}}(\bm{x}_{T})=\sum_{S^{\prime}\subseteq N:S^{\prime}\cap T\neq% \emptyset}I_{\text{or}}(S^{\prime}|\bm{x})italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ italic_N : italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∩ italic_T ≠ ∅ end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_italic_x ). The sparsest AND-OR interactions are extracted by minimizing the following objective [20]: min{γT}SN|Iand(S|𝒙)|+|Ior(S|𝒙)|\min_{\{\gamma_{T}\}}\sum_{S\subseteq N}|I_{\text{and}}(S|\bm{x})|+|I_{\text{% or}}(S|\bm{x})|roman_min start_POSTSUBSCRIPT { italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT } end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_S ⊆ italic_N end_POSTSUBSCRIPT | italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) | + | italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) |. Please see Appendix C for details about the computation and Appendix D for mathematical support of the coefficient in Eq. (2).

Salient interactions and noisy patterns. Let us enumerate all 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT combinations of variables SN𝑆𝑁S\subseteq Nitalic_S ⊆ italic_N, and compute the interaction effects Iand(S|𝒙)subscript𝐼andconditional𝑆𝒙I_{\text{and}}(S|\bm{x})italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) and Ior(S|𝒙)subscript𝐼orconditional𝑆𝒙I_{\text{or}}(S|\bm{x})italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_S | bold_italic_x ). We can identify a few salient interactions from all these interactions, i.e., interactions whose absolute value exceeds a threshold (|Iand(S|𝒙)|τ|I_{\text{and}}(S|\bm{x})|\geq\tau| italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) | ≥ italic_τ or |Ior(S|𝒙)|τ|I_{\text{or}}(S|\bm{x})|\geq\tau| italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) | ≥ italic_τ). Other interactions have small effects and are termed noisy patterns.

Theorem 1 (Sparsity property, proven by [27], and discussed in Appendix B).

Given a DNN v𝑣vitalic_v and an input sample 𝐱𝐱\bm{x}bold_italic_x with n𝑛nitalic_n input variables, let Ω=def{SN:|Iand(S|𝐱)|τ}\Omega\overset{\text{\rm def}}{=}\{S\subseteq N:|I_{\text{\rm and}}(S|\bm{x})|% \geq\tau\}roman_Ω overdef start_ARG = end_ARG { italic_S ⊆ italic_N : | italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) | ≥ italic_τ } denote the set of salient AND interactions whose absolute value exceeds a threshold τ𝜏\tauitalic_τ. If the DNN can generate relatively stable inference outputs v(𝐱S)𝑣subscript𝐱𝑆v(\bm{x}_{S})italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) on masked samples555This is formulated by three mathematical conditions. (1) The DNN does not encode highly complex interactions. (2) Let us compute the average classification confidence when we mask different random sets of k𝑘kitalic_k input variables (generating {𝐱T:|T|=nk}conditional-setsubscript𝐱𝑇𝑇𝑛𝑘\{\bm{x}_{T}:|T|=n-k\}{ bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT : | italic_T | = italic_n - italic_k }). Then, the average confidence monotonically decreases when more input variables are masked. (3) The decreasing speed of the average confidence is polynomial. See Appendix B for the detailed mathematical formulation., then the size of the set |Ω|Ω|\Omega|| roman_Ω | has an upper bound of 𝒪(nξ/τ)𝒪superscript𝑛𝜉𝜏\mathcal{O}(n^{\xi}/\tau)caligraphic_O ( italic_n start_POSTSUPERSCRIPT italic_ξ end_POSTSUPERSCRIPT / italic_τ ), where ξ𝜉\xiitalic_ξ is an intrinsic parameter for the smoothness of the network function v()𝑣v(\cdot)italic_v ( ⋅ ). Empirically, ξ𝜉\xiitalic_ξ is usually within the range of [1.9,2.2].

Theorem 2 (Universal matching property, proven in [4] and Appendix F.1).

Given an input sample 𝐱^^𝐱\hat{\bm{x}}over^ start_ARG bold_italic_x end_ARG, let us construct the following surrogate logical model f()𝑓f(\cdot)italic_f ( ⋅ ) to use AND-OR interactions for inference, which are extracted from the DNN v()𝑣v(\cdot)italic_v ( ⋅ ) on the sample 𝐱^^𝐱\hat{\bm{x}}over^ start_ARG bold_italic_x end_ARG. Then, the output of the surrogate logical model f()𝑓f(\cdot)italic_f ( ⋅ ) can always match the output of the DNN v()𝑣v(\cdot)italic_v ( ⋅ ), no matter how the input sample is masked.

SN,for-all𝑆𝑁\displaystyle\!\!\!\forall S\!\subseteq\!N,∀ italic_S ⊆ italic_N , f(𝒙^S)=v(𝒙^S),f(𝒙^S)=v(𝒙^)+TNIand(T|𝒙^)𝟙(𝒙^S triggersAND relation T)vand(𝒙S)+TNIor(T|𝒙^)𝟙(𝒙^S triggersOR relation T)vor(𝒙S)formulae-sequence𝑓subscript^𝒙𝑆𝑣subscript^𝒙𝑆𝑓subscript^𝒙𝑆subscript𝑣subscript^𝒙subscript𝑇𝑁subscript𝐼andconditional𝑇^𝒙1matrixsubscript^𝒙𝑆 triggersAND relation 𝑇subscript𝑣andsubscript𝒙𝑆subscriptsubscript𝑇𝑁subscript𝐼orconditional𝑇^𝒙1matrixsubscript^𝒙𝑆 triggersOR relation 𝑇subscript𝑣orsubscript𝒙𝑆\displaystyle\ f(\hat{\bm{x}}_{S})\!=\!v(\hat{\bm{x}}_{S}),\ f(\hat{\bm{x}}_{S% })\!=\!\ \underbrace{v(\hat{\bm{x}}_{\emptyset})\!+\!\!\sum_{T\subseteq N}\!I_% {\text{\rm and}}(T|\hat{\bm{x}})\!\cdot\!\mathbbm{1}{\tiny\begin{pmatrix}\!\!% \hat{\bm{x}}_{S}\text{\rm\ triggers}\\ \text{\rm AND relation }T\!\!\end{pmatrix}}}_{v_{\text{\rm and}(\bm{x}_{S})}}% \!+\!\!\underbrace{\sum_{T\subseteq N}\!I_{\text{\rm or}}(T|\hat{\bm{x}})\!% \cdot\!\mathbbm{1}{\tiny\begin{pmatrix}\!\!\hat{\bm{x}}_{S}\text{\rm\ triggers% }\\ \text{\rm OR relation }T\!\!\end{pmatrix}}}_{v_{\text{\rm or}(\bm{x}_{S})}}italic_f ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = italic_v ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) , italic_f ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = under⏟ start_ARG italic_v ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_N end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_T | over^ start_ARG bold_italic_x end_ARG ) ⋅ blackboard_1 ( start_ARG start_ROW start_CELL over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT triggers end_CELL end_ROW start_ROW start_CELL AND relation italic_T end_CELL end_ROW end_ARG ) end_ARG start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT and ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT end_POSTSUBSCRIPT + under⏟ start_ARG ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_N end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_T | over^ start_ARG bold_italic_x end_ARG ) ⋅ blackboard_1 ( start_ARG start_ROW start_CELL over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT triggers end_CELL end_ROW start_ROW start_CELL OR relation italic_T end_CELL end_ROW end_ARG ) end_ARG start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT or ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT end_POSTSUBSCRIPT (3)
=\displaystyle== v(𝒙=𝒙^)+TSIand(T|𝒙=𝒙^)+TN:TSIor(T|𝒙=𝒙^)𝑣𝒙subscript^𝒙subscript𝑇𝑆subscript𝐼andconditional𝑇𝒙^𝒙subscript:𝑇𝑁𝑇𝑆subscript𝐼orconditional𝑇𝒙^𝒙\displaystyle\ v(\bm{x}\!=\!\hat{\bm{x}}_{\emptyset})+\sum\nolimits_{\emptyset% \neq T\subseteq S}I_{\text{\rm and}}\left(T|\bm{x}\!=\!\hat{\bm{x}}\right)+% \sum\nolimits_{T\subseteq N:T\cap S\neq\emptyset}I_{\text{\rm or}}\left(T|\bm{% x}\!=\!\hat{\bm{x}}\right)italic_v ( bold_italic_x = over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT ∅ ≠ italic_T ⊆ italic_S end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_T | bold_italic_x = over^ start_ARG bold_italic_x end_ARG ) + ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_N : italic_T ∩ italic_S ≠ ∅ end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_T | bold_italic_x = over^ start_ARG bold_italic_x end_ARG ) (4)
\displaystyle\approx v(𝒙=𝒙^)+TΩand:TSIand(T|𝒙=𝒙^)+TΩor:TSIor(T|𝒙=𝒙^),𝑣𝒙subscript^𝒙subscript:𝑇subscriptΩand𝑇𝑆subscript𝐼andconditional𝑇𝒙^𝒙subscript:𝑇subscriptΩor𝑇𝑆subscript𝐼orconditional𝑇𝒙^𝒙\displaystyle\ v(\bm{x}\!=\!\hat{\bm{x}}_{\emptyset})+\sum\nolimits_{T\in% \Omega_{\text{\rm and}}:\emptyset\neq T\subseteq S}I_{\text{\rm and}}\left(T|% \bm{x}\!=\!\hat{\bm{x}}\right)+\sum\nolimits_{T\in\Omega_{\text{\rm or}}:T\cap S% \neq\emptyset}I_{\text{\rm or}}\left(T|\bm{x}\!=\!\hat{\bm{x}}\right),italic_v ( bold_italic_x = over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_T ∈ roman_Ω start_POSTSUBSCRIPT and end_POSTSUBSCRIPT : ∅ ≠ italic_T ⊆ italic_S end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_T | bold_italic_x = over^ start_ARG bold_italic_x end_ARG ) + ∑ start_POSTSUBSCRIPT italic_T ∈ roman_Ω start_POSTSUBSCRIPT or end_POSTSUBSCRIPT : italic_T ∩ italic_S ≠ ∅ end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_T | bold_italic_x = over^ start_ARG bold_italic_x end_ARG ) , (5)

where ΩandsubscriptΩand\Omega_{\text{\rm and}}roman_Ω start_POSTSUBSCRIPT and end_POSTSUBSCRIPT is the set of all salient AND interactions, and ΩorsubscriptΩor\Omega_{\text{\rm or}}roman_Ω start_POSTSUBSCRIPT or end_POSTSUBSCRIPT is the set of all salient OR interactions.

What makes the interaction-based explanation faithful. The following four properties guarantee that the inference score of a DNN can be faithfully explained by symbolic interactions.

\bullet Sparsity property. The sparsity property means that a DNN for a classification task usually only encodes a small number of AND interactions with salient effects, i.e., for most of all 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT subsets of input variables SN𝑆𝑁S\subseteq Nitalic_S ⊆ italic_N, Iand(S|𝒙)subscript𝐼andconditional𝑆𝒙I_{\text{and}}(S|\bm{x})italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) has almost zero interaction effect. Specifically, the sparsity property has been widely observed on various DNNs for different tasks [21], and it is also theoretically proven (see Theorem 1). The number of AND interactions whose absolute value exceeds the threshold τ𝜏\tauitalic_τ (|Iand(S|𝒙)|τ|I_{\text{and}}(S|\bm{x})|\geq\tau| italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) | ≥ italic_τ), is 𝒪(nξ/τ)𝒪superscript𝑛𝜉𝜏\mathcal{O}(n^{\xi}/\tau)caligraphic_O ( italic_n start_POSTSUPERSCRIPT italic_ξ end_POSTSUPERSCRIPT / italic_τ ), where ξ𝜉\xiitalic_ξ is empirically within the range of [1.9,2.2]1.92.2[1.9,2.2][ 1.9 , 2.2 ]. This indicates that the number of salient interactions is much less than 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Furthermore, the sparsity property also holds for OR interactions, because an OR interaction can be viewed as a special kind of AND interaction666If we flip the masked state and the presence state of each input variable (i.e., taking bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as the presence state of the i𝑖iitalic_i-th variable, while taking xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as the masked state), then OR interactions can be viewed as a special kind of AND interactions. See Appendix E for details..

\bullet Universal matching property. The universal matching property means that the output of the DNN on a masked sample 𝒙Ssubscript𝒙𝑆\bm{x}_{S}bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT can be well matched by the sum of interaction effects, no matter how we randomly mask the sample and obtain 𝒙Ssubscript𝒙𝑆\bm{x}_{S}bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT. This property is proven in Theorem 2.

\bullet Transferability property. The transferability property means that salient interactions extracted from one input sample can usually be extracted from other input samples as well. If so, these interactions are considered transferable across different samples. This property has been widely observed by [21] on various DNNs for different tasks.

\bullet Discrimination property. This property means that the same interaction extracted from different samples consistently contributes to the classification of a certain category. This property has been observed on various DNNs [21], and it implies that interactions are discriminative for classification.

Complexity/order of interactions. The complexity (or order) of an interaction is defined as the number of input variables in the set S𝑆Sitalic_S, i.e., order(S)=def|S|order𝑆def𝑆{\rm order}(S)\overset{\text{\rm def}}{=}|S|roman_order ( italic_S ) overdef start_ARG = end_ARG | italic_S |. In this way, a high-order interaction represents a complex non-linear relationship among many input variables.

3.2 Two-phase dynamics of learning interactions

Zhang et al. [41] have discovered the following two-phase dynamics of interaction complexity during the training process. (1) As Figure 2 shows, before the training process, the DNN with randomly initialized parameters mainly encodes interactions of medium orders. (2) In the first phase, the DNN removes initial interactions of medium and high orders, and mainly encodes low-order interactions. (3) In the second phase, the DNN gradually learns interactions of increasing orders.

To better illustrate this phenomenon, we followed [41] to conduct experiments on different DNNs, including AlexNet [17], VGG [31], BERT [9], DGCNN [38], and on various datasets, including image data (MNIST [19], CIFAR-10 [16], CUB-200-2011 [36], and Tiny-ImageNet [18]), natural language data (SST-2 [32]), and point cloud data (ShapeNet [39]). For image data, we followed [41] to select a random set of ten image patches as input variables. For natural language data, we set the entire embedding vector of each token as an input variable. For point cloud data, we took point clusters as input variables. Please see Appendix G.3 for the detailed settings. We set v(𝒙)=log(p(ytruth|𝒙)/[1p(ytruth|𝒙)])𝑣𝒙𝑝conditionalsuperscript𝑦truth𝒙delimited-[]1𝑝conditionalsuperscript𝑦truth𝒙v(\bm{x})\!=\!\log\left({p(y^{\text{truth}}|\bm{x})}/[{1-p(y^{\text{truth}}|% \bm{x})}]\right)italic_v ( bold_italic_x ) = roman_log ( italic_p ( italic_y start_POSTSUPERSCRIPT truth end_POSTSUPERSCRIPT | bold_italic_x ) / [ 1 - italic_p ( italic_y start_POSTSUPERSCRIPT truth end_POSTSUPERSCRIPT | bold_italic_x ) ] ) by following [7], where p(ytruth|𝒙)𝑝conditionalsuperscript𝑦truth𝒙p(y^{\text{truth}}|\bm{x})italic_p ( italic_y start_POSTSUPERSCRIPT truth end_POSTSUPERSCRIPT | bold_italic_x ) denotes the probability of classifying the input sample 𝒙𝒙\bm{x}bold_italic_x to the ground-truth category. We followed [41] to define the interaction whose absolute value is greater than or equal to τ=0.03𝔼𝒙[|v(𝒙)v(𝒙)|]𝜏0.03subscript𝔼𝒙delimited-[]𝑣𝒙𝑣subscript𝒙\tau\!=\!0.03\ \mathbb{E}_{\bm{x}}[|v(\bm{x})-v(\bm{x}_{\emptyset})|]italic_τ = 0.03 blackboard_E start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT [ | italic_v ( bold_italic_x ) - italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) | ] as salient interaction. For interactions of each k𝑘kitalic_k-th order, we normalized the strength of salient interactions as Ireal(k)=𝔼𝒙[type{and,or}S:|S|=k,|Itype(S|𝒙)|τ|Itype(S|𝒙)|]/ZI_{\text{real}}^{(k)}\!=\!\mathbb{E}_{\bm{x}}[\sum_{\text{type}\in\{\text{and}% ,\text{or}\}}\!\sum_{S:|S|=k,|I_{\text{type}}(S|\bm{x})|\geq\tau}|I_{\text{% type}}(S|\bm{x})|]/Zitalic_I start_POSTSUBSCRIPT real end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = blackboard_E start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT type ∈ { and , or } end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_S : | italic_S | = italic_k , | italic_I start_POSTSUBSCRIPT type end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) | ≥ italic_τ end_POSTSUBSCRIPT | italic_I start_POSTSUBSCRIPT type end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) | ] / italic_Z to enable fair comparison between different training epochs777The normalization removes the effect of the explosion of output values during the training process and enables us to only analyze the relative distribution of interaction strength., where Z=𝔼1kn𝔼𝒙[type{and,or}S:|S|=k,|Itype(S|𝒙)|τ|Itype(S|𝒙)|]Z\!=\!\mathbb{E}_{1\leq k^{\prime}\leq n}\mathbb{E}_{\bm{x}}[\sum_{\text{type}% \in\{\text{and},\text{or}\}}\!\sum_{S:|S|=k^{\prime},|I_{\text{type}}(S|\bm{x}% )|\geq\tau}\!|I_{\text{type}}(S|\bm{x})|]italic_Z = blackboard_E start_POSTSUBSCRIPT 1 ≤ italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≤ italic_n end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT type ∈ { and , or } end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_S : | italic_S | = italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , | italic_I start_POSTSUBSCRIPT type end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) | ≥ italic_τ end_POSTSUBSCRIPT | italic_I start_POSTSUBSCRIPT type end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) | ] denotes the normalizing constant.

Refer to caption
Figure 2: The distribution of interaction strength Ireal(k)superscriptsubscript𝐼real𝑘I_{\text{real}}^{(k)}italic_I start_POSTSUBSCRIPT real end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT over different orders k𝑘kitalic_k. Each row shows the change in the distribution during the training process. Experiments showed that the two-phase phenomenon widely existed on different DNNs trained on various datasets. It also verified the finding in [41] that the beginning of the 2nd phase was temporally aligned with the time point when the loss gap increased. Please see Appendix J.1 for results on the other six DNNs trained for 3D point cloud/image/sentiment classification.

Figure 2 shows how the distribution of interaction strength Ireal(k)superscriptsubscript𝐼real𝑘I_{\text{real}}^{(k)}italic_I start_POSTSUBSCRIPT real end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT of different orders changed throughout the entire training process, and it demonstrates that the two-phase dynamics widely existed on different DNNs trained on various datasets. Before training, the interaction strength of medium orders dominated, and the distribution of interaction strength of different orders looked like a spindle. In the first phase (from the 2nd column to the 3rd column in the figure), the strength of medium-order and high-order interactions gradually shrank to zero, while the strength of low-order interactions increased. In the second phase (from the 3rd column to the 6th column in the figure), the DNN learned interactions of increasing orders (complexities).

How to understand the two-phase phenomenon. Previous studies [44, 26] have observed and partially proved that the complexity/order of an interaction can reflect the generalization ability888Unlike the traditional definition of the over-fitting/generalization power on the entire model over the entire dataset, the interaction first enables us to explicitly identify detailed over-fitted/generalizable inference patterns (interactions) on a specific sample. of the interaction. Let us consider an interaction that is frequently extracted by a DNN from training samples (see the transferability property in Section 3.2). If this interaction also frequently appears in testing samples, then this interaction is considered generalizable1010footnotemark: 10; otherwise, non-generalizable. To this end, Zhou et al. [44] have discovered that high-order (complex) interactions are less generalizable between training and testing samples than low-order (simple) interactions. Furthermore, Ren et al. [26] have proved that high-order (complex) interactions are more unstable than low-order (simple) interactions when input variables or network parameters are perturbed by random noises.

Therefore, the two-phase dynamics enable us to revisit the change of generalization power of a DNN:

  1. 1.

    Before training, the interactions extracted from an initialized DNN exhibited a spindle-shaped distribution of interaction strength over different orders. These interactions could be considered random patterns irrelevant to the task, and such patterns were mostly of medium orders.

  2. 2.

    In the first phase, the DNN mainly removed the irrelevant patterns caused by the randomly initialized parameters. At the same time, the DNN shifted its attention to low-order interactions between very few input variables. These low-order interactions usually represented relatively simple and generalizable1010footnotemark: 10 inference patterns, without encoding complex inference patterns.

  3. 3.

    In the second phase, the DNN gradually learned interactions of increasing orders (increasing complexities). Although there was no clear boundary between under-fitting and over-fitting in mathematics, the learning of very complex interactions had been widely considered as a typical sign of over-fitting1010footnotemark: 10 [44].

3.3 Proving of the two-phase dynamics

3.3.1 Analytic solution to interaction effects

As the foundation of proving the dynamics of the two phases, let us first derive the analytic solution to interaction effects at a specific time point during the training process. Then, Sections 3.3.2 and 3.3.3 will use this analytic solution to further explain detailed dynamics in the second phase and the first phase, respectively. Later experiments show that our theory can well predict the true dynamics of all AND-OR interactions during the learning of real DNNs.

The proof in this subsection can be divided into three steps. (1) We first rewrite a DNN’s inference on an input sample as a weighted sum of triggering functions of different interactions. (2) Then, we can reformulate the learning of the DNN on an input sample as a linear regression problem. (3) Thus, the interactions at an intermediate time point during training can be obtained as the optimal solution to the linear regression problem under a certain level of parameter noises.

\bullet Step 1: Rewriting a DNN’s inference on an input sample as a weighted sum of triggering functions of different interactions. For simplicity, let us only focus on the dynamics of AND interactions, because OR interactions can also be represented as a specific kind of AND interactions88footnotemark: 8 (see Appendix E for details). In this way, without loss of generality, let us just analyze the learning of AND interactions w.r.t. vand(𝒙)=v(𝒙)+SNIand(S|𝒙)subscript𝑣and𝒙𝑣subscript𝒙subscript𝑆𝑁subscript𝐼andconditional𝑆𝒙v_{\text{and}}(\bm{x})=v(\bm{x}_{\emptyset})+\sum_{\emptyset\neq S\subseteq N}% I_{\text{and}}(S|\bm{x})italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x ) = italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT ∅ ≠ italic_S ⊆ italic_N end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ), and simplify the notation as v(𝒙)=v(𝒙)+SNI(S|𝒙)𝑣𝒙𝑣subscript𝒙subscript𝑆𝑁𝐼conditional𝑆𝒙v(\bm{x})=v(\bm{x}_{\emptyset})+\sum_{\emptyset\neq S\subseteq N}I(S|\bm{x})italic_v ( bold_italic_x ) = italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT ∅ ≠ italic_S ⊆ italic_N end_POSTSUBSCRIPT italic_I ( italic_S | bold_italic_x ) in the following proof. Our conclusions can also be extended to OR interactions, as mentioned above.

Given a DNN, we follow [26, 22] to rewrite the inference function of the network v(𝒙)𝑣𝒙v(\bm{x})italic_v ( bold_italic_x ). This is inspired by the universal matching property of interactions in Theorem 2, i.e., given any arbitrarily masked input sample 𝒙^Ssubscript^𝒙𝑆\hat{\bm{x}}_{S}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT w.r.t. a random subset SN𝑆𝑁S\subseteq Nitalic_S ⊆ italic_N, the network output can always be represented as a linear sum of different interaction effects v(𝒙=𝒙^S)=TSI(T|𝒙=𝒙^)𝑣𝒙subscript^𝒙𝑆subscript𝑇𝑆𝐼conditional𝑇𝒙^𝒙v(\bm{x}=\hat{\bm{x}}_{S})=\sum_{T\subseteq S}I(T|\bm{x}=\hat{\bm{x}})italic_v ( bold_italic_x = over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_S end_POSTSUBSCRIPT italic_I ( italic_T | bold_italic_x = over^ start_ARG bold_italic_x end_ARG ). In this way, the following equation rewrites the inference function of the DNN v(𝒙=𝒙^S)𝑣𝒙subscript^𝒙𝑆v(\bm{x}=\hat{\bm{x}}_{S})italic_v ( bold_italic_x = over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) as the weighted sum of triggering functions of interactions (see Appendix F.2 for proof).

SN,v(𝒙=𝒙^S)=f(𝒙=𝒙^S), subject to f(𝒙)=defTNwTJT(𝒙),formulae-sequencefor-all𝑆𝑁𝑣𝒙subscript^𝒙𝑆𝑓𝒙subscript^𝒙𝑆 subject to 𝑓𝒙defsubscript𝑇𝑁subscript𝑤𝑇subscript𝐽𝑇𝒙\forall\ S\subseteq N,\ v(\bm{x}\!=\!\hat{\bm{x}}_{S})=f(\bm{x}\!=\!\hat{\bm{x% }}_{S}),\ \text{ subject to }f(\bm{x})\overset{\text{def}}{=}\sum\nolimits_{T% \subseteq N}w_{T}\ J_{T}(\bm{x}),∀ italic_S ⊆ italic_N , italic_v ( bold_italic_x = over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = italic_f ( bold_italic_x = over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) , subject to italic_f ( bold_italic_x ) overdef start_ARG = end_ARG ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) , (6)

where the interaction triggering function JT(𝒙)subscript𝐽𝑇𝒙J_{T}(\bm{x})italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) is a real-valued approximation of the binary indicator function 𝟙(𝒙^S triggers the AND relation T)1subscript^𝒙𝑆 triggers the AND relation 𝑇\mathbbm{1}(\hat{\bm{x}}_{S}\text{ triggers the AND relation }T)blackboard_1 ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT triggers the AND relation italic_T ) in Eq. (3) and returns the triggering value of the interaction pattern T𝑇Titalic_T. In particular, we set w=v(𝒙=𝒙^)subscript𝑤𝑣𝒙subscript^𝒙w_{\emptyset}=v(\bm{x}\!=\!\hat{\bm{x}}_{\emptyset})italic_w start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT = italic_v ( bold_italic_x = over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ), J(𝒙)=1subscript𝐽𝒙1J_{\emptyset}(\bm{x})=1italic_J start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ( bold_italic_x ) = 1. JT(𝒙)subscript𝐽𝑇𝒙J_{T}(\bm{x})italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) is computed as a sum of compositional terms in the Taylor expansion of v(𝒙)𝑣𝒙v(\bm{x})italic_v ( bold_italic_x ).

JT(𝒙)=𝝅QT1i=1nπi!π1++πnvx1π1xnπn|𝒙=𝒙iT(xibi)πi/wT,subscript𝐽𝑇𝒙evaluated-atsubscript𝝅subscript𝑄𝑇1superscriptsubscriptproduct𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝜋1subscript𝜋𝑛𝑣superscriptsubscript𝑥1subscript𝜋1superscriptsubscript𝑥𝑛subscript𝜋𝑛𝒙subscript𝒙subscriptproduct𝑖𝑇superscriptsubscript𝑥𝑖subscript𝑏𝑖subscript𝜋𝑖subscript𝑤𝑇J_{T}(\bm{x})=\sum_{\bm{\pi}\in Q_{T}}\frac{1}{\prod_{i=1}^{n}\pi_{i}!}\frac{% \partial^{\pi_{1}+\cdots+\pi_{n}}v}{\partial x_{1}^{\pi_{1}}\cdots\partial x_{% n}^{\pi_{n}}}\Big{|}_{\bm{x}=\bm{x}_{\emptyset}}\prod_{i\in T}\left({x_{i}-b_{% i}}\right)^{\pi_{i}}/w_{T},italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) = ∑ start_POSTSUBSCRIPT bold_italic_π ∈ italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i ∈ italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT / italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , (7)

where the scalar weight wTsubscript𝑤𝑇w_{T}italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT should be computed as wT=I(T|𝒙=𝒙^)subscript𝑤𝑇𝐼conditional𝑇𝒙^𝒙w_{T}=I(T|\bm{x}\!=\!\hat{\bm{x}})italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = italic_I ( italic_T | bold_italic_x = over^ start_ARG bold_italic_x end_ARG ) to satisfy the equality in Eq. (6), and QT={[π1,,πn]:iT,πi+;iT,πi=0}subscript𝑄𝑇conditional-setsuperscriptsubscript𝜋1subscript𝜋𝑛topformulae-sequencefor-all𝑖𝑇formulae-sequencesubscript𝜋𝑖superscriptformulae-sequencefor-all𝑖𝑇subscript𝜋𝑖0Q_{T}=\{[\pi_{1},\dots,\pi_{n}]^{\top}:\forall i\in T,\pi_{i}\in\mathbb{N}^{+}% ;\forall i\not\in T,\pi_{i}=0\}italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = { [ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT : ∀ italic_i ∈ italic_T , italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ; ∀ italic_i ∉ italic_T , italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 }. See Appendix F.2 for proof.

Understanding JT(x)subscript𝐽𝑇𝑥J_{T}(\bm{x})italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) and wTsubscript𝑤𝑇w_{T}italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. Let us consider a masked sample 𝒙^Ssubscript^𝒙𝑆\hat{\bm{x}}_{S}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT in which input variables in NS𝑁𝑆N\setminus Sitalic_N ∖ italic_S are masked. If TS𝑇𝑆T\subseteq Sitalic_T ⊆ italic_S, which means all input variables in T𝑇Titalic_T are not masked in 𝒙^Ssubscript^𝒙𝑆\hat{\bm{x}}_{S}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT, then JT(𝒙^S)=1subscript𝐽𝑇subscript^𝒙𝑆1J_{T}(\hat{\bm{x}}_{S})={1}italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = 1, indicating the interaction pattern is triggered; otherwise, JT(𝒙^S)=0subscript𝐽𝑇subscript^𝒙𝑆0J_{T}(\hat{\bm{x}}_{S})=0italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = 0, indicating the interaction pattern is not triggered. wTsubscript𝑤𝑇w_{T}italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is a scalar weight. Particularly, let If(T|𝒙)subscript𝐼𝑓conditional𝑇𝒙I_{f}(T|\bm{x})italic_I start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_T | bold_italic_x ) denote the interaction extracted from the function f(𝒙)=TNwTJT(𝒙)𝑓𝒙subscript𝑇𝑁subscript𝑤𝑇subscript𝐽𝑇𝒙f(\bm{x})=\sum_{T\subseteq N}w_{T}J_{T}(\bm{x})italic_f ( bold_italic_x ) = ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ), then we have If(T|𝒙)=wTsubscript𝐼𝑓conditional𝑇𝒙subscript𝑤𝑇I_{f}(T|\bm{x})=w_{T}italic_I start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_T | bold_italic_x ) = italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT.

\bullet Step 2: Based on Eq. (6), the learning of the DNN on an input sample can be reformulated as learning the scalar weight wTsubscript𝑤𝑇w_{T}italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT for each interaction triggering function JT(x)subscript𝐽𝑇𝑥J_{T}(\bm{x})italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ), under a linear regression setting. We can roughly consider the learning problem as a linear regression to a set of potentially true interactions, because it has been discovered by [21, 4] that different DNNs for the same task usually encode similar sets of interactions. Therefore, the learning of a DNN can be considered as training a model to fit a set of pre-defined interactions. In spite of the above simplifying settings, subsequent experiments in Figure 4 still verify that our theoretical results can well predict the learning dynamics of interactions in real DNNs.

Specifically, let the DNN be trained on a set of samples 𝒟={(𝒙,y)}𝒟𝒙𝑦\mathcal{D}=\{(\bm{x},y)\}caligraphic_D = { ( bold_italic_x , italic_y ) }. According to Theorem 2, given each training sample 𝒙𝒙\bm{x}bold_italic_x, output scores of the finally converged DNN on all 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT randomly masked samples {𝒙S:SN}conditional-setsubscript𝒙𝑆𝑆𝑁\{\bm{x}_{S}:S\subseteq N\}{ bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT : italic_S ⊆ italic_N } can be written in the form of yS=defy(𝒙S)=v(𝒙)+subscript𝑦𝑆def𝑦subscript𝒙𝑆limit-from𝑣subscript𝒙y_{S}\overset{\text{def}}{=}y(\bm{x}_{S})=v(\bm{x}_{\emptyset})+italic_y start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT overdef start_ARG = end_ARG italic_y ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) + TN𝟙(𝒙S triggers interaction T)wT=v(𝒙)+TSwTsubscript𝑇𝑁1subscript𝒙𝑆 triggers interaction 𝑇subscriptsuperscript𝑤𝑇𝑣subscript𝒙subscript𝑇𝑆subscriptsuperscript𝑤𝑇\sum_{\emptyset\neq T\subseteq N}\mathbbm{1}(\bm{x}_{S}\text{ triggers}\text{ % interaction }T)\cdot w^{*}_{T}=v(\bm{x}_{\emptyset})+\sum_{\emptyset\neq T% \subseteq S}w^{*}_{T}∑ start_POSTSUBSCRIPT ∅ ≠ italic_T ⊆ italic_N end_POSTSUBSCRIPT blackboard_1 ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT triggers interaction italic_T ) ⋅ italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT ∅ ≠ italic_T ⊆ italic_S end_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, which is determined by parameters {wT:TN}conditional-setsubscriptsuperscript𝑤𝑇𝑇𝑁\{w^{*}_{T}:T\subseteq N\}{ italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT : italic_T ⊆ italic_N }999Note that in the converged output ySsubscript𝑦𝑆y_{S}italic_y start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT, the true interactions {wT:TN}conditional-setsubscriptsuperscript𝑤𝑇𝑇𝑁\{w^{*}_{T}:T\subseteq N\}{ italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT : italic_T ⊆ italic_N } actually mean interactions extracted from the finally converged DNN, which probably contain over-fitted interaction patterns. I.e., {wT:TN}conditional-setsubscriptsuperscript𝑤𝑇𝑇𝑁\{w^{*}_{T}:T\subseteq N\}{ italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT : italic_T ⊆ italic_N } is not the ideal representation for the task.. {wT:TN}conditional-setsubscriptsuperscript𝑤𝑇𝑇𝑁\{w^{*}_{T}:T\subseteq N\}{ italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT : italic_T ⊆ italic_N } can be taken as a set of true interactions that the DNN needs to learn. Therefore, the learning of the converged interactions on the training sample 𝒙𝒙\bm{x}bold_italic_x can be represented as the regression towards the converged function y(𝒙S)𝑦subscript𝒙𝑆y(\bm{x}_{S})italic_y ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) on all masked samples {(𝒙S,yS):SN}conditional-setsubscript𝒙𝑆subscript𝑦𝑆𝑆𝑁\{(\bm{x}_{S},y_{S}):S\subseteq N\}{ ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) : italic_S ⊆ italic_N }.

L(𝒘)=𝔼SN[(yS𝒘𝑱(𝒙S))2].𝐿𝒘subscript𝔼𝑆𝑁delimited-[]superscriptsubscript𝑦𝑆superscript𝒘top𝑱subscript𝒙𝑆2L(\bm{w})=\mathbb{E}_{S\subseteq N}[(y_{S}-\bm{w}^{\top}\bm{J}(\bm{x}_{S}))^{2% }].italic_L ( bold_italic_w ) = blackboard_E start_POSTSUBSCRIPT italic_S ⊆ italic_N end_POSTSUBSCRIPT [ ( italic_y start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT - bold_italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] . (8)

where we simplify the notation as follows. 𝒘=defvec({wT:TN})2n𝒘defvecconditional-setsubscript𝑤𝑇𝑇𝑁superscriptsuperscript2𝑛\bm{w}\overset{\text{def}}{=}{\rm vec}(\{w_{T}:{T\subseteq N}\})\in\mathbb{R}^% {2^{n}}bold_italic_w overdef start_ARG = end_ARG roman_vec ( { italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT : italic_T ⊆ italic_N } ) ∈ blackboard_R start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT denotes the weight vector of 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT different interactions, and 𝑱(𝒙S)=defvec({JT(𝒙S):TN})2n𝑱subscript𝒙𝑆defvecconditional-setsubscript𝐽𝑇subscript𝒙𝑆𝑇𝑁superscriptsuperscript2𝑛\bm{J}(\bm{x}_{S})\overset{\text{def}}{=}{\rm vec}(\{J_{T}(\bm{x}_{S}):{T% \subseteq N}\})\in\mathbb{R}^{2^{n}}bold_italic_J ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) overdef start_ARG = end_ARG roman_vec ( { italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) : italic_T ⊆ italic_N } ) ∈ blackboard_R start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT denotes the vector of triggering values of 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT different interactions {TN}𝑇𝑁\{T\subseteq N\}{ italic_T ⊆ italic_N } on the masked sample 𝒙Ssubscript𝒙𝑆\bm{x}_{S}bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT.

\bullet Step 3: Directly optimizing Eq. (8) gives the interactions of the finally converged DNN wTwTsubscript𝑤𝑇subscriptsuperscript𝑤𝑇w_{T}\leftarrow w^{*}_{T}italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ← italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, but how do we estimate the interactions in an intermediate time point during the training process? To this end, we assume that the training process of the DNN is subject to parameter noises (see Lemma 1). In fact, this assumption is common. Before training, randomly initialized parameters in the DNN are pure noises without clear meanings. In this way, the DNN’s training process can be viewed as a process of gradually reducing the noise on its parameters. This is also supported by the lottery ticket hypothesis [10], i.e., the learning process actually penalizes most noisy parameters and learns a very small number of meaningful parameters. Therefore, as training proceeds, the noise on the network parameters can be considered to gradually diminish.

Lemma 1 (Noisy triggering function, proven in Appendix F.3).

If the inference score of the DNN contains an unlearnable noise, i.e., SN,v~(𝐱S)=v(𝐱S)+ΔvSformulae-sequencefor-all𝑆𝑁~𝑣subscript𝐱𝑆𝑣subscript𝐱𝑆Δsubscript𝑣𝑆\forall S\subseteq N,\widetilde{v}(\bm{x}_{S})=v(\bm{x}_{S})+\Delta v_{S}∀ italic_S ⊆ italic_N , over~ start_ARG italic_v end_ARG ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) + roman_Δ italic_v start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT, ΔvS𝒩(0,σ2)similar-toΔsubscript𝑣𝑆𝒩0superscript𝜎2\Delta v_{S}\sim\mathcal{N}(0,\sigma^{2})roman_Δ italic_v start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), then the interaction between input variables w.r.t. TN𝑇𝑁\emptyset\not=T\subseteq N∅ ≠ italic_T ⊆ italic_N, extracted from inference scores {v~(𝐱S)}~𝑣subscript𝐱𝑆\{\widetilde{v}(\bm{x}_{S})\}{ over~ start_ARG italic_v end_ARG ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) } can be written as I~(T|𝐱)=I(T|𝐱)+ΔIT~𝐼conditional𝑇𝐱𝐼conditional𝑇𝐱Δsubscript𝐼𝑇{\widetilde{I}(T|\bm{x})}=I(T|\bm{x})+\Delta I_{T}over~ start_ARG italic_I end_ARG ( italic_T | bold_italic_x ) = italic_I ( italic_T | bold_italic_x ) + roman_Δ italic_I start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, where ΔITΔsubscript𝐼𝑇\Delta I_{T}roman_Δ italic_I start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT denotes the noise in the interaction caused by the noise in the output ΔvSΔsubscript𝑣𝑆\Delta v_{S}roman_Δ italic_v start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT, and we have 𝔼[ΔIT]=0𝔼delimited-[]Δsubscript𝐼𝑇0\mathbb{E}[\Delta I_{T}]=0blackboard_E [ roman_Δ italic_I start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ] = 0 and Var[ΔIT]=2|T|σ2Vardelimited-[]Δsubscript𝐼𝑇superscript2𝑇superscript𝜎2{\rm Var}[\Delta I_{T}]=2^{|T|}\sigma^{2}roman_Var [ roman_Δ italic_I start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ] = 2 start_POSTSUPERSCRIPT | italic_T | end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. In this way, given an input sample 𝐱^^𝐱\hat{\bm{x}}over^ start_ARG bold_italic_x end_ARG, we can consider the scalar weight wT=I(T|𝐱=𝐱^)subscript𝑤𝑇𝐼conditional𝑇𝐱^𝐱w_{T}=I(T|\bm{x}=\hat{\bm{x}})italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = italic_I ( italic_T | bold_italic_x = over^ start_ARG bold_italic_x end_ARG ), and consider the interaction triggering function J~T(𝐱)=JT(𝐱)+ϵTsubscript~𝐽𝑇𝐱subscript𝐽𝑇𝐱subscriptitalic-ϵ𝑇\widetilde{J}_{T}(\bm{x})=J_{T}(\bm{x})+\epsilon_{T}over~ start_ARG italic_J end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) = italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) + italic_ϵ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, where JT(𝐱)subscript𝐽𝑇𝐱J_{T}(\bm{x})italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) is defined in Eq. (7). ϵT=ΔIT/wTsubscriptitalic-ϵ𝑇Δsubscript𝐼𝑇subscript𝑤𝑇\epsilon_{T}=\Delta I_{T}/w_{T}italic_ϵ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = roman_Δ italic_I start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT / italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT represents the noise term on the triggering function. We have 𝔼[ϵT]=0𝔼delimited-[]subscriptitalic-ϵ𝑇0\mathbb{E}[\epsilon_{T}]=0blackboard_E [ italic_ϵ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ] = 0 and Var[ϵT]2|T|σ2proportional-toVardelimited-[]subscriptitalic-ϵ𝑇superscript2𝑇superscript𝜎2{\rm Var}[\epsilon_{T}]\propto 2^{|T|}\sigma^{2}roman_Var [ italic_ϵ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ] ∝ 2 start_POSTSUPERSCRIPT | italic_T | end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT w.r.t. noises.

Therefore, the learned interactions under unavoidable parameter noises can be represented as minimizing the following loss, where we vectorize the noise ϵ=vec({ϵT:TN})2nbold-italic-ϵvecconditional-setsubscriptitalic-ϵ𝑇𝑇𝑁superscriptsuperscript2𝑛\bm{\epsilon}\!=\!{\rm vec}(\{\epsilon_{T}\!:\!{T\subseteq N}\})\!\in\!\mathbb% {R}^{2^{n}}bold_italic_ϵ = roman_vec ( { italic_ϵ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT : italic_T ⊆ italic_N } ) ∈ blackboard_R start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT for simplicity.

L~(𝒘)=𝔼ϵ𝔼SN[(yS𝒘𝑱~(𝒙S))2]=𝔼ϵ𝔼SN[(yS𝒘(𝑱(𝒙S)+ϵ))2].~𝐿𝒘subscript𝔼bold-italic-ϵsubscript𝔼𝑆𝑁delimited-[]superscriptsubscript𝑦𝑆superscript𝒘top~𝑱subscript𝒙𝑆2subscript𝔼bold-italic-ϵsubscript𝔼𝑆𝑁delimited-[]superscriptsubscript𝑦𝑆superscript𝒘top𝑱subscript𝒙𝑆bold-italic-ϵ2\widetilde{L}(\bm{w})=\mathbb{E}_{\bm{\epsilon}}\mathbb{E}_{S\subseteq N}[(y_{% S}-\bm{w}^{\top}\widetilde{\bm{J}}(\bm{x}_{S}))^{2}]=\mathbb{E}_{\bm{\epsilon}% }\mathbb{E}_{S\subseteq N}[(y_{S}-\bm{w}^{\top}(\bm{J}(\bm{x}_{S})+\bm{% \epsilon}))^{2}].over~ start_ARG italic_L end_ARG ( bold_italic_w ) = blackboard_E start_POSTSUBSCRIPT bold_italic_ϵ end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_S ⊆ italic_N end_POSTSUBSCRIPT [ ( italic_y start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT - bold_italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG bold_italic_J end_ARG ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = blackboard_E start_POSTSUBSCRIPT bold_italic_ϵ end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_S ⊆ italic_N end_POSTSUBSCRIPT [ ( italic_y start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT - bold_italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_J ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) + bold_italic_ϵ ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] . (9)

Remark. The minimizer to Eq. (9) does not represent the end of training, but represents the intermediate state of interactions after a certain epoch in the training process. We formulate the training process as a process of gradually reducing the noise on the DNN’s parameters, and the minimizer 𝒘^^𝒘\hat{\boldsymbol{w}}over^ start_ARG bold_italic_w end_ARG to Eq. (9) represents the optimal interaction state when the training is subject to certain parameter noises. We will show later that the minimizer 𝒘^^𝒘\hat{\boldsymbol{w}}over^ start_ARG bold_italic_w end_ARG computed under different noise levels can accurately predict the dynamics of interactions during the training process (see Figures 4 and 8).

Assumption 1.

To simplify the proof, we assume that different noise terms ϵTsubscriptitalic-ϵ𝑇\epsilon_{T}italic_ϵ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT on the triggering function are independent, and uniformly set the variance as TNfor-all𝑇𝑁\forall\ T\subseteq N∀ italic_T ⊆ italic_N, Var[ϵT]=2|T|σ2Vardelimited-[]subscriptitalic-ϵ𝑇superscript2𝑇superscript𝜎2{\rm Var}[\epsilon_{T}]=2^{|T|}\sigma^{2}roman_Var [ italic_ϵ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ] = 2 start_POSTSUPERSCRIPT | italic_T | end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

Assumption 1 is made according to two findings in Lemma 1: (1) the interaction triggering function J~T(𝒙)subscript~𝐽𝑇𝒙\widetilde{J}_{T}(\bm{x})over~ start_ARG italic_J end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) is real-valued subject to the noise on the DNN’s parameters, (2) the variance of the interaction triggering function J~T(𝒙)subscript~𝐽𝑇𝒙\widetilde{J}_{T}(\bm{x})over~ start_ARG italic_J end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) increases exponentially along with the order |T|𝑇|T|| italic_T |. More importantly, the assumed exponential increase of the variance in the above finding (2) has been widely observed in various DNNs trained for different tasks in previous experiments [26, 22].

Theorem 3 (Proven in Appendix F.4).

Let 𝐰^=argmin𝐰L~(𝐰)^𝐰subscript𝐰~𝐿𝐰\hat{\bm{w}}=\arg\min_{\bm{w}}\widetilde{L}(\bm{w})over^ start_ARG bold_italic_w end_ARG = roman_arg roman_min start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT over~ start_ARG italic_L end_ARG ( bold_italic_w ) denote the optimal solution to the minimization of the loss function L~(𝐰)~𝐿𝐰\widetilde{L}(\bm{w})over~ start_ARG italic_L end_ARG ( bold_italic_w ). Then, we have

𝒘^=(𝑱𝑱+2ndiag(𝒄))1𝑱𝒚=(𝑱𝑱+2ndiag(𝒄))1𝑱𝑱𝒘=𝑴^𝒘,^𝒘superscriptsuperscript𝑱top𝑱superscript2𝑛diag𝒄1superscript𝑱top𝒚superscriptsuperscript𝑱top𝑱superscript2𝑛diag𝒄1superscript𝑱top𝑱superscript𝒘^𝑴superscript𝒘\hat{\bm{w}}=({\bm{J}}^{\top}\bm{J}+2^{n}{\rm diag}(\bm{c}))^{-1}{\bm{J}}^{% \top}\bm{y}=({\bm{J}}^{\top}\bm{J}+2^{n}{\rm diag}(\bm{c}))^{-1}{\bm{J}}^{\top% }\bm{J}\bm{w}^{*}=\hat{\bm{M}}\bm{w}^{*},over^ start_ARG bold_italic_w end_ARG = ( bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J + 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_diag ( bold_italic_c ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_y = ( bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J + 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_diag ( bold_italic_c ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = over^ start_ARG bold_italic_M end_ARG bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , (10)

where 𝐉=def[𝐉(𝐱S1),𝐉(𝐱S2),,𝐉(𝐱S2n)]2n×2n𝐉defsuperscript𝐉subscript𝐱subscript𝑆1𝐉subscript𝐱subscript𝑆2𝐉subscript𝐱subscript𝑆superscript2𝑛topsuperscriptsuperscript2𝑛superscript2𝑛\bm{J}\overset{\text{\rm def}}{=}[\bm{J}(\bm{x}_{S_{1}}),\bm{J}(\bm{x}_{S_{2}}% ),\cdots,\bm{J}(\bm{x}_{S_{2^{n}}})]^{\top}\in\mathbb{R}^{2^{n}\times 2^{n}}bold_italic_J overdef start_ARG = end_ARG [ bold_italic_J ( bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , bold_italic_J ( bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , ⋯ , bold_italic_J ( bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT × 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT is a matrix to represent the triggering values of 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT interactions (w.r.t. 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT columns) on 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT masked samples (w.r.t. 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT rows). 𝐱S1,𝐱S2,,𝐱S2nsubscript𝐱subscript𝑆1subscript𝐱subscript𝑆2subscript𝐱subscript𝑆superscript2𝑛\bm{x}_{S_{1}},\bm{x}_{S_{2}},\cdots,\bm{x}_{S_{2^{n}}}bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , ⋯ , bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT enumerate all masked samples. 𝐲=def[y(𝐱S1),y(𝐱S2),,y(𝐱S2n)]2n𝐲defsuperscript𝑦subscript𝐱subscript𝑆1𝑦subscript𝐱subscript𝑆2𝑦subscript𝐱subscript𝑆superscript2𝑛topsuperscriptsuperscript2𝑛\bm{y}\overset{\text{\rm def}}{=}[y(\bm{x}_{S_{1}}),y(\bm{x}_{S_{2}}),\cdots,y% (\bm{x}_{S_{2^{n}}})]^{\top}\in\mathbb{R}^{2^{n}}bold_italic_y overdef start_ARG = end_ARG [ italic_y ( bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , italic_y ( bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , ⋯ , italic_y ( bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT enumerates the finally-converged outputs on 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT masked samples. 𝐜=defvec({Var[ϵT]:TN})=vec({2|T|σ2:TN})2n𝐜defvecconditional-setVardelimited-[]subscriptitalic-ϵ𝑇𝑇𝑁vecconditional-setsuperscript2𝑇superscript𝜎2𝑇𝑁superscriptsuperscript2𝑛\bm{c}\overset{\text{\rm def}}{=}{\rm vec}(\{{\rm Var}[\epsilon_{T}]:{T% \subseteq N}\})={\rm vec}(\{2^{|T|}\sigma^{2}:{T\subseteq N}\})\in\mathbb{R}^{% 2^{n}}bold_italic_c overdef start_ARG = end_ARG roman_vec ( { roman_Var [ italic_ϵ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ] : italic_T ⊆ italic_N } ) = roman_vec ( { 2 start_POSTSUPERSCRIPT | italic_T | end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT : italic_T ⊆ italic_N } ) ∈ blackboard_R start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT denotes the vector of variances of the triggering values of 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT interactions. The matrix 𝐌^^𝐌\hat{\bm{M}}over^ start_ARG bold_italic_M end_ARG is defined as 𝐌^=def(𝐉𝐉+2ndiag(𝐜))1𝐉𝐉^𝐌defsuperscriptsuperscript𝐉top𝐉superscript2𝑛diag𝐜1superscript𝐉top𝐉\hat{\bm{M}}\overset{\text{\rm def}}{=}({\bm{J}}^{\top}\bm{J}+2^{n}{\rm diag}(% \bm{c}))^{-1}{\bm{J}}^{\top}\bm{J}over^ start_ARG bold_italic_M end_ARG overdef start_ARG = end_ARG ( bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J + 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_diag ( bold_italic_c ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J, and 𝐰=defvec({wT:TN})superscript𝐰defvecconditional-setsubscriptsuperscript𝑤𝑇𝑇𝑁\bm{w}^{*}\overset{\text{\rm def}}{=}{\rm vec}(\{w^{*}_{T}:T\subseteq N\})bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT overdef start_ARG = end_ARG roman_vec ( { italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT : italic_T ⊆ italic_N } ).

In this way, Theorem 3 provides an analytic solution to the minimization of L~(𝒘)~𝐿𝒘\widetilde{L}(\bm{w})over~ start_ARG italic_L end_ARG ( bold_italic_w ) under parameter noises. Experiments in Figure 4 will show that the learning dynamics of interactions derived from our simplifying assumption can still predict the real distribution of interactions over different orders.

3.3.2 Explaining the dynamics in the second phase

Based on the above analytic solution, this subsection aims to prove that in the second phase, the DNN first encodes interactions of low orders and then gradually encodes interactions of higher orders.

\bullet The second phase can be viewed as a process of gradually reducing the noise level σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The analytic solution 𝒘^^𝒘\hat{\bm{w}}over^ start_ARG bold_italic_w end_ARG in Theorem 3 under different noise levels σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT enables us to analyze the dynamics of interactions during the second phase. This is because the noise on the network parameters can be considered to gradually diminish during the training process, as we assume in Section 3.3.1. Then accordingly, the noise level σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT of the noise term ϵTsubscriptitalic-ϵ𝑇\epsilon_{T}italic_ϵ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT on the interaction triggering function also gradually diminishes during training. At the start of the second phase, the noise level σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is large, and the interaction triggering function J~T(𝒙)subscript~𝐽𝑇𝒙\widetilde{J}_{T}(\bm{x})over~ start_ARG italic_J end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) is dominated by the noise term ϵTsubscriptitalic-ϵ𝑇\epsilon_{T}italic_ϵ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. Later, as training proceeds in the second phase, the noise level σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT gradually decreases, making less effect on the interaction triggering function.

\bullet The change of the analytic solution w^^𝑤\hat{\bm{w}}over^ start_ARG bold_italic_w end_ARG along with the decreasing noises σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT explains the dynamics in the second phase. We prove that as σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT decreases, the ratio of low-order interaction strength to high-order interaction strength in the analytic solution 𝒘^^𝒘\hat{\bm{w}}over^ start_ARG bold_italic_w end_ARG decreases. This means that the DNN gradually learns higher-order interactions in the second phase, which can be verified by our observation in Figure 2. The detailed results are derived as follows.

Lemma 2 (Proven in Appendix F.5).

The compositional term JT(𝐱)subscript𝐽𝑇𝐱J_{T}(\bm{x})italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) in the Taylor expansion in Eq. (7) always has fixed values on 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT masked samples {𝐱S:SN}conditional-setsubscript𝐱𝑆𝑆𝑁\{\bm{x}_{S}:S\subseteq N\}{ bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT : italic_S ⊆ italic_N }, i.e., SN,JT(𝐱S)=𝟙(TS)formulae-sequencefor-all𝑆𝑁subscript𝐽𝑇subscript𝐱𝑆1𝑇𝑆\forall S\subseteq N,\ J_{T}(\bm{x}_{S})=\mathbbm{1}(T\subseteq S)∀ italic_S ⊆ italic_N , italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = blackboard_1 ( italic_T ⊆ italic_S ). It means that the matrix 𝐉=[𝐉(𝐱S1),𝐉(𝐱S2),,𝐉(𝐱S2n)]{0,1}2n×2n𝐉superscript𝐉subscript𝐱subscript𝑆1𝐉subscript𝐱subscript𝑆2𝐉subscript𝐱subscript𝑆superscript2𝑛topsuperscript01superscript2𝑛superscript2𝑛\bm{J}=[\bm{J}(\bm{x}_{S_{1}}),\bm{J}(\bm{x}_{S_{2}}),\cdots,\bm{J}(\bm{x}_{S_% {2^{n}}})]^{\top}\in\{0,1\}^{2^{n}\times 2^{n}}bold_italic_J = [ bold_italic_J ( bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , bold_italic_J ( bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , ⋯ , bold_italic_J ( bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT × 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT in Eq. (10) is a fixed binary matrix, no matter how we change the DNN v()𝑣v(\cdot)italic_v ( ⋅ ) or the input sample 𝐱𝐱\bm{x}bold_italic_x.

Theorem 4 (Proven in Appendix F.6).

According to Theorem 3, we can write the analytic solution of the interaction effect w^Tsubscript^𝑤𝑇\hat{w}_{T}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT w.r.t. a subset T𝑇Titalic_T as w^T=𝐦^T𝐰subscript^𝑤𝑇superscriptsubscript^𝐦𝑇topsuperscript𝐰\hat{w}_{T}=\hat{\bm{m}}_{T}^{\top}\bm{w}^{*}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, where 𝐦^T1×2nsuperscriptsubscript^𝐦𝑇topsuperscript1superscript2𝑛\hat{\bm{m}}_{T}^{\top}\in\mathbb{R}^{1\times 2^{n}}over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 1 × 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT denotes a row vector of the matrix 𝐌^=[𝐦^T1,𝐦^T2,𝐦^T2n]^𝐌superscriptsubscript^𝐦subscript𝑇1subscript^𝐦subscript𝑇2subscript^𝐦subscript𝑇superscript2𝑛top\hat{\bm{M}}=[\hat{\bm{m}}_{T_{1}},\hat{\bm{m}}_{T_{2}}\cdots,\hat{\bm{m}}_{T_% {2^{n}}}]^{\top}over^ start_ARG bold_italic_M end_ARG = [ over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋯ , over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, indexed by T𝑇Titalic_T. Combining with Lemma 2, for any two subsets T,TN𝑇superscript𝑇𝑁T,T^{\prime}\subseteq Nitalic_T , italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ italic_N of the same order, i.e., |T|=|T|𝑇superscript𝑇|T|=|T^{\prime}|| italic_T | = | italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT |, we have 𝐦^T2=𝐦^T2subscriptnormsubscript^𝐦𝑇2subscriptnormsubscript^𝐦superscript𝑇2\|\hat{\bm{m}}_{T}\|_{2}=\|\hat{\bm{m}}_{T^{\prime}}\|_{2}∥ over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∥ over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Proposition 1.

For any two subsets T,TN𝑇superscript𝑇𝑁T,T^{\prime}\subseteq Nitalic_T , italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ italic_N with |T|<|T|𝑇superscript𝑇|T|<|T^{\prime}|| italic_T | < | italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT |, 𝐦^T2/𝐦^T2subscriptnormsubscript^𝐦𝑇2subscriptnormsubscript^𝐦superscript𝑇2{\|\hat{\bm{m}}_{T}\|_{2}}/{\|\hat{\bm{m}}_{T^{\prime}}\|_{2}}∥ over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / ∥ over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is greater than 1 and decreases monotonically as σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT decreases throughout training. The norm 𝐦^T2subscriptnormsubscript^𝐦𝑇2\|\hat{\bm{m}}_{T}\|_{2}∥ over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is only determined by n𝑛nitalic_n, σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and the order |T|𝑇|T|| italic_T |, but is agnostic to finally-converged interactions {wT:TN}conditional-setsubscriptsuperscript𝑤𝑇𝑇𝑁\{w^{*}_{T}:T\subseteq N\}{ italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT : italic_T ⊆ italic_N }.

Refer to caption
Figure 3: Monotonic increase of r(k)superscript𝑟𝑘r^{(k)}italic_r start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT along with σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT mentioned in Proposition 1. We show the curves of r(k)superscript𝑟𝑘r^{(k)}italic_r start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT when we set different numbers of input variables n𝑛nitalic_n and different orders k=1,,n1𝑘1𝑛1k=1,\cdots,n-1italic_k = 1 , ⋯ , italic_n - 1.

Proposition 1 shows a monotonic decrease of 𝒎^T2/𝒎^T2subscriptnormsubscript^𝒎𝑇2subscriptnormsubscript^𝒎superscript𝑇2{\|\hat{\bm{m}}_{T}\|_{2}}/{\|\hat{\bm{m}}_{T^{\prime}}\|_{2}}∥ over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / ∥ over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT along with the decrease of σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The physical meaning of 𝒎^T2/𝒎^T2subscriptnormsubscript^𝒎𝑇2subscriptnormsubscript^𝒎superscript𝑇2{\|\hat{\bm{m}}_{T}\|_{2}}/{\|\hat{\bm{m}}_{T^{\prime}}\|_{2}}∥ over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / ∥ over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT can be understood as follows. According to w^T=𝒎^T𝒘subscript^𝑤𝑇superscriptsubscript^𝒎𝑇topsuperscript𝒘\hat{w}_{T}=\hat{\bm{m}}_{T}^{\top}\bm{w}^{*}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, 𝒎^T2subscriptnormsubscript^𝒎𝑇2{\|\hat{\bm{m}}_{T}\|_{2}}∥ over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT reflects the strength of the DNN encoding the interaction T𝑇Titalic_T. In this way, 𝒎^T2/𝒎^T2subscriptnormsubscript^𝒎𝑇2subscriptnormsubscript^𝒎superscript𝑇2{\|\hat{\bm{m}}_{T}\|_{2}}/{\|\hat{\bm{m}}_{T^{\prime}}\|_{2}}∥ over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / ∥ over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT measures the relative strength of encoding a low-order interaction T𝑇Titalic_T w.r.t. that of encoding a high-order interaction Tsuperscript𝑇T^{\prime}italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

Conclusions from Theorem 4 and Proposition 1: Because the second phase is viewed as a process of gradually reducing the noise level σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, Theorem 4 and Proposition 1 explain why the DNN mainly encodes low-order interactions and suppresses high-order interactions at the start of the second phase (when σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is large). They also explain why the DNN learns interactions of increasing orders during the second phase (when σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT gradually decreases).

Experimental verification of Proposition 1: We measured the relative strength r(k)=def𝒎^T2/𝒎^T2superscript𝑟𝑘defsubscriptnormsubscript^𝒎𝑇2subscriptnormsubscript^𝒎superscript𝑇2r^{(k)}\overset{\text{def}}{=}{\|{\hat{\bm{m}}}_{T}\|_{2}}/{\|\hat{\bm{m}}_{T^% {\prime}}\|_{2}}italic_r start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT overdef start_ARG = end_ARG ∥ over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / ∥ over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT subject to |T|=k𝑇𝑘|T|=k| italic_T | = italic_k and |T|=k+1superscript𝑇𝑘1|T^{\prime}|=k+1| italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | = italic_k + 1, for k=1,,n1𝑘1𝑛1k=1,\cdots,n-1italic_k = 1 , ⋯ , italic_n - 1, under different values of σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Figure 3 shows that when σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT decreased, r(k)superscript𝑟𝑘r^{(k)}italic_r start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT monotonically decreased for all orders k=1,,n1𝑘1𝑛1k=1,\cdots,n-1italic_k = 1 , ⋯ , italic_n - 1, which verified the proposition. The experiment was conducted using different numbers of input variables n𝑛nitalic_n.

Theorem 5 (Proven in Appendix F.7).

When σ=0𝜎0\sigma=0italic_σ = 0, 𝐰^^𝐰\hat{\bm{w}}over^ start_ARG bold_italic_w end_ARG satisfies TN,w^T=wTformulae-sequencefor-all𝑇𝑁subscript^𝑤𝑇subscriptsuperscript𝑤𝑇\forall\ \emptyset\neq T\subseteq N,\ \hat{w}_{T}=w^{*}_{T}∀ ∅ ≠ italic_T ⊆ italic_N , over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT.

Theorem 5 shows a special case when there is no noise on the network parameters. Then, the DNN learns the finally converged interactions {wT:TN}conditional-setsubscriptsuperscript𝑤𝑇𝑇𝑁\{w^{*}_{T}:T\subseteq N\}{ italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT : italic_T ⊆ italic_N }. Note that the finally converged DNN probably encodes some interactions of high orders, which correspond to over-fitted patterns.

Refer to caption
Figure 4: Comparison between the theoretical distribution of interaction strength Itheo(k)superscriptsubscript𝐼theo𝑘I_{\text{theo}}^{(k)}italic_I start_POSTSUBSCRIPT theo end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT and the real distribution of interaction strength Ireal(k)superscriptsubscript𝐼real𝑘I_{\text{real}}^{(k)}italic_I start_POSTSUBSCRIPT real end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT in the second phase. Please see Appendix J.3 for the comparison on the other six DNNs trained for 3D point cloud/image/sentiment classification.

\bullet Experiments on real datasets. We conducted experiments to examine whether our theory could predict the real dynamics of interaction strength of different orders when we trained DNNs in practice. We trained AlexNet and VGG on the MNIST dataset, the CIFAR-10 dataset, the CUB-200-2011 dataset, and the Tiny-ImageNet dataset, trained BERT-Tiny and BERT-Medium on the SST-2 dataset, and trained DGCNN on the ShapeNet dataset. Then, we computed the real distribution of interaction strength over different orders on each DNN, and tracked the change of the distribution throughout the training process. As mentioned in Section 3.2, the real interaction strength of each k𝑘kitalic_k-th order was quantified as Ireal(k)=𝔼𝒙[S:|S|=k,|I(S|𝒙)|τ|I(S|𝒙)|]/ZI_{\text{real}}^{(k)}=\mathbb{E}_{\bm{x}}[\sum_{S:|S|=k,|I(S|\bm{x})|\geq\tau}% |I(S|\bm{x})|]\ /\ Zitalic_I start_POSTSUBSCRIPT real end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = blackboard_E start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_S : | italic_S | = italic_k , | italic_I ( italic_S | bold_italic_x ) | ≥ italic_τ end_POSTSUBSCRIPT | italic_I ( italic_S | bold_italic_x ) | ] / italic_Z 101010In experiments, the real distribution of interaction strength Ireal(k)superscriptsubscript𝐼real𝑘I_{\text{real}}^{(k)}italic_I start_POSTSUBSCRIPT real end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT was computed using both AND and OR interactions. Because the OR interaction was a special AND interaction and had similar dynamics, this experiment actually tested the fidelity of our theory to explain the dynamics of all interactions. Nevertheless, Appendix J.4 also reports the fitness of the theoretical distribution Itheo(k)superscriptsubscript𝐼theo𝑘I_{\text{theo}}^{(k)}italic_I start_POSTSUBSCRIPT theo end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT and real distribution of AND interactions.. Accordingly, we defined the metric Itheo(k)=𝔼𝒙[S:|S|=k,|w^S|τtheo|w^S|]/Ztheosuperscriptsubscript𝐼theo𝑘subscript𝔼𝒙delimited-[]subscript:𝑆formulae-sequence𝑆𝑘subscript^𝑤𝑆subscript𝜏theosubscript^𝑤𝑆subscript𝑍theoI_{\text{theo}}^{(k)}=\mathbb{E}_{\bm{x}}[\sum_{S:|S|=k,|\hat{w}_{S}|\geq\tau_% {\text{theo}}}|\hat{w}_{S}|]\ /\ Z_{\text{theo}}italic_I start_POSTSUBSCRIPT theo end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = blackboard_E start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_S : | italic_S | = italic_k , | over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT | ≥ italic_τ start_POSTSUBSCRIPT theo end_POSTSUBSCRIPT end_POSTSUBSCRIPT | over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT | ] / italic_Z start_POSTSUBSCRIPT theo end_POSTSUBSCRIPT in the same way of Ireal(k)superscriptsubscript𝐼real𝑘I_{\text{real}}^{(k)}italic_I start_POSTSUBSCRIPT real end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT to measure the theoretical distribution of the interaction strength, where Ztheo=𝔼1kn𝔼𝒙[S:|S|=k,|w^S|τtheo|w^S|]subscript𝑍theosubscript𝔼1superscript𝑘𝑛subscript𝔼𝒙delimited-[]subscript:𝑆formulae-sequence𝑆superscript𝑘subscript^𝑤𝑆subscript𝜏theosubscript^𝑤𝑆Z_{\text{theo}}=\mathbb{E}_{1\leq k^{\prime}\leq n}\mathbb{E}_{\bm{x}}[\sum_{S% :|S|=k^{\prime},|\hat{w}_{S}|\geq\tau_{\text{theo}}}|\hat{w}_{S}|]italic_Z start_POSTSUBSCRIPT theo end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT 1 ≤ italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≤ italic_n end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_S : | italic_S | = italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , | over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT | ≥ italic_τ start_POSTSUBSCRIPT theo end_POSTSUBSCRIPT end_POSTSUBSCRIPT | over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT | ], τtheo=0.03|vtheo(𝒙)w^|subscript𝜏theo0.03subscript𝑣theo𝒙subscript^𝑤\tau_{\text{theo}}=0.03\cdot|v_{\text{theo}}(\bm{x})-\hat{w}_{\emptyset}|italic_τ start_POSTSUBSCRIPT theo end_POSTSUBSCRIPT = 0.03 ⋅ | italic_v start_POSTSUBSCRIPT theo end_POSTSUBSCRIPT ( bold_italic_x ) - over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT |, and vtheo(𝒙)=defSNw^Ssubscript𝑣theo𝒙defsubscript𝑆𝑁subscript^𝑤𝑆v_{\text{theo}}(\bm{x})\overset{\text{def}}{=}\sum_{S\subseteq N}\hat{w}_{S}italic_v start_POSTSUBSCRIPT theo end_POSTSUBSCRIPT ( bold_italic_x ) overdef start_ARG = end_ARG ∑ start_POSTSUBSCRIPT italic_S ⊆ italic_N end_POSTSUBSCRIPT over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT. To compute the theoretical solution 𝒘^=𝑴^𝒘^𝒘^𝑴superscript𝒘\hat{\bm{w}}=\hat{\bm{M}}\bm{w}^{*}over^ start_ARG bold_italic_w end_ARG = over^ start_ARG bold_italic_M end_ARG bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in Eq. (10), given an input sample 𝒙𝒙\bm{x}bold_italic_x, we used the set of salient interactions Ω={SN:|I(S|𝒙)|τ})\Omega=\{S\subseteq N:|I(S|\bm{x})|\geq\tau\})roman_Ω = { italic_S ⊆ italic_N : | italic_I ( italic_S | bold_italic_x ) | ≥ italic_τ } ) extracted from the finally converged DNN to construct the set of true interactions 𝒘superscript𝒘\bm{w}^{*}bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

Figure 4 shows that the theoretical distribution Itheo(k)superscriptsubscript𝐼theo𝑘I_{\text{theo}}^{(k)}italic_I start_POSTSUBSCRIPT theo end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT could well match the real distribution Ireal(k)superscriptsubscript𝐼real𝑘I_{\text{real}}^{(k)}italic_I start_POSTSUBSCRIPT real end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT at different training epochs. Particularly, we used a sequence of theoretical distributions of Itheo(k)superscriptsubscript𝐼theo𝑘I_{\text{theo}}^{(k)}italic_I start_POSTSUBSCRIPT theo end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT with decreasing σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT values to match the real distribution of Ireal(k)superscriptsubscript𝐼real𝑘I_{\text{real}}^{(k)}italic_I start_POSTSUBSCRIPT real end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT at different epochs. The σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT value was determined to achieve the best match between Itheo(k)superscriptsubscript𝐼theo𝑘I_{\text{theo}}^{(k)}italic_I start_POSTSUBSCRIPT theo end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT and Ireal(k)superscriptsubscript𝐼real𝑘I_{\text{real}}^{(k)}italic_I start_POSTSUBSCRIPT real end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT.

3.3.3 Explaining the dynamics in the first phase

Because the spindle-shaped distribution of interaction strength in a randomly initialized DNN has already been proven by [41], in this subsection, let us further explain the DNN’s dynamics in the first phase based on Eq. (9). As previously shown in Figure 2, in the first phase, the DNN removes initial interactions of medium and high orders, and mainly encodes low-order interactions.

Therefore, the first phase is explained as the process of removing chaotic initial interactions and converging to the optimal solution to Eq. (9) under large parameter noise (i.e., large σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT). In sum, the first phase is a process of pushing initial random interactions to the optimal solution, while the second phase corresponds to the change of the optimal solution as σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT gradually decreases.

4 Conclusion and discussion

In this study, we have proven the two-phase dynamics of a DNN learning interactions of different orders. Specifically, we have followed [26, 22] to reformulate the learning of interactions as a linear regression problem on a set of interaction triggering functions. In this way, we have successfully derived an analytic solution to interaction effects when the DNN was learned with unavoidable parameter noises. This analytic solution has successfully predicted a DNN’s two-phase dynamics of learning interactions in real experiments. Considering a series of recent theoretical guarantees of taking interactions as faithful primitive inference patterns encoded by the DNN [44, 27], our study has first mathematically explained why and how the learning process gradually shifts attention from generalizable (low-order) inference patterns to probably over-fitted (high-order) inference patterns.

Practical implications. A theoretical understanding of the two-phase dynamics of interactions offers a new perspective to monitor the overfitting level of the DNN on different training samples throughout training. The two-phase dynamics enables us to evaluate the overfitting level of each specific sample, making overfitting no longer a problem w.r.t. the entire dataset. We can track the change of the interaction complexity for each training sample, and take the time point when high-order interactions increase as a sign of overfitting. In this way, the two-phase dynamics of interactions may help people remove overfitted samples from training and guide the early stopping on a few "hard samples."

Acknowledgements. This work is partially supported by the National Science and Technology Major Project (2021ZD0111602), the National Nature Science Foundation of China (92370115, 62276165). This work is also partially supported by Huawei Technologies Inc.

References

  • [1] Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. Sanity checks for saliency maps. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  • [2] David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  • [3] Zhengping Che, Sanjay Purushotham, Robinder Khemani, and Yan Liu. Interpretable deep models for icu outcome prediction. In AMIA annual symposium proceedings, volume 2016, page 371. American Medical Informatics Association, 2016.
  • [4] Lu Chen, Siyu Lou, Benhao Huang, and Quanshi Zhang. Defining and extracting generalizable interaction primitives from DNNs. In The Twelfth International Conference on Learning Representations, 2024.
  • [5] Xu Cheng, Chuntung Chu, Yi Zheng, Jie Ren, and Quanshi Zhang. A Game-Theoretic Taxonomy of Visual Concepts in DNNs. arXiv preprint arXiv:2106.10938, 2021.
  • [6] Xu Cheng, Xin Wang, Haotian Xue, Zhengyang Liang, and Quanshi Zhang. A Hypothesis for the Aesthetic Appreciation in Neural Networks. arXiv preprint arXiv::2108.02646, 2021.
  • [7] Huiqi Deng, Qihan Ren, Xu Chen, Hao Zhang, Jie Ren, and Quanshi Zhang. Discovering and Explaining the Representation Bottleneck of DNNs. ICLR, 2021.
  • [8] Huiqi Deng, Na Zou, Mengnan Du, Weifu Chen, Guocan Feng, Ziwei Yang, Zheyang Li, and Quanshi Zhang. Unifying Fourteen Post-hoc Attribution Methods with Taylor Interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.
  • [9] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
  • [10] Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations, 2019.
  • [11] Nicholas Frosst and Geoffrey Hinton. Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784, 2017.
  • [12] Marzyeh Ghassemi, Luke Oakden-Rayner, and Andrew L Beam. The false hope of current approaches to explainable artificial intelligence in health care. The Lancet Digital Health, 3(11):e745–e750, 2021.
  • [13] Michel Grabisch and Marc Roubens. An axiomatic approach to the concept of interaction among players in cooperative games. International Journal of game theory, 28(4):547–565, 1999.
  • [14] John C. Harsanyi. A simplified bargaining model for the n-person cooperative game. International Economic Review, 4(2):194–220, 1963.
  • [15] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory sayres. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 2668–2677. PMLR, 10–15 Jul 2018.
  • [16] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
  • [17] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, volume 25, pages 1097–1105, 2012.
  • [18] Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
  • [19] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  • [20] Mingjie Li and Quanshi Zhang. Defining and Quantifying AND-OR Interactions for Faithful and Concise Explanation of DNNs. arXiv preprint arXiv:2304.13312, 2023.
  • [21] Mingjie Li and Quanshi Zhang. Does a Neural Network Really Encode Symbolic Concepts? International Conference on Machine Learning, 2023.
  • [22] Dongrui Liu, Huiqi Deng, Xu Cheng, Qihan Ren, Kangrui Wang, and Quanshi Zhang. Towards the Difficulty for a Deep Neural Network to Learn Concepts of Different Complexities. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  • [23] Jie Ren, Mingjie Li, Qirui Chen, Huiqi Deng, and Quanshi Zhang. Defining and Quantifying the Emergence of Sparse Concepts in DNNs. In The IEEE/CVF Computer Vision and Pattern Recognition Conference, 2023.
  • [24] Jie Ren, Die Zhang, Yisen Wang, Lu Chen, Zhanpeng Zhou, Yiting Chen, Xu Cheng, Xin Wang, Meng Zhou, Jie Shi, and Quanshi Zhang. Towards a Unified Game-Theoretic View of Adversarial Perturbations and Robustness. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 3797–3810. Curran Associates, Inc., 2021.
  • [25] Jie Ren, Zhanpeng Zhou, Qirui Chen, and Quanshi Zhang. Can We Faithfully Represent Absence States to Compute Shapley Values on a DNN? In International Conference on Learning Representations, 2023.
  • [26] Qihan Ren, Huiqi Deng, Yunuo Chen, Siyu Lou, and Quanshi Zhang. Bayesian Neural Networks Tend to Ignore Complex and Sensitive Concepts. International Conference on Machine Learning, 2023.
  • [27] Qihan Ren, Jiayang Gao, Wen Shen, and Quanshi Zhang. Where We Have Arrived in Proving the Emergence of Sparse Interaction Primitives in DNNs. In The Twelfth International Conference on Learning Representations, 2024.
  • [28] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5):206–215, 2019.
  • [29] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  • [30] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. In International Conference on Learning Representations, 2014.
  • [31] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2014.
  • [32] Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642, 2013.
  • [33] Mukund Sundararajan, Kedar Dhamdhere, and Ashish Agarwal. The shapley taylor interaction index. In International Conference on Machine Learning, pages 9259–9268. PMLR, 2020.
  • [34] Sarah Tan, Giles Hooker, Paul Koch, Albert Gordo, and Rich Caruana. Considerations when learning additive explanations for black-box models. arXiv preprint arXiv:1801.08640, 2018.
  • [35] Joel Vaughan, Agus Sudjianto, Erind Brahimi, Jie Chen, and Vijayan N Nair. Explainable neural networks based on additive index models. arXiv preprint arXiv:1806.01933, 2018.
  • [36] Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. The Caltech-UCSD Birds-200-2011 Dataset. 2011.
  • [37] Xin Wang, Jie Ren, Shuyun Lin, Xiangming Zhu, Yisen Wang, and Quanshi Zhang. A Unified Approach to Interpreting and Boosting Adversarial Transferability. In International Conference on Learning Representations, 2021.
  • [38] Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph., 38(5), oct 2019.
  • [39] Li Yi, Vladimir G Kim, Duygu Ceylan, I-Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Sheffer, and Leonidas Guibas. A scalable active framework for region annotation in 3d shape collections. ACM Transactions on Graphics (ToG), 35(6):1–12, 2016.
  • [40] Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579, 2015.
  • [41] Junpeng Zhang, Qing Li, Liang Lin, and Quanshi Zhang. Two-phase dynamics of interactions explains the starting point of a dnn learning over-fitted features. arXiv preprint arXiv:2405.10262v1, 2024.
  • [42] Quanshi Zhang, Xin Wang, Jie Ren, Xu Cheng, Shuyun Lin, Yisen Wang, and Xiangming Zhu. Proving Common Mechanisms Shared by Twelve Methods of Boosting Adversarial Transferability. arXiv preprint arXiv:2207.11694, 2022.
  • [43] Huilin Zhou, Huijie Tang, Mingjie Li, Hao Zhang, Zhenyu Liu, and Quanshi Zhang. Explaining how a neural network play the go game and let people learn. arXiv preprint arXiv:2310.09838, 2023.
  • [44] Huilin Zhou, Hao Zhang, Huiqi Deng, Dongrui Liu, Wen Shen, Shih-Han Chan, and Quanshi Zhang. Explaining Generalization Power of a DNN using Interactive Concepts. In Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024. AAAI Press, 2024.

Appendix A Properties of the AND interaction

The Harsanyi interaction [14] (i.e., the AND interaction in this paper) was a standard metric to measure the AND relationship between input variables encoded by the network. In this section, we present several desirable properties/axioms that the Harsanyi AND interaction Iand(S|𝒙)subscript𝐼andconditional𝑆𝒙I_{\text{and}}(S|\bm{x})italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) satisfies. These properties further demonstrate the faithfulness of using Harsanyi AND interaction to explain the inference score of a DNN.

(1) Efficiency axiom (proven by [14]). The output score of a model can be decomposed into interaction effects of different patterns, i.e. v(𝒙)=SNIand(S|𝒙)𝑣𝒙subscript𝑆𝑁subscript𝐼andconditional𝑆𝒙v(\bm{x})=\sum_{S\subseteq N}I_{\text{and}}(S|\bm{x})italic_v ( bold_italic_x ) = ∑ start_POSTSUBSCRIPT italic_S ⊆ italic_N end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ).

(2) Linearity axiom. If we merge output scores of two models v1subscript𝑣1v_{1}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and v2subscript𝑣2v_{2}italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT as the output of model v𝑣vitalic_v, i.e. SN,v(𝒙S)=v1(𝒙S)+v2(𝒙S)formulae-sequencefor-all𝑆𝑁𝑣subscript𝒙𝑆subscript𝑣1subscript𝒙𝑆subscript𝑣2subscript𝒙𝑆\forall S\subseteq N,~{}v(\bm{x}_{S})=v_{1}(\bm{x}_{S})+v_{2}(\bm{x}_{S})∀ italic_S ⊆ italic_N , italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) + italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ), then their interaction effects Iandv1(S|𝒙)subscriptsuperscript𝐼subscript𝑣1andconditional𝑆𝒙I^{v_{1}}_{\text{and}}(S|\bm{x})italic_I start_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) and Iandv2(S|𝒙)subscriptsuperscript𝐼subscript𝑣2andconditional𝑆𝒙I^{v_{2}}_{\text{and}}(S|\bm{x})italic_I start_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) can also be merged as SN,Iandv(S|𝒙)=Iandv1(S|𝒙)+Iandv2(S|𝒙)formulae-sequencefor-all𝑆𝑁subscriptsuperscript𝐼𝑣andconditional𝑆𝒙subscriptsuperscript𝐼subscript𝑣1andconditional𝑆𝒙subscriptsuperscript𝐼subscript𝑣2andconditional𝑆𝒙\forall S\subseteq N,I^{v}_{\text{and}}(S|\bm{x})=I^{v_{1}}_{\text{and}}(S|\bm% {x})+I^{v_{2}}_{\text{and}}(S|\bm{x})∀ italic_S ⊆ italic_N , italic_I start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) = italic_I start_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) + italic_I start_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ).

(3) Dummy axiom. If a variable iN𝑖𝑁i\in Nitalic_i ∈ italic_N is a dummy variable, i.e. SN{i},v(𝒙S{i})=v(𝒙S)+v(𝒙{i})formulae-sequencefor-all𝑆𝑁𝑖𝑣subscript𝒙𝑆𝑖𝑣subscript𝒙𝑆𝑣subscript𝒙𝑖\forall S\subseteq N\setminus\{i\},v(\bm{x}_{S\cup\{i\}})=v(\bm{x}_{S})+v(\bm{% x}_{\{i\}})∀ italic_S ⊆ italic_N ∖ { italic_i } , italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_S ∪ { italic_i } end_POSTSUBSCRIPT ) = italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) + italic_v ( bold_italic_x start_POSTSUBSCRIPT { italic_i } end_POSTSUBSCRIPT ), then it has no interaction with other variables, SN{i}for-all𝑆𝑁𝑖\forall\ \emptyset\not=S\subseteq N\setminus\{i\}∀ ∅ ≠ italic_S ⊆ italic_N ∖ { italic_i }, Iand(S{i}|𝒙)=0subscript𝐼and𝑆conditional𝑖𝒙0I_{\text{and}}(S\cup\{i\}|\bm{x})=0italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S ∪ { italic_i } | bold_italic_x ) = 0.

(4) Symmetry axiom. If input variables i,jN𝑖𝑗𝑁i,j\in Nitalic_i , italic_j ∈ italic_N cooperate with other variables in the same way, SN{i,j},v(𝒙S{i})=v(𝒙S{j})formulae-sequencefor-all𝑆𝑁𝑖𝑗𝑣subscript𝒙𝑆𝑖𝑣subscript𝒙𝑆𝑗\forall S\subseteq N\setminus\{i,j\},v(\bm{x}_{S\cup\{i\}})=v(\bm{x}_{S\cup\{j% \}})∀ italic_S ⊆ italic_N ∖ { italic_i , italic_j } , italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_S ∪ { italic_i } end_POSTSUBSCRIPT ) = italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_S ∪ { italic_j } end_POSTSUBSCRIPT ), then they have same interaction effects with other variables, SN{i,j},Iand(S{i}|𝒙)=Iand(S{j}|𝒙)formulae-sequencefor-all𝑆𝑁𝑖𝑗subscript𝐼and𝑆conditional𝑖𝒙subscript𝐼and𝑆conditional𝑗𝒙\forall S\subseteq N\setminus\{i,j\},I_{\text{and}}(S\cup\{i\}|\bm{x})=I_{% \text{and}}(S\cup\{j\}|\bm{x})∀ italic_S ⊆ italic_N ∖ { italic_i , italic_j } , italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S ∪ { italic_i } | bold_italic_x ) = italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S ∪ { italic_j } | bold_italic_x ).

(5) Anonymity axiom. For any permutations π𝜋\piitalic_π on N𝑁Nitalic_N, we have SN,Iandv(S|𝒙)=Iandπv(πS|𝒙)formulae-sequencefor-all𝑆𝑁subscriptsuperscript𝐼𝑣andconditional𝑆𝒙subscriptsuperscript𝐼𝜋𝑣andconditional𝜋𝑆𝒙\forall S\!\subseteq\!N,I^{v}_{\text{and}}(S|\bm{x})=I^{\pi v}_{\text{and}}(% \pi S|\bm{x})∀ italic_S ⊆ italic_N , italic_I start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) = italic_I start_POSTSUPERSCRIPT italic_π italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_π italic_S | bold_italic_x ), where πS=def{π(i)|iS}𝜋𝑆defconditional-set𝜋𝑖𝑖𝑆\pi S\overset{\text{def}}{=}\{\pi(i)|i\in S\}italic_π italic_S overdef start_ARG = end_ARG { italic_π ( italic_i ) | italic_i ∈ italic_S }, and the new model πv𝜋𝑣\pi vitalic_π italic_v is defined by (πv)(𝒙πS)=v(𝒙S)𝜋𝑣subscript𝒙𝜋𝑆𝑣subscript𝒙𝑆(\pi v)(\bm{x}_{\pi S})=v(\bm{x}_{S})( italic_π italic_v ) ( bold_italic_x start_POSTSUBSCRIPT italic_π italic_S end_POSTSUBSCRIPT ) = italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ). This indicates that interaction effects are not changed by permutation.

(6) Recursive axiom. The interaction effects can be computed recursively. For iN𝑖𝑁i\in Nitalic_i ∈ italic_N and SN{i}𝑆𝑁𝑖S\subseteq N\setminus\{i\}italic_S ⊆ italic_N ∖ { italic_i }, the interaction effect of the pattern S{i}𝑆𝑖S\cup\{i\}italic_S ∪ { italic_i } is equal to the interaction effect of S𝑆Sitalic_S with the presence of i𝑖iitalic_i minus the interaction effect of S𝑆Sitalic_S with the absence of i𝑖iitalic_i, i.e. SN{i},Iand(S{i}|𝒙)=Iand(S|𝒙,i is always present)Iand(S|𝒙)formulae-sequencefor-all𝑆𝑁𝑖subscript𝐼and𝑆conditional𝑖𝒙subscript𝐼andconditional𝑆𝒙𝑖 is always presentsubscript𝐼andconditional𝑆𝒙\forall S\!\subseteq\!N\!\setminus\!\{i\},I_{\text{and}}(S\cup\{i\}|\bm{x})=I_% {\text{and}}(S|\bm{x},i\text{ is always present})-I_{\text{and}}(S|\bm{x})∀ italic_S ⊆ italic_N ∖ { italic_i } , italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S ∪ { italic_i } | bold_italic_x ) = italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x , italic_i is always present ) - italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ). Iand(S|𝒙,i is always present)subscript𝐼andconditional𝑆𝒙𝑖 is always presentI_{\text{and}}(S|\bm{x},i\text{ is always present})italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x , italic_i is always present ) denotes the interaction effect when the variable i𝑖iitalic_i is always present as a constant context, i.e. Iand(S|𝒙,i is always present)=LS(1)|S||L|v(𝒙L{i})subscript𝐼andconditional𝑆𝒙𝑖 is always presentsubscript𝐿𝑆superscript1𝑆𝐿𝑣subscript𝒙𝐿𝑖I_{\text{and}}(S|\bm{x},i\text{ is always present})=\sum_{L\subseteq S}(-1)^{|% S|-|L|}\cdot v(\bm{x}_{L\cup\{i\}})italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x , italic_i is always present ) = ∑ start_POSTSUBSCRIPT italic_L ⊆ italic_S end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_S | - | italic_L | end_POSTSUPERSCRIPT ⋅ italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_L ∪ { italic_i } end_POSTSUBSCRIPT ).

(7) Interaction distribution axiom. This axiom characterizes how interactions are distributed for “interaction functions” [33]. An interaction function vTsubscript𝑣𝑇v_{T}italic_v start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT parameterized by a subset of variables T𝑇Titalic_T is defined as follows. SNfor-all𝑆𝑁\forall S\subseteq N∀ italic_S ⊆ italic_N, if TS𝑇𝑆T\subseteq Sitalic_T ⊆ italic_S, vT(𝒙S)=csubscript𝑣𝑇subscript𝒙𝑆𝑐v_{T}(\bm{x}_{S})=citalic_v start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = italic_c ; otherwise, vT(𝒙S)=0subscript𝑣𝑇subscript𝒙𝑆0v_{T}(\bm{x}_{S})=0italic_v start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = 0. The function vTsubscript𝑣𝑇v_{T}italic_v start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT models pure interaction among the variables in T𝑇Titalic_T, because only if all variables in T𝑇Titalic_T are present, the output value will be increased by c𝑐citalic_c. The interactions encoded in the function vTsubscript𝑣𝑇v_{T}italic_v start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT satisfies Iand(T|𝒙)=csubscript𝐼andconditional𝑇𝒙𝑐I_{\text{and}}(T|\bm{x})=citalic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_T | bold_italic_x ) = italic_c, and STfor-all𝑆𝑇\forall S\neq T∀ italic_S ≠ italic_T, Iand(S|𝒙)=0subscript𝐼andconditional𝑆𝒙0I_{\text{and}}(S|\bm{x})=0italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) = 0.

Appendix B Common conditions for sparse interactions

Ren et al. [27] have formulated three mathematical conditions for the sparsity of AND interactions, as follows.

Condition 1. The DNN does not encode interactions higher than the M𝑀Mitalic_M-th order: S{SN|S|M+1},Iand(S|𝐱)=0formulae-sequencefor-all𝑆conditional-set𝑆𝑁𝑆𝑀1subscript𝐼andconditional𝑆𝐱0\forall\ S\in\{S\subseteq N\mid|S|\geq M+1\},\ I_{\text{and}}(S|\bm{x})=0∀ italic_S ∈ { italic_S ⊆ italic_N ∣ | italic_S | ≥ italic_M + 1 } , italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) = 0.

Condition 1 implies that the DNN does not encode extremely high-order interactions. This is because extremely high-order interactions usually represent very complex and over-fitted patterns, which are unnecessary and unlikely to be learned by the DNN in real applications.

Condition 2. Let us consider the average network output u¯(k)=def𝔼|S|=k[v(𝐱S)v(𝐱)]superscript¯𝑢𝑘defsubscript𝔼𝑆𝑘delimited-[]𝑣subscript𝐱𝑆𝑣subscript𝐱\bar{u}^{(k)}\overset{\text{\rm def}}{=}\mathbb{E}_{|S|=k}[v(\bm{x}_{S})-v(\bm% {x}_{\emptyset})]over¯ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT overdef start_ARG = end_ARG blackboard_E start_POSTSUBSCRIPT | italic_S | = italic_k end_POSTSUBSCRIPT [ italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) - italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) ] over all masked samples 𝐱Ssubscript𝐱𝑆\bm{x}_{S}bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT with k𝑘kitalic_k unmasked input variables. This average network output monotonically increases when k𝑘kitalic_k increases: kkfor-allsuperscript𝑘𝑘\forall\ k^{\prime}\leq k∀ italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≤ italic_k, we have u¯(k)u¯(k)superscript¯𝑢superscript𝑘superscript¯𝑢𝑘\bar{u}^{(k^{\prime})}\leq\bar{u}^{(k)}over¯ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT ( italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT ≤ over¯ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT.

Condition 2 implies that a well-trained DNN is likely to have higher classification confidence for input samples that are less masked.

Condition 3. Given the average network output u¯(k)superscript¯𝑢𝑘\bar{u}^{(k)}over¯ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT of samples with k𝑘kitalic_k unmasked input variables, there is a polynomial lower bound for the average network output of samples with k(kk)superscript𝑘superscript𝑘𝑘k^{\prime}(k^{\prime}\leq k)italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≤ italic_k ) unmasked input variables: kk,u¯(k)(kk)pu¯(k)formulae-sequencefor-allsuperscript𝑘𝑘superscript¯𝑢superscript𝑘superscriptsuperscript𝑘𝑘𝑝superscript¯𝑢𝑘\forall\ k^{\prime}\leq k,\ \bar{u}^{(k^{\prime})}\geq(\frac{k^{\prime}}{k})^{% p}\ \bar{u}^{(k)}∀ italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≤ italic_k , over¯ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT ( italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT ≥ ( divide start_ARG italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG italic_k end_ARG ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT over¯ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT, where p>0𝑝0p>0italic_p > 0 is a positive constant.

Condition 3 implies that the classification confidence of the DNN does not significantly degrade on masked input samples. The classification/detection of masked/occluded samples is common in real scenarios. In this way, a well-trained DNN usually learns to classify a masked input sample based on local information (which can be extracted from unmasked parts of the input) and thus should not yield a significantly low confidence score on masked samples.

Appendix C Details of optimizing {γT}subscript𝛾𝑇\{\gamma_{T}\}{ italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT } to extract the sparsest AND-OR interactions

A method is proposed [20, 4] to simultaneously extract AND interactions Iand(S|𝒙)subscript𝐼andconditional𝑆𝒙I_{\text{and}}(S|\bm{x})italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) and OR interactions Ior(S|𝒙)subscript𝐼orconditional𝑆𝒙I_{\text{or}}(S|\bm{x})italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) from the network output. Given a masked sample 𝒙Tsubscript𝒙𝑇\bm{x}_{T}bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, [20] proposed to learn a decomposition v(𝒙T)=vand(𝒙T)+vor(𝒙T)𝑣subscript𝒙𝑇subscript𝑣andsubscript𝒙𝑇subscript𝑣orsubscript𝒙𝑇v(\bm{x}_{T})=v_{\text{and}}(\bm{x}_{T})+v_{\text{or}}(\bm{x}_{T})italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) + italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) towards the sparsest interactions. The component vand(𝒙T)subscript𝑣andsubscript𝒙𝑇v_{\text{and}}(\bm{x}_{T})italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) was explained by AND interactions, and the component vor(𝒙T)subscript𝑣orsubscript𝒙𝑇v_{\text{or}}(\bm{x}_{T})italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) was explained by OR interactions. Specifically, they decomposed v(𝒙T)𝑣subscript𝒙𝑇v(\bm{x}_{T})italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) into vand(𝒙T)=0.5v(𝒙T)+γTsubscript𝑣andsubscript𝒙𝑇0.5𝑣subscript𝒙𝑇subscript𝛾𝑇v_{\text{and}}(\bm{x}_{T})=0.5\ v(\bm{x}_{T})+\gamma_{T}italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = 0.5 italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) + italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT and vand(𝒙T)=0.5v(𝒙T)γTsubscript𝑣andsubscript𝒙𝑇0.5𝑣subscript𝒙𝑇subscript𝛾𝑇v_{\text{and}}(\bm{x}_{T})=0.5\cdot v(\bm{x}_{T})-\gamma_{T}italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = 0.5 ⋅ italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) - italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, where {γT:TN}conditional-setsubscript𝛾𝑇𝑇𝑁\{\gamma_{T}:T\subseteq N\}{ italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT : italic_T ⊆ italic_N } is a set of learnable variables that determine the decomposition. In this way, the AND interactions and OR interactions can be computed according to Eq. (2), i.e., Iand(S|𝒙)=TS(1)|S||T|vand(𝒙T)subscript𝐼andconditional𝑆𝒙subscript𝑇𝑆superscript1𝑆𝑇subscript𝑣andsubscript𝒙𝑇I_{\text{and}}(S|\bm{x})=\sum\nolimits_{T\subseteq S}(-1)^{|S|-|T|}v_{\text{% and}}(\bm{x}_{T})italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) = ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_S end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_S | - | italic_T | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ), and Ior(S|𝒙)=TS(1)|S||T|vor(𝒙NT)subscript𝐼orconditional𝑆𝒙subscript𝑇𝑆superscript1𝑆𝑇subscript𝑣orsubscript𝒙𝑁𝑇I_{\text{or}}(S|\bm{x})=-\sum\nolimits_{T\subseteq S}(-1)^{|S|-|T|}v_{\text{or% }}(\bm{x}_{N\setminus T})italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) = - ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_S end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_S | - | italic_T | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_T end_POSTSUBSCRIPT ).

The parameters {γT}subscript𝛾𝑇\{\gamma_{T}\}{ italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT } were learned by minimizing the following LASSO-like loss to obtain sparse interactions:

min{γT}SN|Iand(S|𝒙)|+|Ior(S|𝒙)|\min_{\{\gamma_{T}\}}\sum_{S\subseteq N}|I_{\text{and}}(S|\bm{x})|+|I_{\text{% or}}(S|\bm{x})|roman_min start_POSTSUBSCRIPT { italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT } end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_S ⊆ italic_N end_POSTSUBSCRIPT | italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) | + | italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) | (11)

Removing small noises. A small noise δSsubscript𝛿𝑆\delta_{S}italic_δ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT in the network output may significantly affect the extracted interactions, especially for high-order interactions. Thus,  [20] proposed to learn to remove a small noise term δSsubscript𝛿𝑆\delta_{S}italic_δ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT from the computation of AND-OR interactions. Specifically, the decomposition was rewritten as vand(𝒙T)=0.5(v(𝒙T)δT)+γTsubscript𝑣andsubscript𝒙𝑇0.5𝑣subscript𝒙𝑇subscript𝛿𝑇subscript𝛾𝑇v_{\text{and}}(\bm{x}_{T})=0.5(v(\bm{x}_{T})-\delta_{T})+\gamma_{T}italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = 0.5 ( italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) - italic_δ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) + italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT and vor(𝒙T)=0.5(v(𝒙T)δT)+γTsubscript𝑣orsubscript𝒙𝑇0.5𝑣subscript𝒙𝑇subscript𝛿𝑇subscript𝛾𝑇v_{\text{or}}(\bm{x}_{T})=0.5(v(\bm{x}_{T})-\delta_{T})+\gamma_{T}italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) = 0.5 ( italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) - italic_δ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) + italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. Thus, the parameters {δT}subscript𝛿𝑇\{\delta_{T}\}{ italic_δ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT }, and {γT}subscript𝛾𝑇\{\gamma_{T}\}{ italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT } are simultaneously learned by minimizing the loss function in Eq. (11). The values of {δT}subscript𝛿𝑇\{\delta_{T}\}{ italic_δ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT } were constrained in [ζ,ζ]𝜁𝜁[-\zeta,\zeta][ - italic_ζ , italic_ζ ] where ζ=0.02|v(𝒙)v(𝒙)|𝜁0.02𝑣𝒙𝑣subscript𝒙\zeta=0.02\cdot|v(\bm{x})-v(\bm{x}_{\emptyset})|italic_ζ = 0.02 ⋅ | italic_v ( bold_italic_x ) - italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) |.

Appendix D Where does the coefficient (1)|S||T|superscript1𝑆𝑇(-1)^{|S|-|T|}( - 1 ) start_POSTSUPERSCRIPT | italic_S | - | italic_T | end_POSTSUPERSCRIPT in Eq. (2) come from?

In fact, it is proven in [13] and [23] that the coefficient (1)|S||T|superscript1𝑆𝑇(-1)^{|S|-|T|}( - 1 ) start_POSTSUPERSCRIPT | italic_S | - | italic_T | end_POSTSUPERSCRIPT in Eq. (2) is the unique coefficient to ensure that the interaction satisfies the universal matching property. Recall that the universal matching property means that no matter how we randomly mask an input sample 𝒙𝒙\bm{x}bold_italic_x, the network output on the masked sample 𝒙Ssubscript𝒙𝑆\bm{x}_{S}bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT can always be accurately mimicked by the sum of interaction effects within S𝑆Sitalic_S. An extension of this property for AND-OR interactions is also mentioned in Theorem 2.

Appendix E OR interactions can be considered a special kind of AND interactions

The OR interaction can be considered a specific kind of AND interaction, when we flip the masked state and presence (unmasked) state of each input variable.

Given an input sample 𝒙n𝒙superscript𝑛\bm{x}\in\mathbb{R}^{n}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, let 𝒙Tsubscript𝒙𝑇\bm{x}_{T}bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT denote the masked sample obtained by masking input variables in NT𝑁𝑇N\setminus Titalic_N ∖ italic_T, while leaving variables in T𝑇Titalic_T unchanged. Specifically, the baseline values 𝒃n𝒃superscript𝑛\bm{b}\in\mathbb{R}^{n}bold_italic_b ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT are used to mask the input variables, which represent the masked states of the input variables. The definition of 𝒙Tsubscript𝒙𝑇\bm{x}_{T}bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is given as follows.

(𝒙T)i={xi,iTbi,iNTsubscriptsubscript𝒙𝑇𝑖casessubscript𝑥𝑖𝑖𝑇subscript𝑏𝑖𝑖𝑁𝑇(\bm{x}_{T})_{i}=\begin{cases}x_{i},&i\in T\\ b_{i},&i\in N\setminus T\end{cases}( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , end_CELL start_CELL italic_i ∈ italic_T end_CELL end_ROW start_ROW start_CELL italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , end_CELL start_CELL italic_i ∈ italic_N ∖ italic_T end_CELL end_ROW (12)

Based on the above definition, the AND interaction is computed as Iand(S|𝒙)=TS(1)|S||T|vand(𝒙T)subscript𝐼andconditional𝑆𝒙subscript𝑇𝑆superscript1𝑆𝑇subscript𝑣andsubscript𝒙𝑇I_{\text{and}}(S|\bm{x})=\sum\nolimits_{T\subseteq S}(-1)^{|S|-|T|}v_{\text{% and}}\left(\bm{x}_{T}\right)italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) = ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_S end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_S | - | italic_T | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ), while the OR interaction is computed as Ior(S|𝒙)=TS(1)|S||T|vor(𝒙NT)subscript𝐼orconditional𝑆𝒙subscript𝑇𝑆superscript1𝑆𝑇subscript𝑣orsubscript𝒙𝑁𝑇I_{\text{or}}(S|\bm{x})=-\sum\nolimits_{T\subseteq S}(-1)^{|S|-|T|}v_{\text{or% }}\left(\bm{x}_{N\setminus T}\right)italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) = - ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_S end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_S | - | italic_T | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_T end_POSTSUBSCRIPT ). To simplify the analysis, let us assume vand()=vor()=0.5v()subscript𝑣andsubscript𝑣or0.5𝑣v_{\text{and}}(\cdot)=v_{\text{or}}(\cdot)=0.5v(\cdot)italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( ⋅ ) = italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( ⋅ ) = 0.5 italic_v ( ⋅ ).

Then, let us consider a masked sample x~Tsubscript~𝑥𝑇\tilde{x}_{T}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, where we flip the masked state and presence (unmasked) state of each input variable. In this way, 𝒙~Tsubscript~𝒙𝑇\tilde{\bm{x}}_{T}over~ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is defined as follows.

(𝒙~T)i={xi,iNTbi,iTsubscriptsubscript~𝒙𝑇𝑖casessubscript𝑥𝑖𝑖𝑁𝑇subscript𝑏𝑖𝑖𝑇(\tilde{\bm{x}}_{T})_{i}=\begin{cases}x_{i},&i\in N\setminus T\\ b_{i},&i\in T\end{cases}( over~ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , end_CELL start_CELL italic_i ∈ italic_N ∖ italic_T end_CELL end_ROW start_ROW start_CELL italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , end_CELL start_CELL italic_i ∈ italic_T end_CELL end_ROW (13)

Therefore, the OR interaction Ior(S|𝒙)subscript𝐼orconditional𝑆𝒙I_{\text{or}}(S|\bm{x})italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) in Eq. 2 in main paper can be represented as an AND interaction Ior(S|𝒙~)subscript𝐼orconditional𝑆~𝒙I_{\text{or}}(S|\tilde{\bm{x}})italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_S | over~ start_ARG bold_italic_x end_ARG ), as follows.

Ior(S|𝒙)subscript𝐼orconditional𝑆𝒙\displaystyle I_{\text{or}}(S|\bm{x})italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_S | bold_italic_x ) =TS(1)|S||T|v(𝒙NT),absentsubscript𝑇𝑆superscript1𝑆𝑇𝑣subscript𝒙𝑁𝑇\displaystyle=-\sum_{T\subseteq S}(-1)^{|S|-|T|}v(\bm{x}_{N\setminus T}),= - ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_S end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_S | - | italic_T | end_POSTSUPERSCRIPT italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_T end_POSTSUBSCRIPT ) , (14)
=TS(1)|S||T|v(𝒙~T),absentsubscript𝑇𝑆superscript1𝑆𝑇𝑣subscript~𝒙𝑇\displaystyle=-\sum_{T\subseteq S}(-1)^{|S|-|T|}v(\tilde{\bm{x}}_{T}),= - ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_S end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_S | - | italic_T | end_POSTSUPERSCRIPT italic_v ( over~ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) , (15)
=Iand(S|𝒙~).absentsubscript𝐼andconditional𝑆~𝒙\displaystyle=-I_{\text{and}}\left(S|\tilde{\bm{x}}\right).= - italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_S | over~ start_ARG bold_italic_x end_ARG ) . (16)

In this way, the proof of the sparsity of AND interactions in [27] can also extend to OR interactions. Furthermore, we can simplify our analysis of the DNN’s learning of interactions by only focusing on AND interactions.

Appendix F Proof of theorems

F.1 Proof of Theorem 2

Proof.

(1) Universal matching theorem of AND interactions.

We will prove that output component vand(𝒙S)subscript𝑣andsubscript𝒙𝑆v_{\text{\rm and}}(\bm{x}_{S})italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) on all 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT masked samples {𝒙S:SN}conditional-setsubscript𝒙𝑆𝑆𝑁\{\bm{x}_{S}:S\subseteq N\}{ bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT : italic_S ⊆ italic_N } could be universally explained by the all interactions in SN𝑆𝑁S\subseteq Nitalic_S ⊆ italic_N, i.e., SN,vand(𝒙S)=TSIand(T|𝒙)+v(𝒙)formulae-sequencefor-all𝑆𝑁subscript𝑣andsubscript𝒙𝑆subscript𝑇𝑆subscript𝐼andconditional𝑇𝒙𝑣subscript𝒙\forall\emptyset\neq S\subseteq N,v_{\text{\rm and}}(\bm{x}_{S})=\sum_{% \emptyset\neq T\subseteq S}I_{\text{\rm and}}(T|\bm{x})+v(\bm{x}_{\emptyset})∀ ∅ ≠ italic_S ⊆ italic_N , italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT ∅ ≠ italic_T ⊆ italic_S end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_T | bold_italic_x ) + italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ). In particular, we define vand(𝒙)=v(𝒙)subscript𝑣andsubscript𝒙𝑣subscript𝒙v_{\text{\rm and}}(\bm{x}_{\emptyset})=v(\bm{x}_{\emptyset})italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) = italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) (i.e., we attribute output on an empty sample to AND interactions).

Specifically, the AND interaction is defined as Iand(T|𝒙)=LT(1)|T||L|vand(𝒙L)subscript𝐼andconditional𝑇𝒙subscript𝐿𝑇superscript1𝑇𝐿subscript𝑣andsubscript𝒙𝐿I_{\text{and}}(T|\bm{x})=\sum\nolimits_{L\subseteq T}(-1)^{|T|-|L|}v_{\text{% and}}(\bm{x}_{L})italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_T | bold_italic_x ) = ∑ start_POSTSUBSCRIPT italic_L ⊆ italic_T end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) in 2. To compute the sum of AND interactions TSIand(T|𝒙)=TSLT(1)|T||L|vand(𝒙L)subscript𝑇𝑆subscript𝐼andconditional𝑇𝒙subscript𝑇𝑆subscript𝐿𝑇superscript1𝑇𝐿subscript𝑣andsubscript𝒙𝐿\sum_{\emptyset\neq T\subseteq S}I_{\text{\rm and}}(T|\bm{x})=\sum\nolimits_{% \emptyset\neq T\subseteq S}\sum\nolimits_{L\subseteq T}(-1)^{|T|-|L|}v_{\text{% and}}(\bm{x}_{L})∑ start_POSTSUBSCRIPT ∅ ≠ italic_T ⊆ italic_S end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_T | bold_italic_x ) = ∑ start_POSTSUBSCRIPT ∅ ≠ italic_T ⊆ italic_S end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_L ⊆ italic_T end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ), we first exchange the order of summation of the set LTS𝐿𝑇𝑆L\subseteq T\subseteq Sitalic_L ⊆ italic_T ⊆ italic_S and the set TL𝐿𝑇T\supseteq Litalic_T ⊇ italic_L. That is, we compute all linear combinations of all sets T𝑇Titalic_T containing L𝐿Litalic_L with respect to the model outputs vand(𝒙L)subscript𝑣andsubscript𝒙𝐿v_{\text{and}}(\bm{x}_{L})italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) given a set of input variables L𝐿Litalic_L, i.e., T:LTS(1)|T||L|vand(𝒙L)subscript:𝑇𝐿𝑇𝑆superscript1𝑇𝐿subscript𝑣andsubscript𝒙𝐿\sum\nolimits_{T:L\subseteq T\subseteq S}(-1)^{|T|-|L|}v_{\text{and}}(\bm{x}_{% L})∑ start_POSTSUBSCRIPT italic_T : italic_L ⊆ italic_T ⊆ italic_S end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ). Then, we compute all summations over the set LS𝐿𝑆L\subseteq Sitalic_L ⊆ italic_S.

In this way, we can compute them separately for different cases of LTS𝐿𝑇𝑆L\subseteq T\subseteq Sitalic_L ⊆ italic_T ⊆ italic_S. In the following, we consider the cases (1) L=S=T𝐿𝑆𝑇L=S=Titalic_L = italic_S = italic_T, and (2) LTS,LSformulae-sequence𝐿𝑇𝑆𝐿𝑆L\subseteq T\subseteq S,L\neq Sitalic_L ⊆ italic_T ⊆ italic_S , italic_L ≠ italic_S, respectively.

(1) When L=S=T𝐿𝑆𝑇L=S=Titalic_L = italic_S = italic_T, the linear combination of all subsets T𝑇Titalic_T containing L𝐿Litalic_L with respect to the model output vand(𝒙L)subscript𝑣andsubscript𝒙𝐿v_{\text{and}}(\bm{x}_{L})italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) is (1)|S||S|vand(𝒙L)=vand(𝒙L)superscript1𝑆𝑆subscript𝑣andsubscript𝒙𝐿subscript𝑣andsubscript𝒙𝐿(-1)^{|S|-|S|}v_{\text{and}}(\bm{x}_{L})=v_{\text{and}}(\bm{x}_{L})( - 1 ) start_POSTSUPERSCRIPT | italic_S | - | italic_S | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) = italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ).

(2) When LTS,LSformulae-sequence𝐿𝑇𝑆𝐿𝑆L\subseteq T\subseteq S,L\neq Sitalic_L ⊆ italic_T ⊆ italic_S , italic_L ≠ italic_S, the linear combination of all subsets T𝑇Titalic_T containing L𝐿Litalic_L with respect to the model output vand(𝒙L)subscript𝑣andsubscript𝒙𝐿v_{\text{and}}(\bm{x}_{L})italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) is T:LTS(1)|T||L|vand(𝒙L)subscript:𝑇𝐿𝑇𝑆superscript1𝑇𝐿subscript𝑣andsubscript𝒙𝐿\sum\nolimits_{T:L\subseteq T\subseteq S}(-1)^{|T|-|L|}v_{\text{and}}(\bm{x}_{% L})∑ start_POSTSUBSCRIPT italic_T : italic_L ⊆ italic_T ⊆ italic_S end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ). For all sets T:STL:𝑇superset-of-or-equals𝑆𝑇superset-of-or-equals𝐿T:S\supseteq T\supseteq Litalic_T : italic_S ⊇ italic_T ⊇ italic_L, let us consider the linear combinations of all sets T𝑇Titalic_T with number |T|𝑇|T|| italic_T | for the model output vand(𝒙L)subscript𝑣andsubscript𝒙𝐿v_{\text{and}}(\bm{x}_{L})italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ), respectively. Let m:=|T||L|assign𝑚𝑇𝐿m:=|T|-|L|italic_m := | italic_T | - | italic_L |, (0m|S||L|0𝑚𝑆𝐿0\leq m\leq|S|-|L|0 ≤ italic_m ≤ | italic_S | - | italic_L |), then there are a total of C|S||L|msuperscriptsubscript𝐶𝑆𝐿𝑚C_{|S|-|L|}^{m}italic_C start_POSTSUBSCRIPT | italic_S | - | italic_L | end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT combinations of all sets T𝑇Titalic_T of order |T|𝑇|T|| italic_T |. Thus, given L𝐿Litalic_L, accumulating the model outputs vand(𝒙L)subscript𝑣andsubscript𝒙𝐿v_{\text{and}}(\bm{x}_{L})italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) corresponding to all TL𝐿𝑇T\supseteq Litalic_T ⊇ italic_L, then T:LTS(1)|T||L|vand(𝒙L)=vand(𝒙L)m=0|S||L|C|S||L|m(1)m=0=0subscript:𝑇𝐿𝑇𝑆superscript1𝑇𝐿subscript𝑣andsubscript𝒙𝐿subscript𝑣andsubscript𝒙𝐿subscriptsuperscriptsubscript𝑚0𝑆𝐿superscriptsubscript𝐶𝑆𝐿𝑚superscript1𝑚absent00\sum\nolimits_{T:L\subseteq T\subseteq S}(-1)^{|T|-|L|}v_{\text{and}}(\bm{x}_{% L})=v_{\text{and}}(\bm{x}_{L})\cdot\underbrace{\sum\nolimits_{m=0}^{|S|-|L|}C_% {|S|-|L|}^{m}(-1)^{m}}_{=0}=0∑ start_POSTSUBSCRIPT italic_T : italic_L ⊆ italic_T ⊆ italic_S end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) = italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) ⋅ under⏟ start_ARG ∑ start_POSTSUBSCRIPT italic_m = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_S | - | italic_L | end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT | italic_S | - | italic_L | end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT = 0. Please see the complete derivation of the following formula.

TSIand(T|𝒙)subscript𝑇𝑆subscript𝐼andconditional𝑇𝒙\displaystyle\sum\nolimits_{\emptyset\neq T\subseteq S}I_{\text{and}}(T|\bm{x})∑ start_POSTSUBSCRIPT ∅ ≠ italic_T ⊆ italic_S end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_T | bold_italic_x ) (17)
=\displaystyle== TSLT(1)|T||L|vand(𝒙L)subscript𝑇𝑆subscript𝐿𝑇superscript1𝑇𝐿subscript𝑣andsubscript𝒙𝐿\displaystyle\sum\nolimits_{\emptyset\neq T\subseteq S}\sum\nolimits_{L% \subseteq T}(-1)^{|T|-|L|}v_{\text{and}}(\bm{x}_{L})∑ start_POSTSUBSCRIPT ∅ ≠ italic_T ⊆ italic_S end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_L ⊆ italic_T end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT )
=\displaystyle== LST:LTS(1)|T||L|vand(𝒙L)vand(𝒙)subscript𝐿𝑆subscript:𝑇𝐿𝑇𝑆superscript1𝑇𝐿subscript𝑣andsubscript𝒙𝐿subscript𝑣andsubscript𝒙\displaystyle\sum\nolimits_{L\subseteq S}\sum\nolimits_{T:L\subseteq T% \subseteq S}(-1)^{|T|-|L|}v_{\text{and}}(\bm{x}_{L})-v_{\text{and}}(\bm{x}_{% \emptyset})∑ start_POSTSUBSCRIPT italic_L ⊆ italic_S end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_T : italic_L ⊆ italic_T ⊆ italic_S end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) - italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT )
=\displaystyle== vand(𝒙S)L=S+LS,LSvand(𝒙L)m=0|S||L|C|S||L|m(1)m=0vand(𝒙)subscriptsubscript𝑣andsubscript𝒙𝑆𝐿𝑆subscriptformulae-sequence𝐿𝑆𝐿𝑆subscript𝑣andsubscript𝒙𝐿subscriptsuperscriptsubscript𝑚0𝑆𝐿superscriptsubscript𝐶𝑆𝐿𝑚superscript1𝑚absent0subscript𝑣andsubscript𝒙\displaystyle\underbrace{v_{\text{and}}(\bm{x}_{S})}_{L=S}+\sum\nolimits_{L% \subseteq S,L\neq S}v_{\text{and}}(\bm{x}_{L})\cdot\underbrace{\sum\nolimits_{% m=0}^{|S|-|L|}C_{|S|-|L|}^{m}(-1)^{m}}_{=0}-v_{\text{and}}(\bm{x}_{\emptyset})under⏟ start_ARG italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT italic_L = italic_S end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_L ⊆ italic_S , italic_L ≠ italic_S end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) ⋅ under⏟ start_ARG ∑ start_POSTSUBSCRIPT italic_m = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_S | - | italic_L | end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT | italic_S | - | italic_L | end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT )
=\displaystyle== vand(𝒙S)vand(𝒙)subscript𝑣andsubscript𝒙𝑆subscript𝑣andsubscript𝒙\displaystyle v_{\text{and}}(\bm{x}_{S})-v_{\text{and}}(\bm{x}_{\emptyset})italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) - italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT )
=\displaystyle== vand(𝒙S)v(𝒙)subscript𝑣andsubscript𝒙𝑆𝑣subscript𝒙\displaystyle v_{\text{and}}(\bm{x}_{S})-v(\bm{x}_{\emptyset})italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) - italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT )

Thus, we have SN,vand(𝒙S)=TSIand(T|𝒙)+v(𝒙)formulae-sequencefor-all𝑆𝑁subscript𝑣andsubscript𝒙𝑆subscript𝑇𝑆subscript𝐼andconditional𝑇𝒙𝑣subscript𝒙\forall\emptyset\neq S\subseteq N,v_{\text{\rm and}}(\bm{x}_{S})=\sum_{% \emptyset\neq T\subseteq S}I_{\text{\rm and}}(T|\bm{x})+v(\bm{x}_{\emptyset})∀ ∅ ≠ italic_S ⊆ italic_N , italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT ∅ ≠ italic_T ⊆ italic_S end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_T | bold_italic_x ) + italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ).

(2) Universal matching theorem of OR interactions.

According to the definition of OR interactions, we will derive that SN,vor(𝒙S)=T:TSIor(S|𝒙)formulae-sequencefor-all𝑆𝑁subscript𝑣orsubscript𝒙𝑆subscript:𝑇𝑇𝑆subscript𝐼orconditional𝑆𝒙\forall S\subseteq N,v_{\text{\rm or}}(\bm{x}_{S})=\sum_{T:T\cap S\neq% \emptyset}I_{\text{\rm or}}(S|\bm{x})∀ italic_S ⊆ italic_N , italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_T : italic_T ∩ italic_S ≠ ∅ end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_S | bold_italic_x ), where we define vor(𝒙)=0subscript𝑣orsubscript𝒙0v_{\text{or}}(\bm{x}_{\emptyset})=0italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) = 0 (recall that in Step (1), we attribute the output on empty input to AND interactions).

Specifically, the OR interaction is defined as Ior(T|𝒙)=LT(1)|T||L|vor(𝒙NL)subscript𝐼orconditional𝑇𝒙subscript𝐿𝑇superscript1𝑇𝐿subscript𝑣orsubscript𝒙𝑁𝐿I_{\text{or}}(T|\bm{x})=-\sum\nolimits_{L\subseteq T}(-1)^{|T|-|L|}v_{\text{or% }}(\bm{x}_{N\setminus L})italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_T | bold_italic_x ) = - ∑ start_POSTSUBSCRIPT italic_L ⊆ italic_T end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ) in 2. Similar to the above derivation of the universal matching theorem of AND interactions, to compute the sum of OR interactions T:TSIor(T|𝒙)=T:TS[LT(1)|T||L|vor(𝒙NL)]subscript:𝑇𝑇𝑆subscript𝐼orconditional𝑇𝒙subscript:𝑇𝑇𝑆delimited-[]subscript𝐿𝑇superscript1𝑇𝐿subscript𝑣orsubscript𝒙𝑁𝐿\sum\nolimits_{T:T\cap S\neq\emptyset}I_{\text{or}}(T|\bm{x})=\sum\nolimits_{T% :T\cap S\neq\emptyset}\left[-\sum\nolimits_{L\subseteq T}(-1)^{|T|-|L|}v_{% \text{or}}(\bm{x}_{N\setminus L})\right]∑ start_POSTSUBSCRIPT italic_T : italic_T ∩ italic_S ≠ ∅ end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_T | bold_italic_x ) = ∑ start_POSTSUBSCRIPT italic_T : italic_T ∩ italic_S ≠ ∅ end_POSTSUBSCRIPT [ - ∑ start_POSTSUBSCRIPT italic_L ⊆ italic_T end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ) ], we first exchange the order of summation of the set LTN𝐿𝑇𝑁L\subseteq T\subseteq Nitalic_L ⊆ italic_T ⊆ italic_N and the set T:TS:𝑇𝑇𝑆T:T\cap S\neq\emptysetitalic_T : italic_T ∩ italic_S ≠ ∅. That is, we compute all linear combinations of all sets T𝑇Titalic_T containing L𝐿Litalic_L with respect to the model outputs vor(𝒙NL)subscript𝑣orsubscript𝒙𝑁𝐿v_{\text{or}}(\bm{x}_{N\setminus L})italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ) given a set of input variables L𝐿Litalic_L, i.e., T:TS,TL(1)|T||L|vor(𝒙NL)subscript:𝑇formulae-sequence𝑇𝑆𝐿𝑇superscript1𝑇𝐿subscript𝑣orsubscript𝒙𝑁𝐿\sum\nolimits_{T:T\cap S\neq\emptyset,T\supseteq L}(-1)^{|T|-|L|}v_{\text{or}}% (\bm{x}_{N\setminus L})∑ start_POSTSUBSCRIPT italic_T : italic_T ∩ italic_S ≠ ∅ , italic_T ⊇ italic_L end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ). Then, we compute all summations over the set LN𝐿𝑁L\subseteq Nitalic_L ⊆ italic_N.

In this way, we can compute them separately for different cases of LTN,TSformulae-sequence𝐿𝑇𝑁𝑇𝑆L\subseteq T\subseteq N,T\cap S\neq\emptysetitalic_L ⊆ italic_T ⊆ italic_N , italic_T ∩ italic_S ≠ ∅. In the following, we consider the cases (1) L=NS𝐿𝑁𝑆L=N\setminus Sitalic_L = italic_N ∖ italic_S, (2) L=N𝐿𝑁L=Nitalic_L = italic_N, (3) LS,LNformulae-sequence𝐿𝑆𝐿𝑁L\cap S\neq\emptyset,L\neq Nitalic_L ∩ italic_S ≠ ∅ , italic_L ≠ italic_N, and (4) LS=,LNSformulae-sequence𝐿𝑆𝐿𝑁𝑆L\cap S=\emptyset,L\neq N\setminus Sitalic_L ∩ italic_S = ∅ , italic_L ≠ italic_N ∖ italic_S, respectively.

(1) When L=NS𝐿𝑁𝑆L=N\setminus Sitalic_L = italic_N ∖ italic_S, the linear combination of all subsets T𝑇Titalic_T containing L𝐿Litalic_L with respect to the model output vor(𝒙NL)subscript𝑣orsubscript𝒙𝑁𝐿v_{\text{or}}(\bm{x}_{N\setminus L})italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ) is T:TS,TL(1)|T||L|vor(𝒙NL)=T:TS,TL(1)|T||L|vor(𝒙S)subscript:𝑇formulae-sequence𝑇𝑆𝐿𝑇superscript1𝑇𝐿subscript𝑣orsubscript𝒙𝑁𝐿subscript:𝑇formulae-sequence𝑇𝑆𝐿𝑇superscript1𝑇𝐿subscript𝑣orsubscript𝒙𝑆\sum\nolimits_{T:T\cap S\neq\emptyset,T\supseteq L}(-1)^{|T|-|L|}v_{\text{or}}% (\bm{x}_{N\setminus L})=\sum\nolimits_{T:T\cap S\neq\emptyset,T\supseteq L}(-1% )^{|T|-|L|}v_{\text{or}}(\bm{x}_{S})∑ start_POSTSUBSCRIPT italic_T : italic_T ∩ italic_S ≠ ∅ , italic_T ⊇ italic_L end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_T : italic_T ∩ italic_S ≠ ∅ , italic_T ⊇ italic_L end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ). For all sets T:TL,TS:𝑇formulae-sequence𝐿𝑇𝑇𝑆T:T\supseteq L,T\cap S\neq\emptysetitalic_T : italic_T ⊇ italic_L , italic_T ∩ italic_S ≠ ∅ (then TNS,TLformulae-sequence𝑇𝑁𝑆𝑇𝐿T\neq N\setminus S,T\neq Litalic_T ≠ italic_N ∖ italic_S , italic_T ≠ italic_L), let us consider the linear combinations of all sets T𝑇Titalic_T with number |T|𝑇|T|| italic_T | for the model output vor(𝒙S)subscript𝑣orsubscript𝒙𝑆v_{\text{or}}(\bm{x}_{S})italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ), respectively. Let |T|:=|T||L|assignsuperscript𝑇𝑇𝐿|T^{\prime}|:=|T|-|L|| italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | := | italic_T | - | italic_L |, (1|T||S|1superscript𝑇𝑆1\leq|T^{\prime}|\leq|S|1 ≤ | italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | ≤ | italic_S |), then there are a total of C|S||T|superscriptsubscript𝐶𝑆superscript𝑇C_{|S|}^{|T^{\prime}|}italic_C start_POSTSUBSCRIPT | italic_S | end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT combinations of all sets Tsuperscript𝑇T^{\prime}italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of order |T|superscript𝑇|T^{\prime}|| italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT |. Thus, given L𝐿Litalic_L, accumulating the model outputs vor(𝒙S)subscript𝑣orsubscript𝒙𝑆v_{\text{or}}(\bm{x}_{S})italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) corresponding to all TL𝐿𝑇T\supseteq Litalic_T ⊇ italic_L, then T:TS,TL(1)|T||L|vor(𝒙NL)=vor(𝒙S)|T|=1|S|C|S||T|(1)|T|=1=vor(𝒙S)subscript:𝑇formulae-sequence𝑇𝑆𝐿𝑇superscript1𝑇𝐿subscript𝑣orsubscript𝒙𝑁𝐿subscript𝑣orsubscript𝒙𝑆subscriptsuperscriptsubscriptsuperscript𝑇1𝑆superscriptsubscript𝐶𝑆superscript𝑇superscript1superscript𝑇absent1subscript𝑣orsubscript𝒙𝑆\sum\nolimits_{T:T\cap S\neq\emptyset,T\supseteq L}(-1)^{|T|-|L|}v_{\text{or}}% (\bm{x}_{N\setminus L})=v_{\text{or}}(\bm{x}_{S})\cdot\underbrace{\sum% \nolimits_{|T^{\prime}|=1}^{|S|}C_{|S|}^{|T^{\prime}|}(-1)^{|T^{\prime}|}}_{=-% 1}=-v_{\text{or}}(\bm{x}_{S})∑ start_POSTSUBSCRIPT italic_T : italic_T ∩ italic_S ≠ ∅ , italic_T ⊇ italic_L end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ) = italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) ⋅ under⏟ start_ARG ∑ start_POSTSUBSCRIPT | italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_S | end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT | italic_S | end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT = - 1 end_POSTSUBSCRIPT = - italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ).

(2) When L=N𝐿𝑁L=Nitalic_L = italic_N (then T=N𝑇𝑁T=Nitalic_T = italic_N), the linear combination of all subsets T𝑇Titalic_T containing L𝐿Litalic_L with respect to the model output vor(𝒙NL)subscript𝑣orsubscript𝒙𝑁𝐿v_{\text{or}}(\bm{x}_{N\setminus L})italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ) is T:TS,TL(1)|T||L|vor(𝒙NL)=(1)|N||N|vor(𝒙)=vor(𝒙)subscript:𝑇formulae-sequence𝑇𝑆𝐿𝑇superscript1𝑇𝐿subscript𝑣orsubscript𝒙𝑁𝐿superscript1𝑁𝑁subscript𝑣orsubscript𝒙subscript𝑣orsubscript𝒙\sum\nolimits_{T:T\cap S\neq\emptyset,T\supseteq L}(-1)^{|T|-|L|}v_{\text{or}}% (\bm{x}_{N\setminus L})=(-1)^{|N|-|N|}v_{\text{or}}(\bm{x}_{\emptyset})=v_{% \text{or}}(\bm{x}_{\emptyset})∑ start_POSTSUBSCRIPT italic_T : italic_T ∩ italic_S ≠ ∅ , italic_T ⊇ italic_L end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ) = ( - 1 ) start_POSTSUPERSCRIPT | italic_N | - | italic_N | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) = italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ).

(3) When LS,LNformulae-sequence𝐿𝑆𝐿𝑁L\cap S\neq\emptyset,L\neq Nitalic_L ∩ italic_S ≠ ∅ , italic_L ≠ italic_N, the linear combination of all subsets T𝑇Titalic_T containing L𝐿Litalic_L with respect to the model output vor(𝒙NL)subscript𝑣orsubscript𝒙𝑁𝐿v_{\text{or}}(\bm{x}_{N\setminus L})italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ) is T:TS,TL(1)|T||L|vor(𝒙NL)subscript:𝑇formulae-sequence𝑇𝑆𝐿𝑇superscript1𝑇𝐿subscript𝑣orsubscript𝒙𝑁𝐿\sum\nolimits_{T:T\cap S\neq\emptyset,T\supseteq L}(-1)^{|T|-|L|}v_{\text{or}}% (\bm{x}_{N\setminus L})∑ start_POSTSUBSCRIPT italic_T : italic_T ∩ italic_S ≠ ∅ , italic_T ⊇ italic_L end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ). For all sets T:TL,TS:𝑇formulae-sequence𝐿𝑇𝑇𝑆T:T\supseteq L,T\cap S\neq\emptysetitalic_T : italic_T ⊇ italic_L , italic_T ∩ italic_S ≠ ∅, let us consider the linear combinations of all sets T𝑇Titalic_T with number |T|𝑇|T|| italic_T | for the model output vor(𝒙S)subscript𝑣orsubscript𝒙𝑆v_{\text{or}}(\bm{x}_{S})italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ), respectively. Let us split |T||L|𝑇𝐿|T|-|L|| italic_T | - | italic_L | into |T|superscript𝑇|T^{\prime}|| italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | and |T′′|superscript𝑇′′|T^{\prime\prime}|| italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT |, i.e.,|T||L|=|T|+|T′′|𝑇𝐿superscript𝑇superscript𝑇′′|T|-|L|=|T^{\prime}|+|T^{\prime\prime}|| italic_T | - | italic_L | = | italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | + | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT |, where T={i|iT,iL,iNS}superscript𝑇conditional-set𝑖formulae-sequence𝑖𝑇formulae-sequence𝑖𝐿𝑖𝑁𝑆T^{\prime}=\{i|i\in T,i\notin L,i\in N\setminus S\}italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = { italic_i | italic_i ∈ italic_T , italic_i ∉ italic_L , italic_i ∈ italic_N ∖ italic_S }, T′′={i|iT,iL,iS}superscript𝑇′′conditional-set𝑖formulae-sequence𝑖𝑇formulae-sequence𝑖𝐿𝑖𝑆T^{\prime\prime}=\{i|i\in T,i\notin L,i\in S\}italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT = { italic_i | italic_i ∈ italic_T , italic_i ∉ italic_L , italic_i ∈ italic_S } (then 0|T′′||S||SL|0superscript𝑇′′𝑆𝑆𝐿0\leq|T^{\prime\prime}|\leq|S|-|S\cap L|0 ≤ | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT | ≤ | italic_S | - | italic_S ∩ italic_L |) and |T|+|T′′|+|L|=|T|superscript𝑇superscript𝑇′′𝐿𝑇|T^{\prime}|+|T^{\prime\prime}|+|L|=|T|| italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | + | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT | + | italic_L | = | italic_T |. In this way, there are a total of C|S||SL||T′′|superscriptsubscript𝐶𝑆𝑆𝐿superscript𝑇′′C_{|S|-|S\cap L|}^{|T^{\prime\prime}|}italic_C start_POSTSUBSCRIPT | italic_S | - | italic_S ∩ italic_L | end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT combinations of all sets T′′superscript𝑇′′T^{\prime\prime}italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT of order |T′′|superscript𝑇′′|T^{\prime\prime}|| italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT |. Thus, given L𝐿Litalic_L, accumulating the model outputs vor(𝒙NL)subscript𝑣orsubscript𝒙𝑁𝐿v_{\text{or}}(\bm{x}_{N\setminus L})italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ) corresponding to all TL𝐿𝑇T\supseteq Litalic_T ⊇ italic_L, then T:TS,TL(1)|T||L|vor(𝒙NL)=vor(𝒙NL)TNSL|T′′|=0|S||SL|C|S||SL||T′′|(1)|T|+|T′′|=0=0subscript:𝑇formulae-sequence𝑇𝑆𝐿𝑇superscript1𝑇𝐿subscript𝑣orsubscript𝒙𝑁𝐿subscript𝑣orsubscript𝒙𝑁𝐿subscriptsuperscript𝑇𝑁𝑆𝐿subscriptsuperscriptsubscriptsuperscript𝑇′′0𝑆𝑆𝐿superscriptsubscript𝐶𝑆𝑆𝐿superscript𝑇′′superscript1superscript𝑇superscript𝑇′′absent00\sum\nolimits_{T:T\cap S\neq\emptyset,T\supseteq L}(-1)^{|T|-|L|}v_{\text{or}}% (\bm{x}_{N\setminus L})=v_{\text{or}}(\bm{x}_{N\setminus L})\cdot\sum_{T^{% \prime}\subseteq N\setminus S\setminus L}\underbrace{\sum\nolimits_{|T^{\prime% \prime}|=0}^{|S|-|S\cap L|}C_{|S|-|S\cap L|}^{|T^{\prime\prime}|}(-1)^{|T^{% \prime}|+|T^{\prime\prime}|}}_{=0}=0∑ start_POSTSUBSCRIPT italic_T : italic_T ∩ italic_S ≠ ∅ , italic_T ⊇ italic_L end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ) = italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ) ⋅ ∑ start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ italic_N ∖ italic_S ∖ italic_L end_POSTSUBSCRIPT under⏟ start_ARG ∑ start_POSTSUBSCRIPT | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT | = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_S | - | italic_S ∩ italic_L | end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT | italic_S | - | italic_S ∩ italic_L | end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | + | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT = 0.

(4) When LS=,LNSformulae-sequence𝐿𝑆𝐿𝑁𝑆L\cap S=\emptyset,L\neq N\setminus Sitalic_L ∩ italic_S = ∅ , italic_L ≠ italic_N ∖ italic_S, the linear combination of all subsets T𝑇Titalic_T containing L𝐿Litalic_L with respect to the model output vor(𝒙NL)subscript𝑣orsubscript𝒙𝑁𝐿v_{\text{or}}(\bm{x}_{N\setminus L})italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ) is T:TS,TL(1)|T||L|vor(𝒙NL)subscript:𝑇formulae-sequence𝑇𝑆𝐿𝑇superscript1𝑇𝐿subscript𝑣orsubscript𝒙𝑁𝐿\sum\nolimits_{T:T\cap S\neq\emptyset,T\supseteq L}(-1)^{|T|-|L|}v_{\text{or}}% (\bm{x}_{N\setminus L})∑ start_POSTSUBSCRIPT italic_T : italic_T ∩ italic_S ≠ ∅ , italic_T ⊇ italic_L end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ). Similarly, let us split |T||L|𝑇𝐿|T|-|L|| italic_T | - | italic_L | into |T|superscript𝑇|T^{\prime}|| italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | and |T′′|superscript𝑇′′|T^{\prime\prime}|| italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT |, i.e.,|T||L|=|T|+|T′′|𝑇𝐿superscript𝑇superscript𝑇′′|T|-|L|=|T^{\prime}|+|T^{\prime\prime}|| italic_T | - | italic_L | = | italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | + | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT |, where T={i|iT,iL,iNS}superscript𝑇conditional-set𝑖formulae-sequence𝑖𝑇formulae-sequence𝑖𝐿𝑖𝑁𝑆T^{\prime}=\{i|i\in T,i\notin L,i\in N\setminus S\}italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = { italic_i | italic_i ∈ italic_T , italic_i ∉ italic_L , italic_i ∈ italic_N ∖ italic_S }, T′′={i|iT,iS}superscript𝑇′′conditional-set𝑖formulae-sequence𝑖𝑇𝑖𝑆T^{\prime\prime}=\{i|i\in T,i\in S\}italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT = { italic_i | italic_i ∈ italic_T , italic_i ∈ italic_S } (then 0|T′′||S|0superscript𝑇′′𝑆0\leq|T^{\prime\prime}|\leq|S|0 ≤ | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT | ≤ | italic_S |) and |T|+|T′′|+|L|=|T|superscript𝑇superscript𝑇′′𝐿𝑇|T^{\prime}|+|T^{\prime\prime}|+|L|=|T|| italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | + | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT | + | italic_L | = | italic_T |. In this way, there are a total of C|S||T′′|superscriptsubscript𝐶𝑆superscript𝑇′′C_{|S|}^{|T^{\prime\prime}|}italic_C start_POSTSUBSCRIPT | italic_S | end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT combinations of all sets T′′superscript𝑇′′T^{\prime\prime}italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT of order |T′′|superscript𝑇′′|T^{\prime\prime}|| italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT |. Thus, given L𝐿Litalic_L, accumulating the model outputs vor(𝒙NL)subscript𝑣orsubscript𝒙𝑁𝐿v_{\text{or}}(\bm{x}_{N\setminus L})italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ) corresponding to all TL𝐿𝑇T\supseteq Litalic_T ⊇ italic_L, then T:TS,TL(1)|T||L|vor(𝒙NL)=vor(𝒙NL)TNSL|T′′|=0|S|C|S||T′′|(1)|T|+|T′′|=0=0subscript:𝑇formulae-sequence𝑇𝑆𝐿𝑇superscript1𝑇𝐿subscript𝑣orsubscript𝒙𝑁𝐿subscript𝑣orsubscript𝒙𝑁𝐿subscriptsuperscript𝑇𝑁𝑆𝐿subscriptsuperscriptsubscriptsuperscript𝑇′′0𝑆superscriptsubscript𝐶𝑆superscript𝑇′′superscript1superscript𝑇superscript𝑇′′absent00\sum\nolimits_{T:T\cap S\neq\emptyset,T\supseteq L}(-1)^{|T|-|L|}v_{\text{or}}% (\bm{x}_{N\setminus L})=v_{\text{or}}(\bm{x}_{N\setminus L})\cdot\sum_{T^{% \prime}\subseteq N\setminus S\setminus L}\underbrace{\sum\nolimits_{|T^{\prime% \prime}|=0}^{|S|}C_{|S|}^{|T^{\prime\prime}|}(-1)^{|T^{\prime}|+|T^{\prime% \prime}|}}_{=0}=0∑ start_POSTSUBSCRIPT italic_T : italic_T ∩ italic_S ≠ ∅ , italic_T ⊇ italic_L end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ) = italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ) ⋅ ∑ start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ italic_N ∖ italic_S ∖ italic_L end_POSTSUBSCRIPT under⏟ start_ARG ∑ start_POSTSUBSCRIPT | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT | = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_S | end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT | italic_S | end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | + | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT = 0.

Please see the complete derivation of the following formula.

T:TSIor(T|𝒙)subscript:𝑇𝑇𝑆subscript𝐼orconditional𝑇𝒙\displaystyle\sum\nolimits_{T:T\cap S\neq\emptyset}I_{\text{or}}(T|\bm{x})∑ start_POSTSUBSCRIPT italic_T : italic_T ∩ italic_S ≠ ∅ end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_T | bold_italic_x ) =T:TS[LT(1)|T||L|vor(𝒙NL)]absentsubscript:𝑇𝑇𝑆delimited-[]subscript𝐿𝑇superscript1𝑇𝐿subscript𝑣orsubscript𝒙𝑁𝐿\displaystyle=\sum\nolimits_{T:T\cap S\neq\emptyset}\left[-\sum\nolimits_{L% \subseteq T}(-1)^{|T|-|L|}v_{\text{or}}(\bm{x}_{N\setminus L})\right]= ∑ start_POSTSUBSCRIPT italic_T : italic_T ∩ italic_S ≠ ∅ end_POSTSUBSCRIPT [ - ∑ start_POSTSUBSCRIPT italic_L ⊆ italic_T end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT ) ] (18)
=LNT:TS,TL(1)|T||L|vor(𝒙NL)absentsubscript𝐿𝑁subscript:𝑇formulae-sequence𝑇𝑆𝐿𝑇superscript1𝑇𝐿subscript𝑣orsubscript𝒙𝑁𝐿\displaystyle=-\sum\nolimits_{L\subseteq N}\sum\nolimits_{T:T\cap S\neq% \emptyset,T\supseteq L}(-1)^{|T|-|L|}v_{\text{or}}(\bm{x}_{N\setminus L})= - ∑ start_POSTSUBSCRIPT italic_L ⊆ italic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_T : italic_T ∩ italic_S ≠ ∅ , italic_T ⊇ italic_L end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_L | end_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT )
=[|T|=1|S|C|S||T|(1)|T|]vor(𝒙S)L=NSvor(𝒙)L=Nabsentdelimited-[]superscriptsubscriptsuperscript𝑇1𝑆superscriptsubscript𝐶𝑆superscript𝑇superscript1superscript𝑇subscriptsubscript𝑣orsubscript𝒙𝑆𝐿𝑁𝑆subscriptsubscript𝑣orsubscript𝒙𝐿𝑁\displaystyle=-\left[\sum_{|T^{\prime}|=1}^{|S|}C_{|S|}^{|T^{\prime}|}(-1)^{|T% ^{\prime}|}\right]\cdot\underbrace{v_{\text{or}}(\bm{x}_{S})}_{L=N\setminus S}% -\underbrace{v_{\text{or}}(\bm{x}_{\emptyset})}_{L=N}= - [ ∑ start_POSTSUBSCRIPT | italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_S | end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT | italic_S | end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT ] ⋅ under⏟ start_ARG italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT italic_L = italic_N ∖ italic_S end_POSTSUBSCRIPT - under⏟ start_ARG italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT italic_L = italic_N end_POSTSUBSCRIPT
LS,LN[TNSL(|T′′|=0|S||SL|C|S||SL||T′′|(1)|T|+|T′′|)]vor(𝒙NL)subscriptformulae-sequence𝐿𝑆𝐿𝑁delimited-[]subscriptsuperscript𝑇𝑁𝑆𝐿superscriptsubscriptsuperscript𝑇′′0𝑆𝑆𝐿superscriptsubscript𝐶𝑆𝑆𝐿superscript𝑇′′superscript1superscript𝑇superscript𝑇′′subscript𝑣orsubscript𝒙𝑁𝐿\displaystyle\quad-\sum_{L\cap S\neq\emptyset,L\neq N}\left[\sum_{T^{\prime}% \subseteq N\setminus S\setminus L}\left(\sum_{|T^{\prime\prime}|=0}^{|S|-|S% \cap L|}C_{|S|-|S\cap L|}^{|T^{\prime\prime}|}(-1)^{|T^{\prime}|+|T^{\prime% \prime}|}\right)\right]\cdot v_{\text{or}}(\bm{x}_{N\setminus L})- ∑ start_POSTSUBSCRIPT italic_L ∩ italic_S ≠ ∅ , italic_L ≠ italic_N end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ italic_N ∖ italic_S ∖ italic_L end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT | = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_S | - | italic_S ∩ italic_L | end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT | italic_S | - | italic_S ∩ italic_L | end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | + | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT ) ] ⋅ italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT )
LS=,LNS[TNSL(|T′′|=0|S|C|S||T′′|(1)|T|+|T′′|)]vor(𝒙NL)subscriptformulae-sequence𝐿𝑆𝐿𝑁𝑆delimited-[]subscriptsuperscript𝑇𝑁𝑆𝐿superscriptsubscriptsuperscript𝑇′′0𝑆superscriptsubscript𝐶𝑆superscript𝑇′′superscript1superscript𝑇superscript𝑇′′subscript𝑣orsubscript𝒙𝑁𝐿\displaystyle\quad-\sum_{L\cap S=\emptyset,L\neq N\setminus S}\left[\sum_{T^{% \prime}\subseteq N\setminus S\setminus L}\left(\sum_{|T^{\prime\prime}|=0}^{|S% |}C_{|S|}^{|T^{\prime\prime}|}(-1)^{|T^{\prime}|+|T^{\prime\prime}|}\right)% \right]\cdot v_{\text{or}}(\bm{x}_{N\setminus L})- ∑ start_POSTSUBSCRIPT italic_L ∩ italic_S = ∅ , italic_L ≠ italic_N ∖ italic_S end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ italic_N ∖ italic_S ∖ italic_L end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT | = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_S | end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT | italic_S | end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | + | italic_T start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT ) ] ⋅ italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT )
=(1)vor(𝒙S)vor(𝒙)LS,LN[TNSL0]vor(𝒙NL)absent1subscript𝑣orsubscript𝒙𝑆subscript𝑣orsubscript𝒙subscriptformulae-sequence𝐿𝑆𝐿𝑁delimited-[]subscriptsuperscript𝑇𝑁𝑆𝐿0subscript𝑣orsubscript𝒙𝑁𝐿\displaystyle=-(-1)\cdot v_{\text{or}}(\bm{x}_{S})-v_{\text{or}}(\bm{x}_{% \emptyset})-\sum_{L\cap S\neq\emptyset,L\neq N}\left[\sum_{T^{\prime}\subseteq N% \setminus S\setminus L}0\right]\cdot v_{\text{or}}(\bm{x}_{N\setminus L})= - ( - 1 ) ⋅ italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) - italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_L ∩ italic_S ≠ ∅ , italic_L ≠ italic_N end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ italic_N ∖ italic_S ∖ italic_L end_POSTSUBSCRIPT 0 ] ⋅ italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT )
LS=,LNS[TNSL0]vor(𝒙NL)subscriptformulae-sequence𝐿𝑆𝐿𝑁𝑆delimited-[]subscriptsuperscript𝑇𝑁𝑆𝐿0subscript𝑣orsubscript𝒙𝑁𝐿\displaystyle\quad-\sum_{L\cap S=\emptyset,L\neq N\setminus S}\left[\sum_{T^{% \prime}\subseteq N\setminus S\setminus L}0\right]\cdot v_{\text{or}}(\bm{x}_{N% \setminus L})- ∑ start_POSTSUBSCRIPT italic_L ∩ italic_S = ∅ , italic_L ≠ italic_N ∖ italic_S end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ italic_N ∖ italic_S ∖ italic_L end_POSTSUBSCRIPT 0 ] ⋅ italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_N ∖ italic_L end_POSTSUBSCRIPT )
=vor(𝒙S)vor(𝒙)absentsubscript𝑣orsubscript𝒙𝑆subscript𝑣orsubscript𝒙\displaystyle=v_{\text{or}}(\bm{x}_{S})-v_{\text{or}}(\bm{x}_{\emptyset})= italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) - italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT )
=vor(𝒙S)absentsubscript𝑣orsubscript𝒙𝑆\displaystyle=v_{\text{or}}(\bm{x}_{S})= italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT )

(3) Universal matching theorem of AND-OR interactions.

With the universal matching theorem of AND interactions and the universal matching theorem of OR interactions, we can easily get v(𝒙S)=vand(𝒙S)+vor(𝒙S)=v(𝒙)+TSIand(T|𝒙)+T:TSIor(T|𝒙)𝑣subscript𝒙𝑆subscript𝑣andsubscript𝒙𝑆subscript𝑣orsubscript𝒙𝑆𝑣subscript𝒙subscript𝑇𝑆subscript𝐼andconditional𝑇𝒙subscript:𝑇𝑇𝑆subscript𝐼orconditional𝑇𝒙v(\bm{x}_{S})=v_{\text{\rm and}}(\bm{x}_{S})+v_{\text{\rm or}}(\bm{x}_{S})=v(% \bm{x}_{\emptyset})+\sum_{\emptyset\neq T\subseteq S}I_{\text{\rm and}}(T|\bm{% x})+\!\!\sum_{T:T\cap S\neq\emptyset}I_{\text{\rm or}}(T|\bm{x})italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = italic_v start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) + italic_v start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT ∅ ≠ italic_T ⊆ italic_S end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT and end_POSTSUBSCRIPT ( italic_T | bold_italic_x ) + ∑ start_POSTSUBSCRIPT italic_T : italic_T ∩ italic_S ≠ ∅ end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT or end_POSTSUBSCRIPT ( italic_T | bold_italic_x ), thus, we obtain the universal matching theorem of AND-OR interactions.

F.2 Proof of Eq. (6) and Eq. (7)

Before we give the derivation of Eq. (6) and Eq. (7), we first prove the following lemma.

Lemma 3.

The effect I(T|𝐱)𝐼conditional𝑇𝐱I(T|\bm{x})italic_I ( italic_T | bold_italic_x ) of an AND interaction w.r.t. subset T𝑇Titalic_T on sample 𝐱𝐱\bm{x}bold_italic_x can be rewritten as

I(T|𝒙)=𝝅QT1i=1nπi!π1++πnvx1π1xnπn|𝒙=𝒙iT(xibi)πi,𝐼conditional𝑇𝒙evaluated-atsubscript𝝅subscript𝑄𝑇1superscriptsubscriptproduct𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝜋1subscript𝜋𝑛𝑣superscriptsubscript𝑥1subscript𝜋1superscriptsubscript𝑥𝑛subscript𝜋𝑛𝒙subscript𝒙subscriptproduct𝑖𝑇superscriptsubscript𝑥𝑖subscript𝑏𝑖subscript𝜋𝑖I(T|\bm{x})=\sum\nolimits_{\bm{\pi}\in Q_{T}}\frac{1}{\prod_{i=1}^{n}\pi_{i}!}% \left.\frac{\partial^{\pi_{1}+\cdots+\pi_{n}}v}{\partial x_{1}^{\pi_{1}}\cdots% \partial x_{n}^{\pi_{n}}}\right|_{\bm{x}=\bm{x}_{\emptyset}}\prod_{i\in T}(x_{% i}-b_{i})^{\pi_{i}},italic_I ( italic_T | bold_italic_x ) = ∑ start_POSTSUBSCRIPT bold_italic_π ∈ italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i ∈ italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , (19)

where QT={[π1,,πn]iT,πi+;iT,πi=0}subscript𝑄𝑇conditional-setsuperscriptsubscript𝜋1subscript𝜋𝑛topformulae-sequencefor-all𝑖𝑇formulae-sequencesubscript𝜋𝑖superscriptformulae-sequencefor-all𝑖𝑇subscript𝜋𝑖0Q_{T}=\{[\pi_{1},\dots,\pi_{n}]^{\top}\mid\forall\ i\in T,\pi_{i}\in\mathbb{N}% ^{+};\forall\ i\not\in T,\pi_{i}=0\}italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = { [ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∣ ∀ italic_i ∈ italic_T , italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ; ∀ italic_i ∉ italic_T , italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 }.

Note that a similar proof was first introduced in [26].

Proof.

Let us denote the function on the right of Eq. (19) by K(T|𝒙)𝐾conditional𝑇𝒙K(T|\bm{x})italic_K ( italic_T | bold_italic_x ), i.e., for S𝑆S\neq\emptysetitalic_S ≠ ∅,

K(T|𝒙)=def𝝅QT1i=1nπi!π1++πnvx1π1xnπn|𝒙=𝒙iT(xibi)πi.evaluated-at𝐾conditional𝑇𝒙defsubscript𝝅subscript𝑄𝑇1superscriptsubscriptproduct𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝜋1subscript𝜋𝑛𝑣superscriptsubscript𝑥1subscript𝜋1superscriptsubscript𝑥𝑛subscript𝜋𝑛𝒙subscript𝒙subscriptproduct𝑖𝑇superscriptsubscript𝑥𝑖subscript𝑏𝑖subscript𝜋𝑖K(T|\bm{x})\overset{\text{\rm def}}{=}\sum\nolimits_{\bm{\pi}\in Q_{T}}\frac{1% }{\prod_{i=1}^{n}\pi_{i}!}\left.\frac{\partial^{\pi_{1}+\cdots+\pi_{n}}v}{% \partial x_{1}^{\pi_{1}}\cdots\partial x_{n}^{\pi_{n}}}\right|_{\bm{x}=\bm{x}_% {\emptyset}}\prod_{i\in T}(x_{i}-b_{i})^{\pi_{i}}.italic_K ( italic_T | bold_italic_x ) overdef start_ARG = end_ARG ∑ start_POSTSUBSCRIPT bold_italic_π ∈ italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i ∈ italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT . (20)

Actually, it has been proven in [13] and [23] that the AND interaction I(T|𝒙)𝐼conditional𝑇𝒙I(T|\bm{x})italic_I ( italic_T | bold_italic_x ) (see definition in Eq. (2)) is the unique metric satisfying the following property (an extension of the property for AND-OR interactions is mentioned in Theorem 2), i.e.,

SN,v(𝒙S)=TSI(T|𝒙)+v(𝒙).formulae-sequencefor-all𝑆𝑁𝑣subscript𝒙𝑆subscript𝑇𝑆𝐼conditional𝑇𝒙𝑣subscript𝒙\forall\ S\subseteq N,\ v(\bm{x}_{S})=\sum\nolimits_{\emptyset\neq T\subseteq S% }I(T|\bm{x})+v(\bm{x}_{\emptyset}).∀ italic_S ⊆ italic_N , italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT ∅ ≠ italic_T ⊆ italic_S end_POSTSUBSCRIPT italic_I ( italic_T | bold_italic_x ) + italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) . (21)

Thus, as long as we can prove that K(T|𝒙)𝐾conditional𝑇𝒙K(T|\bm{x})italic_K ( italic_T | bold_italic_x ) also satisfies the above universal matching property, we can obtain I(T|𝒙)=K(T|𝒙)𝐼conditional𝑇𝒙𝐾conditional𝑇𝒙I(T|\bm{x})=K(T|\bm{x})italic_I ( italic_T | bold_italic_x ) = italic_K ( italic_T | bold_italic_x ).

To this end, we only need to prove K(T|𝒙)𝐾conditional𝑇𝒙K(T|\bm{x})italic_K ( italic_T | bold_italic_x ) also satisfies the property in Eq. (21). Specifically, given an input sample 𝒙n𝒙superscript𝑛\bm{x}\in\mathbb{R}^{n}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, let us consider the Taylor expansion of the network output v(𝒙S)𝑣subscript𝒙𝑆v(\bm{x}_{S})italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) of an arbitrarily masked sample 𝒙𝑺subscript𝒙𝑺\bm{x_{S}}bold_italic_x start_POSTSUBSCRIPT bold_italic_S end_POSTSUBSCRIPT, which is expanded at 𝒙=𝒃=[b1,,bn]subscript𝒙𝒃superscriptsubscript𝑏1subscript𝑏𝑛top\bm{x}_{\emptyset}=\bm{b}=[b_{1},\cdots,b_{n}]^{\top}bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT = bold_italic_b = [ italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. Then, we have

SN,v(𝒙S)for-all𝑆𝑁𝑣subscript𝒙𝑆\displaystyle\forall\ S\subseteq N,\ v(\bm{x}_{S})∀ italic_S ⊆ italic_N , italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) =π1=0πn=01i=1nπi!π1++πnvx1π1xnπn|𝒙=𝒙i=1n((𝒙S)ibi)πiabsentevaluated-atsuperscriptsubscriptsubscript𝜋10superscriptsubscriptsubscript𝜋𝑛01superscriptsubscriptproduct𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝜋1subscript𝜋𝑛𝑣superscriptsubscript𝑥1subscript𝜋1superscriptsubscript𝑥𝑛subscript𝜋𝑛𝒙subscript𝒙superscriptsubscriptproduct𝑖1𝑛superscriptsubscriptsubscript𝒙𝑆𝑖subscript𝑏𝑖subscript𝜋𝑖\displaystyle=\sum_{\pi_{1}=0}^{\infty}\cdots\sum_{\pi_{n}=0}^{\infty}\frac{1}% {\prod_{i=1}^{n}\pi_{i}!}\left.\frac{\partial^{\pi_{1}+\cdots+\pi_{n}}v}{% \partial x_{1}^{\pi_{1}}\cdots\partial x_{n}^{\pi_{n}}}\right|_{\bm{x}=\bm{x}_% {\emptyset}}\ \prod_{i=1}^{n}((\bm{x}_{S})_{i}-b_{i})^{\pi_{i}}= ∑ start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ⋯ ∑ start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (22)

where bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the baseline value to mask the input variable xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

According to the definition of the masked sample 𝒙Ssubscript𝒙𝑆\bm{x}_{S}bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT, we have that all variables in S𝑆Sitalic_S keep unchanged and other variables are masked to the baseline value. That is, iSfor-all𝑖𝑆\forall\ i\in S∀ italic_i ∈ italic_S, (𝒙S)i=xi;subscriptsubscript𝒙𝑆𝑖subscript𝑥𝑖(\bm{x}_{S})_{i}=x_{i};( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; iSfor-all𝑖𝑆\forall\ i\not\in S∀ italic_i ∉ italic_S, (𝒙S)i=bisubscriptsubscript𝒙𝑆𝑖subscript𝑏𝑖(\bm{x}_{S})_{i}=b_{i}( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Hence, we obtain iS,((𝒙S)ibi)πi=0formulae-sequencefor-all𝑖𝑆superscriptsubscriptsubscript𝒙𝑆𝑖subscript𝑏𝑖subscript𝜋𝑖0\forall i\not\in S,\ ((\bm{x}_{S})_{i}-b_{i})^{\pi_{i}}=0∀ italic_i ∉ italic_S , ( ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = 0 if πi>0subscript𝜋𝑖0\pi_{i}>0italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0. Then, among all Taylor expansion terms, only terms corresponding to degrees 𝝅𝝅\bm{\pi}bold_italic_π in the set PS={[π1,,πn]iS,πi;iS,πi=0}subscript𝑃𝑆conditional-setsuperscriptsubscript𝜋1subscript𝜋𝑛topformulae-sequencefor-all𝑖𝑆formulae-sequencesubscript𝜋𝑖formulae-sequencefor-all𝑖𝑆subscript𝜋𝑖0P_{S}=\{[\pi_{1},\cdots,\pi_{n}]^{\top}\mid\forall i\in S,\pi_{i}\in\mathbb{N}% ;\forall i\not\in S,\pi_{i}=0\}italic_P start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = { [ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∣ ∀ italic_i ∈ italic_S , italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_N ; ∀ italic_i ∉ italic_S , italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 } may not be zero (we consider the value of ((𝒙S)ibi)πisuperscriptsubscriptsubscript𝒙𝑆𝑖subscript𝑏𝑖subscript𝜋𝑖((\bm{x}_{S})_{i}-b_{i})^{\pi_{i}}( ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT to be always equal to 1 if πi=0subscript𝜋𝑖0\pi_{i}=0italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0). Therefore, Eq. (22) can be re-written as

SN,v(𝒙S)=𝝅PS1i=1nπi!π1++πnvx1π1xnπn|𝒙=𝒙iS(xibi)πi.formulae-sequencefor-all𝑆𝑁𝑣subscript𝒙𝑆evaluated-atsubscript𝝅subscript𝑃𝑆1superscriptsubscriptproduct𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝜋1subscript𝜋𝑛𝑣superscriptsubscript𝑥1subscript𝜋1superscriptsubscript𝑥𝑛subscript𝜋𝑛𝒙subscript𝒙subscriptproduct𝑖𝑆superscriptsubscript𝑥𝑖subscript𝑏𝑖subscript𝜋𝑖\displaystyle\forall\ S\subseteq N,\quad v(\bm{x}_{S})=\sum_{\bm{\pi}\in P_{S}% }\frac{1}{\prod_{i=1}^{n}\pi_{i}!}\left.\frac{\partial^{\pi_{1}+\cdots+\pi_{n}% }v}{\partial x_{1}^{\pi_{1}}\cdots\partial x_{n}^{\pi_{n}}}\right|_{\bm{x}=\bm% {x}_{\emptyset}}\prod_{i\in S}(x_{i}-b_{i})^{\pi_{i}}.∀ italic_S ⊆ italic_N , italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT bold_italic_π ∈ italic_P start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i ∈ italic_S end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT . (23)

We find that the set PSsubscript𝑃𝑆P_{S}italic_P start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT can be divided into multiple disjoint sets as PS=TSQTsubscript𝑃𝑆subscript𝑇𝑆subscript𝑄𝑇P_{S}=\cup_{T\subseteq S}\ Q_{T}italic_P start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = ∪ start_POSTSUBSCRIPT italic_T ⊆ italic_S end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, where QT={[π1,,πn]iT,πi+;iT,πi=0}subscript𝑄𝑇conditional-setsuperscriptsubscript𝜋1subscript𝜋𝑛topformulae-sequencefor-all𝑖𝑇formulae-sequencesubscript𝜋𝑖superscriptformulae-sequencefor-all𝑖𝑇subscript𝜋𝑖0Q_{T}=\{[\pi_{1},\cdots,\pi_{n}]^{\top}\mid\forall i\in T,\pi_{i}\in\mathbb{N}% ^{+};\forall i\not\in T,\pi_{i}=0\}italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = { [ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∣ ∀ italic_i ∈ italic_T , italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ; ∀ italic_i ∉ italic_T , italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 }. Then, we can further write Eq. (23) as

SN,v(𝒙S)for-all𝑆𝑁𝑣subscript𝒙𝑆\displaystyle\forall\ S\subseteq N,\quad v(\bm{x}_{S})∀ italic_S ⊆ italic_N , italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) =TS𝝅QT1i=1nπi!π1++πnvx1π1xnπn|𝒙=𝒙iT(xibi)πiabsentevaluated-atsubscript𝑇𝑆subscript𝝅subscript𝑄𝑇1superscriptsubscriptproduct𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝜋1subscript𝜋𝑛𝑣superscriptsubscript𝑥1subscript𝜋1superscriptsubscript𝑥𝑛subscript𝜋𝑛𝒙subscript𝒙subscriptproduct𝑖𝑇superscriptsubscript𝑥𝑖subscript𝑏𝑖subscript𝜋𝑖\displaystyle=\sum_{T\subseteq S}\sum_{\bm{\pi}\in Q_{T}}\frac{1}{\prod_{i=1}^% {n}\pi_{i}!}\left.\frac{\partial^{\pi_{1}+\cdots+\pi_{n}}v}{\partial x_{1}^{% \pi_{1}}\cdots\partial x_{n}^{\pi_{n}}}\right|_{\bm{x}=\bm{x}_{\emptyset}}% \prod_{i\in T}(x_{i}-b_{i})^{\pi_{i}}= ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_S end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT bold_italic_π ∈ italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i ∈ italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (24)
=TSK(T|𝒙)+v(𝒙). // according to the definition of K(T|𝒙) in Eq. (20)formulae-sequenceabsentsubscript𝑇𝑆𝐾conditional𝑇𝒙𝑣subscript𝒙 // according to the definition of K(T|𝒙) in Eq. (20)\displaystyle=\sum_{\emptyset\neq T\subseteq S}K(T|\bm{x})+v(\bm{x}_{\emptyset% }).\text{\quad// according to the definition of $K(T|\bm{x})$ in Eq.~{}(\ref{% eq:apdx-define-KT})}= ∑ start_POSTSUBSCRIPT ∅ ≠ italic_T ⊆ italic_S end_POSTSUBSCRIPT italic_K ( italic_T | bold_italic_x ) + italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) . // according to the definition of italic_K ( italic_T | bold_italic_x ) in Eq. ( )

The last step is obtained as follows. When T=𝑇T=\emptysetitalic_T = ∅, QTsubscript𝑄𝑇Q_{T}italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT only has one element 𝝅=[0,,0]𝝅superscript00top\bm{\pi}=[0,\cdots,0]^{\top}bold_italic_π = [ 0 , ⋯ , 0 ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, which corresponds to the term v(𝒙)𝑣subscript𝒙v(\bm{x}_{\emptyset})italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ).

Thus, K(T|𝒙)𝐾conditional𝑇𝒙K(T|\bm{x})italic_K ( italic_T | bold_italic_x ) satisfies the property in Eq. (21), and this means I(T|𝒙)=K(T|𝒙)=𝝅QT1i=1nπi!π1++πnvx1π1xnπn|𝒙=𝒙iT(xibi)πi𝐼conditional𝑇𝒙𝐾conditional𝑇𝒙evaluated-atsubscript𝝅subscript𝑄𝑇1superscriptsubscriptproduct𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝜋1subscript𝜋𝑛𝑣superscriptsubscript𝑥1subscript𝜋1superscriptsubscript𝑥𝑛subscript𝜋𝑛𝒙subscript𝒙subscriptproduct𝑖𝑇superscriptsubscript𝑥𝑖subscript𝑏𝑖subscript𝜋𝑖I(T|\bm{x})=K(T|\bm{x})=\sum\nolimits_{\bm{\pi}\in Q_{T}}\frac{1}{\prod_{i=1}^% {n}\pi_{i}!}\left.\frac{\partial^{\pi_{1}+\cdots+\pi_{n}}v}{\partial x_{1}^{% \pi_{1}}\cdots\partial x_{n}^{\pi_{n}}}\right|_{\bm{x}=\bm{x}_{\emptyset}}% \prod_{i\in T}(x_{i}-b_{i})^{\pi_{i}}italic_I ( italic_T | bold_italic_x ) = italic_K ( italic_T | bold_italic_x ) = ∑ start_POSTSUBSCRIPT bold_italic_π ∈ italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i ∈ italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT.

Then, let us continue the proof of Eq. (6) and Eq. (7).

Proof.

Given a specific sample 𝒙^^𝒙\hat{\bm{x}}over^ start_ARG bold_italic_x end_ARG, let us consider the following function defined in Eq. (6) and Eq. (7).

f(𝒙)=TNwTJT(𝒙),𝑓𝒙subscript𝑇𝑁subscript𝑤𝑇subscript𝐽𝑇𝒙f(\bm{x})=\sum\nolimits_{T\subseteq N}w_{T}\ J_{T}(\bm{x}),italic_f ( bold_italic_x ) = ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) , (25)

where the scalar weight wT=I(T|𝒙=𝒙^)subscript𝑤𝑇𝐼conditional𝑇𝒙^𝒙w_{T}=I(T|\bm{x}=\hat{\bm{x}})italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = italic_I ( italic_T | bold_italic_x = over^ start_ARG bold_italic_x end_ARG ), and the function JT(𝒙)=𝝅QT1i=1nπi!π1++πnvx1π1xnπn|𝒙=𝒙iT(xibi)πi/wTsubscript𝐽𝑇𝒙evaluated-atsubscript𝝅subscript𝑄𝑇1superscriptsubscriptproduct𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝜋1subscript𝜋𝑛𝑣superscriptsubscript𝑥1subscript𝜋1superscriptsubscript𝑥𝑛subscript𝜋𝑛𝒙subscript𝒙subscriptproduct𝑖𝑇superscriptsubscript𝑥𝑖subscript𝑏𝑖subscript𝜋𝑖subscript𝑤𝑇J_{T}(\bm{x})=\sum\nolimits_{\bm{\pi}\in Q_{T}}\frac{1}{\prod_{i=1}^{n}\pi_{i}% !}\left.\frac{\partial^{\pi_{1}+\cdots+\pi_{n}}v}{\partial x_{1}^{\pi_{1}}% \cdots\partial x_{n}^{\pi_{n}}}\right|_{\bm{x}=\bm{x}_{\emptyset}}\prod_{i\in T% }(x_{i}-b_{i})^{\pi_{i}}/w_{T}italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) = ∑ start_POSTSUBSCRIPT bold_italic_π ∈ italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i ∈ italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT / italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT.

We will then prove that SN,f(𝒙^S)=v(𝒙^S)formulae-sequencefor-all𝑆𝑁𝑓subscript^𝒙𝑆𝑣subscript^𝒙𝑆\forall S\subseteq N,\ f(\hat{\bm{x}}_{S})=v(\hat{\bm{x}}_{S})∀ italic_S ⊆ italic_N , italic_f ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = italic_v ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ).

f(𝒙^S)𝑓subscript^𝒙𝑆\displaystyle f(\hat{\bm{x}}_{S})italic_f ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) =TNwTJT(𝒙^S)absentsubscript𝑇𝑁subscript𝑤𝑇subscript𝐽𝑇subscript^𝒙𝑆\displaystyle=\sum_{T\subseteq N}w_{T}\ J_{T}(\hat{\bm{x}}_{S})= ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) (26)
=TN𝝅QT1i=1nπi!π1++πnvx1π1xnπn|𝒙=𝒙iT((𝒙^S)ibi)πi // wT cancels outabsentevaluated-atsubscript𝑇𝑁subscript𝝅subscript𝑄𝑇1superscriptsubscriptproduct𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝜋1subscript𝜋𝑛𝑣superscriptsubscript𝑥1subscript𝜋1superscriptsubscript𝑥𝑛subscript𝜋𝑛𝒙subscript𝒙subscriptproduct𝑖𝑇superscriptsubscriptsubscript^𝒙𝑆𝑖subscript𝑏𝑖subscript𝜋𝑖 // wT cancels out\displaystyle=\sum_{T\subseteq N}\sum_{\bm{\pi}\in Q_{T}}\frac{1}{\prod_{i=1}^% {n}\pi_{i}!}\left.\frac{\partial^{\pi_{1}+\cdots+\pi_{n}}v}{\partial x_{1}^{% \pi_{1}}\cdots\partial x_{n}^{\pi_{n}}}\right|_{\bm{x}=\bm{x}_{\emptyset}}% \prod_{i\in T}(({\hat{\bm{x}}}_{S})_{i}-b_{i})^{\pi_{i}}\text{\ \ // $w_{T}$ % cancels out}= ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_N end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT bold_italic_π ∈ italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i ∈ italic_T end_POSTSUBSCRIPT ( ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT // italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT cancels out (27)
=TS𝝅QT1i=1nπi!π1++πnvx1π1xnπn|𝒙=𝒙iT((𝒙^S)ibi)πiabsentevaluated-atsubscript𝑇𝑆subscript𝝅subscript𝑄𝑇1superscriptsubscriptproduct𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝜋1subscript𝜋𝑛𝑣superscriptsubscript𝑥1subscript𝜋1superscriptsubscript𝑥𝑛subscript𝜋𝑛𝒙subscript𝒙subscriptproduct𝑖𝑇superscriptsubscriptsubscript^𝒙𝑆𝑖subscript𝑏𝑖subscript𝜋𝑖\displaystyle=\sum_{T\subseteq S}\sum_{\bm{\pi}\in Q_{T}}\frac{1}{\prod_{i=1}^% {n}\pi_{i}!}\left.\frac{\partial^{\pi_{1}+\cdots+\pi_{n}}v}{\partial x_{1}^{% \pi_{1}}\cdots\partial x_{n}^{\pi_{n}}}\right|_{\bm{x}=\bm{x}_{\emptyset}}% \prod_{i\in T}(({\hat{\bm{x}}}_{S})_{i}-b_{i})^{\pi_{i}}= ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_S end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT bold_italic_π ∈ italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i ∈ italic_T end_POSTSUBSCRIPT ( ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (28)
  // if TSnot-subset-of-nor-equals𝑇𝑆T\nsubseteq Sitalic_T ⊈ italic_S, then jTS𝑗𝑇𝑆\exists j\in T\setminus S∃ italic_j ∈ italic_T ∖ italic_S, s.t. (𝒙^S)jbj=0subscriptsubscript^𝒙𝑆𝑗subscript𝑏𝑗0(\hat{\bm{x}}_{S})_{j}-b_{j}=0( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = 0, which makes the whole term zero (29)
=TS𝝅QT1i=1nπi!π1++πnvx1π1xnπn|𝒙=𝒙iT(x^ibi)πiabsentevaluated-atsubscript𝑇𝑆subscript𝝅subscript𝑄𝑇1superscriptsubscriptproduct𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝜋1subscript𝜋𝑛𝑣superscriptsubscript𝑥1subscript𝜋1superscriptsubscript𝑥𝑛subscript𝜋𝑛𝒙subscript𝒙subscriptproduct𝑖𝑇superscriptsubscript^𝑥𝑖subscript𝑏𝑖subscript𝜋𝑖\displaystyle=\sum_{T\subseteq S}\sum_{\bm{\pi}\in Q_{T}}\frac{1}{\prod_{i=1}^% {n}\pi_{i}!}\left.\frac{\partial^{\pi_{1}+\cdots+\pi_{n}}v}{\partial x_{1}^{% \pi_{1}}\cdots\partial x_{n}^{\pi_{n}}}\right|_{\bm{x}=\bm{x}_{\emptyset}}% \prod_{i\in T}(\hat{x}_{i}-b_{i})^{\pi_{i}}= ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_S end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT bold_italic_π ∈ italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i ∈ italic_T end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (30)
  // when TS𝑇𝑆T\subseteq Sitalic_T ⊆ italic_S, we have iT,(𝒙^S)i=x^iformulae-sequencefor-all𝑖𝑇subscriptsubscript^𝒙𝑆𝑖subscript^𝑥𝑖\forall i\in T,(\hat{\bm{x}}_{S})_{i}=\hat{x}_{i}∀ italic_i ∈ italic_T , ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (31)
=TSI(T|𝒙=𝒙^)+v(𝒙) // the inverse direction of Lemma 3 we have just provenabsentsubscript𝑇𝑆𝐼conditional𝑇𝒙^𝒙𝑣subscript𝒙 // the inverse direction of Lemma 3 we have just proven\displaystyle=\sum_{\emptyset\neq T\subseteq S}I(T|\bm{x}=\hat{\bm{x}})+v(\bm{% x}_{\emptyset})\text{\ \ // the inverse direction of Lemma \ref{lemma:IS-% analytic-form} we have just proven}= ∑ start_POSTSUBSCRIPT ∅ ≠ italic_T ⊆ italic_S end_POSTSUBSCRIPT italic_I ( italic_T | bold_italic_x = over^ start_ARG bold_italic_x end_ARG ) + italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) // the inverse direction of Lemma we have just proven (32)
=v(𝒙^S) // the inverse direction of universal matching theoremabsent𝑣subscript^𝒙𝑆 // the inverse direction of universal matching theorem\displaystyle=v(\hat{\bm{x}}_{S})\text{\ \ // the inverse direction of % universal matching theorem}= italic_v ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) // the inverse direction of universal matching theorem (33)

Remark. The function f(𝒙)𝑓𝒙f(\bm{x})italic_f ( bold_italic_x ) essentially provides a continuous implementation of Eq. (3) in the universal matching theorem (Theorem 2). The weight wT=I(T|𝒙=𝒙^)subscript𝑤𝑇𝐼conditional𝑇𝒙^𝒙w_{T}=I(T|\bm{x}=\hat{\bm{x}})italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = italic_I ( italic_T | bold_italic_x = over^ start_ARG bold_italic_x end_ARG ) is the interaction effect w.r.t. to subset T𝑇Titalic_T on the unmasked sample 𝒙^^𝒙\hat{\bm{x}}over^ start_ARG bold_italic_x end_ARG, while the function JT(𝒙)subscript𝐽𝑇𝒙J_{T}(\bm{x})italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) is a continuous extension of the indicator function 𝟙(𝒙^S triggers the AND relation T)1subscript^𝒙𝑆 triggers the AND relation 𝑇\mathbbm{1}(\hat{\bm{x}}_{S}\text{ triggers the AND relation }T)blackboard_1 ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT triggers the AND relation italic_T ) (thus we call JT(𝒙)subscript𝐽𝑇𝒙J_{T}(\bm{x})italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) a triggering function and the value of this function triggering strength).

F.3 Proof of Lemma 1

Proof.

Given the inference scores on masked samples {v~(𝒙S):SN}conditional-set~𝑣subscript𝒙𝑆𝑆𝑁\{\widetilde{v}(\bm{x}_{S}):S\subseteq N\}{ over~ start_ARG italic_v end_ARG ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) : italic_S ⊆ italic_N }, the interaction between input variables w.r.t. TN𝑇𝑁T\subseteq Nitalic_T ⊆ italic_N can be computed as I~(T|𝒙)=ST(1)|T||S|v~(𝒙S)~𝐼conditional𝑇𝒙subscript𝑆𝑇superscript1𝑇𝑆~𝑣subscript𝒙𝑆\widetilde{I}(T|\bm{x})=\sum_{S\subseteq T}(-1)^{|T|-|S|}\ \widetilde{v}(\bm{x% }_{S})over~ start_ARG italic_I end_ARG ( italic_T | bold_italic_x ) = ∑ start_POSTSUBSCRIPT italic_S ⊆ italic_T end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_S | end_POSTSUPERSCRIPT over~ start_ARG italic_v end_ARG ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) (the computation of AND interactions in Eq. (2)).

Since we assume that SN,v~(𝒙S)=v(𝒙S)+ΔvSformulae-sequencefor-all𝑆𝑁~𝑣subscript𝒙𝑆𝑣subscript𝒙𝑆Δsubscript𝑣𝑆\forall S\subseteq N,\widetilde{v}(\bm{x}_{S})=v(\bm{x}_{S})+\Delta v_{S}∀ italic_S ⊆ italic_N , over~ start_ARG italic_v end_ARG ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) + roman_Δ italic_v start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT, ΔvS𝒩(0,σ2)similar-toΔsubscript𝑣𝑆𝒩0superscript𝜎2\Delta v_{S}\sim\mathcal{N}(0,\sigma^{2})roman_Δ italic_v start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), I~(T|𝒙)~𝐼conditional𝑇𝒙\widetilde{I}(T|\bm{x})over~ start_ARG italic_I end_ARG ( italic_T | bold_italic_x ) can be written as

I~(T|𝒙)~𝐼conditional𝑇𝒙\displaystyle\widetilde{I}(T|\bm{x})over~ start_ARG italic_I end_ARG ( italic_T | bold_italic_x ) =ST(1)|T||S|v~(𝒙S)absentsubscript𝑆𝑇superscript1𝑇𝑆~𝑣subscript𝒙𝑆\displaystyle=\sum_{S\subseteq T}(-1)^{|T|-|S|}\ \widetilde{v}(\bm{x}_{S})= ∑ start_POSTSUBSCRIPT italic_S ⊆ italic_T end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_S | end_POSTSUPERSCRIPT over~ start_ARG italic_v end_ARG ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) (34)
=ST(1)|T||S|(v(𝒙S)+ΔvS)absentsubscript𝑆𝑇superscript1𝑇𝑆𝑣subscript𝒙𝑆Δsubscript𝑣𝑆\displaystyle=\sum_{S\subseteq T}(-1)^{|T|-|S|}\ (v(\bm{x}_{S})+\Delta v_{S})= ∑ start_POSTSUBSCRIPT italic_S ⊆ italic_T end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_S | end_POSTSUPERSCRIPT ( italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) + roman_Δ italic_v start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) (35)
=ST(1)|T||S|v(𝒙S)+ST(1)|T||S|ΔvSabsentsubscript𝑆𝑇superscript1𝑇𝑆𝑣subscript𝒙𝑆subscript𝑆𝑇superscript1𝑇𝑆Δsubscript𝑣𝑆\displaystyle=\sum_{S\subseteq T}(-1)^{|T|-|S|}\ v(\bm{x}_{S})+\sum_{S% \subseteq T}(-1)^{|T|-|S|}\Delta v_{S}= ∑ start_POSTSUBSCRIPT italic_S ⊆ italic_T end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_S | end_POSTSUPERSCRIPT italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_S ⊆ italic_T end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_S | end_POSTSUPERSCRIPT roman_Δ italic_v start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT (36)
=I(T|𝒙)+ΔITabsent𝐼conditional𝑇𝒙Δsubscript𝐼𝑇\displaystyle=I(T|\bm{x})+\Delta I_{T}= italic_I ( italic_T | bold_italic_x ) + roman_Δ italic_I start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT (37)

where I(T|𝒙)=ST(1)|T||S|v(𝒙S)𝐼conditional𝑇𝒙subscript𝑆𝑇superscript1𝑇𝑆𝑣subscript𝒙𝑆I(T|\bm{x})=\sum_{S\subseteq T}(-1)^{|T|-|S|}v(\bm{x}_{S})italic_I ( italic_T | bold_italic_x ) = ∑ start_POSTSUBSCRIPT italic_S ⊆ italic_T end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_S | end_POSTSUPERSCRIPT italic_v ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) is a noiseless component (not a random variable), and ΔIT=ST(1)|T||S|ΔvSΔsubscript𝐼𝑇subscript𝑆𝑇superscript1𝑇𝑆Δsubscript𝑣𝑆\Delta I_{T}=\sum_{S\subseteq T}(-1)^{|T|-|S|}\Delta v_{S}roman_Δ italic_I start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_S ⊆ italic_T end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_S | end_POSTSUPERSCRIPT roman_Δ italic_v start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT is the noise component on the interaction.

Since each Gaussian noise ΔvS𝒩(0,σ2),SNformulae-sequencesimilar-toΔsubscript𝑣𝑆𝒩0superscript𝜎2for-all𝑆𝑁\Delta v_{S}\sim\mathcal{N}(0,\sigma^{2}),\forall S\subseteq Nroman_Δ italic_v start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , ∀ italic_S ⊆ italic_N, is independent and identically distributed, it is easy to see 𝔼[ΔIT]=ST(1)|T||S|𝔼[ΔvS]=0𝔼delimited-[]Δsubscript𝐼𝑇subscript𝑆𝑇superscript1𝑇𝑆𝔼delimited-[]Δsubscript𝑣𝑆0\mathbb{E}[\Delta I_{T}]=\sum_{S\subseteq T}(-1)^{|T|-|S|}\mathbb{E}[\Delta v_% {S}]=0blackboard_E [ roman_Δ italic_I start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ] = ∑ start_POSTSUBSCRIPT italic_S ⊆ italic_T end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_S | end_POSTSUPERSCRIPT blackboard_E [ roman_Δ italic_v start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ] = 0. The variance of ΔITΔsubscript𝐼𝑇\Delta I_{T}roman_Δ italic_I start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is computed as

Var[ΔIT]Vardelimited-[]Δsubscript𝐼𝑇\displaystyle{\rm Var}[\Delta I_{T}]roman_Var [ roman_Δ italic_I start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ] =Var(ST(1)|T||S|ΔvS)absentVarsubscript𝑆𝑇superscript1𝑇𝑆Δsubscript𝑣𝑆\displaystyle={\rm Var}(\sum_{S\subseteq T}(-1)^{|T|-|S|}\Delta v_{S})= roman_Var ( ∑ start_POSTSUBSCRIPT italic_S ⊆ italic_T end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | - | italic_S | end_POSTSUPERSCRIPT roman_Δ italic_v start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) (38)
=Var(ΔvS1)+Var(ΔvS2)++Var(ΔvS2|T|)absentVarΔsubscript𝑣subscript𝑆1VarΔsubscript𝑣subscript𝑆2VarΔsubscript𝑣subscript𝑆superscript2𝑇\displaystyle={\rm Var}(\Delta v_{S_{1}})+{\rm Var}(\Delta v_{S_{2}})+\cdots+{% \rm Var}(\Delta v_{S_{2^{|T|}}})= roman_Var ( roman_Δ italic_v start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + roman_Var ( roman_Δ italic_v start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + ⋯ + roman_Var ( roman_Δ italic_v start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT | italic_T | end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) (39)
=2|T|σ2,absentsuperscript2𝑇superscript𝜎2\displaystyle=2^{|T|}\cdot\sigma^{2},= 2 start_POSTSUPERSCRIPT | italic_T | end_POSTSUPERSCRIPT ⋅ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (40)

because there are a total of 2|T|superscript2𝑇2^{|T|}2 start_POSTSUPERSCRIPT | italic_T | end_POSTSUPERSCRIPT subsets for ST𝑆𝑇S\subseteq Titalic_S ⊆ italic_T.

Furthermore, according to the analytic form of interaction effect in Eq. (19), we note that the values of I~(T|𝒙)~𝐼conditional𝑇𝒙\widetilde{I}(T|\bm{x})over~ start_ARG italic_I end_ARG ( italic_T | bold_italic_x ) and J~T(𝒙)subscript~𝐽𝑇𝒙\widetilde{J}_{T}(\bm{x})over~ start_ARG italic_J end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) have a ratio of wTsubscript𝑤𝑇w_{T}italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. Therefore, if we write J~T(𝒙)=JT(𝒙)+ϵTsubscript~𝐽𝑇𝒙subscript𝐽𝑇𝒙subscriptitalic-ϵ𝑇\widetilde{J}_{T}(\bm{x})=J_{T}(\bm{x})+\epsilon_{T}over~ start_ARG italic_J end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) = italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) + italic_ϵ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, then the noise term satisfies ϵT=ΔIT/wTsubscriptitalic-ϵ𝑇Δsubscript𝐼𝑇subscript𝑤𝑇\epsilon_{T}=\Delta I_{T}/w_{T}italic_ϵ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = roman_Δ italic_I start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT / italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, and thus 𝔼[ϵT]=0,Var[ϵT]2|T|σ2formulae-sequence𝔼delimited-[]subscriptitalic-ϵ𝑇0proportional-toVardelimited-[]subscriptitalic-ϵ𝑇superscript2𝑇superscript𝜎2\mathbb{E}[\epsilon_{T}]=0,{\rm Var}[\epsilon_{T}]\propto 2^{|T|}\sigma^{2}blackboard_E [ italic_ϵ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ] = 0 , roman_Var [ italic_ϵ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ] ∝ 2 start_POSTSUPERSCRIPT | italic_T | end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

F.4 Proof of Theorem 3

Proof.

We concatenate all 𝑱(𝒙S)𝑱subscript𝒙𝑆\bm{J}(\bm{x}_{S})bold_italic_J ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) (w.r.t. all 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT masked samples 𝒙S,SNsubscript𝒙𝑆𝑆𝑁\bm{x}_{S},\ S\subseteq Nbold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT , italic_S ⊆ italic_N) into a matrix 𝑱=[𝑱(𝒙S1),𝑱(𝒙S2),,𝑱(𝒙S2n)]{0,1}2n×2n𝑱superscript𝑱subscript𝒙subscript𝑆1𝑱subscript𝒙subscript𝑆2𝑱subscript𝒙subscript𝑆superscript2𝑛topsuperscript01superscript2𝑛superscript2𝑛\bm{J}=[\bm{J}(\bm{x}_{S_{1}}),\bm{J}(\bm{x}_{S_{2}}),\cdots,\bm{J}(\bm{x}_{S_% {2^{n}}})]^{\top}\in\{0,1\}^{2^{n}\times 2^{n}}bold_italic_J = [ bold_italic_J ( bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , bold_italic_J ( bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , ⋯ , bold_italic_J ( bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT × 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT to represent the triggering strength of 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT interactions on 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT masked samples We also concatenate all noise terms on all 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT masked samples into a matrix 𝓔=[ϵ(1),ϵ(2),,ϵ(2n)]𝓔superscriptsuperscriptbold-italic-ϵ1superscriptbold-italic-ϵ2superscriptbold-italic-ϵsuperscript2𝑛top\bm{\mathcal{E}}=[\bm{\epsilon}^{(1)},\bm{\epsilon}^{(2)},\cdots,\bm{\epsilon}% ^{(2^{n})}]^{\top}bold_caligraphic_E = [ bold_italic_ϵ start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , bold_italic_ϵ start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , ⋯ , bold_italic_ϵ start_POSTSUPERSCRIPT ( 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT to represent the noise term over 𝑱𝑱\bm{J}bold_italic_J. We concatenate the output score vector 𝒚=def[y(𝒙S1),y(𝒙S2),,y(𝒙S2n)]2n𝒚defsuperscript𝑦subscript𝒙subscript𝑆1𝑦subscript𝒙subscript𝑆2𝑦subscript𝒙subscript𝑆superscript2𝑛topsuperscriptsuperscript2𝑛\bm{y}\overset{\text{\rm def}}{=}[y(\bm{x}_{S_{1}}),y(\bm{x}_{S_{2}}),\cdots,y% (\bm{x}_{S_{2^{n}}})]^{\top}\in\mathbb{R}^{2^{n}}bold_italic_y overdef start_ARG = end_ARG [ italic_y ( bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , italic_y ( bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , ⋯ , italic_y ( bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT to represent the finally converged outputs on all 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT masked samples.

The optimal weights 𝒘^^𝒘\hat{\bm{w}}over^ start_ARG bold_italic_w end_ARG can be solved by minimizing the loss function L~(𝒘)~𝐿𝒘\widetilde{L}(\bm{w})over~ start_ARG italic_L end_ARG ( bold_italic_w ) in Eq. (9). The loss function can be rewritten as follows:

𝒘^=argmin𝒘L~(𝒘)^𝒘subscript𝒘~𝐿𝒘\displaystyle\hat{\bm{w}}=\arg\min_{\bm{w}}\widetilde{L}(\bm{w})over^ start_ARG bold_italic_w end_ARG = roman_arg roman_min start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT over~ start_ARG italic_L end_ARG ( bold_italic_w ) (41)
L~(𝒘)=𝔼ϵ𝔼SN[(yS𝒘(𝑱(𝒙S)+ϵ))2],~𝐿𝒘subscript𝔼bold-italic-ϵsubscript𝔼𝑆𝑁delimited-[]superscriptsubscript𝑦𝑆superscript𝒘top𝑱subscript𝒙𝑆bold-italic-ϵ2\displaystyle\widetilde{L}(\bm{w})=\mathbb{E}_{\bm{\epsilon}}\mathbb{E}_{S% \subseteq N}\left[\left(y_{S}-\bm{w}^{\top}(\bm{J}(\bm{x}_{S})+\bm{\epsilon})% \right)^{2}\right],over~ start_ARG italic_L end_ARG ( bold_italic_w ) = blackboard_E start_POSTSUBSCRIPT bold_italic_ϵ end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_S ⊆ italic_N end_POSTSUBSCRIPT [ ( italic_y start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT - bold_italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_J ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) + bold_italic_ϵ ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] , (42)
=𝔼𝓔[12n𝒚(𝑱+𝓔)𝒘22],absentsubscript𝔼𝓔delimited-[]1superscript2𝑛superscriptsubscriptnorm𝒚𝑱𝓔𝒘22\displaystyle=\mathbb{E}_{\bm{\mathcal{E}}}\left[\frac{1}{2^{n}}\|\bm{y}-(\bm{% J}+\bm{\mathcal{E}})\bm{w}\|_{2}^{2}\right],= blackboard_E start_POSTSUBSCRIPT bold_caligraphic_E end_POSTSUBSCRIPT [ divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG ∥ bold_italic_y - ( bold_italic_J + bold_caligraphic_E ) bold_italic_w ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] , (43)
=12n𝔼𝓔[(𝒚(𝑱+𝓔)𝒘)(𝒚(𝑱+𝓔)𝒘)],absent1superscript2𝑛subscript𝔼𝓔delimited-[]superscript𝒚𝑱𝓔𝒘top𝒚𝑱𝓔𝒘\displaystyle=\frac{1}{2^{n}}\mathbb{E}_{\bm{\mathcal{E}}}\left[(\bm{y}-(\bm{J% }+\bm{\mathcal{E}})\bm{w})^{\top}(\bm{y}-(\bm{J}+\bm{\mathcal{E}})\bm{w})% \right],= divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG blackboard_E start_POSTSUBSCRIPT bold_caligraphic_E end_POSTSUBSCRIPT [ ( bold_italic_y - ( bold_italic_J + bold_caligraphic_E ) bold_italic_w ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_y - ( bold_italic_J + bold_caligraphic_E ) bold_italic_w ) ] , (44)
=12n(𝒚𝒚2𝒚𝔼𝓔[(𝑱+𝓔)]𝒘+𝒘𝔼𝓔[(𝑱+𝓔)(𝑱+𝓔)]𝒘).absent1superscript2𝑛superscript𝒚top𝒚2superscript𝒚topsubscript𝔼𝓔delimited-[]𝑱𝓔𝒘superscript𝒘topsubscript𝔼𝓔delimited-[]superscript𝑱𝓔top𝑱𝓔𝒘\displaystyle=\frac{1}{2^{n}}\left(\bm{y}^{\top}\bm{y}-2\bm{y}^{\top}\mathbb{E% }_{\bm{\mathcal{E}}}\left[(\bm{J}+\bm{\mathcal{E}})\right]\bm{w}+\bm{w}^{\top}% \mathbb{E}_{\bm{\mathcal{E}}}\left[(\bm{J}+\bm{\mathcal{E}})^{\top}(\bm{J}+\bm% {\mathcal{E}})\right]\bm{w}\right).= divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG ( bold_italic_y start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_y - 2 bold_italic_y start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT bold_caligraphic_E end_POSTSUBSCRIPT [ ( bold_italic_J + bold_caligraphic_E ) ] bold_italic_w + bold_italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT bold_caligraphic_E end_POSTSUBSCRIPT [ ( bold_italic_J + bold_caligraphic_E ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_J + bold_caligraphic_E ) ] bold_italic_w ) . (45)

Taking the derivative with respect to 𝒘𝒘\bm{w}bold_italic_w and setting it to zero, we get:

L~𝒘~𝐿𝒘\displaystyle\frac{\partial\widetilde{L}}{\partial\bm{w}}divide start_ARG ∂ over~ start_ARG italic_L end_ARG end_ARG start_ARG ∂ bold_italic_w end_ARG =2𝔼𝓔[(𝑱+𝓔)𝒚]+2𝔼𝓔[(𝑱+𝓔)(𝑱+𝓔)𝒘]=0,absent2subscript𝔼𝓔delimited-[]superscript𝑱𝓔top𝒚2subscript𝔼𝓔delimited-[]superscript𝑱𝓔top𝑱𝓔𝒘0\displaystyle=-2\mathbb{E}_{\bm{\mathcal{E}}}\left[(\bm{J}+\bm{\mathcal{E}})^{% \top}\bm{y}\right]+2\mathbb{E}_{\bm{\mathcal{E}}}\left[(\bm{J}+\bm{\mathcal{E}% })^{\top}(\bm{J}+\bm{\mathcal{E}})\bm{w}\right]=0,= - 2 blackboard_E start_POSTSUBSCRIPT bold_caligraphic_E end_POSTSUBSCRIPT [ ( bold_italic_J + bold_caligraphic_E ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_y ] + 2 blackboard_E start_POSTSUBSCRIPT bold_caligraphic_E end_POSTSUBSCRIPT [ ( bold_italic_J + bold_caligraphic_E ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_J + bold_caligraphic_E ) bold_italic_w ] = 0 , (46)
𝔼𝓔[(𝑱+𝓔)(𝑱+𝓔)]𝒘=𝔼𝓔[(𝑱+𝓔)𝒚],absentsubscript𝔼𝓔delimited-[]superscript𝑱𝓔top𝑱𝓔𝒘subscript𝔼𝓔delimited-[]superscript𝑱𝓔top𝒚\displaystyle\Rightarrow\mathbb{E}_{\bm{\mathcal{E}}}\left[(\bm{J}+\bm{% \mathcal{E}})^{\top}(\bm{J}+\bm{\mathcal{E}})\right]\bm{w}=\mathbb{E}_{\bm{% \mathcal{E}}}\left[(\bm{J}+\bm{\mathcal{E}})^{\top}\bm{y}\right],⇒ blackboard_E start_POSTSUBSCRIPT bold_caligraphic_E end_POSTSUBSCRIPT [ ( bold_italic_J + bold_caligraphic_E ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_J + bold_caligraphic_E ) ] bold_italic_w = blackboard_E start_POSTSUBSCRIPT bold_caligraphic_E end_POSTSUBSCRIPT [ ( bold_italic_J + bold_caligraphic_E ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_y ] , (47)
(𝑱𝑱+𝔼𝓔[𝓔𝑱]+𝑱𝔼𝓔[𝓔]+𝔼𝓔[𝓔𝓔])𝒘=𝑱𝒚,absentsuperscript𝑱top𝑱subscript𝔼𝓔delimited-[]superscript𝓔top𝑱superscript𝑱topsubscript𝔼𝓔delimited-[]𝓔subscript𝔼𝓔delimited-[]superscript𝓔top𝓔𝒘superscript𝑱top𝒚\displaystyle\Rightarrow(\bm{J}^{\top}\bm{J}+\mathbb{E}_{\bm{\mathcal{E}}}[\bm% {\mathcal{E}}^{\top}\bm{J}]+\bm{J}^{\top}\mathbb{E}_{\bm{\mathcal{E}}}[\bm{% \mathcal{E}}]+\mathbb{E}_{\bm{\mathcal{E}}}[\bm{\mathcal{E}}^{\top}\bm{% \mathcal{E}}])\bm{w}=\bm{J}^{\top}\bm{y},⇒ ( bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J + blackboard_E start_POSTSUBSCRIPT bold_caligraphic_E end_POSTSUBSCRIPT [ bold_caligraphic_E start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J ] + bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT bold_caligraphic_E end_POSTSUBSCRIPT [ bold_caligraphic_E ] + blackboard_E start_POSTSUBSCRIPT bold_caligraphic_E end_POSTSUBSCRIPT [ bold_caligraphic_E start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_caligraphic_E ] ) bold_italic_w = bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_y , (48)
(𝑱𝑱+𝔼𝓔[𝓔𝓔])𝒘=𝑱𝒚. // because 𝔼[𝓔]=𝟎formulae-sequenceabsentsuperscript𝑱top𝑱subscript𝔼𝓔delimited-[]superscript𝓔top𝓔𝒘superscript𝑱top𝒚 // because 𝔼[𝓔]=𝟎\displaystyle\Rightarrow(\bm{J}^{\top}\bm{J}+\mathbb{E}_{\bm{\mathcal{E}}}[\bm% {\mathcal{E}}^{\top}\bm{\mathcal{E}}])\bm{w}=\bm{J}^{\top}\bm{y}.\text{\quad//% because $\mathbb{E}[\bm{\mathcal{E}}]=\bm{0}$}⇒ ( bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J + blackboard_E start_POSTSUBSCRIPT bold_caligraphic_E end_POSTSUBSCRIPT [ bold_caligraphic_E start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_caligraphic_E ] ) bold_italic_w = bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_y . // because blackboard_E [ bold_caligraphic_E ] = bold_0 (49)

Notice that the sample covariance matrix 1m𝓔𝓔1𝑚superscript𝓔top𝓔\frac{1}{m}\bm{\mathcal{E}}^{\top}\bm{\mathcal{E}}divide start_ARG 1 end_ARG start_ARG italic_m end_ARG bold_caligraphic_E start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_caligraphic_E converges to the true covariance matrix Cov(𝓔)Cov𝓔\text{Cov}(\bm{\mathcal{E}})Cov ( bold_caligraphic_E ), when m=2n𝑚superscript2𝑛m=2^{n}italic_m = 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is large. Therefore, 𝔼𝓔[𝓔𝓔])=𝔼𝓔[2nCov(𝓔)])=2nCov(𝓔)\mathbb{E}_{\bm{\mathcal{E}}}[\bm{\mathcal{E}}^{\top}\bm{\mathcal{E}}])=% \mathbb{E}_{\bm{\mathcal{E}}}[2^{n}\text{Cov}(\bm{\mathcal{E}})])=2^{n}\text{% Cov}(\bm{\mathcal{E}})blackboard_E start_POSTSUBSCRIPT bold_caligraphic_E end_POSTSUBSCRIPT [ bold_caligraphic_E start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_caligraphic_E ] ) = blackboard_E start_POSTSUBSCRIPT bold_caligraphic_E end_POSTSUBSCRIPT [ 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT Cov ( bold_caligraphic_E ) ] ) = 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT Cov ( bold_caligraphic_E ). Because we assume noises on different interactions are independent, it is a diagonal matrix, denoted by Cov(𝓔)=diag(𝒄)Cov𝓔diag𝒄\text{Cov}(\bm{\mathcal{E}})=\text{diag}(\bm{c})Cov ( bold_caligraphic_E ) = diag ( bold_italic_c ), where 𝒄=vec({Var[ϵT]:TN})=vec({2|T|σ2:TN})2n𝒄vecconditional-setVardelimited-[]subscriptitalic-ϵ𝑇𝑇𝑁vecconditional-setsuperscript2𝑇superscript𝜎2𝑇𝑁superscriptsuperscript2𝑛\bm{c}={\rm vec}(\{{\rm Var}[\epsilon_{T}]:{T\subseteq N}\})={\rm vec}(\{2^{|T% |}\sigma^{2}:{T\subseteq N}\})\in\mathbb{R}^{2^{n}}bold_italic_c = roman_vec ( { roman_Var [ italic_ϵ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ] : italic_T ⊆ italic_N } ) = roman_vec ( { 2 start_POSTSUPERSCRIPT | italic_T | end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT : italic_T ⊆ italic_N } ) ∈ blackboard_R start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT denotes the vector of variances of the triggering strength of 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT interactions.

Thus, we have:

(𝑱𝑱+2ndiag(𝒄))𝒘=𝑱𝒚.superscript𝑱top𝑱superscript2𝑛diag𝒄𝒘superscript𝑱top𝒚\displaystyle({\bm{J}}^{\top}\bm{J}+2^{n}\text{diag}(\bm{c}))\bm{w}=\bm{J}^{% \top}\bm{y}.( bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J + 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT diag ( bold_italic_c ) ) bold_italic_w = bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_y . (50)

Next, we can prove that the matrix 𝑱𝑱+2ndiag(𝒄)superscript𝑱top𝑱superscript2𝑛diag𝒄\bm{J}^{\top}\bm{J}+2^{n}\text{diag}(\bm{c})bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J + 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT diag ( bold_italic_c ) is always invertible, as follows. (1) We can prove that 𝑱𝑱superscript𝑱top𝑱\bm{J}^{\top}\bm{J}bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J is positive semi-definite, because 𝒖𝟎,𝒖𝑱𝑱𝒖=𝑱𝒖220formulae-sequencefor-all𝒖0superscript𝒖topsuperscript𝑱top𝑱𝒖superscriptsubscriptnorm𝑱𝒖220\forall\bm{u}\neq\bm{0},\bm{u}^{\top}\bm{J}^{\top}\bm{J}\bm{u}=\|\bm{J}\bm{u}% \|_{2}^{2}\geq 0∀ bold_italic_u ≠ bold_0 , bold_italic_u start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J bold_italic_u = ∥ bold_italic_J bold_italic_u ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ 0. (2) We can further prove that 𝑱𝑱superscript𝑱top𝑱\bm{J}^{\top}\bm{J}bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J is positive definite. Let us denote the eigenvalues of 𝑱𝑱superscript𝑱top𝑱\bm{J}^{\top}\bm{J}bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J as λ1,,λ2nsubscript𝜆1subscript𝜆superscript2𝑛\lambda_{1},\cdots,\lambda_{2^{n}}\in\mathbb{R}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_λ start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∈ blackboard_R (because 𝑱𝑱superscript𝑱top𝑱\bm{J}^{\top}\bm{J}bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J is real symmetric, its eigenvalues must be real). Note that the diagonal elements of 𝑱𝑱superscript𝑱top𝑱\bm{J}^{\top}\bm{J}bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J are all positive, so we have i=12nλi=i=12n(𝑱𝑱)ii>0superscriptsubscriptproduct𝑖1superscript2𝑛subscript𝜆𝑖superscriptsubscriptproduct𝑖1superscript2𝑛subscriptsuperscript𝑱top𝑱𝑖𝑖0\prod_{i=1}^{2^{n}}\lambda_{i}=\prod_{i=1}^{2^{n}}(\bm{J}^{\top}\bm{J})_{ii}>0∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J ) start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT > 0. Combining the positive semi-definiteness, we know that the eigenvalues of 𝑱𝑱superscript𝑱top𝑱\bm{J}^{\top}\bm{J}bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J must be all positive, without having a zero eigenvalue. It means that 𝑱𝑱superscript𝑱top𝑱\bm{J}^{\top}\bm{J}bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J is positive definite. (3) We can prove that 𝑱𝑱+2ndiag(𝒄)superscript𝑱top𝑱superscript2𝑛diag𝒄\bm{J}^{\top}\bm{J}+2^{n}\text{diag}(\bm{c})bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J + 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT diag ( bold_italic_c ) is positive definite. The diagonal matrix 2ndiag(𝒄)superscript2𝑛diag𝒄2^{n}\text{diag}(\bm{c})2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT diag ( bold_italic_c ) is positive definite, because all its diagonal elements are positive. The sum of two positive definite matrices is still positive definite. (4) Since 𝑱𝑱+2ndiag(𝒄)superscript𝑱top𝑱superscript2𝑛diag𝒄\bm{J}^{\top}\bm{J}+2^{n}\text{diag}(\bm{c})bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J + 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT diag ( bold_italic_c ) is positive definite, it cannot have a zero eigenvalue, and is thus invertible.

So the optimal weights can be solved as

𝒘^=(𝑱𝑱+2ndiag(𝒄))1𝑱𝒚.^𝒘superscriptsuperscript𝑱top𝑱superscript2𝑛diag𝒄1superscript𝑱top𝒚\hat{\bm{w}}=({\bm{J}}^{\top}\bm{J}+2^{n}\text{diag}(\bm{c}))^{-1}{\bm{J}}^{% \top}\bm{y}.over^ start_ARG bold_italic_w end_ARG = ( bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J + 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT diag ( bold_italic_c ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_y . (51)

Next we will show that 𝒚=𝑱𝒘𝒚superscript𝑱topsuperscript𝒘\bm{y}=\bm{J^{\top}}\bm{w^{*}}bold_italic_y = bold_italic_J start_POSTSUPERSCRIPT bold_⊤ end_POSTSUPERSCRIPT bold_italic_w start_POSTSUPERSCRIPT bold_∗ end_POSTSUPERSCRIPT. Recall that definition of y(𝒙S)𝑦subscript𝒙𝑆y(\bm{x}_{S})italic_y ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) is given by y(𝒙S)=v(𝒙)+TSwT𝑦subscript𝒙𝑆𝑣subscript𝒙subscript𝑇𝑆subscriptsuperscript𝑤𝑇y(\bm{x}_{S})=v(\bm{x}_{\emptyset})+\sum_{\emptyset\neq T\subseteq S}w^{*}_{T}italic_y ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT ∅ ≠ italic_T ⊆ italic_S end_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT in the main paper. According to the Lemma 2, we have JT(𝒙)=𝟙(TS)subscript𝐽𝑇𝒙1𝑇𝑆J_{T}(\bm{x})=\mathbbm{1}(T\subseteq S)italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) = blackboard_1 ( italic_T ⊆ italic_S ). Therefore, y(𝒙S)𝑦subscript𝒙𝑆y(\bm{x}_{S})italic_y ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) can be rewritten as y(𝒙S)=TNJT(𝒙S)wT𝑦subscript𝒙𝑆subscript𝑇𝑁subscript𝐽𝑇subscript𝒙𝑆subscriptsuperscript𝑤𝑇y(\bm{x}_{S})=\sum_{T\subseteq N}J_{T}(\bm{x}_{S})w^{*}_{T}italic_y ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_N end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, where we define w=defv(𝒙)subscriptsuperscript𝑤def𝑣subscript𝒙w^{*}_{\emptyset}\overset{\text{def}}{=}v(\bm{x}_{\emptyset})italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT overdef start_ARG = end_ARG italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) for simplicity of notation. Writing the sum in vector norm, we obtain y(𝒙S)=𝑱(𝒙S)𝒘𝑦subscript𝒙𝑆𝑱superscriptsubscript𝒙𝑆topsuperscript𝒘y(\bm{x}_{S})=\bm{J}(\bm{x}_{S})^{\top}\bm{w}^{*}italic_y ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = bold_italic_J ( bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Furthermore, the whole vector 𝒚𝒚\bm{y}bold_italic_y can be written as 𝒚=𝑱𝒘𝒚superscript𝑱topsuperscript𝒘\bm{y}=\bm{J^{\top}}\bm{w^{*}}bold_italic_y = bold_italic_J start_POSTSUPERSCRIPT bold_⊤ end_POSTSUPERSCRIPT bold_italic_w start_POSTSUPERSCRIPT bold_∗ end_POSTSUPERSCRIPT.

With 𝒚=𝑱𝒘𝒚superscript𝑱topsuperscript𝒘\bm{y}=\bm{J^{\top}}\bm{w^{*}}bold_italic_y = bold_italic_J start_POSTSUPERSCRIPT bold_⊤ end_POSTSUPERSCRIPT bold_italic_w start_POSTSUPERSCRIPT bold_∗ end_POSTSUPERSCRIPT, we have 𝒘^=(𝑱𝑱+2ndiag(𝒄))1𝑱𝑱𝒘=𝑴^𝒘^𝒘superscriptsuperscript𝑱top𝑱superscript2𝑛diag𝒄1superscript𝑱top𝑱superscript𝒘^𝑴superscript𝒘\hat{\bm{w}}=({\bm{J}}^{\top}\bm{J}+2^{n}{\rm diag}(\bm{c}))^{-1}{\bm{J}}^{% \top}\bm{J}\bm{w}^{*}=\hat{\bm{M}}\bm{w}^{*}over^ start_ARG bold_italic_w end_ARG = ( bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J + 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_diag ( bold_italic_c ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = over^ start_ARG bold_italic_M end_ARG bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. ∎

F.5 Proof of Lemma 2

Proof.

According to Eq. (7), the interaction triggering function on an arbitrarily given sample 𝒙^^𝒙\hat{\bm{x}}over^ start_ARG bold_italic_x end_ARG is given by

JT(𝒙)=𝝅QT1i=1nπi!π1++πnvx1π1xnπn|𝒙=𝒙iT(xibi)πi/wTsubscript𝐽𝑇𝒙evaluated-atsubscript𝝅subscript𝑄𝑇1superscriptsubscriptproduct𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝜋1subscript𝜋𝑛𝑣superscriptsubscript𝑥1subscript𝜋1superscriptsubscript𝑥𝑛subscript𝜋𝑛𝒙subscript𝒙subscriptproduct𝑖𝑇superscriptsubscript𝑥𝑖subscript𝑏𝑖subscript𝜋𝑖subscript𝑤𝑇J_{T}(\bm{x})=\sum_{\bm{\pi}\in Q_{T}}\frac{1}{\prod_{i=1}^{n}\pi_{i}!}\frac{% \partial^{\pi_{1}+\cdots+\pi_{n}}v}{\partial x_{1}^{\pi_{1}}\cdots\partial x_{% n}^{\pi_{n}}}\Big{|}_{\bm{x}=\bm{x}_{\emptyset}}\prod_{i\in T}\left({x_{i}-b_{% i}}\right)^{\pi_{i}}/w_{T}italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_x ) = ∑ start_POSTSUBSCRIPT bold_italic_π ∈ italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i ∈ italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT / italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT (52)

where wT=I(T|𝒙=𝒙^)subscript𝑤𝑇𝐼conditional𝑇𝒙^𝒙w_{T}=I(T|\bm{x}=\hat{\bm{x}})italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = italic_I ( italic_T | bold_italic_x = over^ start_ARG bold_italic_x end_ARG ), and QT={[π1,,πn]:iT,πi+;iT,πi=0}subscript𝑄𝑇conditional-setsuperscriptsubscript𝜋1subscript𝜋𝑛topformulae-sequencefor-all𝑖𝑇formulae-sequencesubscript𝜋𝑖superscriptformulae-sequencefor-all𝑖𝑇subscript𝜋𝑖0Q_{T}=\{[\pi_{1},\dots,\pi_{n}]^{\top}:\forall i\in T,\pi_{i}\in\mathbb{N}^{+}% ;\forall i\not\in T,\pi_{i}=0\}italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = { [ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT : ∀ italic_i ∈ italic_T , italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ; ∀ italic_i ∉ italic_T , italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 }.

Specifically, now we consider a masked sample 𝒙^Ssubscript^𝒙𝑆\hat{\bm{x}}_{S}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT, and we will prove that JT(𝒙^S)=𝟙(TS)subscript𝐽𝑇subscript^𝒙𝑆1𝑇𝑆J_{T}(\hat{\bm{x}}_{S})=\mathbbm{1}(T\subseteq S)italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = blackboard_1 ( italic_T ⊆ italic_S ). We consider the following two cases.

Case 1: TSnot-subset-of-nor-equals𝑇𝑆T\nsubseteq Sitalic_T ⊈ italic_S. Then, there exists some jTS𝑗𝑇𝑆j\in T\setminus Sitalic_j ∈ italic_T ∖ italic_S. Since jS𝑗𝑆j\notin Sitalic_j ∉ italic_S, according to the masking rule of the sample 𝒙^Ssubscript^𝒙𝑆\hat{\bm{x}}_{S}over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT, we have (𝒙^S)jbj=0subscriptsubscript^𝒙𝑆𝑗subscript𝑏𝑗0(\hat{\bm{x}}_{S})_{j}-b_{j}=0( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = 0. Since jT𝑗𝑇j\in Titalic_j ∈ italic_T, we have πj+subscript𝜋𝑗superscript\pi_{j}\in\mathbb{N}^{+}italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT. Therefore, ((𝒙^S)jbj)πj=0superscriptsubscriptsubscript^𝒙𝑆𝑗subscript𝑏𝑗subscript𝜋𝑗0((\hat{\bm{x}}_{S})_{j}-b_{j})^{\pi_{j}}=0( ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = 0. In this way, we have

𝝅QT,iT((𝒙^S)ibi)πi=0.formulae-sequencefor-all𝝅subscript𝑄𝑇subscriptproduct𝑖𝑇superscriptsubscriptsubscript^𝒙𝑆𝑖subscript𝑏𝑖subscript𝜋𝑖0\forall\bm{\pi}\in Q_{T},\quad\prod_{i\in T}\left({(\hat{\bm{x}}_{S})_{i}-b_{i% }}\right)^{\pi_{i}}=0.∀ bold_italic_π ∈ italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , ∏ start_POSTSUBSCRIPT italic_i ∈ italic_T end_POSTSUBSCRIPT ( ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = 0 . (53)

Since each term in the summation equals zero, we have JT(𝒙^S)=0subscript𝐽𝑇subscript^𝒙𝑆0J_{T}(\hat{\bm{x}}_{S})=0italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = 0.

Case 2: TS𝑇𝑆T\subseteq Sitalic_T ⊆ italic_S. In this case, iTfor-all𝑖𝑇\forall i\in T∀ italic_i ∈ italic_T, we have iS𝑖𝑆i\in Sitalic_i ∈ italic_S. Therefore, according to the masking rule, we have iTiS(𝒙^S)i=x^ifor-all𝑖𝑇𝑖𝑆subscriptsubscript^𝒙𝑆𝑖subscript^𝑥𝑖\forall i\in T\Rightarrow i\in S\Rightarrow(\hat{\bm{x}}_{S})_{i}=\hat{x}_{i}∀ italic_i ∈ italic_T ⇒ italic_i ∈ italic_S ⇒ ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

According to the analytic form of I(T|𝒙)𝐼conditional𝑇𝒙I(T|\bm{x})italic_I ( italic_T | bold_italic_x ) in Eq. (19) in the proof in Appendix F.2, we can derive the value of wTsubscript𝑤𝑇w_{T}italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT as

wT=I(T|𝒙=𝒙^)=𝝅QT1i=1nπi!π1++πnvx1π1xnπn|𝒙=𝒙iT(x^ibi)πi.subscript𝑤𝑇𝐼conditional𝑇𝒙^𝒙evaluated-atsubscript𝝅subscript𝑄𝑇1superscriptsubscriptproduct𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝜋1subscript𝜋𝑛𝑣superscriptsubscript𝑥1subscript𝜋1superscriptsubscript𝑥𝑛subscript𝜋𝑛𝒙subscript𝒙subscriptproduct𝑖𝑇superscriptsubscript^𝑥𝑖subscript𝑏𝑖subscript𝜋𝑖w_{T}=I(T|\bm{x}=\hat{\bm{x}})=\sum_{\bm{\pi}\in Q_{T}}\frac{1}{\prod_{i=1}^{n% }\pi_{i}!}\frac{\partial^{\pi_{1}+\cdots+\pi_{n}}v}{\partial x_{1}^{\pi_{1}}% \cdots\partial x_{n}^{\pi_{n}}}\Big{|}_{\bm{x}=\bm{x}_{\emptyset}}\prod_{i\in T% }\left({\hat{x}_{i}-b_{i}}\right)^{\pi_{i}}.italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = italic_I ( italic_T | bold_italic_x = over^ start_ARG bold_italic_x end_ARG ) = ∑ start_POSTSUBSCRIPT bold_italic_π ∈ italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i ∈ italic_T end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT . (54)

Therefore, we can derive the value of JT(𝒙^S)subscript𝐽𝑇subscript^𝒙𝑆J_{T}(\hat{\bm{x}}_{S})italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) as follows.

JT(𝒙^S)subscript𝐽𝑇subscript^𝒙𝑆\displaystyle J_{T}(\hat{\bm{x}}_{S})italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) =𝝅QT1i=1nπi!π1++πnvx1π1xnπn|𝒙=𝒙iT((𝒙^S)ibi)πi/wTabsentevaluated-atsubscript𝝅subscript𝑄𝑇1superscriptsubscriptproduct𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝜋1subscript𝜋𝑛𝑣superscriptsubscript𝑥1subscript𝜋1superscriptsubscript𝑥𝑛subscript𝜋𝑛𝒙subscript𝒙subscriptproduct𝑖𝑇superscriptsubscriptsubscript^𝒙𝑆𝑖subscript𝑏𝑖subscript𝜋𝑖subscript𝑤𝑇\displaystyle=\sum_{\bm{\pi}\in Q_{T}}\frac{1}{\prod_{i=1}^{n}\pi_{i}!}\frac{% \partial^{\pi_{1}+\cdots+\pi_{n}}v}{\partial x_{1}^{\pi_{1}}\cdots\partial x_{% n}^{\pi_{n}}}\Big{|}_{\bm{x}=\bm{x}_{\emptyset}}\prod_{i\in T}\left({(\hat{\bm% {x}}_{S})_{i}-b_{i}}\right)^{\pi_{i}}/w_{T}= ∑ start_POSTSUBSCRIPT bold_italic_π ∈ italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i ∈ italic_T end_POSTSUBSCRIPT ( ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT / italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT (55)
=𝝅QT1i=1nπi!π1++πnvx1π1xnπn|𝒙=𝒙iT((𝒙^S)ibi)πi𝝅QT1i=1nπi!π1++πnvx1π1xnπn|𝒙=𝒙iT(x^ibi)πi // by Eq. (54)absentevaluated-atsubscript𝝅subscript𝑄𝑇1superscriptsubscriptproduct𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝜋1subscript𝜋𝑛𝑣superscriptsubscript𝑥1subscript𝜋1superscriptsubscript𝑥𝑛subscript𝜋𝑛𝒙subscript𝒙subscriptproduct𝑖𝑇superscriptsubscriptsubscript^𝒙𝑆𝑖subscript𝑏𝑖subscript𝜋𝑖evaluated-atsubscript𝝅subscript𝑄𝑇1superscriptsubscriptproduct𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝜋1subscript𝜋𝑛𝑣superscriptsubscript𝑥1subscript𝜋1superscriptsubscript𝑥𝑛subscript𝜋𝑛𝒙subscript𝒙subscriptproduct𝑖𝑇superscriptsubscript^𝑥𝑖subscript𝑏𝑖subscript𝜋𝑖 // by Eq. (54)\displaystyle=\frac{\sum_{\bm{\pi}\in Q_{T}}\frac{1}{\prod_{i=1}^{n}\pi_{i}!}% \frac{\partial^{\pi_{1}+\cdots+\pi_{n}}v}{\partial x_{1}^{\pi_{1}}\cdots% \partial x_{n}^{\pi_{n}}}\Big{|}_{\bm{x}=\bm{x}_{\emptyset}}\prod_{i\in T}% \left({(\hat{\bm{x}}_{S})_{i}-b_{i}}\right)^{\pi_{i}}}{\sum_{\bm{\pi}\in Q_{T}% }\frac{1}{\prod_{i=1}^{n}\pi_{i}!}\frac{\partial^{\pi_{1}+\cdots+\pi_{n}}v}{% \partial x_{1}^{\pi_{1}}\cdots\partial x_{n}^{\pi_{n}}}\Big{|}_{\bm{x}=\bm{x}_% {\emptyset}}\prod_{i\in T}\left({\hat{x}_{i}-b_{i}}\right)^{\pi_{i}}}\text{% \quad// by Eq.~{}(\ref{eq:apdx-get-wT})}= divide start_ARG ∑ start_POSTSUBSCRIPT bold_italic_π ∈ italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i ∈ italic_T end_POSTSUBSCRIPT ( ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT bold_italic_π ∈ italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i ∈ italic_T end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG // by Eq. ( ) (56)
=𝝅QT1i=1nπi!π1++πnvx1π1xnπn|𝒙=𝒙iT(x^ibi)πi𝝅QT1i=1nπi!π1++πnvx1π1xnπn|𝒙=𝒙iT(x^ibi)πiabsentevaluated-atsubscript𝝅subscript𝑄𝑇1superscriptsubscriptproduct𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝜋1subscript𝜋𝑛𝑣superscriptsubscript𝑥1subscript𝜋1superscriptsubscript𝑥𝑛subscript𝜋𝑛𝒙subscript𝒙subscriptproduct𝑖𝑇superscriptsubscript^𝑥𝑖subscript𝑏𝑖subscript𝜋𝑖evaluated-atsubscript𝝅subscript𝑄𝑇1superscriptsubscriptproduct𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝜋1subscript𝜋𝑛𝑣superscriptsubscript𝑥1subscript𝜋1superscriptsubscript𝑥𝑛subscript𝜋𝑛𝒙subscript𝒙subscriptproduct𝑖𝑇superscriptsubscript^𝑥𝑖subscript𝑏𝑖subscript𝜋𝑖\displaystyle=\frac{\sum_{\bm{\pi}\in Q_{T}}\frac{1}{\prod_{i=1}^{n}\pi_{i}!}% \frac{\partial^{\pi_{1}+\cdots+\pi_{n}}v}{\partial x_{1}^{\pi_{1}}\cdots% \partial x_{n}^{\pi_{n}}}\Big{|}_{\bm{x}=\bm{x}_{\emptyset}}\prod_{i\in T}% \left({\hat{x}_{i}-b_{i}}\right)^{\pi_{i}}}{\sum_{\bm{\pi}\in Q_{T}}\frac{1}{% \prod_{i=1}^{n}\pi_{i}!}\frac{\partial^{\pi_{1}+\cdots+\pi_{n}}v}{\partial x_{% 1}^{\pi_{1}}\cdots\partial x_{n}^{\pi_{n}}}\Big{|}_{\bm{x}=\bm{x}_{\emptyset}}% \prod_{i\in T}\left({\hat{x}_{i}-b_{i}}\right)^{\pi_{i}}}= divide start_ARG ∑ start_POSTSUBSCRIPT bold_italic_π ∈ italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i ∈ italic_T end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT bold_italic_π ∈ italic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ! end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_x = bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i ∈ italic_T end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG (57)
 // because we have proven iT,(𝒙^S)i=x^iformulae-sequencefor-all𝑖𝑇subscriptsubscript^𝒙𝑆𝑖subscript^𝑥𝑖\forall i\in T,\ (\hat{\bm{x}}_{S})_{i}=\hat{x}_{i}∀ italic_i ∈ italic_T , ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (58)
=1absent1\displaystyle=1= 1 (59)

Combining the two cases, we can conclude that JT(𝒙^S)=𝟙(TS)subscript𝐽𝑇subscript^𝒙𝑆1𝑇𝑆J_{T}(\hat{\bm{x}}_{S})=\mathbbm{1}(T\subseteq S)italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = blackboard_1 ( italic_T ⊆ italic_S ).

In this way, no matter how we change the DNN v()𝑣v(\cdot)italic_v ( ⋅ ) or the input sample 𝒙𝒙\bm{x}bold_italic_x, the matrix 𝑱=[𝑱(𝒙S1),𝑱(𝒙S2),,𝑱(𝒙S2n)]{0,1}2n×2n𝑱superscript𝑱subscript𝒙subscript𝑆1𝑱subscript𝒙subscript𝑆2𝑱subscript𝒙subscript𝑆superscript2𝑛topsuperscript01superscript2𝑛superscript2𝑛\bm{J}=[\bm{J}(\bm{x}_{S_{1}}),\bm{J}(\bm{x}_{S_{2}}),\cdots,\bm{J}(\bm{x}_{S_% {2^{n}}})]^{\top}\in\{0,1\}^{2^{n}\times 2^{n}}bold_italic_J = [ bold_italic_J ( bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , bold_italic_J ( bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , ⋯ , bold_italic_J ( bold_italic_x start_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT × 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT in Eq. (10) is a always a fixed binary matrix.

F.6 Proof of Theorem 4

Proof.

We prove that for any two subsets T,TN𝑇superscript𝑇𝑁T,T^{\prime}\subseteq Nitalic_T , italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ italic_N of the same order, the vector 𝒎^Tsubscript^𝒎𝑇\hat{\bm{m}}_{T}over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is a permutation of the vector 𝒎^Tsubscript^𝒎superscript𝑇\hat{\bm{m}}_{T^{\prime}}over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT.

The proof consists of two steps. First, we show that there exists a symmetric matrix transformation 𝒯()=PkPk1P1()P1P2Pk1Pk𝒯subscript𝑃𝑘subscript𝑃𝑘1subscript𝑃1subscript𝑃1subscript𝑃2subscript𝑃𝑘1subscript𝑃𝑘\mathcal{T}(\cdot)=P_{k}P_{k-1}\cdots P_{1}(\cdot)P_{1}P_{2}\cdots P_{k-1}P_{k}caligraphic_T ( ⋅ ) = italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ⋯ italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ⋅ ) italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋯ italic_P start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, where Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a permutation matrix, that maps both 𝑱𝑱superscript𝑱top𝑱\bm{J}^{\top}\bm{J}bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J and 𝑱𝑱+2ndiag(𝒄)superscript𝑱top𝑱superscript2𝑛diag𝒄\bm{J}^{\top}\bm{J}+2^{n}\mathrm{diag}(\bm{c})bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J + 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_diag ( bold_italic_c ) to themselves, i.e., 𝒯(𝑱𝑱)=𝑱𝑱𝒯superscript𝑱top𝑱superscript𝑱top𝑱\mathcal{T}(\bm{J}^{\top}\bm{J})=\bm{J}^{\top}\bm{J}caligraphic_T ( bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J ) = bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J, 𝒯(𝑱𝑱+2ndiag(𝒄))=𝑱𝑱+2ndiag(𝒄)𝒯superscript𝑱top𝑱superscript2𝑛diag𝒄superscript𝑱top𝑱superscript2𝑛diag𝒄\mathcal{T}(\bm{J}^{\top}\bm{J}+2^{n}\mathrm{diag}(\bm{c}))=\bm{J}^{\top}\bm{J% }+2^{n}\mathrm{diag}(\bm{c})caligraphic_T ( bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J + 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_diag ( bold_italic_c ) ) = bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J + 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_diag ( bold_italic_c ). We will show that this transformation 𝒯()𝒯\mathcal{T}(\cdot)caligraphic_T ( ⋅ ) applies permutation to the rows and columns of the same order.

Second, we show that this transformation also maps 𝑴^^𝑴\hat{\bm{M}}over^ start_ARG bold_italic_M end_ARG to itself, i.e., 𝒯(𝑴^)=𝑴^𝒯^𝑴^𝑴\mathcal{T}(\hat{\bm{M}})=\hat{\bm{M}}caligraphic_T ( over^ start_ARG bold_italic_M end_ARG ) = over^ start_ARG bold_italic_M end_ARG, implying that row vectors of the same order in 𝑴^^𝑴\hat{\bm{M}}over^ start_ARG bold_italic_M end_ARG are permutations of each other.

From Theorem 3, we have:

(𝑱𝑱+2ndiag(𝒄))𝑴^=𝑱𝑱superscript𝑱top𝑱superscript2𝑛diag𝒄^𝑴superscript𝑱top𝑱(\bm{J}^{\top}\bm{J}+2^{n}\mathrm{diag}(\bm{c}))\hat{\bm{M}}=\bm{J}^{\top}\bm{J}( bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J + 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_diag ( bold_italic_c ) ) over^ start_ARG bold_italic_M end_ARG = bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J (60)

To simplify the notation, we denote 𝑩:=𝑱𝑱assign𝑩superscript𝑱top𝑱\bm{B}:=\bm{J}^{\top}\bm{J}bold_italic_B := bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J and 𝑫:=2ndiag(𝒄)assign𝑫superscript2𝑛diag𝒄\bm{D}:=2^{n}\mathrm{diag}(\bm{c})bold_italic_D := 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_diag ( bold_italic_c ). Then, we have:

(𝑩+𝑫)𝑴^=𝑩𝑩𝑫^𝑴𝑩(\bm{B}+\bm{D})\hat{\bm{M}}=\bm{B}( bold_italic_B + bold_italic_D ) over^ start_ARG bold_italic_M end_ARG = bold_italic_B (61)

Step 1: We construct a transformation 𝒯()𝒯\mathcal{T}(\cdot)caligraphic_T ( ⋅ ) which permutes the rows and columns of a 2n×2nsuperscript2𝑛superscript2𝑛2^{n}\times 2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT × 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT matrix based on element selection. Let us first consider the matrix 𝑩𝑩\bm{B}bold_italic_B. For the matrix 𝑫𝑫\bm{D}bold_italic_D, the analysis is similar because its diagonal elements 2|T|+nσ2superscript2𝑇𝑛superscript𝜎22^{|T|+n}\sigma^{2}2 start_POSTSUPERSCRIPT | italic_T | + italic_n end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT are the same for each order. Thus, if 𝒯()𝒯\mathcal{T}(\cdot)caligraphic_T ( ⋅ ) maps 𝑩𝑩\bm{B}bold_italic_B to itself, it also maps 𝑫𝑫\bm{D}bold_italic_D to itself.

Given the set N={1,2,,n}𝑁12𝑛N=\{1,2,\cdots,n\}italic_N = { 1 , 2 , ⋯ , italic_n }, the subsets S1,S2,,S2nsubscript𝑆1subscript𝑆2subscript𝑆superscript2𝑛S_{1},S_{2},\cdots,S_{2^{n}}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_S start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT can be regarded as selections from the power set of N𝑁Nitalic_N, denoted as 2Nsuperscript2𝑁2^{N}2 start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT. Consider a permutation 𝒫𝒫\mathcal{P}caligraphic_P acting on N𝑁Nitalic_N. Under this permutation, the selections S1,S2,,S2nsubscript𝑆1subscript𝑆2subscript𝑆superscript2𝑛S_{1},S_{2},\cdots,S_{2^{n}}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_S start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT transform correspondingly. For example, if N={1,2,3}𝑁123N=\{1,2,3\}italic_N = { 1 , 2 , 3 } is mapped to N={3,2,1}𝑁321N=\{3,2,1\}italic_N = { 3 , 2 , 1 } under the permutation 𝒫𝒫\mathcal{P}caligraphic_P, the list of subsets [S1,S2,,S2n]=[,{1},{2},{3},{1,2},{1,3},{2,3},{1,2,3}]subscript𝑆1subscript𝑆2subscript𝑆superscript2𝑛123121323123[S_{1},S_{2},\cdots,S_{2^{n}}]=[\emptyset,\{1\},\{2\},\{3\},\{1,2\},\{1,3\},\{% 2,3\},\{1,2,3\}][ italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_S start_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] = [ ∅ , { 1 } , { 2 } , { 3 } , { 1 , 2 } , { 1 , 3 } , { 2 , 3 } , { 1 , 2 , 3 } ] is mapped to [,{3},{2},{1},{3,2},{3,1},{2,1},{3,2,1}]321323121321[\emptyset,\{3\},\{2\},\{1\},\{3,2\},\{3,1\},\{2,1\},\{3,2,1\}][ ∅ , { 3 } , { 2 } , { 1 } , { 3 , 2 } , { 3 , 1 } , { 2 , 1 } , { 3 , 2 , 1 } ].

This permutation induces a transformation 𝒯()=PkPk1P1()P1P2Pk1Pk𝒯subscript𝑃𝑘subscript𝑃𝑘1subscript𝑃1subscript𝑃1subscript𝑃2subscript𝑃𝑘1subscript𝑃𝑘\mathcal{T}(\cdot)=P_{k}P_{k-1}\cdots P_{1}(\cdot)P_{1}P_{2}\cdots P_{k-1}P_{k}caligraphic_T ( ⋅ ) = italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ⋯ italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ⋅ ) italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋯ italic_P start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT on the matrix 𝑩=𝑱𝑱𝑩superscript𝑱top𝑱\bm{B}=\bm{J}^{\top}\bm{J}bold_italic_B = bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J by permuting its rows and columns.

Since the permutation acts on N𝑁Nitalic_N and preserves the inclusion relation, the transformation 𝒯()𝒯\mathcal{T}(\cdot)caligraphic_T ( ⋅ ) is invariant, meaning 𝒯(𝑩)=𝑩𝒯𝑩𝑩\mathcal{T}(\bm{B})=\bm{B}caligraphic_T ( bold_italic_B ) = bold_italic_B. Similarly, we have 𝒯(𝑩+𝑫)=𝑩+𝑫𝒯𝑩𝑫𝑩𝑫\mathcal{T}(\bm{B}+\bm{D})=\bm{B}+\bm{D}caligraphic_T ( bold_italic_B + bold_italic_D ) = bold_italic_B + bold_italic_D.

Step 2: We apply 𝒯()𝒯\mathcal{T}(\cdot)caligraphic_T ( ⋅ ) to the matrices 𝑩+𝑫𝑩𝑫\bm{B}+\bm{D}bold_italic_B + bold_italic_D and 𝑩𝑩\bm{B}bold_italic_B in Eq. (61). Since the transformation is invariant, we have:

𝒯(𝑩+𝑫)𝑴^=𝒯(𝑩)𝒯𝑩𝑫^𝑴𝒯𝑩\mathcal{T}(\bm{B}+\bm{D})\hat{\bm{M}}=\mathcal{T}(\bm{B})caligraphic_T ( bold_italic_B + bold_italic_D ) over^ start_ARG bold_italic_M end_ARG = caligraphic_T ( bold_italic_B ) (62)

Thus:

PkPk1P1(𝑩+𝑫)P1P2Pk1Pk𝑴^=PkPk1P1(𝑩)P1P2Pk1Pksubscript𝑃𝑘subscript𝑃𝑘1subscript𝑃1𝑩𝑫subscript𝑃1subscript𝑃2subscript𝑃𝑘1subscript𝑃𝑘^𝑴subscript𝑃𝑘subscript𝑃𝑘1subscript𝑃1𝑩subscript𝑃1subscript𝑃2subscript𝑃𝑘1subscript𝑃𝑘P_{k}P_{k-1}\cdots P_{1}(\bm{B}+\bm{D})P_{1}P_{2}\cdots P_{k-1}P_{k}\hat{\bm{M% }}=P_{k}P_{k-1}\cdots P_{1}(\bm{B})P_{1}P_{2}\cdots P_{k-1}P_{k}italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ⋯ italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_B + bold_italic_D ) italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋯ italic_P start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over^ start_ARG bold_italic_M end_ARG = italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ⋯ italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_B ) italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋯ italic_P start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (63)

We can easily see that if 𝑴^^𝑴\hat{\bm{M}}over^ start_ARG bold_italic_M end_ARG is a solution to this equation, then 𝒯(𝑴^)=PkPk1P1𝑴^P1P2Pk1Pk𝒯^𝑴subscript𝑃𝑘subscript𝑃𝑘1subscript𝑃1^𝑴subscript𝑃1subscript𝑃2subscript𝑃𝑘1subscript𝑃𝑘\mathcal{T}(\hat{\bm{M}})=P_{k}P_{k-1}\cdots P_{1}\hat{\bm{M}}P_{1}P_{2}\cdots P% _{k-1}P_{k}caligraphic_T ( over^ start_ARG bold_italic_M end_ARG ) = italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ⋯ italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over^ start_ARG bold_italic_M end_ARG italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋯ italic_P start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is also a solution, since Pi2=I,i=1,,kformulae-sequencesuperscriptsubscript𝑃𝑖2𝐼𝑖1𝑘P_{i}^{2}=I,i=1,\cdots,kitalic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_I , italic_i = 1 , ⋯ , italic_k, where I𝐼Iitalic_I is the identity matrix. In addition, because 𝑩+𝑫𝑩𝑫\bm{B}+\bm{D}bold_italic_B + bold_italic_D is invertible (as shown in Appendix F.4), this solution is unique. Therefore:

𝒯(𝑴^)=𝑴^𝒯^𝑴^𝑴\mathcal{T}(\hat{\bm{M}})=\hat{\bm{M}}caligraphic_T ( over^ start_ARG bold_italic_M end_ARG ) = over^ start_ARG bold_italic_M end_ARG (64)

This shows that the transformation 𝒯()𝒯\mathcal{T}(\cdot)caligraphic_T ( ⋅ ) also maps 𝑴^^𝑴\hat{\bm{M}}over^ start_ARG bold_italic_M end_ARG to itself.

Conclusion: We have shown that, under the transformation 𝒯()𝒯\mathcal{T}(\cdot)caligraphic_T ( ⋅ ), the affected rows of 𝑴^^𝑴\hat{\bm{M}}over^ start_ARG bold_italic_M end_ARG are permutations of each other. Note that only the rows with the same order will be permuted to each other because 𝒯()𝒯\mathcal{T}(\cdot)caligraphic_T ( ⋅ ) is derived from the permutation of the power set of N𝑁Nitalic_N, so the order of the rows is preserved.

For any two subsets T,TN𝑇superscript𝑇𝑁T,T^{\prime}\subseteq Nitalic_T , italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ italic_N of the same order, we can construct a permutation of indices from T𝑇Titalic_T to Tsuperscript𝑇T^{\prime}italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT that maps 𝒎^Tsubscript^𝒎𝑇\hat{\bm{m}}_{T}over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT to 𝒎^Tsubscript^𝒎superscript𝑇\hat{\bm{m}}_{T^{\prime}}over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. Therefore, 𝒎^Tsubscript^𝒎𝑇\hat{\bm{m}}_{T}over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is a permutation of 𝒎^Tsubscript^𝒎superscript𝑇\hat{\bm{m}}_{T^{\prime}}over^ start_ARG bold_italic_m end_ARG start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. ∎

F.7 Proof of Theorem 5

Proof.

From Eq. (10), when there is no noise (i.e., σ=0𝜎0\sigma=0italic_σ = 0), it is obvious that 𝒘^=(𝑱𝑱)1𝑱𝑱𝒘=𝒘^𝒘superscriptsuperscript𝑱top𝑱1superscript𝑱top𝑱superscript𝒘superscript𝒘\hat{\bm{w}}=(\bm{J}^{\top}\bm{J})^{-1}\bm{J}^{\top}\bm{J}\bm{w^{*}}=\bm{w^{*}}over^ start_ARG bold_italic_w end_ARG = ( bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_J start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_J bold_italic_w start_POSTSUPERSCRIPT bold_∗ end_POSTSUPERSCRIPT = bold_italic_w start_POSTSUPERSCRIPT bold_∗ end_POSTSUPERSCRIPT, which means that the optimal weights 𝒘^^𝒘\hat{\bm{w}}over^ start_ARG bold_italic_w end_ARG are the same as the true weights 𝒘superscript𝒘\bm{w}^{*}bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. So TN,w^T=wTformulae-sequencefor-all𝑇𝑁subscript^𝑤𝑇subscriptsuperscript𝑤𝑇\forall\ \emptyset\neq T\subseteq N,\ \hat{w}_{T}=w^{*}_{T}∀ ∅ ≠ italic_T ⊆ italic_N , over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. ∎

Appendix G Experimental details

G.1 Models and datasets

We trained various DNNs on different datasets. Specifically, for image data, we trained VGG-11 on the MNIST dataset (Creative Commons Attribution-Share Alike 3.0 license), VGG-11/VGG-16 on the CIFAR-10 dataset (MIT license), AlexNet/VGG-16 on the CUB-200-2011 dataset (license unknown), and VGG-16 on the Tiny ImageNet dataset (license unknown). For natural language data, we trained BERT-Tiny and BERT-Medium on the SST-2 dataset (license unknown). For point cloud data, we trained DGCNN on the ShapeNet dataset (Custom (non-commerical) license).

For the CUB-200-2011 dataset, we cropped the images to remove the background regions, using the bounding box provided by the dataset. These cropped images were resized to 224×\times×224 and fed into the DNN. For the Tiny ImageNet dataset, due to the computational cost, we selected 50 classes from the total 200 classes at equal intervals (i.e., the 4th, 8th,…, 196th, 200th classes). All these images were resized to 224×\times×224. For the MNIST dataset, all images were resized to 32×\times×32 for classification. To better demonstrate that the learning of higher-order interactions in the second phase was closely related to overfitting, we added a small ratio of label noise to the MNIST dataset, the CIFAR-10 dataset, and the CUB-200-2011 dataset to boost the significance of over-fitting of the DNNs. Specifically, we randomly selected 1% training samples in the MNIST dataset and the CIFAR-10 dataset, and randomly reset their labels. We randomly selected 5% training samples in the CUB-200-2011 dataset and randomly reset their labels.

G.2 Training settings

We trained all DNNs using the SGD optimizer with a learning rate of 0.01 and a momentum of 0.9. No learning rate decay was used. We trained VGG models, AlexNet models, and BERT models for 256 epochs, and trained the DGCNN model for 512 epochs. The batchsize was set to 128 for all DNNs on all datasets.

G.3 Details on computing interactions

First, we provide a summary of the mathematical settings of the hyper-parameters for interactions in Table 1, including the scalar output function of the DNN v()𝑣v(\cdot)italic_v ( ⋅ ), the baseline value 𝒃𝒃\boldsymbol{b}bold_italic_b for masking, and the threshold τ𝜏\tauitalic_τ. These settings are uniformly applied to all DNNs. More detailed settings for different datasets can be found below.

Output function v()𝑣v(\cdot)italic_v ( ⋅ ) v(𝒙)=logp(ytruth|𝒙)1p(ytruth|𝒙)𝑣𝒙𝑝conditionalsuperscript𝑦truth𝒙1𝑝conditionalsuperscript𝑦truth𝒙v(\bm{x})=\log\frac{p(y^{\text{truth}}|\bm{x})}{1-p(y^{\text{truth}}|\bm{x})}italic_v ( bold_italic_x ) = roman_log divide start_ARG italic_p ( italic_y start_POSTSUPERSCRIPT truth end_POSTSUPERSCRIPT | bold_italic_x ) end_ARG start_ARG 1 - italic_p ( italic_y start_POSTSUPERSCRIPT truth end_POSTSUPERSCRIPT | bold_italic_x ) end_ARG
Threshold τ𝜏\tauitalic_τ τ=0.03𝔼𝒙[|v(𝒙)v(𝒙)|]𝜏0.03subscript𝔼𝒙delimited-[]𝑣𝒙𝑣subscript𝒙\tau\!=\!0.03\ \mathbb{E}_{\bm{x}}[|v(\bm{x})-v(\bm{x}_{\emptyset})|]italic_τ = 0.03 blackboard_E start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT [ | italic_v ( bold_italic_x ) - italic_v ( bold_italic_x start_POSTSUBSCRIPT ∅ end_POSTSUBSCRIPT ) | ]
Baseline value 𝒃𝒃\bm{b}bold_italic_b Image data: using the zero baseline on the feature map after ReLU
Text data: using the [MASK] token
Point cloud data: using the cluster center of each point cluster
Table 1: Mathematical setting of hyper-parameters for interactions.

Image data. For image data, we considered image patches as input variables to the DNN. To generate a masked sample 𝒙Ssubscript𝒙𝑆\bm{x}_{S}bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT, we followed [41] to mask the patch on the intermediate-layer feature map corresponding to each image patch in the set NS𝑁𝑆N\setminus Sitalic_N ∖ italic_S. Specifically, we considered the feature map after the second ReLU layer for VGG-11/VGG-16 and the feature map after the first ReLU layer for AlexNet. For the VGG models and the AlexNet model, we uniformly partitioned the feature map into 8×\times×8 patches, randomly selected 10 patches from the central 6×\times×6 region (i.e., we did not select patches that were on the edges), and considered each of the 10 patches as an input variable in the set N𝑁Nitalic_N to calculate interactions. We considered each of the 10 patches as an input variable in the set N𝑁Nitalic_N to calculate interactions. We used a zero baseline value to mask the input variables in the set NS𝑁𝑆N\setminus Sitalic_N ∖ italic_S to obtain the masked sample 𝒙Ssubscript𝒙𝑆\bm{x}_{S}bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT.

Natural language data. We considered the input tokens as input variables for each input sentence. Specifically, we randomly selected 10 words that are meaningful (i.e., not including stopwords, special characters, and punctuations) as input variables in the set N𝑁Nitalic_N to calculate interactions. We used the “mask” token with the token id 103 to mask the tokens in the set NS𝑁𝑆N\setminus Sitalic_N ∖ italic_S to obtain the masked sample 𝒙Ssubscript𝒙𝑆\bm{x}_{S}bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT.

Point cloud data. We clustered all the points into 30 clusters using K-means clustering, and randomly selected 10 clusters as the input variables in the set N𝑁Nitalic_N to calculate interactions. We used the average coordinate of the points in each cluster to mask the corresponding cluster in NS𝑁𝑆N\setminus Sitalic_N ∖ italic_S and obtained the masked sample 𝒙Ssubscript𝒙𝑆\bm{x}_{S}bold_italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT.

For all DNNs and datasets, we randomly selected 50 samples from the testing set to compute interactions, and averaged the interaction strength of the k𝑘kitalic_k-th order on each sample to obtain Ireal(k)subscriptsuperscript𝐼𝑘realI^{(k)}_{\text{real}}italic_I start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT real end_POSTSUBSCRIPT.

G.4 Compute resources

All DNNs can be trained within 12 hours on a single NVIDIA GeForce RTX 3090 GPU (with 24G GPU memory). Computing all interactions on a single input sample usually takes 35-40 seconds, which is acceptable in real applications.

Appendix H Potential limitations of the theoretical proof

In this study, we have assumed that during the training process, the noise on the parameters gradually decreased (σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT gradually became smaller). Although experiments in Figure 4 and Figure 8 have verified that the theoretical distribution of interaction strength can well match the real distribution by using a set of decreasing σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT values, it is not exactly clear how the value of σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is related to the training process. The value of σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT probably does not decrease linearly along with the training epochs/iterations, which needs more precise formulations.

Appendix I More discussions about the two-phase dynamics

I.1 Does the model re-learn the initial interactions during the second phase?

Our theory does not claim that in the second phase, a DNN will not re-encode an interaction that is removed in the first phase. Instead, Theorem 4 and Proposition 1 collectively indicate the possibility of a DNN gradually re-encoding a few higher-order interactions in the second phase along with the decrease of the parameter noise.

The key point to this question is that the massive interactions in a fully initialized DNN are all chaotic and meaningless patterns caused by randomly initialized network parameters. Therefore, the crux of the matter is not whether the DNN re-learns the initially removed interactions, but the fact that the DNN mainly removes chaotic and meaningless initial interactions in the first phase, and learns potential target interactions in the second phase. In this way, although a few interactions may be re-encoded later in the second phase, we do not consider this as a problem with the training of a DNN.

I.2 About extending the theoretical analysis to specific network architectures

Our current analysis is agnostic to the network architecture, and aims to explain the common two-phase dynamics of interactions that is shared by different network architectures for various tasks. Fig. 2 and Fig. 5 demonstrate this shared two-phase dynamics.

On the other hand, although DNNs with different architectures all exhibit the two-phase dynamics of interactions, the length of the two phases and the finally converged state of the DNN are influenced by the network architecture and can slightly vary among different architectures. Eq. (10) shows that our current formulation is to use the finally converged state of a DNN to accurately predict the DNN’s learning dynamics of interactions. Therefore, the learning dynamics predicted by our theory also exhibits slight differences among different DNN architectures and datasets accordingly, but it still matches well with the empirical dynamics of interactions. To this end, studying how the network architecture affects the finally converged state of a DNN may be a good future direction.

Appendix J More experimental results

J.1 More results for the two-phase phenomenon

In this subsection, we show the two-phase dynamics of learning interactions on more DNNs and datasets. See Figure 5 and Figure 6 for details.

Refer to caption
Figure 5: The distribution of interaction strength Ireal(k)superscriptsubscript𝐼real𝑘I_{\text{real}}^{(k)}italic_I start_POSTSUBSCRIPT real end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT over different orders k𝑘kitalic_k. Each row shows the change of the distribution during the training process. Experiments showed that the two-phase phenomenon widely existed on different DNNs trained on various datasets.
Refer to caption
Figure 6: Demonstration of the two-phase dynamics of interactions on more textual datasets.

J.2 More details for the alignment between the two phases and the loss gap

Besides the loss gap, in Figure 7, we also show the training loss and the testing loss separately. In fact, instead of considering underfitting (or learning useful features) and overfitting (or learning overfitted features) as two separate processes, the DNN simultaneously learns both useful features and overfitted features during training. The learning of useful features decreases the training loss and the testing loss, which alleviates underfitting. Meanwhile, the learning of overfitted features gradually increases the loss gap.

Refer to caption
Figure 7: Demonstration of the training loss and the testing loss (the last column) in addition to the two-phase dynamics of interactions (1st column to 6th column) and the loss gap (7th column).

J.3 More results for the experimental verification of our theory

In this subsection, we show results of using the theoretical distribution of interaction strength Itheo(k)superscriptsubscript𝐼theo𝑘I_{\text{theo}}^{(k)}italic_I start_POSTSUBSCRIPT theo end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT to match the real distribution of interaction strength Ireal(k)superscriptsubscript𝐼real𝑘I_{\text{real}}^{(k)}italic_I start_POSTSUBSCRIPT real end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT on more DNNs and datasets, as shown in Figure 8.

Refer to caption
Figure 8: Comparison between the theoretical distribution of interaction strength Itheo(k)superscriptsubscript𝐼theo𝑘I_{\text{theo}}^{(k)}italic_I start_POSTSUBSCRIPT theo end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT and the real distribution of interaction strength Ireal(k)superscriptsubscript𝐼real𝑘I_{\text{real}}^{(k)}italic_I start_POSTSUBSCRIPT real end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT in the second phase on more DNNs and datasets.

J.4 Using the theoretical distribution Itheo(k)superscriptsubscript𝐼theo𝑘I_{\text{\rm theo}}^{(k)}italic_I start_POSTSUBSCRIPT theo end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT to predict the real distribution of AND interactions

In this subsection, we show results of using the theoretical distribution of interaction strength Itheo(k)superscriptsubscript𝐼theo𝑘I_{\text{theo}}^{(k)}italic_I start_POSTSUBSCRIPT theo end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT to match the real distribution of AND interactions (rather than the AND-OR interactions), as shown in Figure 9.

Refer to caption
Figure 9: Comparison between the theoretical distribution of interaction strength Itheo(k)superscriptsubscript𝐼theo𝑘I_{\text{theo}}^{(k)}italic_I start_POSTSUBSCRIPT theo end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT and the real distribution of interaction strength of AND interactions.

NeurIPS Paper Checklist

  1. 1.

    Claims

  2. Question: Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope?

  3. Answer: [Yes]

  4. Justification: The main claims made in the abstract and introduction accurately reflect our paper’s contributions and scope.

  5. Guidelines:

    • The answer NA means that the abstract and introduction do not include the claims made in the paper.

    • The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers.

    • The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings.

    • It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper.

  6. 2.

    Limitations

  7. Question: Does the paper discuss the limitations of the work performed by the authors?

  8. Answer: [Yes]

  9. Justification: Although we have no room for a separate Limitations section in the main paper, we provide discussion of potential limitations in Appendix G.

  10. Guidelines:

    • The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper.

    • The authors are encouraged to create a separate "Limitations" section in their paper.

    • The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be.

    • The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated.

    • The authors should reflect on the factors that influence the performance of the approach. For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon.

    • The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size.

    • If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness.

    • While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren’t acknowledged in the paper. The authors should use their best judgment and recognize that individual actions in favor of transparency play an important role in developing norms that preserve the integrity of the community. Reviewers will be specifically instructed to not penalize honesty concerning limitations.

  11. 3.

    Theory Assumptions and Proofs

  12. Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof?

  13. Answer: [Yes]

  14. Justification: We provide the assumptions in the main paper, and the proof for all theorems in Appendix E.

  15. Guidelines:

    • The answer NA means that the paper does not include theoretical results.

    • All the theorems, formulas, and proofs in the paper should be numbered and cross-referenced.

    • All assumptions should be clearly stated or referenced in the statement of any theorems.

    • The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition.

    • Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material.

    • Theorems and Lemmas that the proof relies upon should be properly referenced.

  16. 4.

    Experimental Result Reproducibility

  17. Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)?

  18. Answer: [Yes]

  19. Justification: The contribution of this paper is mainly theoretical. Nevertheless, we provide the detailed experimental settings in Appendix F to reproduce the experiment results. The code will be released when the paper is accepted.

  20. Guidelines:

    • The answer NA means that the paper does not include experiments.

    • If the paper includes experiments, a No answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not.

    • If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable.

    • Depending on the contribution, reproducibility can be accomplished in various ways. For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model. In general. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model (e.g., in the case of a large language model), releasing of a model checkpoint, or other means that are appropriate to the research performed.

    • While NeurIPS does not require releasing code, the conference does require all submissions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution. For example

      1. (a)

        If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm.

      2. (b)

        If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully.

      3. (c)

        If the contribution is a new model (e.g., a large language model), then there should either be a way to access this model for reproducing the results or a way to reproduce the model (e.g., with an open-source dataset or instructions for how to construct the dataset).

      4. (d)

        We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility. In the case of closed-source models, it may be that access to the model is limited in some way (e.g., to registered users), but it should be possible for other researchers to have some path to reproducing or verifying the results.

  21. 5.

    Open access to data and code

  22. Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material?

  23. Answer: [No]

  24. Justification: The code will be released when the paper is accepted. All datasets used in this paper are publicly available. Nevertheless, to enhance reproducibility, we provide the detailed experimental settings in Appendix F.

  25. Guidelines:

    • The answer NA means that paper does not include experiments requiring code.

    • Please see the NeurIPS code and data submission guidelines (https://nips.cc/public/guides/CodeSubmissionPolicy) for more details.

    • While we encourage the release of code and data, we understand that this might not be possible, so “No” is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark).

    • The instructions should contain the exact command and environment needed to run to reproduce the results. See the NeurIPS code and data submission guidelines (https://nips.cc/public/guides/CodeSubmissionPolicy) for more details.

    • The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc.

    • The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why.

    • At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable).

    • Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted.

  26. 6.

    Experimental Setting/Details

  27. Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results?

  28. Answer: [Yes]

  29. Justification: Details on dataset preprocessing can be found in Appendix F.1. Details on training settings can be found in Appendix F.2. Details on how to compute interactions can be found in Appendix F.3.

  30. Guidelines:

    • The answer NA means that the paper does not include experiments.

    • The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them.

    • The full details can be provided either with the code, in appendix, or as supplemental material.

  31. 7.

    Experiment Statistical Significance

  32. Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments?

  33. Answer: [No]

  34. Justification: The main contribution of this study is to provide theoretical proof for the two-phase dynamics phenomenon discovered in previous studies. The experiments in this study are mainly to reproduce the two-phase dynamics phenomenon for better illustration and to verify that our theory can predict the trend of the interaction dynamics on real DNNs. This study does not propose new methods to boost performance or discover a new phenomenon, so we refrain from reporting error bars for clarity.

  35. Guidelines:

    • The answer NA means that the paper does not include experiments.

    • The authors should answer "Yes" if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper.

    • The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions).

    • The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.)

    • The assumptions made should be given (e.g., Normally distributed errors).

    • It should be clear whether the error bar is the standard deviation or the standard error of the mean.

    • It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified.

    • For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g. negative error rates).

    • If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text.

  36. 8.

    Experiments Compute Resources

  37. Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments?

  38. Answer: [Yes]

  39. Justification: We provide the compute resources needed in Appendix F.4, including the type of GPU and the approximate amount of time for training DNNs and computing interactions.

  40. Guidelines:

    • The answer NA means that the paper does not include experiments.

    • The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage.

    • The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute.

    • The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn’t make it into the paper).

  41. 9.

    Code Of Ethics

  42. Question: Does the research conducted in the paper conform, in every respect, with the NeurIPS Code of Ethics https://neurips.cc/public/EthicsGuidelines?

  43. Answer: [Yes]

  44. Justification: The research conducted in the paper conform with the NeurIPS Code of Ethics.

  45. Guidelines:

    • The answer NA means that the authors have not reviewed the NeurIPS Code of Ethics.

    • If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics.

    • The authors should make sure to preserve anonymity (e.g., if there is a special consideration due to laws or regulations in their jurisdiction).

  46. 10.

    Broader Impacts

  47. Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed?

  48. Answer: [N/A]

  49. Justification: The contribution of this paper is mainly theoretical, which has not yet been applied to real applications. The social impact could be little, for now.

  50. Guidelines:

    • The answer NA means that there is no societal impact of the work performed.

    • If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact.

    • Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations.

    • The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster.

    • The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology.

    • If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML).

  51. 11.

    Safeguards

  52. Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)?

  53. Answer: [N/A]

  54. Justification: All models and datasets used in this paper are already publicly available.

  55. Guidelines:

    • The answer NA means that the paper poses no such risks.

    • Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters.

    • Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images.

    • We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort.

  56. 12.

    Licenses for existing assets

  57. Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected?

  58. Answer: [Yes]

  59. Justification: We cite the original paper for all datasets. The name of the license is included for each dataset in Appendix F.1, although some licenses are unknown.

  60. Guidelines:

    • The answer NA means that the paper does not use existing assets.

    • The authors should cite the original paper that produced the code package or dataset.

    • The authors should state which version of the asset is used and, if possible, include a URL.

    • The name of the license (e.g., CC-BY 4.0) should be included for each asset.

    • For scraped data from a particular source (e.g., website), the copyright and terms of service of that source should be provided.

    • If assets are released, the license, copyright information, and terms of use in the package should be provided. For popular datasets, paperswithcode.com/datasets has curated licenses for some datasets. Their licensing guide can help determine the license of a dataset.

    • For existing datasets that are re-packaged, both the original license and the license of the derived asset (if it has changed) should be provided.

    • If this information is not available online, the authors are encouraged to reach out to the asset’s creators.

  61. 13.

    New Assets

  62. Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets?

  63. Answer: [N/A]

  64. Justification: The paper does not release new assets.

  65. Guidelines:

    • The answer NA means that the paper does not release new assets.

    • Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates. This includes details about training, license, limitations, etc.

    • The paper should discuss whether and how consent was obtained from people whose asset is used.

    • At submission time, remember to anonymize your assets (if applicable). You can either create an anonymized URL or include an anonymized zip file.

  66. 14.

    Crowdsourcing and Research with Human Subjects

  67. Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)?

  68. Answer: [N/A]

  69. Justification: The paper does not involve crowdsourcing nor research with human subjects.

  70. Guidelines:

    • The answer NA means that the paper does not involve crowdsourcing nor research with human subjects.

    • Including this information in the supplemental material is fine, but if the main contribution of the paper involves human subjects, then as much detail as possible should be included in the main paper.

    • According to the NeurIPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector.

  71. 15.

    Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects

  72. Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained?

  73. Answer: [N/A]

  74. Justification: The paper does not involve crowdsourcing nor research with human subjects.

  75. Guidelines:

    • The answer NA means that the paper does not involve crowdsourcing nor research with human subjects.

    • Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper.

    • We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the NeurIPS Code of Ethics and the guidelines for their institution.

    • For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review.