research-article

Open access

Semantic Explanation for Deep Neural Networks Using Feature Interactions

Authors:

Bohui Xia,

Xueting Wang,

Toshihiko YamasakiAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 17, Issue 3s

Article No.: 115, Pages 1 - 19

https://doi.org/10.1145/3474557

Published: 15 November 2021 Publication History

All formats PDF

Abstract

Given the promising results obtained by deep-learning techniques in multimedia analysis, the explainability of predictions made by networks has become important in practical applications. We present a method to generate semantic and quantitative explanations that are easily interpretable by humans. The previous work to obtain such explanations has focused on the contributions of each feature, taking their sum to be the prediction result for a target variable; the lack of discriminative power due to this simple additive formulation led to low explanatory performance. Our method considers not only individual features but also their interactions, for a more detailed interpretation of the decisions made by networks. The algorithm is based on the factorization machine, a prediction method that calculates factor vectors for each feature. We conducted experiments on multiple datasets with different models to validate our method, achieving higher performance than the previous work. We show that including interactions not only generates explanations but also makes them richer and is able to convey more information. We show examples of produced explanations in a simple visual format and verify that they are easily interpretable and plausible.

1 Introduction

Deep learning techniques have developed quickly in recent years, achieving high performance in such diverse areas as computer vision [16, 30, 46] and natural language processing [29]. Although research has shown the discriminative power of deep neural networks (DNNs), the grounds on which such networks make decisions have not been fully clarified. This is a major problem: without explanations for decisions, users cannot be certain that their models learn the proper knowledge. Moreover, convincing explanations for models have additional special value in business situations, where they can be used to improve services or products. Thus, many studies have been reported aiming at the explainability problem [1, 3, 34, 47].

There are various methods for explaining networks. In the case of convolutional neural networks (CNNs), one of the most popular strategies is to visualize the areas where the model is paying attention [36]. These methods measure gradients of CNN weights in response to the change in output; they are not able to capture semantic information. To solve this problem, several network analysis studies have attempted to produce more semantic and interpretable visualization methods [52]. However, these were done by incorporating explanation modules into models that were designed for other tasks, sacrificing performance. Other studies developed post hoc explaining models using knowledge distillation: explainable models were trained to imitate models designed for the original tasks [6].

Chen et al. [3] also employed knowledge distillation to produce semantic and quantitative explanations: explaining models could produce explanations for prediction while learning to output the same values as the original predictors. One of the methods they introduced describes attributes by the addition of contributions of visual concepts. More specifically, after they trained a predictor for multi-label classification, they obtained the contributions of each pre-defined visual concept to a target attribute’s classification result by calculating the product of a classification probability and its weight. They then trained an “explainer” so that the sum of the contributions became close to the classification result of the target attribute from the original predictor. In this way, they produced humanly comprehensible explanations without harming the performance of the original tasks; nevertheless, there is a drawback to their method: the performance of the explainer is lower than that of the original problem-solving model, predictor, by a large margin. This is due to the simplicity of the composition of the explainer; that is, the target prediction by the explainer is simply the sum of the contributions of the visual concepts.

In order to improve the performance while preserving or even enhancing the explanation, we propose a method of producing explanations using feature interactions. We were inspired by the factorization machine (FM) [33]. FM combines the advantages of support vector machines and factorization models. It has been a popular option for predictive tasks such as click-through rate prediction and advertisement recommendation. Although FM is not a deep-learning-based method, some recent methods using DNNs have incorporated it into their architectures [5, 11]. In our work, we extend the method of Chen et al. [3], described above, to make the best use of FM. There have been few works that introduce feature interactions as explanations of DNNs. In contrast to the interactions computed in these existing methods, ours calculates factor vectors, which are used for calculating interactions, for every input. This calculation method enables us to consider feature interactions as explanations.

The flow of our proposed architecture is as follows. We first train a predictor to predict all the attributes; then we train an explainer that can take feature interactions into account. Weights for attributes and factor vectors are obtained from the explainer to calculate the contributions of each individual attribute and their interactions. These contributions are our explanations for the prediction results. This method enables us to analyze explanations quantitatively and semantically. The sum of the contributions and a bias term is assumed as an output value from the explainer, and the model is trained to output values close to the prediction models that the explainer model mimics. In other words, the explainer model gains knowledge from the prediction model and imitates its decision logic. Note that the interactions in our method are different from those of FM in that factor vectors are not simple embeddings of features, but variables derived from each input.

The experiments, including both classification and regression tasks, verify the efficiency of our method. Because each dataset has different types of input/output and tasks (classification and regression), we have designed and trained different models on each one. Our method, although intended mainly for visual analyses, is applicable to multimedia data, so we also tested it on a TV ads dataset. A comparison between our approaches and models without interactions shows that ours can achieve higher performances. Furthermore, visualized explanations are displayed to prove that these explanations are humanly interpretable and become richer thanks to interactions. The contributions of this work are as follows:

•

We propose a method of explaining networks with the aid of feature interactions that yields easily interpretable semantic and quantitative explanations. The weight factor vector of FM is dynamically tuned depending on the input data.

•

We demonstrate different kinds of networks for the multiple datasets used in our experiments so as to cope with various types of input and output.

•

Our method performs better on the datasets than does a similar method without interactions.

2 Related Works

2.1 Explainability in Deep Learning

Visualization: When a CNN is the base of a model, visualization methods are often used to explain the decisions of the networks. When the areas where the model is focusing are highlighted, it becomes clear which parts affect the prediction result significantly. One visualizing strategy is to illustrate gradients of weights in layers caused by back propagation through trained models [8, 27, 37, 48, 49]. In this strategy, inverse operations are defined for every layer in the CNN model; gradients can then be taken from any layer to analyze their nature individually. Class activation map (CAM) is one of the most common visualization techniques to highlight areas corresponding to specific classes, and its successor model Grad-CAMs [36] are based on the same principles. Grad-CAMs use activation maps and gradients of weights, instead of weights as in CAM, to make heatmaps. One drawback of these types of approaches is that they are mainly targeted only at classification problems.

Another major explanatory strategy aimed at visualization employs perturbation methods. These methods learned the relationship between inputs and outputs by adding changes to the inputs and calculating the extent to which the outputs are altered by the change. Perturbation has been implemented in various ways, such as sliding an occlusion mask over an input image [34, 48, 54], masking a word in input sentences, or masking a feature from a hidden layer [21]. Zintgraf et al. [55] proposed a method for calculating the relevance between regions in an input image and classes. They estimated the conditional probability of the presence of a certain class under the condition that a part of an image was perturbed; the difference between the probability and the original prediction result for the whole image was assumed to be the contribution made by the perturbed area to the classification result.

Model revision for interpretable features: There are many studies that, instead of taking a post hoc approach, attempt to improve the model itself that has been reported so that interpretable features can be obtained [42]. Generative models have developed so remarkably that the generated images are almost completely natural, but the lack of interpretability in the latent space has limited the range of applications. In order to interpret generative-model decision-making and control attributes in generated images, disentanglement has been explored [4, 15]. Zhang et al. [52] proposed an interpretable CNN in which each filter in a high convolutional layer represents a specific object part by introducing a loss for each filter.

Joint training: Visualization methods that involve analyzing pre-trained models have limited expressiveness. On the other hand, incorporating interpretability into models can damage their discriminative power. To overcome these problems, approaches have emerged that introduce additional tasks [17, 19, 31]. In this approach, not only the original model but also additional models solving other tasks are trained jointly. One of the most intrinsic strategies is to generate explanations in text format. Hendricks et al. [14] have proposed a phrase-critic model to refine candidate explanatory sentences by comparing their accumulated scores for each noun they contain and selecting the one that is the most class and image relevant. Zellers et al. [50] have formulated a new task called visual sense reasoning, the task of answering questions with a thorough visual understanding. They have collected data that contains questions, answers, and rationales and introduced a method for solving the task that consists of grounding, contextualization, and reasoning. Another joint training method is explanation by prototypes: the prediction result of an input image is explained by a subset of the training datasets [2, 22, 28].

Knowledge distillation: To enhance interpretability, it has been suggested that an explainable model could distill knowledge from a prediction model [6, 12]. Various ways of distilling knowledge into more interpretable models, such as decision trees and graphs, have been investigated [10, 51]. This kind of strategy is applicable not only to images but also to other domains such as videos [18]. Our work is closely related to one of the distillation methods proposed by Chen et al. [3], which describes prediction results using pre-defined visual concepts, with a prior weight to prevent biased interpretation. We describe this method in more detail in Section 3. Our modification of it achieves higher performance by enhancing its discriminative power. In addition, our method can generate more detailed explanations.

2.2 Feature Interaction

Implicit interaction interpretation: There have been some approaches to detect and interpret feature interactions [9, 35]. Some works have tackled interpretation of complex models using features [26, 39]. However, feature interactions were not discussed in these methods. One of the most recent works about detecting interactions in neural architectures is [44], the goal of which is closely related to ours. Our method, however, is aimed to explicitly consider interactions and furthermore is capable of comprehensive interpretation including not only interactions but also each feature separately.

Learning explicit interaction: In order to improve prediction performances for that high-dimensional and sparse data that often can been seen in recommender systems, factorization models have gained popularity, such as Matrix Factorization [20, 41]. Rendle [33] proposed FM, which combines the advantages of support vector machines and the factorization models. FM works as a general predictor with any real-valued feature vector. By using factorized parameters, it can model interactions between all the input parameters even in the case of sparsity. As DNNs have gained popularity, there have been many other attempts to integrate interactions in deep learning models [5, 11, 23, 32, 40, 45, 53].

3 Approach

In this section, we explain how our model works in detail. First, we provide an overview of our whole architecture, and then illustrate its components individually.

3.1 Overview

An overview of our model is shown in Figure 1. Our goal is to explain prediction results by the contributions from every individual attribute and their interactions; hence, our model is a combination of two sub-models that we term the “performer” and “explainer,” following [3]. First, a prediction task is solved by the performer; then, an explanation is generated by the explainer using the predicted values. The prediction results are explained by the contributions of single features and their interactions. More precisely, we acquire vectors from the explainer whose sizes are the total number of explanatory attributes plus that of combinations of every attribute. The vector represents numerical contributions to the prediction result, by which we can tell how much each attribute or interaction contributes to the result, and we treat the vector as an “explanation” for the object network.

Fig. 1.

We will define the variables used in Figure 1 and the following sections before entering into the details. We assume the output value from the performer is

, and the output value from the explainer is

given an input instance I. The predictors of a target attribute in the performer are denoted by F, and those of the other attributes by

, where n represents the number of attributes used to explain the prediction. The attribute’s prediction results are

, whereas

is a vector that denotes the weight for single attributes and

is a matrix containing factor vectors. The explainer consists of g (for producing

) and h (for producing

). The size of a factor vector is set to l. These variables are used to calculate the contributions, as we explain in the next section.

3.2 Explainer Algorithm

In this section, we illustrate how the explainer works and how to acquire explanations for the predictor results. We formulate the explainer model as

(1)

where

, and

means an inner product of two vectors. The first term in the equation is a bias, the second refers to the contributions from attributes, and the third to contributions made by attribute interactions. In the second term,

is the product of a weight from the explainer and a predicted value from the performer, measuring the extent to which the ith attribute contributes to the target attribute. Regarding the third term, we newly add this to the formula of the previous method [3].

represents the interaction between the ith and jth attributes. It is defined as the product of their predicted values from the performer with the inner product of the ith and jth factor vectors.

It is important to note that the weights

and the factor vectors

are dynamically calculated depending on the attributes as well as the input instance and the parameters in the explainer, which means that every set of input data produces a different set of factor vectors; this contrasts with the usual linear regression or FM model. These variables thus give us more expressive power than do traditional methods.

We will now take a closer look at the interaction term. A straightforward approach to computing the contribution of the interactions would be calculating each interaction for every pair of attributes and then adding them up. However, this would be a very time-consuming operation, because its computational complexity is of

. To reduce the computational load, we follow the reformulation used in the original FM [33]. The third term in Equation (1) is calculated as follows in an actual model, omitting deformations in the middle:

(2)

This reduces the complexity to

. In the training phase, interactions are calculated in this way. By contrast, in the testing phase, we calculate all pairwise interactions so as to clarify which kinds of attribute interactions make large contributions and which do not.

3.3 Training Process

First, the performer is trained to solve a prediction task, for example, multilabel classification and regression. We use cross entropy as the performer’s loss function for classification tasks and mean squared error for regression tasks. Then, the explainer is trained using the output values from the performer. The value predicted by the explainer, which is calculated in the manner illustrated in the previous section, is expected to be similar to the prediction results by the performer because we train the explainer to mimic the behavior of the performer. The loss function for training the explainer is defined as follows:

(3)

The loss function works for minimizing the mean squared error between the performer and explainer outputs. PriorLoss was proposed in [3] as a solution for the problem of biased interpretation: simply minimizing the error between the performer and explainer outputs makes the explainer select fewer attributes, resulting in biased explanations. To avoid this, the prior weights

are approximated as the derivatives of

with respect to

, and the difference between the priors and the weights is minimized. The definition of the loss function is

, where t represents the current epoch,

is a constant, and

means the L-2 norm. We use this loss function in some of our experiments to penalize the weights of the additive function of attribute contributions.

4 Experiments

We used three datasets to validate our method. Each dataset has a different domain of input data and a different type of annotation. By testing our model in different settings, it can be verified to be useful in many applications.

4.1 Experiment 1: CelebA Dataset

The first dataset is the CelebA dataset [25]. This is a face attribute dataset including 200K celebrity images, each with 40 attribute annotations such as “Eyeglasses” and “Smiling.” In this experiment, we set “Attractive,” “Heavy Makeup,” “Male,” and “Young” as the attributes to be explained by the rest. These global attributes are selected as targets because they can be intuitively explained by combinations of other local features.

The model architecture is illustrated in Figure 2. The input is an image. We use VGG16 [38] as a base model for a performer and ResNet152 [13] for an explainer; both are pretrained on ImageNet [7]. F and

serve as the performer’s prediction heads that output predictions for each attribute. The explainer’s prediction heads, g and

, which regress the weight and factor-vector for each attribute, share the same architecture with different parameters following ResNet152. Layers composing these models are listed in Table 1. The number of explanatory attributes n is 39. The dimension of factor vectors l is set to 2.

Fig. 2.

Table 1.

Layer	Specification
Linear
ReLU	–
Dropout	–
Linear
ReLU	–
Dropout	–
Linear	n-1l

We first train a performer with a cross-entropy loss and then an explainer with the loss function of Equation (3). In PriorLoss,

is set to 10.

4.2 Experiment 2: DeepFashion Dataset

The second dataset is the DeepFashion dataset [24]. The DeepFashion database includes many benchmarks available for various purposes. We select the Category and Attribute Prediction Benchmark because it contains rich annotations suitable for explanations. Coarse annotation has five types of attributes: texture, figure, shape, part, and style. Because the annotation includes as many as 1,000 attributes, we reduce the number of attributes and data. The benchmark includes many kinds of clothes (denim jacket, long skirt, T-shirt, etc.). In order to limit the number of attributes, we choose only data from the tops. Then the 100 most frequent attributes are selected, for example, “Print,” “Knit,” and “Shirt;” the rest are abandoned. As a result, the number of data points is about 140K. In this experiment, we select “Classic,” “Basic,” “Cute,” and “Soft” as attributes to be explained by the other attributes, as they describe clothes’ global features.

The model architecture used in this experiment and its training process is the same as that of Experiment 1. The input is an image. The number of explanatory attributes n is 99. The dimension of factor vectors is the same as that of Experiment 1, which is 2.

4.3 Experiment 3: TV Ads Dataset

The last dataset used is the TV ads dataset, a collection of 14,990 commercial videos that were actually broadcast on TV in Japan between January 2006 and April 2016. Each video was evaluated and annotated by 600 participants. The dataset was collected to predict the following four impressional and emotional effects:

•

Favorability rating (F): how much participants liked the content of the advertisement itself

•

Interest rating (I): how much participants became interested in the product/service

•

Willingness rating (W): how much participants felt like buying the product/service

•

Recognition rating (R): how much participants remembered the advertisement

Besides the videos, the dataset contains metadata such as information about the casts featured in the ad. In addition, scores are given to 26 attributes that describe the ad, such as “Good story” and “Impressive.” In the present experiment, we attempt to explain each of the four effects of the attributes.

The effects and the attributes are continuous values, not binary labels, and the performer’s prediction task is necessarily a regression problem, in contrast to the previous two experiments. Hence, a different architecture is needed. We illustrate the model in Figure 3. Input data consists of frame deep features extracted from video, sound, metadata, cast data, text in frames, and narration data. As the base model for both the performer and the explainer, we employ a multimodal fusion model using an attention mechanism proposed in previous research [43]. F regresses one of the four effects; f outputs a vector

(where n is 26) that represents the predicted attributes. In contrast to the model in Figure 2, F and f output the target prediction and explanatory attributes prediction independently. The architecture of the explainer is otherwise similar to that in Figure 2: g and h share the base model, and its branches produce

and

. We set

in Equation (3) to 0; that is, we do not employ PriorLoss in this experiment. l is set to 2 similarly.

Fig. 3.

4.4 Results

We show the accuracy or correlation coefficients of each experiment in Tables 2, 4, and 6. We also show the conditional entropy in Tables 3, 5, and 7. The first row shows the result from the explainer in the method of Chen et al. [3], the second row shows our method’s explainer, and the third row shows the performer. In the experiment, we compared our method to the previous method to show that feature interaction improves the explainer’s performance. Conditional entropy of explanation presented in the previous work [3] is not appropriate for evaluating our method because the weights of attributes and their interactions were not approximated as they were in the previous one.

Table 2.

	Attractive	Heavy Makeup	Male	Young
Explainer w/o interaction [3]	0.789	0.899	0.960	0.812
Explainer w/ interaction (ours)	0.815	0.911	0.970	0.875
Performer	0.819	0.912	0.977	0.881

Table 2. Results of Experiment 1 on the CelebA dataset

The evaluation metric is classification accuracy.

Table 3.

	Attractive	Heavy Makeup	Male	Young
Explainer w/o interaction [3]	9.81	9.80	9.81	9.81
Explainer w/ interaction (ours)	9.80	9.81	9.81	9.82
Performer	9.85	9.86	9.81	9.88

Table 3. Results of Experiment 1 on the CelebA Dataset

The evaluation metric is the conditional entropy of the prediction.

For Experiments 1 and 2, we reimplemented the previous method to use it as a comparative method. There could be a slight difference between our implementation and that of Chen et al. [3], since their paper misses some details about the model; nevertheless, as the first and third rows in Table 2 show, the performer and explainer implemented by us achieve almost the same performances as those reported in the paper. This implies that our implementation can accurately reproduce the method of Chen et al. [3].

Table 2 shows the results of the experiment on the CelebA dataset. As mentioned above, our proposed method is compared with an interaction-free method from the literature. From the table it can be seen that explainers perform better with feature interaction whatever the target attribute, and attain accuracies close to those of performers. This indicates that feature interaction is capable of increasing both explainability and the model’s discriminative power at the same time. To verify that our model using interaction can produce reasonable explanations, we display an example in Figures 4 and 5, picked from test data in the CelebA dataset. These are explanations of why the performer judged the image to be “Attractive.” The horizontal axis is the attribute label and the vertical axis is the contribution to the prediction. Figure 4 shows an explanation produced with the method of Chen et al. [3]. The 20 largest contributions are sorted in descending order. Figure 5 shows an explanation produced by our method. The top row shows the contribution from single attributes and the bottom row shows that from attribute interactions (the 20 largest for each). The previous method already achieved quantitative and semantic explanations; however, ours is able to consider not only single-attribute contributions but also interactions, resulting in more unbiased and insightful explanations. Examining Figure 5 in more detail, it is suggested by the explainer that attributes such as “Double chin” and “Bushy eyebrows” contribute to “Attractive” for this face image and so do the attribute interactions, including “No beard & Young” and “Male & Young.” The explanation is reasonable and easily interpretable by humans. In addition, it is observed in the explanation that contributions made by feature interactions such as “No beard & Young” are larger than those made by single features such as “Double chin.” It can be estimated that the performer highlights the feature interaction when it performs prediction, and our method is able to successfully detect that.

Fig. 4.

Fig. 5.

Table 4 shows the results of the experiment on the DeepFashion dataset. As can be seen, introducing interactions to the explainer is not helpful for improving prediction performance in this domain. There are two possible reasons for this. The first is that the annotations are so coarse that the interactions contain more error. Since interaction is the product of the probabilities of two attributes and factor vectors, error tends to be amplified. The second possible reason is that the task itself is so simple that explanation models can easily converge to the optimum regardless of variation of explanatory variables. We find that conditions such as the number of attributes or complexity of the model are strongly related to the quality of explanation and prediction performance and thus need to be carefully designed. For a more detailed analysis, we present an example of explanations produced by the method of Chen et al. and by ours in Figures 6 and 7, respectively. These explanations were produced to explain why the image displayed on the top is “Classic.” For example, according to Figure 6, the second most important reason for being “Classic” is “New York,” while it is hard to determine whether the cloth can be categorized as “New York.” Furthermore, Figure 7 shows that interactions between “Collar & Button” or between “Collar & Pleated” are the most significant factors, although “Button & Pleated” are unseen in the image. Similarly, other examples contain wrong attributes in their explanations.

Fig. 6.

Fig. 7.

Table 4.

	Classic	Basic	Cute	Soft
Explainer w/o interaction [3]	0.9700	0.9909	0.9921	0.9920
Explainer w/ interaction (ours)	0.9700	0.9906	0.9920	0.9920
Performer	0.9699	0.9909	0.9921	0.9920

Table 4. Results of Experiment 2 on the DeepFashion Dataset

The evaluation metric is classification accuracy.

Table 6 compares prediction results of the experiment on the TV ads dataset. Different from the previous two experiments, the results are evaluated by Pearson’s correlation coefficients, since the targets are continuous values. It can be seen that the explainer achieves higher performance when interactions are incorporated, except on the Favorability rating. This implies that considering interactions is valid for various tasks including regression. Figures 8 and 9 give examples of the explanations produced by the two methods explaining the ad’s Favorability rating.¹ For the interaction-free explanation, attributes such as “Familiar” and “Empathetic” are dominant causes. By contrast, the explanation with interactions similarly takes “Familiar” as one of the most important reasons; however, it is different from the other one in that the second most emphasized attribute is “Celebrity/Character,” which is aligned with our intuition. Although in this case the effects of interaction are much less important than in the other two experiments, our method is able to produce reasonable quantitative and semantic explanations just as the other cases.

Fig. 8.

Fig. 9.

Tables 3, 5, and 7 show that the conditional entropy is almost the same as the explainer without interaction as well as the performer. However, as pointed out in [3], this is not directly related to the ground truth of explanations. We believe that higher accuracy and correlation coefficients are more important because it means the distillation from the performer is more successful.

Table 5.

	Classic	Basic	Cute	Soft
Explainer w/o interaction [3]	9.79	9.79	9.79	9.79
Explainer w/ interaction (ours)	9.80	9.80	9.80	9.80
Performer	9.82	9.83	9.83	9.81

Table 5. Results of Experiment 2 on the DeepFashion Dataset

The evaluation metric is the conditional entropy of the prediction.

Table 6.

	F	I	W	R
Explainer w/o interaction [3]	0.631	0.552	0.724	0.653
Explainer w/ interaction (ours)	0.613	0.568	0.728	0.674
Performer	0.687	0.692	0.816	0.716

Table 6. Results of Experiment 3 on TV Ads Dataset

The evaluation metric is Pearson’s correlation coefficients. The columns refer to Favorability, Interest, Willingness, and Recognition.

Table 7.

	F	I	W	R
Explainer w/o interaction [3]	6.81	6.82	6.82	6.81
Explainer w/ interaction (ours)	6.82	6.82	6.81	6.80
Performer	6.82	6.81	6.81	6.82

Table 7. Results of Experiment 3 on TV Ads Dataset

The evaluation metric is the conditional entropy of the prediction. The columns refer to Favorability, Interest, Willingness, and Recognition.

For more experimental results, please refer to Figures 10, 11, 12, and 13 in the appendix.

4.5 Discussion

In the previous section, we reviewed our experimental results and argued that our method with interactions is able to produce more accurate and insightful explanations than a similar one without them. Experiments 1 and 3 demonstrated better prediction results by explainers with interactions, while Experiment 2 resulted in almost the same performances. Here, we would like to consider an aspect of the architecture in more detail. Let’s discuss the regression model used in Experiment 3 on the TV ads dataset. This model is distinguished from the other two in that a sigmoid function is attached to the end of g and h, which induces weights

and factor vectors

, respectively. The reason we add a sigmoid to this model is that the prediction performance drops sharply otherwise, as illustrated in Table 8. Our original intention was to allow weights and factor vectors to take negative values to provide more flexibility to the models, as in [3] and the other two experiments. However, we find that the range of the weights and vectors needs to be restricted to produce plausible explanations and at the same time maintain an acceptable level of performance. We suppose that whether an explainer needs a sigmoid or another appropriate activation function depends on the type of prediction task: when explainer models are built, the fine details of their design will depend on the problems for which they are intended.

Table 8.

	F	I	W	R
Explainer w/o sigmoid	0.579	0.518	0.633	0.514
Explainer w/ sigmoid	0.613	0.568	0.728	0.674

Table 8. Performance of Our Interaction-including Model on the TV Ads Dataset with and without a Sigmoid

The columns refer to Favorability, Interest, Willingness, and Recognition.

5 Conclusion

In this article, we have proposed a method to add explainability to an existing prediction model regardless of the type of prediction task. More specifically, our method can produce quantitative and semantic explanations that are easily interpretable. Our method developed from previous work by Chen et al. [3] that attempted to explain a prediction result by the addition of contributions from attributes, without including interactions. However, this method had a defect, in that there was a trade-off between performance and explainability. Inspired by the factorization machine, we addressed this problem by introducing feature interactions to the method. We verified the effectiveness of our proposal through experiments on multiple datasets with multiple prediction problems. We conducted qualitative and quantitative evaluations of the explainer of our study and showed it superior to that of the no-interactions model.

In future work, the interactions not only between attributes in the same domain but also between attributes in different domains may be considered to acquire more insightful explanations.

Footnote

Zenyaku Kogyo Co., Ltd. October 3 2005. Jikininn.

A Results from Experiment 1

Fig. 10.

Fig. 11.

Fig. 12.

Fig. 13.

References

[1]

Sercan O. Arik and Tomas Pfister. 2020. ProtoAttend: Attention-based prototypical learning. Journal of Machine Learning Research 21, 210 (2020), 1–35.

Abstract

1 Introduction

2 Related Works

2.1 Explainability in Deep Learning

2.2 Feature Interaction

3 Approach

3.1 Overview

3.2 Explainer Algorithm

3.3 Training Process

4 Experiments

4.1 Experiment 1: CelebA Dataset

4.2 Experiment 2: DeepFashion Dataset

4.3 Experiment 3: TV Ads Dataset

4.4 Results

4.5 Discussion

5 Conclusion

Footnote

A Results from Experiment 1

References

Cited By

Index Terms

Recommendations

Causality in Neural Networks - An Extended Abstract

Causal Explanation of Convolutional Neural Networks

Non-monotonic Explanation Functions

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations