Knowledge-Grounded Natural Language Recommendation Explanation
Knowledge-Grounded Natural Language Recommendation Explanation
Knowledge-Grounded Natural Language Recommendation Explanation
Linear
mean pool
ni1 ni2 XL
Global Attention
Global Attention
Global Attention
rating
user
User
ri1
Linear
mean pool
vu2 vc + Item
+ Mask
item
gather gather
User-Item
explanation
Attention
User-Item
User-Item
Attention
Attention
Auto-Regressive
ri3 ~
XL
Decoder
ni3 Mg ~
XgL
Collaborative KG Representation
Figure 1: Illustration of KnowRec. 1) The User’s Item KG Representation Module. 2) The Global and User-Item
Graph Attention Encoder. 3) The Output Module for rating prediction and explanation.
2.3 Knowledge Graph-to-Text Generation measures u’s preference for v. By jointly training
In KG-to-Text, pre-trained language models such on the recommendation and explanation generation,
as GPT-2 (Radford et al., 2019) and BART (Lewis our model can contextualize the embeddings more
et al., 2020) have seen success in generating fluent adequately with training signals from both tasks.
and accurate verbalizations of KGs (Chen et al., 4 Model
2020; Ke et al., 2021; Ribeiro et al., 2021; Colas
et al., 2022). We devise an encoder for user-item Figure 1 illustrates our model with the user-item
KGs and a decoder for both the generation and rec- graph constructed through collaborative filtering
ommendation tasks. Specifically, we formulate a signals, an encoder, and inference functions for
novel masking scheme for user-item KGs to struc- explanation generation and rating prediction.
turally encode user and item features, while gen- 4.1 Input
erating a recommendation score from their latent
representations. Thus, our task is two-fold, fusing The input of KnowRec comprises a user u repre-
elements from the Graph-to-Text generation and sented by the user’s purchase history {vui } and an
KG recommendation domains. item v represented by its KG gv , as introduced in
Section 3. Let vc denote the item currently consid-
3 Problem Formulation ered by the system. The item vc is aligned with one
of the entities through A and becomes the center
Following prior work, we denote U as a set of users, node of gv , as shown in Figure 1.
I as a set of items, and the user-item interaction Because our system leverages a Transformer-
matrix as Y ∈ R|U |×|I| , where yuv = 1 if user based encoder, we first linearize the input into a
u ∈ U and item v ∈ I have interacted. Here, we string. For the user u = {vui }, we initialize it by
represent user u as the user’s purchase history u = mapping each purchased item vui into tokens of
{vui }, where vui denotes the i-th item purchased by the item’s name. For the item v represented by
user u in the past. Next, we define a KG as a multi- gv , we decompose gv into a set of tuples {tvj },
relational graph G = (V, E), where V is the set of where tvj = (vc , rvj , nvj ), nvj ∈ Vv , and rvj ∈
entity vertices and E ⊂ V×R×V is the set of edges Rv . We linearize each tuple tvj into a sequence of
connecting entities with a relation from R. Each tokens using lexicalized names of the nodes and the
item v has its own KG, gv , comprising an entity relation. We then concatenate all the user tokens
set Vv and a relation set Rv which contain features and the item tokens to form the full input sequence
of v. We devise a set of item-entity alignments x. For example, suppose the current item vc is
A = {(v, e)|v ∈ I, e ∈ V}, where (v, e) indicates the book Harry Potter, the KG has a single tuple
that item v is aligned with an entity e. (Harry Potter, author, J.K. Rowling), and the user
Given a user u and an item v represented by its previously purchased two books The Lord of the
KG gv , the task is to generate an explanation of Rings and The Little Prince. In this case, input
natural language sentences Eu,v as to why item sequence x = The Lord of the Rings The Little
v was recommended for the user u. As in previ- Prince Harry Potter author J.K. Rowling.
ous multi-task explainable recommendation mod- We map the tokens to randomly initialized vec-
els, KnowRec calculates a rating score ru,v that tors or pre-trained word embeddings such as those
in BART (Lewis et al., 2020), obtaining X0 = corresponding parameter matrix W in the l-th layer.
[. . . ; Vui ; . . . ; Tvj ; . . . ] where Vui and Tvj are Note that the transformer encoder may be initial-
word vector representations of vui and tvj , respec- ized via a pre-trained language model.
tively. Unlike previous work on KG recommen- User-Item Graph Attention. We further propose
dation (Wang et al., 2020) where users/items are User-Item Graph Attention encoder layers, which
represented via purchase history and propagated compute graph-aware attention via a mask to cap-
KG information, our system infuses KG compo- ture the user-item graph’s topological information,
nents to provide a recommendation and its natu- which runs in parallel with the Global Attention
ral language explanation. Our system also differs encoder layers.
from prior studies on explainable recommendation We first extract the mask Mg ∈ Rm×m from the
in that while they focus on reviews and thus en- user-item linked KG, where m is the number of
code users/items as random vectors with additional relevant KG components, i.e., nodes and edges that
review-based sparse token features as auxiliary in- are lexically expressed in the KG (edges between
formation (Li et al., 2021), we directly encapsulate vui and vc not included). In Mg , each row/column
KG information into the input representation. refers to a KG component. Mij = 0 if there is a
connection between component i and j (e.g., “J.K.
4.2 Encoder Rowling” and “author”) and −∞ otherwise. In
Collaborative KG Representation. Because addition, we assume all item components, i.e., the
KnowRec outputs a natural language explanation previous purchases and the current item, are mutu-
grounded on KG facts, as well as a recommenda- ally connected when devising Mg .
tion score for the user-item pair, we need to con- For each layer (referred to as the l-th layer), we
struct a user-item-linked KG to represent an in- then transfer its input Xl−1 into a component-wise
put through its corresponding lexical graph feature. representation Xgl−1 ∈ Rm×d , where d is the word
To do so, we leverage collaborative signals from embedding size. Motivated by Ke et al. (2021),
Y, combining u with v by linking previously pur- we perform this transfer by employing a pooling
chased products vui to the current item vc from gv , layer that averages the vector representations of
forming a novel lexical user-item KG. Additionally, all the word tokens contained in the correspond-
we connect all previously purchased items together ing node/edge names per relevant KG component.
in order to graphically model collaborative filtering With the transferred input Xgl−1 , we proceed to en-
effects for rating prediction, as illustrated in Fig- code it using User-Item Graph Attention with the
ure 1. Note that the relations between previously graph-topology-sensitive mask as follows:
purchased items and the current items do require a
lexical representation for our model. The resulting X̃gl = AttnM (Q′ , K′ , V′ )
′ ′⊤
graph goes through the Transformer architecture, (2)
QK
as described below. = softmax √ + Mg V′ .
dk
Global Attention. Transformer architectures have
recently been adopted for the personalized ex- where query Q′ , key K′ , and value V′ are com-
plainable recommendation task (Li et al., 2021). puted with the transferred input and learnable pa-
We similarly leverage Transformer encoder lay- rameters in the same manner as Equation (1).
ers (Vaswani et al., 2017), referred to as Global Lastly, we combine the outputs of the Global
Attention, to encode the input representation with Attention encoder and the User-Item Graph Atten-
self-attention as: tion encoder in each layer. As the two outputs have
QK⊤
different dimensions, we first expand X̃gl to the
Xl = Attn(Q, K, V) = softmax √ V, same dimension of Xl through a gather operation,
dk
i.e., broadcasting each KG component-wise rep-
Q = Xl−1 WlQ , K = Xl−1 WlK , resentation in X̃gl to every encompassing word of
V = Xl−1 WlV the corresponding component and connecting those
(1) representations. We then add the expanded X̃gl to
where Xl is the output of the l-th layer in the en- Xl through element-wise addition, generating the
coder, and dk is a tunable parameter. Q, K, and l-th encoding layer’s output:
V represent the Query, Key, and Value vectors,
respectively, each of which is calculated with the X̃l = gather(X̃gl ) + Xl (3)
Note, in this section, we illustrate the Global At- where ru,v denotes the ground-true score.
tention encoder, User-Item Attention encoder, and Next, as in other NLG tasks (Lewis et al., 2020;
their combination with single-head attention. In Zhang et al., 2020), we incorporate Negative Log-
practice, we implement both encoders with multi- Likelihood (NLL) as the explanation’s cost func-
head attention as in Vaswani et al. (2017). tion Le . Thus, we define Le as:
|Eu,v |
4.3 Rating Prediction 1 X 1 X
Le = − log pet t (7)
For the rating prediction task, we first separate |U||I| |Eu,v |
u∈U ∧v∈I t=1
and isolate user u and item v features via masking.
Once isolated, we perform a mean pool on all their where pet t
is the probability of a decoded token et
respective tokens and linearly project u and v to at time step t.
perform a dot-product between the two new vector
5 Dataset
representations as follows:
Although KG-recommendation datasets exist, they
x̃u = poolmean (X̃L + mu )Wu do not contain any supervision signals to NL de-
x̃v = poolmean (X̃L + mv )Wv (4) scriptions. Thus, to evaluate our explainable recom-
r̂u,v = dot(x̃u , x̃v ), mendation approach in a KG-aware setting and our
KnowRec model, we introduce two new datasets
where mu and mv are the user and item masks that based on the Amazon-Book and Amazon-Movie
denote which tokens belong to the user and item, datasets (He and McAuley, 2016): (1) Book KG-
Ws are learnable parameters, and L refers to the Exp and (2) Movie KG-Exp.
last layer of the encoder. Recall that our task requires an input KG along
with an NL explanation and recommendation score.
4.4 Explanation Generation Because it is more efficient to extract KGs from
Before generating a final output text for our expla- text, rather than manually annotate each KG with
nation, we pass the representation through a fully text, we take a description-first approach, automati-
connected linear layer as the encoder hidden state cally extracting KG elements from the correspond-
and decode the representation into its respective ing text. Given the currently available data, we
output tokens through an auto-regressive decoder, leverage item descriptions as a proxy for the NL
following previous work (Lewis et al., 2020). explanations, while constructing a user-item KG
from an item’s features and user’s purchase history.
4.5 Joint-learning Objective
We first extract entities from a given item de-
As previously noted, our system consists of two scription via DBpedia Spotlight (Mendes et al.,
outputs: a rating prediction score r̂u,v and natural 2011), a tool that detects mentions of DBpe-
language explanation Eu,v which justifies the rat- dia (Auer et al., 2007) entities from NL text.
ing by verbalizing the item’s corresponding KG. We then query for each entity’s most specific
We thus perform multi-task learning to learn both type and use those types as relations that con-
tasks and manually define regularization weights nect the item to its corresponding entities. We
λ, as in similar multi-task paradigms, to weight construct a user KG via their purchase history,
the two tasks. Taking Lr and Le to represent the e.g. [P urchase1 , P urchase2 , ...P urchasen ], as
recommendation and explanation cost functions, a complete graph where each purchase is connected.
respectively, the multi-task cost L then becomes: Finally, we connect all the nodes of the user KG
to the item KG, treating each user purchase as a
L = λr Lr + λe Le , (5)
one-hop neighbor of the current item. To ensure the
where λr and λe denote the rating prediction and KG-explanation correspondence, we filter out any
explanation regularization weights, respectively. sentences in the explanation in which no entities
We define Lr using Mean Square Error (MSE) were found. To measure objectivity, we calculate
in line with conventional item recommendation and the proportion of a given KG’s entities that appear
review-based explainable systems: in the explanation, called entity coverage (EC) (de-
fined in Appendix B.3). We summarize our dataset
1 X
statistics in Table 1 and present a more comprehen-
Lr = (ru,v − r̂u,v )2 , (6)
|U||I| sive comparison in Appendix A.2.
u∈U ∧v∈I
Name #Users #Items #Interactions KG #Es #Rs #Triples EC Desc. Words/Sample
Book KG-Exp 396,114 95,733 2,318,107 Yes 195,110 392 745,699 71.45 Yes 99.96
Movie KG-Exp 131,375 18,107 788,957 Yes 59,036 363 146,772 71.32 Yes 96.35
Table 1: Statistics of our Book KG-Exp and Movie KG-Exp benchmark datasets.#Es, #Rs, and Desc. denote number
of entities, number of relations, and if the dataset contains parallel descriptions.
Table 2: Comparison of neural generation models on the Movie KG-Exp and Book KG-Exp datasets.
Dataset Model BLEU-1 BLEU-4 USR R2-F R2-R R2-P RL-F RL-R RL-P EC
Att2Seq 2.63 0.00 0.00 0.00 0.00 0.00 2.73 4.25 2.63 0.01
Movie NRT 8.78 0.32 0.01 1.84 1.08 11.73 7.12 10.17 17.97 0.07
KG-Exp Transformer 12.23 0.27 0.16 1.24 1.07 3.54 6.97 9.54 12.00 1.18
(Few-shot) PETER 12.28 0.68 0.36 2.33 1.45 12.49 12.00 13.18 18.03 5.44
PEPLER 12.58 0.41 0.01 1.26 1.44 1.18 10.73 12.63 10.38 0.11
KnowRec 33.89 7.53 0.87 13.41 12.60 17.67 24.48 25.63 35.66 63.92
Att2Seq 16.58 1.53 0.22 4.68 3.10 15.58 13.30 15.28 21.32 0.26
Book NRT 19.12 2.19 0.01 6.11 4.36 13.99 15.18 20.47 16.78 1.19
KG-Exp Transformer 12.69 1.22 0.08 3.60 3.16 8.65 9.77 15.64 10.58 1.57
(Few-shot) PETER 18.38 2.87 0.45 7.12 5.07 17.50 14.74 17.66 17.52 4.23
PEPLER 7.96 0.26 0.02 0.67 0.63 0.83 7.59 10.07 7.04 0.54
KnowRec 28.93 7.94 0.93 17.28 16.05 22.45 24.84 25.19 36.60 60.46
Table 3: Comparison of neural generation models on the Movie KG-Exp and Book KG-Exp datasets in the few-shot
learning setting (1% of training data).
Yehuda Koren. 2008. Factorization meets the neighbor- Andriy Mnih and Russ R Salakhutdinov. 2007. Proba-
hood: a multifaceted collaborative filtering model. In bilistic matrix factorization. In Advances in Neural
Proceedings of the 14th ACM SIGKDD international Information Processing Systems, volume 20.
conference on Knowledge discovery and data mining, Kishore Papineni, Salim Roukos, Todd Ward, and Wei-
pages 426–434. Jing Zhu. 2002. Bleu: a method for automatic evalu-
ation of machine translation. In Proceedings of the
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan 40th Annual Meeting of the Association for Compu-
Ghazvininejad, Abdelrahman Mohamed, Omer Levy, tational Linguistics, pages 311–318, Philadelphia,
Veselin Stoyanov, and Luke Zettlemoyer. 2020. Pennsylvania, USA. Association for Computational
BART: Denoising sequence-to-sequence pre-training Linguistics.
for natural language generation, translation, and com-
prehension. In Proceedings of the 58th Annual Meet- Alec Radford, Jeffrey Wu, Rewon Child, David Luan,
ing of the Association for Computational Linguistics, Dario Amodei, Ilya Sutskever, et al. 2019. Language
pages 7871–7880, Online. Association for Computa- models are unsupervised multitask learners. OpenAI
tional Linguistics. blog, 1(8).
Leonardo F. R. Ribeiro, Martin Schmitt, Hinrich
Chenliang Li, Cong Quan, Li Peng, Yunwei Qi, Yuming Schütze, and Iryna Gurevych. 2021. Investigating
Deng, and Libing Wu. 2019. A capsule network for pretrained language models for graph-to-text genera-
recommendation and explaining what you like and tion. In Proceedings of the 3rd Workshop on Natural
dislike. In Proceedings of the 42nd International Language Processing for Conversational AI, pages
ACM SIGIR conference on Research and Develop- 211–227, Online. Association for Computational Lin-
ment in Information Retrieval, pages 275–284. guistics.
Lei Li, Yongfeng Zhang, and Li Chen. 2020. Gener- Shaoyun Shi, Hanxiong Chen, Weizhi Ma, Jiaxin Mao,
ate neural template explanations for recommendation. Min Zhang, and Yongfeng Zhang. 2020. Neural logic
In Proceedings of the 29th ACM International Con- reasoning. In Proceedings of the 29th ACM Inter-
ference on Information & Knowledge Management, national Conference on Information & Knowledge
pages 755–764. Management, pages 1365–1374.
Peijie Sun, Le Wu, Kun Zhang, Yanjie Fu, Richang sentiment classification. Proceedings of the AAAI
Hong, and Meng Wang. 2020. Dual learning for ex- Conference on Artificial Intelligence, 32(1).
plainable recommendation: Towards unifying user
preference prediction and review generation. In Pro- Yikun Xian, Zuohui Fu, Shan Muthukrishnan, Gerard
ceedings of The Web Conference 2020, WWW ’20, De Melo, and Yongfeng Zhang. 2019. Reinforcement
page 837–847, New York, NY, USA. Association for knowledge graph reasoning for explainable recom-
Computing Machinery. mendation. In Proceedings of the 42nd international
ACM SIGIR conference on research and development
Nava Tintarev and Judith Masthoff. 2015. Explaining in information retrieval, pages 285–294.
recommendations: Design and evaluation. In Recom-
mender systems handbook, pages 353–382. Springer. Lijie Xie, Zhaoming Hu, Xingjuan Cai, Wensheng
Zhang, and Jinjun Chen. 2021. Explainable rec-
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob ommendation based on knowledge graph and multi-
Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz objective optimization. Complex & Intelligent Sys-
Kaiser, and Illia Polosukhin. 2017. Attention is all tems, 7(3):1241–1252.
you need. In Advances in Neural Information Pro-
cessing Systems, volume 30. Aobo Yang, Nan Wang, Hongbo Deng, and Hongning
Wang. 2021. Explanation as a defense of recommen-
Petar Veličković, Guillem Cucurull, Arantxa Casanova, dation. In Proceedings of the 14th ACM International
Adriana Romero, Pietro Liò, and Yoshua Bengio. Conference on Web Search and Data Mining, pages
2018. Graph attention networks. In Proceedings of 1029–1037.
the International Conference on Learning Represen-
tations. Wenhao Yu, Chenguang Zhu, Zaitang Li, Zhiting Hu,
Qingyun Wang, Heng Ji, and Meng Jiang. 2022. A
Nan Wang, Hongning Wang, Yiling Jia, and Yue Yin. survey of knowledge-enhanced text generation. ACM
2018a. Explainable recommendation via multi-task Computing Surveys, 54(11s):1–38.
learning in opinionated text data. In The 41st In-
ternational ACM SIGIR Conference on Research & Yongfeng Zhang, Xu Chen, et al. 2020. Explainable
Development in Information Retrieval, pages 165– recommendation: A survey and new perspectives.
174. Foundations and Trends® in Information Retrieval,
14(1):1–101.
Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, and
Tat-Seng Chua. 2019. KGAT: Knowledge graph at- Yaxin Zhu, Yikun Xian, Zuohui Fu, Gerard de Melo,
tention network for recommendation. In Proceedings and Yongfeng Zhang. 2021. Faithfully explainable
of the 25th ACM SIGKDD international conference recommendation via neural logic reasoning. In Pro-
on knowledge discovery & data mining, pages 950– ceedings of the 2021 Conference of the North Amer-
958. ican Chapter of the Association for Computational
Linguistics: Human Language Technologies, pages
Xiang Wang, Tinglin Huang, Dingxian Wang, Yancheng 3083–3090, Online. Association for Computational
Yuan, Zhenguang Liu, Xiangnan He, and Tat-Seng Linguistics.
Chua. 2021. Learning intents behind interactions
with knowledge graph for recommendation. In Pro- A Dataset Details
ceedings of the Web Conference 2021, pages 878–
887. A.1 Source Data
Xiting Wang, Yiru Chen, Jie Yang, Le Wu, Zhengtao Amazon product data: The Amazon product
Wu, and Xing Xie. 2018b. A reinforcement learn- dataset is a large-scale widely used dataset for prod-
ing framework for explainable recommendation. In uct recommendation containing product reviews
2018 IEEE International Conference on Data Mining, and metadata from Amazon. Data fields include
pages 587–596. IEEE.
ratings, texts, descriptions, and category informa-
Ze Wang, Guangyan Lin, Huobin Tan, Qinghong tion (He and McAuley, 2016). Because the dataset
Chen, and Xiyang Liu. 2020. CKAN: Collaborative contains item descriptions, we can leverage such
knowledge-aware attentive network for recommender data to extract entities and relations to construct
systems. In Proceedings of the 43rd International
ACM SIGIR conference on Research and Develop- a KG that matches the textual description. Thus,
ment in Information Retrieval, pages 219–228. these descriptions provide objective, item-distinct
explanations as to why a user may have purchased a
Max Welling and Thomas N Kipf. 2016. Semi-
supervised classification with graph convolutional product. Although a user may not have reviewed an
networks. In Proceedings of the International Con- item, the dataset provides an existing description of
ference on Learning Representations. the item, allowing models to produce explanations
Zhen Wu, Xin-Yu Dai, Cunyan Yin, Shujian Huang,
for such items. To keep our datasets large-scale, we
and Jiajun Chen. 2018. Improving review represen- focus on Amazon Book and Amazon Movie 5-core,
tations with user attention and product attention for the two largest Amazon product datasets.
A.2 Dataset Comparison NRT (Li et al., 2017) is a multi-task model for
rating prediction and tip generation, based on user
Table 6 summarizes existing popular rec-
and item IDs. As in previous work, we use our
ommendation system datasets utilized for
explanations as tips and remove the model’s L2
both the explainable recommendation and
regularizer (Li et al., 2020, 2021), which causes the
KG recommendation task. We report both
model to generate identical sentences.
traditional recommendation features, KG-
recommendation features, and explainable Transformer (Vaswani et al., 2017; Li et al.,
recommendation features. Last.FM (Wang et al., 2021) treats user and item IDs as words. We adapt
2019), Book-Crossing (Wang et al., 2020), Movie- the model first introduced for review generation
Lens20M (Wang et al., 2020), and Amazon-book by Li et al. (2021) while integrating the KG entities
(KG) (Wang et al., 2019) are popular benchmarks and relations instead of the review item features.
for the KG-recommendation task but contain PETER (Li et al., 2021) utilizes both user/item
no NL explanation features. Yelp-Restaurant, IDs and corresponding item features extracted from
Amazon Movies & TV, and TripAdvisor-Hotel user reviews to generate a recommendation score,
have been recently experimented with for the explanation, and context related to the item fea-
explainable recommendation task (Li et al., 2020), tures. The model also develops a novel PETER
but lack KG data and rely on user reviews as mask between item/user IDs and corresponding
proxies for the explanation. In contrast, our features/generated text. As our task does not take a
datasets, referred to as Book KG-Exp and Movie feature-based approach, for a fair comparison we
KG-Exp contain both KG and the corresponding remove the context prediction module and input
parallel item descriptions associated with those the whole KG into the model as the corresponding
KGs as explanations. Compared to Book KG-Exp, item features.
the Movie KG-Exp dataset contains fewer amount PEPLER (Li et al., 2022) is an extension of
of unique KG elements, with 59,036 to 195,110 PETER, where the transformer is replaced with a
and 745,699 to 146,772 unique entities and KG, pre-train language model, namely GPT-2 to gener-
while having similarly sized explanations. ate both recommendation scores and explanations.
We take the best-performing setting for a fair com-
A.3 Dataset Statistics parison, namely using the MLP setting for recom-
mendation scores.
We provide detailed statistics on both the Book KG-
Exp and Movie KG-Exp datasets in Figure 2. As In addition to NRT, PETER, and PEPLER, as
seen in Figures 2(a) and 2(b), the distributions of in previous work, we compare with two traditional
KGs with respect to the number of tuples shows baselines for recommendation: PMF (Mnih and
similar long-tail distributions in both datasets. We Salakhutdinov, 2007) and SVD++ (Koren, 2008).
observe from Figures 2(c) and 2(d) that a similar
trend of long-tail distributions exists for both with B.2 Hyper-parameters and Settings
respect to explanation lengths, where the lengths in As in (Li et al., 2021), we adapt the baseline codes
the book dataset tend to skew more right than the to our setting and set the vocabulary size for NRT,
lengths in the movie dataset. ATT2Seq, and PETER to 20,000 by keeping the
most frequent words. For PETER and PEPLER,
B Experiment Details we set the number of context words to 128. For
all approaches, including KnowRec, we set the
B.1 Baseline Models
length of explanation to 128, as the mean length
We introduce several baselines in explainable rec- is about 94 for both datasets. For KnowRec, we
ommendation, describing how to adapt the models use an embedding size of 512, using a Byte-Pair
to the KG setting, as these models have been pri- Encoding (BPE) vocabulary (Radford et al., 2019)
marily formulated for user review data. of size 50,256, with 2 encoding layers. Follow-
Att2Seq (Dong et al., 2017) was designed for ing KG generation work (Ribeiro et al., 2021), we
review generation, where we adapt it to the item split the tokens in the linearized graph with their
explanation setting. As in (Li et al., 2021), we corresponding label: [user], [graph], [head], [re-
remove the attention module, as it makes the gen- lation], and [tail]. For both datasets, we set the
erated content unreadable. batch size to 128 and max user and KG size to 64
Words/
Name #Users #Items #Interactions KG #Es #Rs #Triples Desc.
Sample
Last.FM 23,566 48,123 3,034,796 Yes 58,266 9 464,567 No -
Book-Crossing 276,271 271,379 1,048,575 Yes 25,787 18 60,787 No -
Movie-Lens20M 138,159 16,954 13,501,622 Yes 102,569 32 499,474 No -
Amazon-book (KG) 70,679 24,915 847,733 Yes 88,572 39 2,557,746 No -
Yelp-Restaurant 27,147 20,266 1,293,247 No - - - No 12.32
Amazon Movies 7,506 7,360 441,783 No - - - No 14.14
TripAdvisor-Hotel 9,765 6,280 320,023 No - - - No 13.01
Book KG-Exp 396,114 95,733 2,318,107 Yes 195,110 392 745,699 Yes 99.96
Movie KG-Exp 131,375 18,107 788,957 Yes 59,036 363 146,772 Yes 96.35
Table 6: Comparison of widely used datasets divided by task: KG-Recommendation (top), Explainable Recommen-
dation (middle), and KG Explainable Recommendation (bottom).
3.5K
Number of Unique KGs
12K
4K 10K 3.0K
15K 2.5K
3K 8K
2.0K
10K 6K
2K 1.5K
4K 1.0K
5K 1K 2K 0.5K
0 0 0 0
0 20 40 60 80 0 20 40 60 80 0 200 400 0 200 400
Number of Tuples Number of Tuples Number of Tokens Number of Tokens
(a) Book KGs (b) Movie KGs (c) Book Explanations (d) Movie Explanations
Figure 2: Distributions for number of tuples (Figures 2(a) and 2(b)) and tokens (Figures 2(c) and 2(d)) per sample.
and 192, respectively. We set the max node and Effect of r on KG-Exp (Few-Shot)
edge length to 60. We experiment with λr and λe Movie
7.5 Book
and find that 0.01 and 1 give us the best BLEU per-
7.0
formance without affecting the recommendation
BLEU-4 Score (Avg)
C Generated Examples
Table 7 presents some examples generated by
KnowRec from the Book and Movie KG-EXP
datasets. As discussed in Section 7, we find the
examples to be fluent and grammatical, while in-
corporating both item features and implicit user
information based on a user’s purchase history.
The generated examples closely match the ground
truth, while integrating some language derived
from the user. Note, that our aim here is to il-
lustrate examples that showcase the implicit user
preferences, instead of showing those generated
outputs which most closely match the ground truth
descriptions. As with other state-of-the-art NLG
models, KnowRec does have a tendency to hallu-
cinate by adding extra information that may not
be necessarily accurate. As can be by the NLG
metrics in Table 2, KnowRec relieves the halluci-
nation problem by incorporating the user-item KG
information. Such limitations may be additionally
improved by leveraging more dense background
KGs to generate from, while also incorporating
user purchase history item features.
Item Graph Representation Generated Explanation Ground Truth Explanation
a scientist (jules verne) investigates jules verne’s professor lindenbrook
Jules Verne
Stitch in Crime
a magnetic storm that sends a leads a trip through monsters, mush-
writer
mysterious beam of light from earth rooms and a magnetic storm.
to the center of earth.
Journey to the
Columbo
Center of the Earth
disease
The Lord of the
Rings, Trilogy
magnetic storm
disease
Walt Disney
Treasures
magnetic storm
Murder in St .
70 mystery and historical novels.
Silent Circle
Giles
person
newspaper
pseudonym
NY Times
comicscreator
comicscharacter on batman for dc comics. he is consensus. he has worked on such
the author of numerous books for series as batman adventures, bat-
Black Canary and
Batgirl Vol. 1, young readers, including supergirl, girl and kinetic and supergirl for dc
Zatanna, Batgirl
Silent Knight
Bloodspell
your favorite dr. pol vet and his from sick goats to sick pet pigs, dr.
Jurrasic World
animal pet dog return for a second season pol and his colleagues have their
The Incredible Dr.
Pol - Season 2
pet of this hilarious and heartwarming hands full with a variety of cases
Best of the
Incredible Dr. Pol
animated adventure. and several animal emergencies.
wide range
Table 7: Examples generated by KnowRec on the Book/Movie KG-Exp datasets. In the first column, we follow the
format of user-item KG representation in Figure 1, where red nodes represent a user’s purchase history and blue
nodes represent an item KG. For clarity and brevity, we only show the relevant parts of the item graphs. In the
second column, the bold words are the item features directly coming from the item KG representation, whereas the
underlined words are the features implicitly captured by KnowRec, based on the user’s purchase history.