Text-Enhanced Question Answering Over Knowledge Graph
Text-Enhanced Question Answering Over Knowledge Graph
Text-Enhanced Question Answering Over Knowledge Graph
Ye Ji Jiajun Wu
goodboyjy@hotmail.com w15250465300@163.com
Nanjing University of Aeronautics and Astronautics Nanjing University of Aeronautics and Astronautics
Nanjing, Jiangsu, China Nanjing, Jiangsu, China
ABSTRACT 1 INTRODUCTION
Question answering over knowledge graph is an important area of Question answering over knowledge graph(KGQA) is an important
research within question answering. Existing methods mainly focus application of knowledge graph in downstream tasks. Recent years,
on the utilization of information in knowledge graphs and ignore researchers have proposed lots of methods to solve the KGQA
the abundant external information of entities. However, knowledge task. These methods can be roughly categorized into two main
graphs are usually incomplete and entities in knowledge graphs groups: semantic parsing based approaches(SP-based approaches)
are not completely described. In this paper, we propose a novel and information retrieval based approaches(IR-based approaches).
text-enhanced question answering model over knowledge graph by The goal of SP-based approaches is to construct a semantic parser
taking advantage of the rich context information in a text corpus. which converts natural language questions into intermediate logic
We believe the rich textual context information can effectively forms. Traditional semantic parsers [16] need annotated logical
alleviate the information loss in knowledge graphs and enhance the forms as supervision for model training. These approaches usually
knowledge representation capability in the answer end. To this end, struggle with the problem of coverage since the logical predicate
we apply an attention model to realize dynamic fusion of internal are not comprehensive, and they are limited to specific domains.
and external information. Besides, Transformer Encoder network is Recent efforts to overcome these limitations include Abujabal et
used to obtain the representation of input question and descriptive al. [1] and Hu et al. [10] trying to construct hand-crafted rules or
text. The experiments on the WebQuestions dataset prove that features, and Liang et al. [12] and Krishnamurthy et al.[11] trying to
compared with other state-of-the-art QA methods, our method can apply weak supervision by using question-answer pairs or distant
effectively improve the accuracy. supervision instead of full semantic annotations.
IR-based approaches firstly construct a set of candidate answers
CCS CONCEPTS from the knowledge graph, and then map input question and candi-
date answers into vector space. Finally the similarity scores between
• Information systems → Question answering.
question and candidate answers are calculated and used to get the
final answer. The most important part of IR-based approaches is
KEYWORDS the method to obtain low-dimensional vector of question and an-
Question answering; Knowledge graph; Embedding model swers. Most approaches utilize deep neural networks to learn the
representations of the question and candidate answers while the
ACM Reference Format:
construction of neural networks varies. Inspired by the process of
Jiaying Tian, Bohan Li, Ye Ji, and Jiajun Wu. 2021. Text-Enhanced Ques-
deep learning in natural language processing(NLP) tasks, Bordes
tion Answering over Knowledge Graph. In The 10th International Joint
Conference on Knowledge Graphs (IJCKG’21), December 6–8, 2021, Virtual et al. [4] adopt a simplified way of bag-of-words model to obtain
Event, Thailand. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/ the question representation and use the subgraph of candidate
3502223.3502241 answer to help represent candidate answer. Dong et al. [8] intro-
duce the multi-column convolutional neural networks(MCCNNs)
to enhance the information acquisition ability in the question end.
∗ Corresponding author Hao et al. [9] apply Bi-LSTM networks to represent question and
the information of candidate answer in knowledge graph is used
as an aid. Other approaches try to utilize information outside the
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed knowledge graph, for example, Xu et al. [18] [19] use the Wikipedia
for profit or commercial advantage and that copies bear this notice and the full citation free text as the external knowledge in the question end. However,
on the first page. Copyrights for components of this work owned by others than ACM knowledge graphs are usually incomplete and context information
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a in knowledge graph is often cluttered. We believe that the context
fee. Request permissions from permissions@acm.org. information of candidate answer can be enhanced by the rational
IJCKG’21, December 6–8, 2021, Virtual Event, Thailand utilization of external text information.
© 2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-9565-6/21/12. . . $15.00 In the field of knowledge graph representation, entity descrip-
https://doi.org/10.1145/3502223.3502241 tion text effectively improves the performance of knowledge graph
135
IJCKG’21, December 6–8, 2021, Virtual Event, Thailand Jiaying Tian, Bohan Li, Ye Ji, and Jiajun Wu
embedding models. Entity description text contains rich entity get its unique entity description text. The entity description text
information. These information can be used as auxiliary informa- constructs external information through word embedding matrix.
tion of structured information with high confidence interval in The word embedding matrix is pre-trained with BERT [7]. Internal
the knowledge graph to help the model represent the knowledge information contains entity itself, entity type, entity relation and
more accurate. Hence, some researchers considered that knowl- entity context. The we apply an attention model to dynamically
edge representation learning needs to incorporate more semantic fuse internal and external information. Finally, we calculate the
information. Xie et al. [17] proposed a new representation learning similarity score for each candidate answer.
model based on entity hierarchical type (TKRL). TKRL believes that
an entity has multiple hierarchical types. In different semantic envi-
ronments, entities have multiple representations according to their
Score Function
hierarchical types. Based on the TKRL model, RAHMAN et al.[13]
proposed a knowledge graph embedding model TPRC using entity
type attributes in a relational context. Wang et al. [15] proposed a Answer
Representation
text enhanced representation learning method (TEKE) for knowl- Question
edge graph which enhances the effect of knowledge embedding. Representation
TEKE mainly refers to the text description information of entities Attention Model
2 RELATED WORK
In general, we follow the way of IR-based approaches. To be more 3.1 Question Representation
specific, input question and candidate answers are embedded into The input question Q is expressed as a sequence of words q =
vector space, then similarity scores are calculated to get ranking. (w 1 , w 2 , . . . , w n ), where w i denotes the ith word. We first construct
In question end, Bordes et al. [4] adopt a simplified way of bag-of- a word embedding matrix Ew ∈ Rd ×|w | to get the word embeddings.
words model, Dong et al. [8] introduce the multi-column convolu- The word embedding matrix Ew is randomly initialized and the
tional neural networks(MCCNNs), and Hao et al. [9] apply Bi-LSTM parameters are updated in the training process. Here, d means the
networks. We construct Transformer Encoder network [14] to ob- dimension of word embeddings, and |w | is the vocabulary size.
tain question representation. In answer end, Bordes et al. [4] firstly Then these word embeddings are fed into a six-layer Transformer
make use of the subgraph of the candidate answer. Dong et al. [8] network to obtain the question representation q. The dimension of
put more attention on the answer path, answer context and answer q is also set as d.
type. Different from the methods above, Hao et al. [9] use different
aspects of the candidate answer to help represent the input ques- 3.2 Candidate answer representation
tion and use the input question to adjust the weight of different
Internal information representation We employ TransE [6] as
answer aspects. We draw on the work in the field of knowledge
the knowledge graph embedding model. For each candidate answer,
graph representation and use the text description information of
We use three answer aspects to describe the information of can-
entities to enhance the information of candidate answers.
didate answer in the knowledge graph. Answer entity ae denotes
the embedding vector of candidate answer, answer relation ar de-
3 METHODOLOGY notes the average of embeddings of relations that appear on the
The whole architecture of our approach is shown as figure 1. Given answer path, answer context ac denotes the average of embeddings
an input question, we first retrieve its topic entity through Free- of entities and predicates that directly connect to the answer entity.
base API [3] and collect all the 2-hop nodes of the topic entity in External information representation Given a knowledge graph
FB2M as candidate answers. Then the input question is fed into a G and textual corpus T = {w 1 , w 2 , · · · , w n }, we first conduct en-
stacked Transformer neural network to get its vector representa- tity annotation with entity linking tools to label the entities in
tion. For each candidate answer, we use co-occurrence network to knowledge graph and get an entity-annotated text corpus D =
136
Text-Enhanced Question Answering over Knowledge Graph IJCKG’21, December 6–8, 2021, Virtual Event, Thailand
{x 1 , x 2 , · · · , x m }. Here m ≤ n because multiple adjacent words examples. We apply pairwise training in the training process and
could be labeled as one entity. Then we construct a co-occurrence the loss function is defined as follows:
network to bridge the candidate answer entity a and the entity-
L q, a, a ′ = m − S(q, a) + S q, a ′ +
(7)
annotated text corpus D.
For each candidate a, we use co-occurrence frequency to find its Here, S(·) denotes the function used to calculate similarity score, m
external neighboring nodes and these nodes constitute the external denotes the margin parameter. [z]+ denotes the maximum value
textual context of candidate answer entity. The weighted average of between 0 and z. The whole objective function is defined as follows:
the vectors of these external neighboring nodes is used as external Õ 1 Õ Õ
information representation of candidate answer entity a. To be L q, a, a ′
min (8)
more specific, yi denotes the co-occurrence frequency between q Rq a ∈R a ′ ∈W
q q
candidate answer entity a and x i ∈ D = {x 1 , x 2 , · · · , x m }. The
In the training process, we apply stochastic gradient descent (SGD)
external textual context of candidate answer entity a is defined as:
based on minibatch as optimizer.
n (a) = {x i | yi > θ } (1)
Here, the co-occurrence window is set as 5 and θ is the threshold 3.5 Inference
used to filter out low frequency nodes. We use pre-trained word In the testing process, we calculate scores for all the answers a ∈ Cq
embedding matrix to obtain the embedding vectors of these external with the scoring function S(q, a), since there may be more than one
neighboring nodes. Then the external information representation correct answer for many questions, it is inappropriate to choose
of candidate answer entity ao can be calculated by: the answer with the highest score as the final answer. The margin
1 Õ parameter m is used to help solve this problem. We first find out the
ao = Í yi x i (2) highest score of all the answers, the highest score S max is defined
x i ∈n(a) yi x i ∈n(a) as:
S max = max (S(q, a)) (9)
3.3 Attention model a ∈Cq
In order to comprehensively fuse internal information in the knowl- Then the candidate answers whose scores are close to the highest
edge graph and external information in text corpus, we design an score S max are select as the final answers. To be more specific, we
attention model to dynamically aggregate vectors. For each candi- use the following formulas to determine the final outputs and the
date answer a, the extent of attention should be measured by the final outputs consist of the final answer set.
relatedness between representation of input question q and differ-
Aq = a | a ∈ Cq and S max − S(q, a) < m
ent answer aspect embeddings ae , ar , ac , ao . We use the following (10)
formula to calculate the weights:
exp (ηi )
4 EXPERIMENTS
αi = Í (3) We conduct our experiments on the WebQuestions dataset, the We-
j ∈ {e,r,c,o } exp η j
bQuestions dataset consists of 5810 question-answer pairs, in which
ηi = f W T [q; ai ] + b (4) 3778 question-answer pairs are for training and 2032 question-
answer pairs are for testing. As for knowledge graph, we employ
Here, [m; n] denotes the connection of vector m and vector n, W T ∈ FB2M which is a subset of the huge knowledge graph Freebase that
R2d ×d is an intermediate matrix which is randomly initialized. b is contains 647,657 entities, 4,641 relations and 1,604,874 triples. For
the offset and is also randomly initialized The intermediate matrix WebQuestions, we further split the training part into a training set
and offset are updated in the training process. We use q to define and a validate set with the percentage of 80% and 20%. The text
the vector representation of input question. f (·) is a non-linear corpus is generated from Wikipedia. Macro F1 score is used as the
activation function and j ∈ {e, r , c, o} denotes different answer evaluation metrics and is calculated through the official evaluation
aspects. The similarity score of the question q and candidate answer script provided by Berant et al [2]. To evaluate our method, we use
a can be calculated as follows: IR-based approaches proposed in recent years as baseline.
S(q, a) = h (q, a) (5)
Õ 4.1 Settings
a= α i ai (6) In the training process, we perform word embedding model and
i ∈ {e,r,c,o }
knowledge graph embedding model at first and the training process
Here, h(·) is the inner product between the question vector q and of question answering will directly use the embedding vectors out-
candidate answer vector a. put by them. We employ BERT [7] to obtain the word embedding
matrix of text corpus and TransE is used to get the embedding
3.4 Model Training vectors of entities and relations in the knowledge graph. The em-
For every question q, we first construct its correct answer set Rq bedding dimension d is set to 250. The embedding dimension of
and wrong answer set Wq according to its candidate answer set input question is also set to 250. Besides, we set the minibatch size
Wq . Then, for every correct answer in Rq , we construct negative to 50 and the learning rate is set to 0.01. The hyper-parameter k
examples by randomly selecting k wrong answers in Wq . Here, and the margin parameter m are set to 1000 and 0.7. Finally, we
k is a hyper-parameter used to control the number of negative employ hyperbolic tangent as the activation function f (·).
137
IJCKG’21, December 6–8, 2021, Virtual Event, Thailand Jiaying Tian, Bohan Li, Ye Ji, and Jiajun Wu
Table 1: Results on the WebQuestions dataset. From the results in table 2, we draw observations as follows. First,
when internal or external information is used alone, using internal
Methods Macro F1 information can achieve higher F1 score. This means the informa-
tion in the knowledge graph is richer and more conducive to the
Bordes et al., 2014 39.2
representation of candidate answer. Then, the external description
Dong et al., 2015 40.8
information screened by co-occurrence network can effectively
Bordes et al., 2015 42.2
supplement the context information of candidate answers. The
Hao et al., 2017 42.9
comprehensive utilization of internal and external information can
Our approach 43.7 further improve the performance of the model and achieve the
highest score.
Table 2: Results of the model under different combinations
of components. 5 CONCLUSION
In this work, we introduce a novel method that utilizes text infor-
Methods Macro F1 mation outside the knowledge graph to enhance the representation
ability of candidate answers for the task of KGQA. Firstly, we use
Transformer 39.8 answer entity, answer relation and answer context to help represent
Transformer + external information 41.8 answer information in the knowledge graph. Then, we employ co-
Transformer + internal information 42.5 occurrence network to find external neighboring nodes and these
Transformer + internal and external information 43.7 nodes constitute the external textual context of candidate answer
entity. Finally, we design an attention model to dynamically fuse
internal and external information. Besides, we apply Transformer
4.2 Performance Comparison Encoder network to conduct question representation learning in
As is shown in Table 1, our method achieves higher F1 score on order to further capture word-to-word dependencies. The experi-
the WebQuestions dataset. On one hand, in terms of question rep- ments on the WebQuestions dataset proves the effectiveness of our
resentation learning, Bordes et al., 2014 [4] apply a bag-of-words method.
model. Dong et al., 2015 [8] use multi-column convolutional neural
network. Bordes et al., 2015 [5] apply memory network and Hao et REFERENCES
al., 2017 [9] introduce Bi-LSTM network. The Transformer Encoder [1] Abdalghani Abujabal, Mohamed Yahya, Mirek Riedewald, and Gerhard Weikum.
2017. Automated Template Generation for Question Answering over Knowledge
network we use can further capture the word-to-word dependen- Graphs. In the 26th International Conference.
cies within the input question and extract information of the input [2] Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Seman-
question more efficiently. On the other hand, in the answer end, tic parsing on freebase from question-answer pairs. In Proceedings of the 2013
conference on empirical methods in natural language processing. 1533–1544.
Bordes et al., 2014 [4] use the subgraph of candidate answer to help [3] K. Bollacker. 2008. Freebase : A collaboratively created graph database for
represent candidate answer. Dong et al., [8] use answer path, an- structuring human knowledge. Proc. SIGMOD’ 08 (2008).
swer type and answer context to represent candidate answer. Hao [4] Antoine Bordes, Sumit Chopra, and Jason Weston. 2014. Question answering
with subgraph embeddings. arXiv preprint arXiv:1406.3676 (2014).
et al., 2017 [9] consider the impact of the question on the candidate [5] Antoine Bordes, Nicolas Usunier, Sumit Chopra, and Jason Weston. 2015. Large-
answer aspects and use attention model to aggregate information scale simple question answering with memory networks. arXiv preprint
arXiv:1506.02075 (2015).
in knowledge graph. We further utilize text information outside the [6] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Ok-
knowledge graph and design an attention model to dynamically sana Yakhnenko. 2013. Translating embeddings for modeling multi-relational
aggregate internal and external information. The answer aspects data. In Neural Information Processing Systems (NIPS). 1–9.
[7] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova. 2018. BERT: Pre-training of
in the knowledge graph can effectively provide key information of Deep Bidirectional Transformers for Language Understanding. (2018).
candidate answer like the relation between question and candidate [8] Li Dong, Furu Wei, Ming Zhou, and Ke Xu. 2015. Question Answering over
answer while the description textual context outside the knowledge Freebase with Multi-Column Convolutional Neural Networks. In Proceedings of
the 53rd Annual Meeting of the Association for Computational Linguistics and the
graph can effectively supplement the context information of the 7th International Joint Conference on Natural Language Processing (Volume 1: Long
candidate answer. At the same time, specially designed attention Papers).
[9] Yanchao Hao, Yuanzhe Zhang, Kang Liu, Shizhu He, and Jun Zhao. 2017. An
model can reasonably fuse these information. End-to-End Model for Question Answering over Knowledge Base with Cross-
Attention Combining Global Knowledge. In Proceedings of the 55th Annual Meeting
4.3 Ablation Study of the Association for Computational Linguistics (Volume 1: Long Papers).
[10] Sen Hu, Lei Zou, Jeffrey Xu Yu, Haixun Wang, and Dongyan Zhao. 2017. An-
We also conduct comprehensive experiments to evaluate the com- swering Natural Language Questions by Subgraph Matching over Knowledge
ponents of our model. Table 2 shows the performance of the model Graphs. IEEE Transactions on Knowledge and Data Engineering (2017).
[11] Jayant Krishnamurthy and Tom M. Mitchell. 2012. Weakly Supervised Training of
under different combinations of components. Transformer means Semantic Parsers. In Joint Conference on Empirical Methods in Natural Language
we only apply Transformer Encoder network to get the represen- Processing & Computational Natural Language Learning.
[12] Percy Liang, Michael I. Jordan, and Dan Klein. 2013. Learning dependency-based
tation of question and the multiple aspect information of answer compositional semantics. Computational Linguistics 39, 2 (2013), 389–446.
is not used. External information means the text corpus is used to [13] Md Mostafizur Rahman and Atsuhiro Takasu. 2020. Leveraging Entity-Type
get the context information outside the knowledge graph. Internal Properties in the Relational Context for Knowledge Graph Embedding. IEICE
Transactions on Information and Systems E103.D, 5 (2020), 958–968.
information means the text corpus is abandoned while answer as- [14] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
pects in the knowledge graph are utilized. Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All
138
Text-Enhanced Question Answering over Knowledge Graph IJCKG’21, December 6–8, 2021, Virtual Event, Thailand
You Need. arXiv (2017). [18] Kun Xu, Yansong Feng, Songfang Huang, and Dongyan Zhao. 2016. Hybrid
[15] Zhigang Wang, Juanzi Li, Zhiyuan Liu, and Jie Tang. 2016. Text-enhanced question answering over knowledge base and free text. In Proceedings of COLING
representation learning for knowledge graph. In Proceedings of International Joint 2016, the 26th International Conference on Computational Linguistics: Technical
Conference on Artificial Intelligent (IJCAI). 4–17. Papers. 2397–2407.
[16] Yuk Wah Wong and Raymond J. Mooney. 2007. Learning Synchronous Grammars [19] Kun Xu, Siva Reddy, Yansong Feng, Songfang Huang, and Dongyan Zhao. 2016.
for Semantic Parsing with Lambda Calculus. In Acl, Meeting of the Association Question answering on freebase via relation extraction and textual evidence.
for Computational Linguistics, June, Prague, Czech Republic. arXiv preprint arXiv:1603.00957 (2016).
[17] Ruobing Xie, Zhiyuan Liu, Maosong Sun, et al. 2016. Representation Learning of
Knowledge Graphs with Hierarchical Types.. In IJCAI. 2965–2971.
139