Keywords

1 Introduction

As the rapid development of cloud computing and artificial intelligence technologies, intelligent question answering (QA) systems based on knowledge graph (KG) [1] are widely applied in production such as customer services. QA systems establish domain knowledge base by extracting semantics including entities and assertions from the various data, convert the natural language of customers’ questions into entities and assertions described in the knowledge graph, and then intelligently answer questions with the optimized algorithm of querying head entities. The intelligent QA systems reduce the effort of manual operations, decrease the fault probability of customer services, provide standard answers to frequent questions, and then guarantee the quality of services. Therefore, a well-designed KG can automatically and intelligently assist customers with low cost.

However, the frequent update of services and various technology stacks have raised great challenges for the construction, maintenance and update of KGs. Current methods often employ lexical and grammatical segmentation, association sequence mining and problem classification templates to construct KGs. However, they are only applicable for determinate training samples in a limited sample space, but cannot deal with the dynamic update of knowledge graphs. Furthermore, existing QA systems often introduce various extra text sources such as web page and documents to expand the knowledge base of entities and assertions. However, an open knowledge base is prone to cause search errors and inaccurate answers because of a large amount of collected data. Thus, we adopt various optimization technologies to construct an enterprise-level KG for customer services, which employ semantic enhancement to dynamically update the KG.

2 Methodology

Figure 1 shows the overall framework of our method including entity recognition, heuristic query and dynamic update. In this section, we will describe them in detail as follows.

Fig. 1.
figure 1

Method overall framework

2.1 Entity Recognition by Combining LSTM and CRF

To overcome the fuzziness of question description in the QA system, we analyze texts to extract information such as key words and word orders to improve traditional methods of searching header entities and assertions (e.g., semantic analysis, manual labeling). The LSTM model does not consider the constraint between output labels, so we employ a model combining the LSTM and CRF. This model takes the word sequences of a sentence as input, LSTM learns the order information, feeds the probability vector to the CRF layer, and then CRF output predicts best word sequences. We use the formal semantics to describe the LSTM model, where input is a LSTM model M and a question sqi; output is a predicate pl and an entity eh; the enhanced LSTM model is described as M ilstm  = <pl, eh>.

The output of the trained LSTM model ought to be close to the real value, where the assertion represents users’ intention, and the entity represents the domain. To deal with various questions in domain, LSTM mainly uses a dual-phase connected circular network layer (RNN-Layer) and an attention layer (A-Layer) as follows.

Firstly, we segment words, take the question whose length is L as an input, and maps elements into input word vectors \( \left\{ {x_{j} } \right\} ( {\text{j = 1,}} \ldots , {\text{L)}} \). Then, we use the bi-direction LSTM to learn the forward and backward hidden state sequences as follows.

$$ f_{j} = \sigma \left( {W_{xf} x_{j} + W_{hf} h_{0:j + 1} + b_{f} } \right) $$
$$ i_{j} = \sigma \left( {W_{xi} x_{j} + W_{hi} h_{j + 1:0} + b_{i} } \right) $$
$$ o_{j} = \sigma \left( {W_{xo} x_{j} + W_{ho} h_{j + 1:0} + b_{o} } \right) $$
$$ c_{j} = f_{j} c_{j + 1} + i_{j} \tan h(W_{xc} x_{j} + W_{hc} h_{j + 1:0} + b_{c} ) $$
$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{h}_{j} {\text{ = o}}_{j} \tan h(c_{j} ) $$

where fj, ij, oj represent the activation vectors of a forgetting gate, an input gate and an output gate, respectively; \( c_{j} \) is the unit state vector; \( \sigma \) is a sigmoid function; \( tanh \) is a double tangent cosine function; \( o \) represents Hadamard product.

The model connects forward and backward vectors to obtain \( h_{j} = \left[ {h_{0:j} ;h_{j:0} } \right] \), and sets the weight connection layer parameters. The attention weight of the \( j^{th} \) word in the word vector \( \left\{ {x_{j} } \right\} ( {\text{j = 1,}} \ldots , {\text{L)}} \) is expressed as \( \alpha_{j} \) calculated as follows.

$$ \alpha_{j} = \frac{{exp(q_{j} )}}{{\mathop \sum \nolimits_{i = 1}^{L} exp(q_{i} )}} $$
$$ q_{j} = tanh(W^{T} \left[ {x_{j} ;h_{j} } \right] + b_{q} ) $$

Finally, the model forms a hidden state \( s_{j} = \left[ {x_{j} ;\alpha_{j} h_{j} } \right] \) with an attention weight \( \alpha_{j} \), a state sequence \( h_{j} \) and a specific word \( x_{j} \). Then, we calculate this hidden state \( s_{j} \) to obtain the output \( r_{j} \in R^{d \times 1} \), and the entity/assertion is calculated by the mean value as follows.

$$ \frac{{\hat{p}_{\ell } }}{{\hat{e}_{h} }} = \frac{1}{L}\mathop \sum \limits_{j = 1}^{L} r_{j}^{T} $$

where the weight vector and the bias value are set based on the dataset of training questions and corresponding manually labeled answers, and then two target vectors are calculated with the LSTM model.

2.2 Topic Model Based Semantic Enhancement

To deal with different requirements of various users (e.g., enterprise, family, employee), we select a knowledge base with the greatest tendency, obtain the enhanced texts related to the assertion/entity by comparing the similarity among topics, identify the entity assertion with the LSTM model to construct the global KG. The enhancement method uses external knowledge to online update the knowledge base of KG, thus supporting KG’s long-term stable operation and maintenance. The tendency analysis is mainly based on the maximum likelihood estimation and the least square loss estimation on the subject model as follows.

Firstly, as a typical probabilistic method for text analysis, Topic Model (TM) is applicable for two text types. One is graph data used for training KG, and TM adopts the dataset with questions and their corresponding answers to construct the KG. The other is query results, and TM takes the questions processed by the LSTM model as query criteria. The enhanced text is the best matched query result.

We describe the topic distribution with probabilistic hidden semantic analysis [2]. In the dataset of query results with N texts, each text \( d_{i} \in \left\{ {d_{1} , \ldots ,d_{N} } \right\} \) is composed of k unobserved topic variables \( z_{k} \in \left\{ {z_{1} , \ldots ,z_{K} } \right\} \), and each topic variable has different words \( w_{j} \in \left\{ {w_{1} , \ldots ,w_{M} } \right\} \). The joint probability distribution of documents and words \( \left( {d,w} \right) \) is as follows.

$$ P\left( {d_{i} ,w_{j} } \right) = P\left( {d_{i} } \right)\mathop \sum \limits_{k = 1}^{K} P\left( {w_{j} |z_{k} } \right)P\left( {z_{k} |d_{i} } \right) $$

where \( P\left( {w_{j} |z_{k} } \right) \) represents the probability of a word \( w_{j} \) appearing in a topic \( z_{k} \), \( P\left( {z_{k} |d_{i} } \right) \) represents the probability of a topic \( z_{k} \) appearing in a document \( d_{i} \).

The distribution parameters of the hidden topic model can be calculated by the maximum likelihood estimation of the document set as follows.

$$ L\left( D \right) = \mathop \sum \limits_{i = 1}^{{N\sum \mathop \sum \limits_{k = 1}^{K} P\left( {w_{j} |z_{k} } \right)P\left( {z_{k} |d_{i} } \right)}} \mathop \sum \limits_{j = 1}^{M\sum } n\left( {d_{i} ,w_{j} } \right)lg $$

We use a classical EM algorithm [3] to estimate \( L\left( D \right) \). However, a PLSA algorithm cannot constraint a document set to some similar topics, and the scale of \( P\left( {z_{k} |d_{i} } \right) \) linearly grows with the continuous expansion of the knowledge base. TM based methods such as LDA also cannot well describe semantic relationships between documents. Thus, to effectively compare the similarity, we propose a topic similarity calculation method based on PLSA. If an entity in a KG belongs to the topic of a question and its answer, other tail entities connected to it also have a high probability of belonging to the same topic. So, we use the least square loss between a KG and the query result to express the similarity between the entity and the topic as follows.

$$ R_{v} \left( {D_{p} } \right) = \mathop \sum \limits_{i = 1}^{{\left| {D_{p} } \right|}} \mathop \sum \limits_{k = 1}^{K} (P(z_{k} |d_{i} ) - \mathop \sum \limits_{{e_{h} \in V}} \mathop \sum \limits_{{e_{t} \in V}} P\left( {z_{k} |e_{h} } \right)w\left( {e_{h} |e_{t} } \right))^{2} $$

where \( D_{p} \subset D \) represents the degree that the parts of the query result set match the KG; \( w\left( {e_{h} |e_{t} } \right) \) represents the weight of the pair of connected entities in the KG, which is calculated as follows.

$$ w\left( {e_{h} |e_{t} } \right){ = } - \lg \left( {P\left( {W_{{predicate\left( {e_{h} ,e_{t} } \right)}} } \right)} \right) $$

where \( P\left( {W_{{predicate\left( {e_{h} ,e_{t} } \right)}} } \right) \) represents the probability that two entities are connected with a semantic relationship, and these entities are connected with different paths.

Finally, we can select the tendency result set with maximum likelihood estimation to construct the local KG.

$$ L_{rp}^{{\prime }} = - \left( {1 - \lambda } \right)L\left( {D_{p} } \right) + \lambda R_{v} \left( {D_{p} } \right) $$

where \( \lambda \) is a bias parameter to balance the topic model and the least squares loss. If \( \lambda = 0 \), the minimization of \( L_{rp}^{'} \) is equivalent to the result set corresponding to the topic with the greatest possibility. If \( \lambda = 1 \), that is equivalent to the result set that is the closest topic distribution of the entity/assertion in the KG. Thus, we can use the semantic knowledge of external texts and KGs by setting a suitable parameter value.

2.3 Heuristic Querying Rules

To effectively query the global KG by avoiding the non-optimal answers caused by the inefficient graph search algorithm, we employ heuristic query rules to rank candidate set, and select the optimal results from the candidate set as the answers. We consider the question and the global KG to design heuristic query rules as follows.

Historical Candidate Answer Count.

By counting historical questions and their answers, we find that questions show relatively aggregated characteristics. The number that a KG query result is selected as a candidate in history, and the richness of the text are two important ranking indexes.

Text Similarity.

Answering questions often involves three types of texts, which are the question \( Q_{i} \), the query results \( A_{j} \) of enhanced semantics, and candidate answers \( C_{k} \). The optimal answer is often similar to the question and the enhanced semantics. By mapping each word in the texts into the word vector \( \left\{ {x_{j} } \right\} \), we calculate the cosine similarity of the candidate answer and the other two types of texts, respectively. Thus, we take the sum of these similarities as a ranking metric.

2.4 Dynamic Update of KG

To guarantee the answer’s precision with a lower KG’s update cost, the periodic update layer merges external knowledge with the results of integer linear programming (ILP) to support the scale expansion and quality improvement of the global KG. The ILP model quantifies and normalizes users’ satisfaction and response time to decide whether the corresponding KG for each question is updated. Thus, the ILP model balances the KG cost and service quality as follows.

We maximize the following objective function:

$$ \frac{1}{{\left| {KG_{L} } \right|}}\mathop \sum \limits_{i = 1}^{{\left| {KG_{L} } \right|}} uD_{i} \times uS_{i} - N\lg \frac{{t_{lstm} + t_{augment} + t_{query} }}{M}, $$

where \( KG_{L} \) represents the collection of local KGs of each question in a period; \( uD_{i} \) represents the update result that is a symbolic function. If the KG is updated, it is set to 1, and otherwise it is set to 0; “\( uS_{i} \subset \left[ {0,100} \right] \cap uS_{i} \subset Z^{ + } \)” represent scores in the training dataset; \( t_{lstm} \), \( t_{augment} \), \( t_{query} \) represent the processing times in the LSTM model, enhanced semantics and heuristic query, respectively, to measure the cost of operation and maintenance cost; M represents the time reduction factor; N represents the amplification factor adjusted in different configurations.

By setting the above optimization functions and constraints, we can make choices to meet the requirements of maximizing users’ satisfaction and minimizing maintenance costs. The actual optimization effect depends on the two scaling factors and the accumulated number of knowledge bases.

3 Evaluation

3.1 Setting

The experiments validate our semantic enhancement method on improving the precision, recall and F-measure of questioning answers, by comparing it with the traditional LSTM based methods of obtaining the head entity/assertion. We adopt a real business dataset of an e-commerce company, which is collect from January to December 2018. The dataset with six hundred assertion labels and eight thousand entity labels includes ten thousand items for training the model, three thousand items for validating the model, and five thousand items for online testing the model. We pre-process and label these data, and then constructs KG to train the model combing LSTM and CRF.

Existing works often employ the precision, recall and F-measure to evaluate methods, but they only count the number of binary problems. We extend the evaluation metrics by defining the matching degree of entities and assertions as follows.

$$ precision = \frac{TP}{TP + FP} $$
$$ recall = \frac{{T{\text{P}}}}{TP + FN} $$
$$ {\text{F}}1 = \frac{{{\text{precision}} \times {\text{recall}}}}{{{\text{precision}} + {\text{recall}}}} $$

where TP (true positive) represents the number of correctly labeled entities and assertions, FP(false positive) represents the number of falsely labeled entities and assertions. We also do a series of experiments to compare our approach with [8] in precision, recall and F1.

3.2 Results

The LSTM activation function employs the loss function PReLU using regularization to constrain the sharing weight parameters, where the penalty value is set as 10−6, the dropout rate of the attention layer is set as 0.15, the weight of the connection layer is set as 0.25, and the bias parameters is set as 0.35. We set these parameters according to our experiences in practice. We query entities and assertions, and select the latest five query results as enhanced semantics. Figure 2 shows that the effects of different bias parameter settings on the precision, recall and F1 of entity assertions on the test dataset. The experimental results show that our semantic enhancement method based on topic modeling and least squares loss can improve the performance metrics in the same dataset. Our method improves the precision by 6.41%, recall by 16.46% and F1 by 11.17%. The experimental results demonstrate that the original topic model cannot fully describe the relevant questions and answers, and the selection of external semantic information can well supplement the single topic model.

Fig. 2.
figure 2

Result comparison

4 Related Work

Since assertions in natural language have various expressions, KEQA [4] employs LSTM semantic perceptions to discover head entities and assertions, and measures the candidate answer sets with joint distance measurement to deal with ambiguous expressions. CAN [5] constructs a GRU based deep upgrade network for questions, inputs and answers to sense incomplete contextual semantic interactions in interactive systems. Although these methods with limited datasets and static models can improve the precision of querying results in QA systems, they cannot be well applied in industrial scenarios, where business logic frequently changes.

Existing works enhance contextual semantics by introducing external knowledge bases. They enhance entities and assertions to answer questions whose related knowledge is not contained in the KG. FreeBase [6] connects the retrieval results of web pages and KG to enhance semantics. Text2KB [7] takes web search, questions and answers in communities, and common texts as external knowledge bases. DB-pedia [8] proposes a topic model-based information retrieval by introducing multiple knowledge bases. Since external knowledge bases have low reliability, high cost, and unstable performance, these methods cannot achieve well query results in industrial applications.

5 Conclusion

This paper proposes a semantic enhancement based dynamic construction of domain knowledge graph for answering questions. We employ the LSTM-based attention model to overcome the fuzziness of domain questions’ expression, use topic comparison based enhanced semantic to construct the local KG, expand the knowledge of the global KG, and then adopt the ILP update strategy to support the dynamic update of KG. Compared with traditional method, the experimental results show that our approach can well improve the precision, recall and F1 of questioning answers by introducing semantic enhancement.