Besides mainstream models for different KG Construction sub-tasks, there are also many other innovative or practical attempts to work on different scenarios. In this part, we present more advances for enlightening readers to design novel construction solutions.
D.1 Rule-based Methods for Knowledge Acquisition
Many early attempts focus on rules that achieve knowledge acquisition or its sub-tasks. Despite inaccuracy in big data environments, rule-based methods are practical solutions to quickly extract massive raw knowledge. These methods also work in scenarios where high-performance computing is not available.
Rule-based approaches [
291] are the general solutions for NER. As for semi-structured web data, Wrapper inductions generate rule wrappers to interpret semi-structures such as DOM tree nodes and tags for harvesting entities from pages. Some rule-based solutions are unsupervised, which require no human annotations, such as Omini [
292]. As for entities in table forms, many approaches are proposed based on property-attribute layouts of Wikipedia, such as rule-based tools [
40][
254] for DBpedia, and YAGO. For unstructured data, classic NER systems [
293] also rely on manually-constructed rule sets for pattern matching. Semi-supervised approaches are developed to improve rule-based NER by iteratively generating refined new patterns via pattern seeds and scoring, such as Bootstrapping-based NER [
294].
Methods focusing rules are the earliest attempts for RE tasks on different data structure kinds, gathering strings that fit in hand-craft templates, e.g., “$PEOPLE is born in $LOCATION.” refers to ($PEOPLE, born-in, $LOCATION). However, these unsupervised strategies rely on complex linguist knowledge to label data. Later, researchers concentrate on automatical pattern discovery for triples mining. Semi-supervision design is an enlightening strategy to reduce hand-craft features and data labeling that uncovers more reliable patterns based on a small group of annotated samples, such as DIPRE [
295] iteratively extracting patterns with seeds, bootstrapping-based KnowItAll [
12] and Snowball [
296] equipping DIPRE with confidence evaluation. Some rule-based models consider more lexical objects for mining. OLLIE [
264] incorporates lexical structure patterns with relational dependency paths in texts. MetaPAD [
297] combines lexical segmentation and synonymous clustering to meta patterns that are sufficiently informative, frequent, and accurate for relational triples. Specifically for semi-structured tables, researchers design table structure-based rules to acquire relationships arranged in rows, columns, and table headers, such as [
298]. Furthermore, Some semi-structured extraction systems utilizing distant supervision tolerate potential errors, which directly query external databases like DBpedia and Wikipedia to acquire relationships for the found entities in tabular data, such as [
70], [
299], and [
300]. Similarly, Muñoz et al. [
300] look up the Wikipedia tables for labeling relationships in tabular forms. Krause et al. [
301] also expand rule sets for RE via distant supervision.
Rule-based models to perform end-to-end knowledge acquisition are lightweight solutions for specific domains. However, these designs require extra work for maintenance if the domain changes.
D.2 More Embedding-based Models
Embedding-based models lay the foundation for KG completion while providing semantic support for different sub-tasks for knowledge acquisition from semi-structured or unstructured data.
More variants for
translation embedding (
TranE) models for KG completion have been developed to search entity-relation feature space via mapping matrices like TransR [
302] and TransH [
303]. Meanwhile, researchers also consider more tensor-based empirical models for embedding over a completed large graph, such as RESCAL [
304] and DistMult [
305]. Some knowledge representation models leverage non-linear neural networks to exploit deep knowledge embedding features for KG completion, such as ConvE [
306], M-DCN [
307], and TransGate [
308]. Unstructured entity descriptions are also incorporated for feature enhancement, such as the DKRL model [
309] and ConMask model [
310]. GCNs are also presented to encode a KG, such as R-GCN [
311], W-GCN [
312], and COMPGCN [
313]. GCNs can also comprehend neighborhood information through semantic diffusion mechanisms. ProjE [
314] projects an entity and a relation to distinctive feature spaces through neural operations for capturing another candidate for missing entities. However, when the relation element is missing, a latent vector space of relationship candidates cannot be retrospected. SENN [
315] bridges the disparity-distribution-space semantic gaps by multi-task embedding sharing strategy unifying relation, head entity, and tail entity link prediction.
As for ET, novel embedding-based models avail of combing global graph structure features and background knowledge for predicting potential types of entities via representations. Researchers reported that the classical TransE model acts poorly while directly applied to ET tasks. Moon et al.[
154] propose the TransE-ET model adjusting the TransE model by optimizing the Euclidean distance between entities and their types representations, limited by insufficient entities types and triples features. New solutions aim at constructing various graphs to share diversified features of entity-related objects for learning embeddings with entity-type features. PTE [
17] reduces data noise via a partial-label embedding, which constructs a bipartisan graph between entities and all their types while connecting entities nodes to their related extracted text features. Finally, PTE utilizes the background KG by building a type hierarchy tree with the derived correlation weights. JOIE [
316] embeds entity nodes in the ontology-view graph and instance graphs, gathering entity types by top-k ranking between entity and type candidates. Likewise, ConnectE [
317] maps entities onto their types and learning knowledge triples embeddings. Practical models improving embeddings on heterogeneous graphs for ET tasks (in Xlore project [
42]) also include [
318], [
319], [
320]. We present graph structures for embedding model-based ET in Figure
22.
Embedding-based models are also critical solutions for entity linking via entity embeddings. LIEGE [
321] derives distribution context representations to links entities for web pages. Early researchers [
322] leverage
Bag-of-word (
BoW) for contextual embeddings of entity mentions, then performed clustering to gather linked entity pairs. Later, Lasek et al. [
323] extend the BoW model with linguistic embeddings for EL tasks. Researchers also focus on Deep representations for high-performance linking. DSRM [
324] employs a deep neural network to exploit semantic relatedness, combining entity descriptions and relationships with types features to obtain deep entity features for linking. EDKate [
325] jointly learns low-dimensional embedding of entities and words in the KB and textual data, capturing intrinsic entity-mention features beyond the BoW model. Furthermore, Ganea and Hofmann [
18] introduce an attention mechanism for joint embedding and passed semantic interaction for disambiguation. Le and Titov [
19] model the latent relations between mentions in the context for embedding, utilizing mention-wise and relation-wise normalization to score pair-wise coherence score function.
Researchers also focus on embedding-based distribution models over multiple semantic structures to handle coreference resolution (CO). Durrett and Klein [
326] utilize antecedent representations to enable coreference inference through distribution features. Martschat and Strube [
327] explore distribution semantics over mention-pairs and tree models to enhance coreference representations, directly picking robust features to optimize the CO task. Chakrabarti et al. [
328] further employ the MapReduce framework to cover anaphoric entity names through query context similarity.
As for joint RE, novel distribution embedding-based models are proposed to model the cross-task distributions to bridge the semantic gaps between NER and RC. Ren et al. [
329] propose a knowledge-enhanced distribution CoType model for joint extraction tasks. In this model, entity pairs are firstly mapped onto their mentions in the KB, then tagged with entity types and all relation candidates provided by the KB. This model learns embeddings of relation mentions with contextualized lexical and syntax features while training embeddings of the entity mentions with their types, then the contextual relation mention will be derived by its head and tail entities embeddings via TranE [
330] model. The CoType model assumes interactive cooccurrence between entities and their relation labels, filling the distribution discrepancy with knowledge from the external domain and extra type features. Noticeably, this model also effectively prevents noises in distant-supervised datasets. However, feature engineering and extra KBs are also needed.
D.4 Other Advances
Researchers explore more strategies for flexible NER tasks. Transfer Learning shares knowledge between different domains or models. Pan et al. [
334] propose
Transfer Joint Embedding (
TJE) to jointly embed output labels and input samples from different domains for blending intrinsic entity features. Lin et al. apply [
335] a neural network with adaptation layers to transfer parameter features from a model pre-trained on a different domain.
Reinforcement Learning (
RL) puts NER models to interact with the environment domain through a behavior agency with a reward policy, such as the
Markov decision process (
MDP)-based model [
336] and Q-network enhanced model [
337]. Noticeably, researchers [
338] have also leveraged the RL model for noise reduction in distant-supervised NER data. Adversarial Learning generates counterexamples or perturbations to enforce the robustness of NER models, such as DATNet [
339] imposing perturbations on word representations and counterexamples generators ([
340] and [
341]). Moreover, Active Learning, which queries users to annotate selected samples, has also been applied to NER. Shen et al.[
342] incrementally chose the most samples for NER labeling during the training procedures to mitigate the reliance on tagged samples.
Few-shot/zero-shot ET is an intricate challenging issue. Ma et al. [
343] model the prototype of entity label embeddings for zero-shot fine-grain ET, naming Proto-HLE, which combines prototypical features with hierarchical type labels for inferring essential features of a new type. Zhang et al. [
344] further propose MZET that exploits contextual features and word embeddings with a Memory Network to provide semantic side information for few-shot ET.
More probabilistic-based models are developed for EL tasks. Guo et al. [
345] propose a probabilistic model for unstructured data, that leverages the prior probability of an entity, context, and name when performing linking tasks with unstructured data. Han et al. [
346] employed a reference graph of entities, assuming that entities co-occurring in the same documents should be semantically related.
Joint models for NER and EL reduce error propagation of the pipeline-based entity recognition tasks. NEREL [
347] couples NER and EL by ranking extracted mention-entity pairs to exploit the interaction features between entity mentions and their links. Graphic models are also effective designs to combine
Named Entity Normalization (
NEN) labels that convert entity mentions into unambiguous forms, e.g., Washington (Person) and Washington (State). Li et al. [
348] incorporated EL with NEN utilizing a factor graph model, forming CRF chains for word entity types and their target nodes. Likewise, MINTREE [
349] introduces a tree-based pair-linking model for collective tasks.
Cluster-based solutions handle the CO (Coreference Resolution) task as a pairwise binary classification task (co-referred or not). Early cluster models aim at mention-pair features. Soon et al. [
350] propose a single-link clustering strategy to detect anaphoric pairs. Recasens et al. [
351] further develop a mention-pair-based cluster to emanate a coreference chain or a singleton leaf. Later, researchers concentrate on entity-based features to exploit complex anaphoric features. Rahman and Ng [
352] propose a mention-ranking clustering model to dive into entity characteristics. Stoyanov and Eisner [
353] develop agglomerative clustering to merge the best clusters with entity features.
Early researchers concentrate on intriguing statistical-based features for fast end-to-end joint RE, such as
Integer Linear Programming (
ILP)-based algorithm [
354] solving entities and relations via conditional probabilistic model, semi-Markov chain model [
355] jointly decoding global-level relation features, and MLNs [
356] modeling joint logic rules of entity labels and relationships. Early attempts deliver prototypes of entity-relationship interactions. However, statistical patterns are not explicit for intricate contexts.
Few-shot RC designs also consider feature augmentation strategies to mitigate data deficiency with intriguing model designs and background knowledge. Similar to [
95], Levy et al. [
357] turn zero-shot RC into a reading comprehension problem to comprehend unseen labels by a template converter. Soares et al. [
358] compose a compound relation representation for each sentence by the BERT contextualized embeddings of entity pairs and the corresponding sentence. GCNs also deliver extra graph-level features for few-shot learning. Satorras and Estrach [
359] propose a novel GCN framework to determine the relation tag of a query sample by calculating the similarity between nodes. Moreover, Qu et al. [
360] employ posterior distribution for prototypical vectors. Some designs also avail semi-supervised data augmentation based on metric learning. The previous Neural Snowball [
121] (based on RSN) labels the query set via the Siamese network while drawing a similar sample candidate from external distant-supervised sample sets to enrich the support set.
Many early attempts develop random-walk models for relation path reasoning that infer relational logic paths in a latent variable logic graphic model.
Path-Ranking Algorithm (
PRA) [
361] generates a feature matrix to sample potential relation paths. However, the feature sparsity in the graph impedes random walk approaches. Semantic enrichment strategies are proposed to mitigate this bottleneck, such as inducing vector space similarity [
362] and clustering associated relations [
363].
Early attempts aim at the unique attributes of entities for entity matching. Many models leverage distance-based approaches to distributional representations of entity descriptions or definitions. VCU [
364] proposes first-order and second-order vector models to embed the description words of an entity pair for comprehensively measuring the conceptual distance. TALN [
365] leverages sense-based embedding derived by BabelNet to combine the definitional description of words, which first generates the embedding of each filtered definition word combined with POS-tagger, syntax features via BabelNet, then averages them to obtain a centroid sense to obtain the best matching candidates. String-similarity-based models available for entity matching also include TF-IDF [
366] and I-Sub [
367].
Graph-based methods achieve feasible performance for entity matching on the medium-scale KG that consists of hierarchical graph structures. ETF [
368] learns concept representations through semantic features and graph-based features, including Katz similarity, random walk betweenness centrality, and information propagation score. ParGenFS [
369] leverages a graph-based fuzzy cluster algorithm to conceptualize a new entity. This method stimulates the thematic distribution to acquire distinctive concept clusters to search the corresponding location of an entity update in a target KG.
Entity alignment tasks can also be handled by text-similarity-based models that detect surficial similarity between entities when considering the tradeoff between performance and computation cost. Rdf-ai [
370] proposes a systematic model to match two entity node graphs, which leverages the string-matching and lexical-feature-similarity comparing algorithms to align available attributes, then calculates the entity similarity for alignment. Similarly, Lime [
371] further leverages metric spaces to detect aligned entity pairs, which first generate entity exemplars to filter alignable candidates before similarity computation for entity fusion. Different from small-scale KGs, the shaped large KGs contain meaningful relational paths and enriched concept taxonomy. HolisticEM [
372] employs IDF score to calculate the surficial similarity of entity names for seed generating and utilizes
Personalized PageRank (
PPR) to measure distances between entity graphs by traversing their neighbor nodes.