Unveiling Scholarly Communities over
Knowledge Graphs
Sahar Vahdati1[0000−0002−7171−169X] , Guillermo Palma2[0000−0002−8111−2439] ,
Rahul Jyoti Nath1 , Christoph Lange1,4[0000−0001−9879−3827] , Sören
Auer2,3[0000−0002−0698−2864] , and Maria-Esther Vidal2,3[0000−0003−1160−8727]
arXiv:1807.06816v1 [cs.DL] 18 Jul 2018
1
3
University of Bonn, Germany
{vahdati,langec}@cs.uni-bonn.de,s6ranath@uni-bonn.de,
2
L3S Research Center, Germany {palma,auer,vidal}@L3S.de
TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
Maria.Vidal@tib.eu
4
Fraunhofer IAIS, Germany
Abstract. Knowledge graphs represent the meaning of properties of
real-world entities and relationships among them in a natural way. Exploiting semantics encoded in knowledge graphs enables the implementation of knowledge-driven tasks such as semantic retrieval, query processing, and question answering, as well as solutions to knowledge discovery tasks including pattern discovery and link prediction. In this paper,
we tackle the problem of knowledge discovery in scholarly knowledge
graphs, i.e., graphs that integrate scholarly data, and present Korona,
a knowledge-driven framework able to unveil scholarly communities for
the prediction of scholarly networks. Korona implements a graph partition approach and relies on semantic similarity measures to determine
relatedness between scholarly entities. As a proof of concept, we built
a scholarly knowledge graph with data from researchers, conferences,
and papers of the Semantic Web area, and apply Korona to uncover
co-authorship networks. Results observed from our empirical evaluation
suggest that exploiting semantics in scholarly knowledge graphs enables
the identification of previously unknown relations between researchers.
By extending the ontology, these observations can be generalized to other
scholarly entities, e.g., articles or institutions, for the prediction of other
scholarly patterns, e.g., co-citations or academic collaboration.
1
Introduction
Knowledge semantically represented in knowledge graphs can be exploited to
solve a broad range of problems in the respective domain. For example, in
scientific domains, such as bio-medicine, scholarly communication, or even in
industries, knowledge graphs enable not only the description of the meaning
of data, but the integration of data from heterogeneous sources and the discovery of previously unknown patterns. With the rapid growth in the number
of publications, scientific groups, and research topics, the availability of scholarly datasets has considerably increased. This generates a great challenge for
researchers, particularly, to keep track of new published scientific results and
potential future co-authors. To alleviate the impact of the explosion of scholarly
2
S. Vahdati et al.
data, knowledge graphs provide a formal framework where scholarly datasets can
be integrated and diverse knowledge-driven tasks can be addressed. Nevertheless, to exploit the semantics encoded in such knowledge graphs, a deep analysis
of the graph structure as well as the semantics of the represented relations, is
required. There have been several attempts considering both of these aspects.
However, the majority of previous approaches rely on the topology of the graphs
and usually omit the encoded meaning of the data. Most of such approaches are
also mainly applied on special graph topologies, e.g., ego networks rather than
general knowledge graphs. To provide an effective solution to the problem of representing scholarly data in knowledge graphs, and exploiting them to effectively
support knowledge-driven tasks such as pattern discovery, we propose Korona ,
a knowledge-driven framework for scholarly knowledge graphs. Korona enables
both the creation of scholarly knowledge graphs and knowledge discovery. Specifically, Korona resorts to community detection methods and semantic similarity
measures to discover hidden relations in scholarly knowledge graphs. We have
empirically evaluated the performance of Korona in a knowledge graph of publications and researchers from the Semantic Web area. As a proof of concept,
we studied the accuracy of identifying co-author networks. Further, the predictive capacity of Korona has been analyzed by members of the Semantic Web
area. Experimental outcomes suggest the next conclusions: i) Korona identifies
co-author networks that include researchers that both work on similar topics,
and attend and publish in the same scientific venues. ii) Korona allows for
uncovering scientific relations among researchers of the Semantic Web area. The
contributions of this paper are as follows:
– A scholarly knowledge graph integrating data from DBLP datasets;
– The Korona knowledge-driven framework, which has been implemented on
top of two graph partitioning tools, semEP [8] and METIS [3], and relies on
semantic similarity to identify patterns in a scholarly knowledge graph;
– Collaboration suggestions based on co-author networks; and
– An empirical evaluation of the quality of Korona using semEP and METIS.
This paper includes five additional sections. Section 2 motivates our work
with an example. The Korona approach is presented in section 3. Related
work is analyzed in section 4. Section 5 reports on experimental results. Finally,
section 6 concludes and presents ideas for future work.
2
Motivating Example
In this section, we motivate the problem of knowledge discovery tackled in
this paper. We present an example of co-authorship relation discovery between
researchers working on data-centric problems in the Semantic Web area. We
checked the Google Scholar profiles of three researchers between 2015 and 2017,
and compared their networks of co-authorship. By 2016, Sören Auer and Christoph
Lange were part of the same research group and wrote a large number of joint
publications. Similarly, Maria-Esther Vidal, also working on data management
Unveiling Scholarly Communities over Knowledge Graphs
3
Scholarly Artifacts
... ...
...
...
...
Maria-Esther Vidal
...
Federated Query
Processing
...
...
...
Maria-Esther Vidal
...
Sören Auer
...
Christoph Lange
Data Integration in
...
Data
Integration
Industry
4.0
(a) Researchers working on similar topics were in two co-authorship
communities.
Federated Query
Processing
Big Data
Management
Linked Datat
...
...
...
Life Science Data
Management
Life Science Data
Management
Linked
Data
Big Data
Management
Management
Papers
... ...
...
Sören Auer
...
...
...
Christoph Lange
Data Integration in
4.0 Integration
...IndustryData
Federated
Query Processing
Semantic Data
Integration
Projects
Knowledge Management
in Life Science
Industry 4.0
Big Data
(b) Researchers working on similar topics
constitute a co-authorship community and
produce a large number of scholarly artifacts.
Fig. 1: Motivating Example. Co-authorship communities from the Semantic
Web area working on data-centric problems. Researchers were in different coauthorship communities (2016) (a) started a successful scientific collaboration
in 2016 (b), and as a result, produced a large number of scholarly artifacts.
topics, was part of a co-authorship community. Figure 1b illustrates the two
co-authorship communities, which were confirmed by the three researchers. After 2016, these three researchers started to work in the same research lab, and a
large number of scientific results, e.g., papers and projects, was produced. An approach able to discover such potential collaborations automatically would allow
for the identification of the best collaborators and, thus, for maximizing the success chances of scholars and researchers working on similar scientific problems.
In this paper, we rely on the natural intuition that successful researchers working
on similar problems and producing similar solutions can collaborate successfully,
and propose Korona, a framework able to discover unknown relations between
scholarly entities in a knowledge graph. Korona implements graph partitioning
methods able to exploit semantics encoded in a scholarly knowledge graph and
to identify communities of scholarly entities that should be connected or related.
3
3.1
Our Approach: Korona
Preliminaries
The definitions required to understand our approach are presented in this section.
First, we define a scholarly knowledge graph as a knowledge graph where nodes
represent scholarly entities of different types, e.g., publications, researchers, publication venues, or scientific institutions, and edges correspond to an association
between these entities, e.g., co-authors or citations.
Definition 1 Scholarly Knowledge Graph. Let U be a set of RDF URI references
and L a set of RDF literals. Given sets Ve and Vt of scholarly entities and types,
respectively, and given a set P of properties representing scholarly relations, a
scholarly knowledge graph is defined as SKG=(Ve ∪ Vt , E, P ), where:
4
S. Vahdati et al.
Question Answering,
Federated Queries
Linked Data, Semantic Web
or:abstract
or:filed
or:filed
Maria-Esther
Vidal
Sören Auer
LinkDaViz
–
Automatic
Binding of Linked Data to
Visualizations
or:title
or:keyword
Data Property, Recommendation
Algorithm,
Structural
Option
Layout,Option Input Data Model
Federated query engines provide a
unified query interface to federations of
SPARQL endpoints. Replicating data
fragments from different Linked Data
sources facilitates data re-organization
to better fit federated query processing
needs of data consumers...
or:isAuthorOf
or:title
article#1
Federated SPARQL Queries
Processing with Replicated
Fragments
or:keyword
Linked data, Federated query, processing
Source, Fragment replication
or:publishedIn
or:isAuthorOf
or:fullTitle
or:publishedIn
article#2
or:abstract
As the Web of Data is growing steadily,
the demand for user-friendly means for
exploring, analyzing and visualizing
Linked Data is also increasing...
ISWC_2014
or:hasDate
13th International Semantic
Web Conference
or:location
Riva del Garda, Italy
or:year
2014
October 19-23, 2014
Fig. 2: Korona Knowledge Graph. Scholarly entities and relations.
– Scholarly entities and types are represented as RDF URIs, i.e., Ve ∪ Vt ⊆ U ;
– Relations between scholarly entities and types are represented as RDF properties, i.e., P ⊆ U and E ⊆ (Ve ∪ Vt × P × Ve ∪ Vt ∪ L)
Figure 2 shows a portion of a scholarly knowledge graph describing scholarly entities, e.g., papers, publication venues, researchers, and different relations among
them, e.g., co-authorship, citation, and collaboration.
Definition 2 Co-author Network. A co-author network CAN =(Va , Ea , Pa ) corresponds to a subgraph of SKG=(Ve ∪ Vt , E, P ), where
– Nodes are scholarly entities of type researcher,
Va = {a | (a rdf:type :Researcher) ∈ E}
– Researchers are related according to co-authorship of scientific publications,
Ea = {(ai :co-author aj ) | ∃p . ai , aj ∈ Va ∧ (ai :author p) ∈ E ∧
(aj :author p) ∈ E ∧ (p rdf:type :Publication) ∈ E}
Figure 3 shows scholarly networks that can be generated by Korona. Some
of these networks are among the recommended applications for scholarly data
analytics in [14]. However, the focus on this work is on co-author networks.
3.2
Problem Statement
Let SKG ′ =(Ve ∪ Vt , E ′ , P ) and SKG=(Ve ∪ Vt , E, P ) be two scholarly knowledge graphs, such that SKG ′ is an ideal scholarly knowledge graph that contains
all the existing and successful relations between scholarly entities in Ve , i.e.,
an oracle that knows whether two scholarly entities should be related or not.
SKG=(Ve ∪Vt , E, P ) is the actual scholarly knowledge graph, which only contains
a portion of the relations represented in SKG ′ , i.e., E ⊆ E ′ ; it represents those
relations that are known and is not necessarily complete. Let ∆(E ′ , E) = E ′ − E
be the set of relations existing in the ideal scholarly knowledge graph SKG ′
that are not represented in the actual scholarly knowledge graph SKG. Let
Unveiling Scholarly Communities over Knowledge Graphs
Co-Author Networks
Jens
Lehmann
Axel
Ngomo
Researcher
Article
Article
Sören
Auer
Christoph
Lange
MariaEsther
Louiqa
Raschid
Olivier Curé
article#1
Jens
Lehmann
Axel
Ngomo
MariaEsther
Ruben
Verborgh
Louiqa
Raschid
Olivier
Curé
article#5
article#2
Co-Citation
Networks
article#4
(a) Network of Researchers and Articles.
Research
development
Networks
ISWC
13
ISWC
12
...
article#6
article#3
article#7
Community
development
Networks
Event history
Networks
ISWC
10
...
Ruben
Verborgh
Christoph
Lange
Event
ISWC
14
Sören
Auer
5
ISWC
01
(b) Networks of Events and Articles.
Fig. 3: Scholarly networks. (a) Co-authors networks from researchers and articles.(b) Co-citation networks from discovered from events and articles.
SKG comp =(Ve ∪ Vt , Ecomp , P ) be a complete knowledge graph, which includes
a relation for each possible combination of scholarly entities in Ve and properties
in P , i.e., E ⊆ E ′ ⊆ Ecomp . Given a relation e ∈ ∆(Ecomp , E), the problem
of discovering scholarly relations consists in determining whether e ∈ E ′ , i.e.,
whether a relation r=(ei p ej ) corresponds to an existing relation in the ideal
scholarly knowledge graph SKG ′ .
In this paper, we specifically focus on the problem of discovering successful co-authorship relations between researchers in scholarly knowledge graph
SKG=(Ve ∪ Vt , E, P ). Thus, we are interested in finding the co-author network
CAN =(Va , Ea , Pa ) composed of the maximal set of relationships or edges that
belong to the ideal scholarly knowledge graph, i.e., the set Ea in CAN that
corresponds to a solution of the following optimization problem:
argmax |Ea ∩ E ′ |
(1)
Ea ⊆Ecomp
3.3
Proposed Solution
We propose Korona to solve the problem of discovering meaningful co-authorship
relations between researchers in scholarly knowledge graphs. Korona relies on
information about relatedness between researchers to identify communities composed of researchers that work on similar problems and publish in similar scientific events. Korona is implemented as an unsupervised machine learning
method able to partition a scholarly knowledge graph into subgraphs or communities of co-author networks. Moreover, Korona applies the homophily prediction principle over the communities of co-author networks to identify successful
co-author relations between researchers in the knowledge graph. The homophily
prediction principle states that similar entities tend to be related to similar entities [6]. Intuitively, the application of the homophily prediction principle enables
Korona to relate two researchers ri and rj whenever they work on similar
research topics or publish in similar scientific venues. The relatedness or similarity between two scholarly entities, e.g., researchers, research topics, or scientific
venues, is represented as RDF properties in the scholarly knowledge graph. Semantic similarly measures, e.g., GADES [10] or Doc2Vec [5], are utilized to
6
S. Vahdati et al.
Knowledge
Discovery
Knowledge Graph
Creation
Repositories
Format
Transformer
Digital
Libraries
Parser&Filter
Engine
Semantifier
Datasets
Semantic
Integrator
Semantic
Similarity-Measure
Calculator
Intra-type Relatedness
solver (IRs)
Scholarly
Knowledge
Graph
Scholarly
Patterns
Co-author
networks
Intra-type Scholarly
Community solver (ISCs)
Scholarly Pattern
generator (SPg)
Fig. 4: The Korona Architecture. Korona receives scholarly datasets and
outputs scholarly patterns, e.g., co-author networks. First, a scholarly knowledge
graph is created. Then, community detection methods and similarity measures
are used to compute communities of scholarly entities and scholarly patterns.
quantify the degree of relatedness between two scholarly entities. The identified
degree shows the relevance of entities and returns the most related ones.
r
ila
:sim
rdf:type
Sören
Auer
:value
0.8
:si
m
ila
ilar
Christoph
Lange
Louiqa
Raschid
:sim
0.9
(a) Similarity-based Relatedness
:paper1
:co
-a
:
Christoph
or Lange
uth
:paperN
SC={(Sören Auer, Christoph Lange,0.8),
(Maria-Esther Vidal, Louiqa Raschid, 0.9)}
ilar
:value
:sim
rdf:type
:GADES
Maria-Esther
Vidal
r
Sören
tor
Auer co-au
:ISWC15
Maria-Esther
Vidal
:c
o
:ISWC14
SC={(Sören Auer, Christoph Lange,0.7),
(Maria-Esther Vidal, Louiqa Raschid, 0.7)}
Louiqa
Raschid
or
-a
th
ut
ho :paperP
au
or
:c
:paperM
(b) Path-based Relatedness
Fig. 5: Intra-type Relatedness solver (IRs). Relatedness across scholarly
entities. (a) Relatedness is computed according to the values of a semantic similarity metrics, e.g., GADES. (b) Relatedness is determined based on the number
of paths between two scholarly entities.
Figure 4 depicts the Korona architecture; it implements a knowledge-driven
approach able to transform scholarly data ingested from publicly available data
sources into patterns that represent discovered relationships between researchers.
Thus, Korona receives scholarly data sources and outputs co-author networks;
it works in two stages: (a) Knowledge graph creation and (b) Knowledge graph
discovery. During the knowledge graph creation stage, a semantic integration
pipeline is followed in order to create a scholarly knowledge graph from data
ingested from heterogeneous scholarly data sources. It utilizes mapping rules between the Korona ontology and the input data sources to create the scholarly
knowledge graph. Additionally, semantic similarity measures are used to compute
Unveiling Scholarly Communities over Knowledge Graphs
7
Sören
Auer
Steffen
Lohmann
Christoph
Lange
Maria-Esther
Vidal
Louiqa
Raschid
Jens
Lehmann
Chris Bizer
Ruben
Verborgh
Axel Ngomo
Steffen
Lohmann
Christoph
Lange
Maria-Esther
Vidal
Louiqa
Raschid
Jens
Lehmann
Sören Auer
Chris
Bizer
Ruben
Verborgh
Christoph Lange
Jens Lehmann
Steffen Lohmann
Chris Bizer
Maria-Esther Vidal
Louiqa Raschid
Axel-Cyrille Ngonga
Ngomo
Maribel Acosta
Axel-Cyrille
Ngonga
Ngomo
Olaf Hartig
Olaf
Hartig
Olaf Hartig
Ruben Verborgh
Maribel
Acosta
(a) Relatedness Across Researchers
(b) Communities of Researchers
Fig. 6: Intra-type Relatedness solver (IRs). Communities of similar researchers are computed. (a) The tabular representation of SC; lower and higher
values of similarity are represented by lighter and darker colors, respectively. (b)
Two communities of researchers; each one includes highly similar researchers.
the relatedness between scholarly entities; the results are explicitly represented
in the knowledge graph as scores in the range of 0.0 and 1.0. The knowledge
graph creation stage is executed offline and enables the integration of new entities in the knowledge graph whenever the input data sources change. On the
other hand, the knowledge graph discovery step is executed on the fly over an
existing scholarly knowledge graph. During this stage, Korona executes three
main tasks: (i) Intra-type Relatedness solver (IRs); (ii) Intra-type Scholarly
Community solver (IRSCs); and (iii) Scholarly Pattern generator (SPg).
Intra-type Relatedness solver (IRs). This module quantifies relatedness
between the scholarly entities of the same type in a scholarly knowledge graph
SKG=(Ve ∪ Vt , E, P ). IRs receives as input SKG=(Ve ∪ Vt , E, P ) and a scholarly type Va in Vt ; it outputs a set SC of triples (ei , ej , score), where ei and
ej belong to Va and score quantifies the relatedness between ei and ej . The
relatedness can be just computed in terms of the values of similarity represented in the knowledge graph, e.g., according to the values of the semantic
similarity according to GADES or Doc2Vec. Alternatively, the values of relatedness can be computed based on the number of paths in the scholarly
knowledge graph that connect the scholarly entities ei and ej . Figure 5 depicts two representations of the relatedness of scholarly entities. As shown in
Figure 5a, IRs generates a set SC according to the GADES values of semantic similarity; thus, IRs includes two triples (Sören Auer, Christoph Lange, 0.8),
(Maria-Esther Vidal, Louiqa Raschid, 0.9) in SC. On the other hand, if paths between scholarly entities are considered (Figure 5b), the values of relatedness can
different, e.g., in this case, Sören Auer and Christoph Lange are equally similar
as Maria-Esther Vidal and Louiqa Raschid.
Intra-type Scholarly Community solver (IRSCs). Once the relatedness
between the scholarly entities has been computed, communities of highly related scholarly entities are determined. IRSCs resorts to unsupervised methods
such as METIS or semEP, and to relatedness values stored in SC, to compute
the scholarly communities. Figure 6 depicts scholarly communities computed
by IRSCs based on similarity values; as observed, each community includes
8
S. Vahdati et al.
Fig. 7: Co-author network. A network generated from scholarly communities.
researchers that are highly related; for readability, SC is shown as a heatmap
where lower and higher values of similarity are represented by lighter and darker
colors, respectively. For example, in Figure 6a, Sören Auer, Christoph Lange,
and Maria-Esther Vidal are quite similar, and they are in the same community.
Scholarly Pattern generator (SPg). SPg receives communities of scholarly entities and produces a network, e.g., a co-author network. SPg applies
the homophily prediction principle on the input communities, and connects the
scholarly entities in one community in a network. Figure 7 shows a co-author
network computed based on a scholarly knowledge graph created from DBLP;
as observed, Sören Auer, Christoph Lange, and Maria-Esther Vidal are included
in the same co-author network. In addition to computing the scholarly networks,
SPg scores the relations in a network and computes the weight of connectivity
of a relation between two entities. For example, in Figure 7, thicker lines represent strongly connected researchers in the network. SPg can also filter from
a network the relations labeled with higher values of weight of connectivity. All
the relations in a network correspond to solutions to the problem of discovering
successful co-authorship relations defined in Equation 1. To compute the weights
of connectivity, SPg considers the values of similarity of the scholarly entities
in a community C; weights are computed as aggregated values using an aggregation function f (.), e.g., average or triangular norm. For each pair (ei , ej ) of
scholarly entities in C, the weight of connectivity between ei and ej , φ(ei , ej | C),
is defined as: φ(ei , ej | C) = {f (score) | ez , eq ∈ C ∧ (ez , eq , score) ∈ SC}.
4
Empirical Evaluation
4.1
Knowledge Graph Creation
A scholarly knowledge graph has been crafted using the DBLP collection (7.83
GB in April 20175 ); it includes researchers, papers, and publication year from
the International Semantic Web Conference (ISWC) 2001–2016. The knowledge
graph also includes similarity values between researchers who have published at
ISWC (2001–2017). Let PC ei and PC ej be the number of papers published by researchers ei and ej together (as co-authors), respectively at ISWC (2001–2017).
Let TP ei and TP ej be the total number of papers that ei and ej have in all
conferences of the scholarly knowledge graph, respectively. The similarity meaPC e ∩PC e
sure is defined as: SimR(ei , ej ) = TP ei ∪TP e j . The similarities between ISWC
i
5
j
http://dblp2.uni-trier.de/e55477e3eda3bfd402faefd37c7a8d62/
Unveiling Scholarly Communities over Knowledge Graphs
Percentile 85
Percentile 90
Inv. Conductance
Inv. Conductance
0.2
od
ul
ar
it
0.1
0
y
In
N
v.
m
or
al
ot
.T
t
Cu
No
rm
.M
(a) Percentile 85
Inv. Conductance
ge
0.4
0.2
0
e
0
0.6
e
c
man
0.1
0.8
Co v
era
0.2
y
m
or
c
man
Co v
era
ge
0.3
ul
ar
it
N
v.
for
Per
Inv.
for
Per
Inv.
0.4
od
In
t
Cu
Inv. Conductance
Korona_SemEP
Korona_Metis
0.5
.M
y
al
ot
.T
(b) Percentile 90
Korona_SemEP
No
rm
ul
ar
it
Percentile 98
Percentile 95
Korona_Metis
od
e
.M
0.2
e
0
0.3
ge
c
man
0.1
0.5
0.4
a nc
form
Co v
era
ge
0.3
Per
Inv.
for
Per
Inv.
0.4
No
rm
Korona_SemEP
Korona_Metis
0.5
Co v
era
Korona_SemEP
Korona_Metis
9
In
N
v.
m
or
(c) Percentile 95
al
ot
.T
t
Cu
No
rm
.M
od
ul
ar
it
y
In
N
v.
m
or
al
ot
.T
t
Cu
(d) Percentile 98
Fig. 8: Quality of Korona. Communities evaluated in terms of prediction
metrics (higher values are better); percentiles 85, 90, 95, and 98 are reported.
Korona exhibits the best performance at percentile 95 and groups similar researchers according to research topics and events where they publish.
(2002–2016) are represented as well. Let RC i and RC j the number of the authors with papers published in conferences ci and cj respectively. The similarity
RC ∩RC
measure corresponds to SimC(ci , cj ) = RCii ∪RCjj . Thus, the scholarly knowledge
graph includes both scholarly entities enriched with their values of similarity.
4.2
Experimental Study
The effectiveness of Korona has been evaluated in terms of the quality of both
the generated communities of researchers and the predicted co-author networks.
Research Questions: We assess the following research questions: RQ1) Does
the semantics encoded in scholarly knowledge graphs impact the quality of scholarly patterns? RQ2) Does the semantics encoded in scholarly knowledge graph
allow for improving the quality of the predicting co-author relations?
Implementation: Korona is implemented in Python 2.7. The experiments
were executed on a macOS High Sierra 10.13 (64 bits) Apple MacBook Air
machine with an Intel Core i5 1.6 GHz CPU and 8 GB RAM. METIS 5.1 6 and
SemEP 7 are part of Korona and used to obtain the scholarly patterns.
6
7
http://glaros.dtc.umn.edu/gkhome/metis/metis/download
https://github.com/gpalma/semEP
10
S. Vahdati et al.
Q1. Do you know this person? Have you co-authored before? To avoid confusion, the
meaning of “knowing” was kept simple and general. The participants were asked to only consider
if they were aware of the existence of the recommended person in their research community.
Q2. Have you co-authored “before” with this person at any event of the ISWC series?
With the same intent of keeping the survey simple, all types of collaboration on papers in any
edition of this event series were considered as “having co-authored before”.
Q3. Have you co-authored with this person after May 2016? Our study considered scholarly
metadata of publications until May 2016. The objective of this question was to find out whether a
prediction had actually come true, and the researchers had collaborated.
Q4. Have you ever planned to write a paper with the recommended person and you
never made it and why? The aim is to know whether two researchers who had been predicted
to work together actually wanted to but then did not and the reason, e.g., geographical distance.
Q5. On a scale from 1–5, (5 being most likely), how do you score the relevance of your
research with this person? The aim is to discover how close and relevant are the collaboration
recommendations to the survey participant.
Table 1: Survey. Questions to validate the recommended collaborations.
Evaluation metrics: Let Q = {C1 , . . . Cn } be the set of communities obtained
by Korona: Conductance: measures relatedness of entities in a community, and
how different they are to entities outside the community [2]. The inverse of the
conductance 1 − Conductance(S) is reported. Coverage: compares the fraction of
intra-community similarities among entities to the sum of all similarities among
entities [2]. Modularity: is the value of the intra-community similarities among
the entities divided by the sum of all the similarities among the entities, minus
the sum of the similarities among the entities in different communities, in the
case they were randomly distributed in the communities [7]. The value of the
modularity lies in the range [−0.5, 1], which can be scaled to [0, 1] by computing
Modularity(Q)+0.5
. Performance: sums the number of intra-community relation1.5
ships, plus the number of non-existent relationships between communities [2].
Total Cut: sums all similarities among entities in different communities [1]. Values of total cut are normalized by dividing by the sum of the similarities among
the entities; inverse values are reported, i.e., 1 − NormTotalCut(Q).
Experiment 1: Evaluation of the Quality of Collaboration Patterns.
Prediction metrics are used to evaluate the quality of the communities generated by Korona using METIS and semEP; relatedness of the researchers is
measured in terms of SimR and SimC. Communities are built according to
different similarity criteria; percentiles of 85, 90, 95, and 98 of the values of similarity are analyzed. For example, in percentile 85 only 85% of all similarity values
among entities have scores lower than the similarity value in the percentile 85.
Figure 8 presents the results of the studied metrics. In general, in all percentiles,
the communities include closely related researchers. However, both implementations of Korona exhibit quite good performance at percentile 95, and allow for
grouping together researchers that are highly related in terms of the research
topics on which they work, and the events where their papers are published.
On the contrary, Korona creates many communities of no related authors for
percentiles 85 and 90, thus exposing low values of coverage and conductance.
Unveiling Scholarly Communities over Knowledge Graphs
Korona
%
11
Q.1(a)
Q.1(b)
Q.2
Q.3
Q.4
Q.5
Korona-METIS 85
Korona-semEP 85
0.26±0.25
0.24±0.21
0.72±0.29
0.80±0.34
0.99±0.04
1.00±0.00
0.86±0.13
0.97±0.07
0.86±0.20
0.93±0.16
3.10±0.59
3.35±0.85
Korona-METIS 90
Korona-semEP 90
0.39±0.24
0.13±0.18
0.91±0.19
0.89±0.18
1.00±0.00
1.00±0.00
1.00±0.00
1.00±0.00
0.98±0.04
0.85±0.23
3.03±0.79
3.12±1.06
Korona-METIS 95
Korona-semEP 95
0.40±0.34
0.14±0.30
0.93±0.08
0.81±0.40
1.00±0.00
0.67±0.58
0.80±0.45
0.60±0.55
0.95±0.10
0.69±0.47
3.20±0.81
3.83±0.76
Table 2: Survey results. Aggregated normalized values of negative answers
provided by the study participants during the validation of the recommended
collaborations (Q.1(a), Q.1(b), Q.2, Q.3, and Q.4); average (lower is better) and
standard deviation (lower is better) are reported. For Q.5, average and standard
deviation of the scale from 1–5 are presented; higher average values are better.
Experiment 2: Survey of the Quality of the Prediction of Collaborations among Researchers. Results of an online survey8 among 10 researchers
are reported; half of the researchers are from the same research area, while
the other half was chosen randomly. Knowledge subgraphs of each of the participants are part of the Korona research knowledge graph; predictions are
computed from these subgraphs. The predictions for each were laid out in an
online spreadsheet along with 5 questions and a comment section. Table 1 lists
the five questions that the survey participants were asked to validate the answers, while Table 2 reports on the results of the study. The analysis of results
suggests that Korona predictions represent potentially successful co-authorship
relations; thus, they provide a solution to the problem tackled in this paper.
5
Related Work
Xia et al. [14] provides a comprehensive survey of tools and technologies for
scholarly data management, as well as a review of data analysis techniques, e.g.,
social networks and statistical analysis. However, all the proposals have been
made over raw data and knowledge-driven methods were not considered. Wang et
al. [13] present a comprehensive survey of link prediction in social networks, while
Paulheim [9] presents a survey of methodologies used for knowledge graph refinement; both works show the importance of the problem of knowledge discovery.
Traverso-Ribón et al. [12] introduces a relation discovery approach, KOI, able
to identify hidden links in TED talks; it relies on heterogeneous bipartite graphs
and on the link discovery approach proposed in [8]. In this work, Palma et al.
present semEP, a semantic-based graph partitioning approach, which was used in
the implementation of Korona-semEP. Graph partitioning of semEP is similar
to KOI with the difference of only considering isolated entities, whereas KOI is
desired for ego networks. However, it is only applied to ego networks, whereas
Korona is mainly designed for knowledge graphs. Sachan and Ichise [11] propose a syntactic approach considering dense subgraphs of a co-author network
created from the DBLP. They discover relations between authors and propose
pairs of researchers belonging to the same community. A link discovery tool is
8
https://bit.ly/2ENEg2G
12
S. Vahdati et al.
developed for the biomedical domain by Kastrin et al. [4]. Albeit effective, these
approaches focus on the graph structure and ignore the meaning of the data.
6
Conclusions and Future Work
Korona is presented for unveiling unknown relations; it relies on semantic similarity measures to discover hidden relations in scholarly knowledge graphs. Reported and validated experimental results show that Korona retrieves valuable
information that can impact the research direction of a researcher. In the future,
we plan to extend Korona to detect other networks, e.g., affiliation networks,
co-citation networks and research development networks. We plan to extend our
evaluation over big scholarly datasets and study the scalability of Korona;
further, the impact of several semantic similarity measures will be included in
the study. Finally, Korona will be offered as an online service that will enable
researchers to explore and analyze the underlying scholarly knowledge graph.
Acknowledgement This work has been partially funded by the EU H2020
programme for the project iASiS (grant agreement No. 727658).
References
1. Buluç, A., Meyerhenke, H., Safro, I., Sanders, P., Schulz, C.: Recent Advances in
Graph Partitioning. Springer, Cham (2016)
2. Gaertler, M.: Clustering. In: Network Analysis: Method. Found.
3. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning
irregular graphs. Scientific Computing (1998)
4. Kastrin, A., Rindflesch, T.C., Hristovski, D.: Link prediction on the semantic MEDLINE network - an approach to literature-based discovery. In: The Discovery Science Conference (2014)
5. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents.
CoRR abs/1405.4053 (2014)
6. Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks.
JASIST 58(7) (2007)
7. Newman, M.E.: Modularity and community structure in networks. Proceedings of
the national academy of sciences 103(23) (2006)
8. Palma, G., Vidal, M., Raschid, L.: Drug-target interaction prediction using semantic similarity and edge partitioning. In: ISWC (2014)
9. Paulheim, H.: Knowledge graph refinement: A survey of approaches and evaluation
methods. Semantic Web Journal 8(3) (2017)
10. Ribón, I.T., Vidal, M., Kämpgen, B., Sure-Vetter, Y.: GADES: A graph-based
semantic similarity measure. In: SEMANTICS (2016)
11. Sachan, M., Ichise, R.: Using semantic information to improve link prediction results in network datasets. IJET 2(4) (2010)
12. Traverso-Ribón, I., Palma, G., Flores, A., Vidal, M.E.: Considering semantics on
the discovery of relations in knowledge graphs. In: EKAW (2016)
13. Wang, P., Xu, B., Wu, Y., Zhou, X.: Link prediction in social networks: the stateof-the-art. Link Prediction in Social Networks(SCIS) 58(1) (2015)
14. Xia, F., Wang, W., Bekele, T.M., Liu, H.: Big scholarly data:a survey. IEEE Big
Data (2017)