ClusCite: Effective Citation Recommendation by
Information Network-Based Clustering
Xiang Ren Jialu Liu Xiao Yu Urvashi Khandelwal Quanquan Gu Lidan Wang Jiawei Han
University of Illinois at Urbana-Champaign, Urbana, IL
{xren7, jliu64, xiaoyu1, khndlwl2, qgu3, lidan, hanj}@illinois.edu
ABSTRACT
Citation recommendation is an interesting but challenging
research problem. Most existing studies assume that all papers adopt the same criterion and follow the same behavioral pattern in deciding relevance and authority of a paper.
However, in reality, papers have distinct citation behavioral
patterns when looking for different references, depending on
paper content, authors and target venues. In this study,
we investigate the problem in the context of heterogenous
bibliographic networks and propose a novel cluster-based citation recommendation framework, called ClusCite, which
explores the principle that citations tend to be softly clustered into interest groups based on multiple types of relationships in the network. Therefore, we predict each query’s
citations based on related interest groups, each having its
own model for paper authority and relevance. Specifically,
we learn group memberships for objects and the significance
of relevance features for each interest group, while also propagating relative authority between objects, by solving a joint
optimization problem. Experiments on both DBLP and
PubMed datasets demonstrate the power of the proposed approach, with 17.68% improvement in Recall@50 and 9.57%
growth in MRR over the best performing baseline.
Categories and Subject Descriptors
H.3.3 [Information Storage and Retrieval]: Information
Search and Retrieval
Keywords
Citation Recommendation; Heterogeneous Information Network; Clustering; Citation Behavioral Pattern
1.
INTRODUCTION
A research paper needs to cite relevant and important
previous work to help readers understand its background,
context and innovation. However, the already large, and
rapidly growing body of scientific literature makes it hard
for anyone to go through and digest all the papers. It is thus
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
KDD’14, August 24–27, 2014, New York, NY, USA.
Copyright 2014 ACM 978-1-4503-2956-9/14/08 ...$15.00.
http://dx.doi.org/10.1145/2623330.2623630.
Query
manuscripts
Information Needs
(interest groups)
Citation
Recommendation
(problem)
Citation Behavioral Patterns
Target papers
Jon Kleinberg
link prediction
KDD
Paper I [20]
Recommending
Citations for Academic Papers
Link Prediction
(problem)
Random Walk
(method)
L-BFGS
(algorithm)
random walk
literature search
C. Lee Giles
Quasi-Newton method
Paper II [2]
Supervised Random Walks: Predicting and Recommending Links in
Social Networks
large scale optimization
Social Networks
(data)
Jure Leskovec
Figure 1: A toy example showing the diverse information needs of two query manuscripts and the
corresponding citation behavioral patterns.
desirable to design a system that could automatically generate quality citation recommendations. Traditional literature
search engines, such as Google Scholar, can retrieve a list of
relevant papers using keyword-based queries. But casting
one’s rich information needs into a few keywords may not
be feasible. Moreover, a user may be looking for papers that
are not only relevant to their work, but also important and of
high quality. To this end, citation recommendation aims to
suggest a small number of publications that can be used as
high quality references to satisfy such citation requirements.
There exist some interesting studies on citation recommendation. Context-aware recommendation [10, 12] analyzes each citation’s local context to capture its specific information needs. However, local context can be ambiguous
or too short a query, causing inaccurate predictions. Topical similarity-based methods [18, 24] find conceptually related papers by taking advantage of latent topic models.
But solely relying on topic distributions to measure relevance is insufficient. A large number of papers may share
the same topic, making topical similarity weak in indicating
importance of a paper. Both methods primarily focus on
recommending relevant papers based on content, but ignore
critical information related to importance and quality.
Recent studies [16, 20] utilize citation links to derive structural similarity and authority, which serve as good complements to content-based relevance features. With paper text,
authors and target venues as queries, one can further generate a rich set of structural features [5, 27] based on multiple types of relations between different entities. However,
existing hybrid methods have difficulty in handling the diverse information needs since they impose the same citation
behavioral pattern on every query manuscript. Fig. 1 il-
Table 1: Meta paths with different semantics.
Author
Writes
Venue
Publishes
Paper
Contains
Term
Cites
Figure 2: Schema for DBLP bibliographic network.
lustrates the diversity of these behavioral patterns using a
toy example. Paper I is on “citation recommendation” and
“link prediction”, which are studied by a relatively compact
group of researchers and venues, and one can find useful
papers through related researchers, venues and key terms
effectively. On the other hand, for “random walk”, relations
through authors and venues would be less informative on paper relevance since this method is widely studied by authors
working on a variety of topics, and published at venues focusing on a variety of fields. Previous hybrid methods learn and
apply the same recommendation model across all queries,
ignoring the variations in citation behaviors when seeking
quality references. Intuitively, paper citations should be organized into different groups and each group should have its
own behavior pattern to identify information of interest.
In this paper, we propose a novel citation recommendation framework to capture citation behaviors for each query
manuscript, based on both paper relevance and importance.
By softly clustering citations into different interest groups,
we aim to study the significance of different relevance features for each interest group, and derive paper relative authority within each group. In doing so, the challenge of satisfying diverse information needs behind a paper’s citations
can be properly tackled by making a paper-specific recommendation according to the query’s interest group membership. Meanwhile, integration of paper importance can be
accurately accomplished using relative authority. This idea,
though interesting, leads to two critical problems: (1) how
to discover hidden interest groups for effective citation recommendation, and (2) how to derive behavioral patterns on
relevance and authority for each group.
To facilitate our study, a heterogenous bibliographic network, encoding the multiple types of relations between different objects, is constructed (Fig. 2). A rich set of structural features is derived from the network, representing various relation semantics (Table 1) between two papers. We
then formulate a joint optimization problem to learn the proposed model such that prediction error along with graph regularization is minimized over known citations, based on the
network. Specifically, the optimization problem conducts
graph-regularized co-clustering to learn group membership
for attribute objects and weights on relevance features for
each group. It also propagates relative authority between
different objects. An alternative minimization algorithm,
called ClusCite, is further designed to iterate between coclustering and authority propagation. Intuitively, feature
weights and relative authority can be better learned with
high quality interest groups, and in turn they assist in mining higher quality interest groups.
Our experiments on the DBLP and PubMed datasets demonstrate the power of the proposed model. ClusCite achieves
17.68% improvement in Recall@50 and 9.57% growth in MRR
over the best baseline on the DBLP dataset. Our performance analysis shows that ClusCite can achieve even better
results with richer attribute objects, and our case studies
demonstrate the effectiveness of discovered interest groups
and object relative authority for citation recommendation.
The rest of the paper is organized as follows. Sec. 2 gives
background and the problem definition. Sec. 3 introduces
Meta path
P −A−P
P −T −P
P −V −P
P −T −P →P
P −A−P ←P
Semantic meaning of the relation
pi and pj share same author(s)
pi and pj contain same term(s)
pi and pj are in the same venue
pi share term(s) with the paper(s) that cite pj
pi share the same author(s) with the paper(s)
cited by pj
our new model. The learning algorithm and its computational complexity analysis are in Sec. 4. We present and
analyze our experimental results in Sec. 5, discuss the related work in Sec. 6, and conclude this study in Sec. 7.
2. BACKGROUND
This section introduces concepts on heterogeneous bibliographic networks and presents the formal problem definition.
A heterogeneous bibliographic network [27, 23] is a
directed graph G, that consists of multiple types of objects
and relationships, derived from a bibliographic dataset.
Suppose there are n papers P = {p1 , . . . , pn }, |A| authors
A = {a1 , . . . , a|A| }, |V| venues (conferences or journals) V =
{v1 , . . . , v|V| }, and |T | terms T = {e1 , . . . , e|T | } in the network. Citations between papers form a directed subgraph
denoted by an adjacency matrix Y ∈ Rn×n with Yij = 1
if paper pi cites paper pj and Yij = 0 otherwise. For relationships between papers and authors, we use an undirected bipartite graph, denoted by a biadjacency matrix
(A)
R(A) ∈ Rn×|A| , where Rij = 1 if paper pi has the author aj
and R(A)
ij = 0 otherwise. Similarly, the relationships between
papers and venues can also be represented by a biadjacency
matrix R(V) ∈ Rn×|V| , R(V)
= 1 if paper pi is published in
ij
the venue vj and R(V)
= 0 otherwise. We extract a set of
ij
term objects T from the paper’s free-text and further construct an undirected bipartite subgraph between these terms
and papers to represent paper content. We use the weight
matrix R(T ) ∈ Rn×|T | to denote the paper-term subgraph
(T )
where Rij
is the term frequency of term ej in paper pi .
We adopt the concept of network schema to describe
the heterogeneous bibliographic network at the meta level
[22, 23]. An example is shown in Fig. 2.
As shown in [22, 27], meta path-based features in heterogeneous information networks describe a rich set of relation
semantics that can capture textual similarity, conceptual
relevance and several kinds of social relatedness. A meta
path is defined over network schema, where nodes are object types and edges are relation types. Table 1 shows some
examples that use meta paths to measure paper relevance
for the citation recommendation problem. Moreover, structural similarity measures can be defined on each meta path
to generate relevance features, as shown in [22, 27].
In general, we represent the meta path-based relevance
score between pi and pj as φ(pi , pj ). Suppose we generate L
different meta path-based relevance features by combining
different meta paths with different structural similarity measures, we can define a relevance scores matrix S(i) ∈ Rn×L
(i)
for every paper pi ∈ P where Sjl
= φ(l) (pi , pj ) is the l-th
meta path-based relevance score between pi and pj 1 .
In this work, we cast the citation recommendation problem into the problem of learning a recommendation score
function s(q, p) : Q × P 7→ R for a query manuscript q ∈ Q
and a target paper p ∈ P based on the heterogeneous bibliographic network. The learned function is later used to
1
Details of the meta path-based similarity computation in
heterogeneous bibliographic network can be found in [22].
compute scores between query and target papers to make a
recommendation. Formally, we define the citation recommendation problem as follows.
Definition 1 (Problem Definition). Given a heterogeneous bibliographic network G, and the terms, authors and
target venues for a query manuscript q ∈ Q, we aim to build
a recommendation model specifically for q, and recommend a
small subset of target papers p ∈ P as high quality references
for q, by ranking the papers with the score function s(q, p).
3.
THE PROPOSED FRAMEWORK
At a high level, the proposed cluster-based citation recommendation framework consists of two major steps:
1. Learning the model parameters based on known citations
by solving a joint optimization problem (Sec. 4).
2. Making paper-specific recommendations for each query
manuscript based on the learned ClusCite model, which
is introduced in detail in this section.
Table 2: Learned weights on seven different meta
paths for four mined interest groups (K = 40).
Meta path
P −V −P
P −A−P
P −A−P →P
P −T −P
P −T −P →P
P −T −P ←P
Group 1
0.0024
0.0054
0.6133∗∗
0.1227
0.0442
0.1938∗
Group 2
0.0113
0.0006
0.2159∗
0.0947
0.5448∗∗
0.0870
Group 3
0.0158
0.0192
0.2254
0.1579
0.3250∗
0.3578∗∗
Group 4
0.3076∗
0.1243
0.0213
0.1095
0.0231
0.2409∗∗
study this problem. However, if the interest switches to “LBFGS (algorithm)”, using P − V − P probably will hurt the
results since a much broarder set of venues involve studying
this algorithm, and thus, sharing a venue with the query
provides very weak evidence for paper relevance.
In order to capture the biased significance of different relevance features for different interest groups, we assign feature
weights for each interest group individually, leading to the
definition of a cluster-based relevance function as follows:
r (k) (q, p) =
L
X
(l)
wk · φ(l) (q, p).
(2)
l=1
3.1 Model Overview
We first provide an overview of the proposed model by
defining the major components in the score function s(q, p).
Given a query manuscript q, its citations will focus on
several interest groups each having its own behavioral patterns in finding relevant and high authority work (Fig. 1).
It is desirable to recommend papers that are highly ranked
in multiple interest groups of the query, since they best capture diverse information needs. We propose a cluster-based
score function to decide relative relevance and importance
of target papers in the context of each interest group. It
assigns a final recommendation score by integrating scores
computed with respect to different interest groups.
Mathematically, suppose paper citations can be softly clustered into K interest groups, based on multiple types of relationships between objects in the heterogeneous bibliographic
network, then we define the score function s(q, p) as follows:
s(q, p) =
K
X
n
o
(k)
θq(k) · r (k) (q, p) + fP (p) .
(1)
k=1
Function s(q, p) measures how likely a query manuscript
q ∈ Q is to cite a target paper p ∈ P. It is decomposed into
a set of cluster-based functions: the cluster-based relevance
function r(k) (q, p) : Q × P 7→ R measures the relatedness between q and p according to the k-th interest group, and paper relative authority function fP(k) (p) : P 7→ R computes the
relative importance of p within the k-th interest group. The
weighted combination of these functions defines the final recommendation score with respect to the group membership
indicators of q, i.e., {θq(k) : θq(k) > 0}, which represent how
likely query q is to belong to the K different interest groups.
3.2 Feature Weights for Paper Relevance
As mentioned in Sec. 2, one can compute a rich set of meta
path-based features to describe paper relevance under various relation semantics. Each meta path-based feature, could
play a distinct role in identifying relevant work in different
interest groups.
In Fig. 1, incorporating meta path P − V − P along with
textual similarity P −T −P can effectively suggest related papers under the interest “link prediction (problem)” because
only a compact set of venues (e.g., KDD, ICML and ICDM)
For each interest group k, we use a set of weights {wk(l) :
> 0} to measure the significance of the L different meta
path-based features {φ(l) (q, p)} for the group.
These K feature weights are estimated through a joint
optimization problem (Sec. 4). We demonstrate in Table 2
the learned feature patterns over 7 meta paths for 4 example
interest groups (* and ** highlight first and second most
significant values), using the random walk-based similarity
measure on DBLP. All 4 groups show distinct weights on
the 7 meta paths, justifying the claim that different interest
groups hold different feature weights. In particular, we find
meta paths which impose textual similarity (e.g., P −T −P ←
P ) as well as references of co-author’s papers (P −A−P → P )
play critical roles in finding relevant papers in these 4 groups,
which matches human intuitions very well.
(l)
wk
3.3 Object Relative Authority
A paper may have very different visibility or authority
among different interest groups even if it has many citations.
In the DBLP dataset [25], paper ObjectRank [3] (132 citations) got 47 citations from VLDB but only 12 from WWW,
while RankSVM [14] (250 citations) obtained only 27 citations from VLDB but 109 from WWW, implying the bias of
authority in different interest groups.
Instead of learning object’s group membership and deriving relative authority separately, we propose to estimate
them jointly using graph regularization, which preserves consistency over each subgraph. By doing so, paper relative
authority serves as a feature for learning interest groups,
and better estimated groups can in turn help derive relative
authority more accurately (Fig. 3).
We adopt the semi-supervised learning framework [8] that
leads to iteratively updating rules as authority propagation
between different types of objects.
FP = GP FP , FA , FV ; λA , λV ,
(3)
FA = GA FP and FV = GV FP .
We denote relative authority score matrices for paper, author and venue objects by FP ∈ RK×n , FA ∈ RK×|A| and
FV ∈ RK×|V| . Generally, in an interest group, relative importance of one type of object could be a combination of the
relative importance from different types of objects [23].
In our solution, the propagation function GP updates paper relative authority scores for all groups, following the intuition: High quality papers from an interest group are often
published in highly reputed venues, written by authoritative
authors and related to other high quality papers, from this
group. Trade-off parameters λA and λV control the relative
importance of paper-author and paper-venue relations. On
the other hand, propagation functions GA and GV capture
the rules: highly regarded authors often write good quality papers, and highly reputed venues often publish good
quality papers. We include detailed formulae for the three
propagation functions in Sec. 4.
3.4 Paper-Specific Citation Recommendation
In practice, to derive interest group memberships for newly
emerged queries, one has to re-estimate the model using
these queries and training data, which is highly inefficient.
Moreover, as the number of papers grows rapidly, the size of
the model parameter space will increase a lot, making the
model learning even more unscalable.
To tackle these two challenges, we leverage group memberships of the query’s related attribute objects, i.e., authors,
terms and target venues, to approximately represent group
membership of the query manuscript.
Intuitively, terms of the query manuscript describe its information needs based on paper content, whereas its author(s) and venue complement the content with research interests and other conceptual information. Specifically, we
(k)
represent the query’s group membership θq by weighted
integration of group memberships of its attribute objects.
θq(k) =
X
X
X ∈{A,V,T } x∈NX (q)
(k)
θx
.
|NX (q)|
(4)
We use NX (q) to denote type X neighbors for query q,
i.e., its attribute objects. How likely a type X object is to
(k)
belong to the k-th interest group is represented by θx .
Paper-specific citation recommendation can be efficiently
conducted for each query manuscript q by applying Eq. (1)
along with definitions in Eqs. (2), (3) and (4).
4.1 The Joint Optimization Problem
To learn model parameters, we use a citation network as
training data, where value 1 indicates observed citation relationships while value 0 represents a mixture of negatives
(should not cite) or unobserved (unaware and may cite in
the future) examples. Traditional learning methods adopt
classification [27] or learning-to-rank [1] objective functions
and usually treat all 0s in training data as negative examples, which does not fit the real cases.
Without loss of generality, we adopts weighted square error [11] on the citation matrix as the loss function to measure
the prediction performance, which is defined as follows:
2
K
n
K P
L
P
P
P
(k)
(k) (l) (i)
θpi FP,kj ,
θpi wk Sjl −
L=
Mij Yij −
i,j=1
k=1
k=1 l=1
n
P
2
(i)T
Mi ⊙ Yi − Ri P(WS
+ FP ) 2 .
=
i=1
(5)
We define the weight indicator matrix M ∈ Rn×n for the
citation matrix, where Mij takes value 1 if the citation relationship between pi and pj is observed and 0 in other cases.
By doing so, the model can focus on positive examples and
get rid of noise in the 0 values. One can also define other
loss functions to optimize with respect to precision or recall.
For ease of optimization, the loss can be further rewritten in a matrix form, where matrix P ∈ R(|T |+|A|+|V|)×K is
group membership indicator for all attribute objects while
Ri ∈ Rn×((|T |+|A|+|V|)) is the corresponding neighbor indicaP
R=
λA
2
+
4.
MODEL LEARNING
This section introduces the learning algorithm for the proposed citation recommendation model in Eq. (1).
There are three sets of parameters in our model: group
memberships for attribute objects; feature weights for interest groups; and object relative authority within each interest group. A straightforward way is to first conduct hardclustering of attribute objects based on the network and
then derive feature weights and relative authority for each
cluster. Such a solution encounters several problems: (1)
one object may have multiple citation interests, (2) mined
object clusters may not properly capture distinct citation
interests as we want, and (3) model performance may not
be best optimized by the mined clusters.
In our solution, we formulate a joint optimization problem to estimate all model parameters simultaneously, which
minimizes prediction error as well as graph regularization.
By doing so, we can softly cluster attribute objects in terms
of their citation interests and guarantee the learned model
can yield good performance on training data.
We explain the joint optimization problem in Sec. 4.1 and
design an efficient algorithm to solve it in Sec. 4.2 along with
its computational complexity analysis in Sec. 4.3.
P
(k)
tor matrix such that Ri P = X ∈{A,V,T } x∈NX (pi ) |NθXx(pi )| .
Feature weights for each interest group are represented by
(l)
each row of the matrix W ∈ RK×L , i.e., Wkl = wk . Hadamard
product ⊙ is used for the matrix element-wise product.
As discussed in Sec. 3.3, to achieve authority learning
jointly, we adopt graph regularization to preserve consistency over the paper-author and paper-venue subgraphs,
which takes the following form:
n |A|
P
P
i=1 j=1
n |V|
P
P
λV
2
FP,i
(A)
Rij
i=1 j=1
(PA)
Dii
(V)
Rij
−
FP,i
(PV)
Dii
FA,j
(AP)
Djj
−
2
2
FV,j
(VP)
Djj
2
2
(6)
.
The intuition behind the above two terms is natural: Linked
objects in the heterogeneous network are more likely to share
similar relative authority scores [13]. To reduce impact of
node popularity, we apply a normalization technique on authority vectors, which helps suppress popular objects to keep
them from dominating the authority propagation. Each
element in the diagonal matrix D(PA) ∈ Rn×n is the degree of paper pi in subgraph R(A) while each element in
D(AP) ∈ R|A|×|A| is the degree of author aj in subgraph
R(A) . Similarly, we can define the two diagonal matrices for
subgraph R(V) .
Integrating the loss in Eq. (5) with graph regularization in
Eq. (6), we formulate a joint optimization problem following
the semi-supervised learning framework [8]:
min
P,W,FP ,FA ,FV
cp
cw
1
L + R + kPk2F +
kWk2F
2
2
2
s.t. P ≥ 0; W ≥ 0.
(7)
To ensure stability of the obtained solution, Tikhonov regularizers are imposed on variables P and W [4], and we use
cp , cw > 0 to control the strength of regularization. In addition, we impose non-negativity constraints to make sure
Directly solving Eq (7) is not easy because the objective
function is non-convex. We develop an alternative minimization algorithm, called ClusCite, which alternatively
optimizes the problem with respect to each variable.
The learning algorithm essentially accomplishes two things
simultaneously and iteratively: Co-clustering of attribute
objects and relevance features with respect to interest groups,
and authority propagation between different objects. During an iteration, different learning components will mutually
enhance each other (Fig. 3): Feature weights and relative
authority can be more accurately derived with high quality
interest groups while in turn they serve a good feature for
learning high quality interest groups.
First, to learn group membership for attribute objects,
we take the derivative of the objective function in Eq. (7)
with respect to P while fixing other variables, and apply
the Karush-Kuhn-Tucker complementary condition to impose the non-negativity constraint [7]. With some simple
algebraic operations, a multiplicative update formula for P
can be derived as follows:
i
hP
n
−
RTi Ỹi S(i) WT + L+
P1 + LP2
jk
i=1
h
i
, (8)
Pjk ← Pjk
−
+
LP0 + LP1 + LP2 + cp P
jk
where matrices LP0 , LP1 and LP2 are defined as follows:
LP0
=
n
X
RTi Ri PWS̃(i)T S̃(i) WT ; LP1 =
=
n
X
(i) (i)T
RTi Ri PF̃P F̃P
i=1
+
RTi Ỹi FTP ;
i=1
i=1
LP2
n
X
+
n
X
RTi Ri PWS̃(i)T FTP
i=1
n
X
RTi Ri PFP S̃(i) WT .
i=1
In order to preserve non-negativity throughout the update,
+
+
LP1 is decomposed into L−
P1 and LP1 where Aij = (|Aij | +
−
Aij )/2 and Aij = (|Aij | − Aij )/2. Similarly, we decompose
+
LP2 into L−
P2 and LP2 , but note that the decomposition is
applied to each of the three components of LP2 , respectively.
We denote the masked Yi as Ỹi , which is the Hadamard
product of Mi and Yi . Similarly, S̃(i) and F̃(i)
P denote rowwise masked S(i) and FP by Mi .
Second, to learn feature weights for interest groups, the
multiplicative update formula for W can be derived following a similar derivation as that of P, taking the form:
i
hP
n
PT RTi Ỹi S(i) + L−
W
kl
i=1
Wkl ← Wkl h P
i ,
n
+
T
T
(i)T
(i)
P Ri Ri PWS̃
S̃ + LW + cw W
kl
i=1
(9)
where we have LW = i=1 PT RTi Ri PFP S̃(i) .
Similarly, to preserve non-negativity of W, LW is decom−
posed into L+
W and LW , which can be computed same before.
Finally, we derive the authority propagation functions in
Eq. (3) by optimizing the objective function in Eq. (7) with
respect to the authority score matrices of papers, authors
Pn
Iteration 5
0.2 0.4 0.6 0.8
1
Paper relative authority score
20
18
16
14
12
10
8
6
4
2
0
0
Iteration 10
# citations from test set
20
18
16
14
12
10
8
6
4
2
0
0
# citations from test set
4.2 The ClusCite Algorithm
# citations from test set
Iteration 1
learned group membership indicators and feature weights
can provide semantic meaning as we want.
0.2
0.4
0.6
0.8
1
Paper relative authority score
20
18
16
14
12
10
8
6
4
2
0
0
0.2
0.4
0.6
0.8
1
Paper relative authority score
Figure 3: Correlation between paper relative authority and # ground truth citations, during different iterations.
and venues. Specifically, we take the derivative of the objective function with respect to FP , FA and FV , and follow
traditional semi-supervised learning frameworks [8] to derive
the update rules, which take the form:
FP
FA
FV
= GP (FP , FA , FV ; λA , λV )
1
λA FA STA + λV FV STV + LFP (10)
=
λA + λV
= GA (FP ) = FP SA ;
(11)
= GV (FP ) = FP SV .
(12)
where we have normalized adjacency matrices and the paper
authority guidance terms defined as follows:
SA
=
(D(PA) )−1/2 R(A) (D(AP) )−1/2
SV
=
LFP
=
(D(PV) )−1/2 R(V) (D(VP) )−1/2
n
n
o
X
(i)
PT RTi Ỹi − Ri P WS̃(i) + F̃P
i=1
Using normalized adjacency matrices SA and SV to propagate relative authority can suppress popular objects in the
network. In this way, they will not dominate the authority propagation. At each iteration, the guidance term LFP
adjusts paper relative authority such that the model can fit
known citations in a more accurate way.
Algorithm 1 summarizes the ClusCite algorithm. For convergence analysis, ClusCite essentially applies block coordinate descent on the optimization problem in Eq. (7). The
proof procedure in [26] can be adopted to prove convergence
for ClusCite (to the local minimum). For lack of space, we
do not include it here.
Fig. 3 illustrates the quality change of estimated paper relative authority. Given an interest group, citations from test
set to training papers serve as our ground truth for the relative authority of this group. We study the change of correlation between estimated relative authority and the ground
truth, during different iterations. The initialization (global
citation count) shows poor quality based on the correlation.
As the algorithm iterates, we observe significant enhancement on the correlation, which justifies the effectiveness of
the proposed authority propagation approach.
4.3 Computational Complexity Analysis
In this section, we analyze the computational complexity of the proposed ClusCite algorithm. Let d denote the
total number of attribute objects and |E| the total number of links in the heterogeneous network. First, it takes
O(K(n + d)) time to initialize all the variables and O(|E|L)
time to pre-compute the constants in the
update formula.
P
In addition, we
apply
the
fact
that:
An XBn = C is
n
P
N
equivalent to ( n BTn An ) vec(X) = vec(C), in our implementation so that we can avoid summations over all papers
N
by pre-computing several matrix Kronecker products ( ).
2
2
3
2
This step takes totally O(L |E| /n + L|E| /n ) time.
We then study the time complexity at each iteration of
ClusCite with pre-computed matrices. Learning the group
Table 3: Statistics of two bibliographic networks.
Algorithm 1 Model Learning by ClusCite
Input: adjacency matrices {Y, SA , SV }, neighbor indicator
R, mask matrix M, meta path-based features {S(i) }, parameters {λA , λV , cw , cp }, number of interest groups K
Output: group membership P, feature weights W, relative
authority {FP , FA , FV }
1: Initialize P, W with positive values, and {FP , FA , FV }
with citation counts from training set
2: repeat
3:
Update group membership P by Eq. (8)
4:
Update feature weights W by Eq. (9)
5:
Compute paper relative authority FP by Eq. (10)
6:
Compute author relative authority FA by Eq. (11)
7:
Compute venue relative authority FV by Eq. (12)
8: until objective in Eq. (7) converges
membership matrix P by Eq. (8) takes O(L|E|3 /n3 +L2 |E|2 /n2
+ Kdn) time. Learning the feature weights W by Eq. (9)
takes O(L|E|3 /n3 + L2 |E|2 /n2 + Kdn) time. Updating all
three relative authority matrices takes O(L|E|3 /n3 + |E| +
Kdn) time. Let the number of iterations to compute ClusCite be T (T ≪ n). The total time complexity is O(L|E|3 /n2 +
L2 |E|2 /n + T |E| + T Kdn). In our experiments, ClusCite
usually converges within 50 iterations.
5.
EXPERIMENTS
In this section, we evaluate the recommendation performance of the proposed method on real world data and conduct case studies to demonstrate its effectiveness.
5.1 Data Preparation
In the experiments, two different bibliographic datasets
are used: the DBLP dataset2 [25] and the PubMed dataset3 .
Statistics of the two constructed heterogeneous bibliographic
networks are summarized in Table 3.
5.1.1 Heterogeneous Bibliographic Networks
Tang et al. [25] extracted citation information and built a
DBLP citation dataset. We generated a subset of the aforementioned dataset by filtering out papers with incomplete
meta information or less than 5 citations. Keywords and key
phrases are extracted from paper titles and abstracts using
the TF-IDF measure and the TextBlob noun phrase extractor4 . The PubMed Central dataset is processed by the same
method as described above to generate a subset5 . We converted both datasets into heterogeneous bibliographic networks according to the network schema in Fig. 2.
5.1.2 Training and Evaluation Sets
We split the network to generate training, validation and
testing subsets according to the paper publication year. We
considered three time intervals T0 , T1 and T2 . The subnetwork associated with papers in T0 was used for model
training. Papers in T1 were then used as the validation set
for parameter tuning and papers in T2 were used as the test
set for evaluations. Tables 4(a) and 4(b) summarize the
statistics of the subsets. During evaluation, we consider citations from the evaluation sets (T1 and T2 ) to the training
2
http://arnetminer.org/DBLP_Citation
http://www.ncbi.nlm.nih.gov/pmc/
4
http://textblob.readthedocs.org/en/latest/
5
https://github.com/shanzhenren2/PubMed_subset
3
Data sets
# papers
# authors
# venues
# terms
# relationships
Paper avg citations
DBLP
137,298
135,612
2,639
29,814
∼2.3M
5.16
PubMed
100,215
212,312
2,319
37,618
∼3.6M
17.55
Table 4: Training, validation and testing paper subsets from the DBLP and PubMed datasets
(a) The DBLP dataset
Subsets
Years
# papers
Train
T0 =[1996, 2007]
62.23%
Validation
T1 =[2008]
12.56%
Test
T2 =[2009, 2011]
25.21%
(b) The PubMed dataset
Subsets
Years
# papers
Train
T0 =[1966, 2008]
64.50%
Validation
T1 =[2009]
7.81%
Test
T2 =[2010, 2013]
27.69%
set (T0 ) as the ground truth. Such an evaluation practice
is more realistic because a citation recommendation system
only knows the related attribute objects of a newly written manuscript. Also, it predicts future citations based on
models which are learned from past citations.
5.1.3 Feature Generation
In the experiments, without loss of generality, we selected
15 different meta-paths between paper objects including (P −
X − P )y , P − X − P → P and P − X − P ← P where
X = {A, V, T } and y = {1, 2, 3}. Note that (P − X − P )2
denotes P −X −P −X −P . We used two different structural
similarity measures: PathSim [22] measure and the randomwalk based measure [27]. We applied the random-walk based
measure to all meta-paths and the PathSim measure to only
symmetric meta-paths due to its requirement. This provides
us with 24 meta-path based relevance features. Note that
all the “cited” and “citing” relations in the meta-paths were
only measured between papers in the training set.
5.2 Experimental Settings
We provide details on the experimental settings for conducting evaluations on all the methods.
5.2.1 Compared Methods
We compared the proposed method (ClusCite) with its
variation which considered only relevance features (ClusCiteRel). Several widely deployed or state-of-the-art citation
recommendation approaches were also implemented, including content-based methods, link-based methods and hybrid
methods. All compared methods were first tuned on validation set to pick the tuning parameters.
BM25: BM25 is a text-based method, which computes
similarity scores using only text information.
PopRank [19]: PopRank is a link-based method which
derives an object’s importance based on authority propagation in the heterogeneous bibliographic network.
TopicSim: We measure similarity between papers with
topic modeling technique (LDA) and return the papers with
the most similar topic distribution compared with the query.
Link-PLSA-LDA [18]: Link-PLSA-LDA6 is a hybrid
method that leverages both document text and citation links
6
https://sites.google.com/site/rameshnallapati/
software
Table 5: Recommendation performance comparisons on DBLP and PubMed datasets in terms of Precision,
Recall and MRR. We set the number of interest groups to be 200 (K = 200) for ClusCite and ClusCite-Rel.
Method
BM25
PopRank
TopicSim
Link-PLSA-LDA
L2-LR
RankSVM
MixFea
ClusCite-Rel
ClusCite
P@10
0.1260
0.0112
0.0328
0.1023
0.2274
0.2372
0.2261
0.2402
0.2429
P@20
0.0902
0.0098
0.0273
0.0893
0.1677
0.1799
0.1689
0.1872
0.1958
0.55
0.3
Recall
0.4
0.35
0.2
0.15
L2−LR
RankSVM
MixFea
ClusCite−Rel
ClusCite
0.1
0 10 20 30 40 50 60 70 80 90 100
Top−M Recommendations
(a) Recall on DBLP
Precision
0.45
0.3
R@50
0.2146
0.0308
0.0825
0.1823
0.3547
0.3621
0.3636
0.4015
0.4279
0.35
0.5
0.25
DBLP
R@20
0.1431
0.0155
0.0432
0.1295
0.2471
0.2733
0.2473
0.2856
0.2993
0.25
0.2
L2−LR
RankSVM
MixFea
ClusCite−Rel
ClusCite
0.15
0.1
0.05
0 10 20 30 40 50 60 70 80 90 100
Top−M Recommendations
(b) Precision on PubMed
Figure 4: Performance comparisons measured by
Recall and Precision at different positions.
when modeling topics. The candidates were ranked in terms
of the conditional probability of citations from the query
manuscript to the candidate papers.
L2-LR [27]: This technique changes the problem into
classification with a linearly weighted combination of meta
path-based relevance features. Positive examples are observed citations and negative examples are randomly sampled paper pairs.
RankSVM [14]: RankSVM considers the preference between paper-paper relationships, instead of assuming all unobserved relationships are negative examples.
MixFea: the candidates were ranked by a linear combination of meta path-based relevance features, topic distributions and PopRank’s features. We used RankSVM to
estimate feature weights.
ClusCite: candidates were ranked based on the scores
computed by Eq. (1). We set the number of interest groups
K = 200, cp = 10−6 , cw = 10−7 , λA = λV = 0.3 after
tuning them on validation sets (Fig. 5 and Sec. 5.6).
ClusCite-Rel: candidates were ranked based on the proposed model with only meta path-based relevance features,
i.e., by dropping FP in Eq. (1). It used the same settings
on K, cp and cw as those of ClusCite.
5.2.2 Evaluation Metrics
We employed Precision and Recall at position M (P@M
and R@M ) as the evaluation metrics. Recall@M is defined as the percentage of original citing papers that appear in the top-M recommended list. A high recall with a
lower M indicates a better citation recommendation system.
Precision@M was also used to measure the effectiveness of
the recommendation system by checking whether the original citing papers were ranked high for the query manuscript.
Furthermore, it is desirable that ground truth papers should
appear earlier in the top-M recommended list. Therefore,
Mean Reciprocal Rank (MRR) was also employed
P over the
1
,
target papers, which is defined as MRR = |Q1T | q∈QT rank(q)
where QT is the testing set and rank(q) denotes the rank of
its first ground truth paper (positive example).
MRR
0.4107
0.0451
0.1161
0.3748
0.4866
0.4989
0.5002
0.5156
0.5481
P@10
0.1847
0.0438
0.0761
0.1439
0.2527
0.2534
0.2699
0.2786
0.3019
P@20
0.1349
0.0314
0.0685
0.1002
0.1959
0.1954
0.2025
0.2221
0.2434
PubMed
R@20
R@50
0.1754
0.2470
0.0402
0.0814
0.0855
0.1516
0.1589
0.2015
0.2504
0.3981
0.2499
0.382
0.2519
0.4021
0.2753
0.4305
0.3129 0.4587
MRR
0.4971
0.2012
0.3254
0.4079
0.5308
0.5187
0.5041
0.5524
0.5787
5.3 Performance Comparison
We now compare the proposed recommendation model
(ClusCite) with its variation (ClusCite-Rel) and other baselines in terms of the citation recommendation performance.
First, we compare the proposed methods with seven different baselines using Precision@10, 20, Recall@20, 50 and
MRR. Table 5 summarizes the comparison results on both
DBLP and PubMed datasets. Overall, the proposed ClusCite method and its variation ClusCite-Rel outperform other
methods on all metrics. In particular, ClusCite obtains a
17.68% improvement in Recall@50 and 9.57% improvement
in MRR compared to the best baseline on the DBLP dataset.
On the PubMed dataset, it improves Recall@20 by 20.19%
and MRR by 14.79% compared to MixFea. Even though
MixFea has incorporated a rich set of features, ClusCite obtained superior performance because it not only explores
citation behaviors by learning group-based feature weights
over different relation semantics, but also integrates relative
paper authority to augment the recommendation process.
The ClusCite-Rel method outperforms all other baselines
and improves Recall@50 by 10.42% compared to the best
baseline, MixFea, on the DBLP dataset. Comparing ClusCiteRel with methods such as RankSVM and L2-LR, one can
clearly notice the performance gain from distinguishing relevance feature weights for different interest groups. ClusCite always outperforms ClusCite-Rel, improving MRR by
12.21% and Recall@50 by 6.57% on the DBLP dataset. The
enhancement mainly comes from utilizing paper relative authority with respect to different interest groups. Also, the
derived relative authority can assist recommendation since
it is jointly learned through the unified optimization.
MixFea is another method that incorporates paper authority information, but it does not distinguish paper authority in different interest groups. However, it still obtained
better results than RankSVM and L2-LR did in most cases.
This demonstrates the effectiveness of paper authority information in the citation recommendation process. Furthermore, poor performance of PopRank shows that using only
global authority is not sufficient to conduct good citation
recommendation. Different from the conclusions in [10], We
found that Link-PLSA-LDA and TopicSim can only achieve
0.0893 and 0.0273 for Precision@20 (compared to 0.1677
with L2-LR), respectively. Also, BM25 outperformed both
of the topic-based methods in all cases. This shows that
topic-based features are not good enough for finding relevant papers, since the features may be of coarse granularity.
For more comprehensive comparisons, we computed the
precision and recall at different positions (5 to 100) to study
the trends in performance changes. Due to space limits,
Fig. 4 only shows the comparison results of Recall on DBLP
and comparison results of Precision on PubMed, respec-
MixFea
ClusCite−Rel
ClusCite
100
200
300
400
Number of interest groups (K)
500
(a) DBLP
MixFea
ClusCite−Rel
ClusCite
100
200
300
400
500
Number of interest groups (K)
(b) PubMed
Figure 5: Performance change (Recall@50) on validation sets, with respect to number of interest
groups (K) (MixFea as baseline).
tively. For both precision and recall, the performance gap
between ClusCite and ClusCite-Rel gets slightly larger as
more candidates are returned. This indicates that authority
information played a critical role in identifying papers with
moderate relevance to the query (people may cite relevant
papers even though they are new and less reputed, but they
prefer authoritative ones among the less relevant papers).
5.4 Performance Analysis
In this section, we analyze the performance of ClusCite,
ClusCite-Rel and MixFea in different recommendation scenarios. We ran the following experiments on both datasets
and observed similar performance changes in both. However, in the interest of brevity, we only present results from
the PubMed dataset for some analyses.
First, we studied performance change with respect to the
number of interest groups for ClusCite and ClusCite-Rel.
As presented in Fig. 5(a) and 5(b), although not very sensitive to K, these two methods did perform differently when
the number of groups were varied. Also, the performance
changes were more notable at smaller K, i.e.K < 100. This
indicates that the proposed methods cannot determine citation behavior well when the number of groups is small. On
the other hand, a large K (e.g.K > 300) caused a performance drop due to the insufficiency of training data in deriving interest groups. We found that ClusCite achieved the
best performance when the number of groups was K = 200
while ClusCite-Rel obtained the best performance with a
large number K = 300. This shows that biomedical domain
has more diverse citation behavior patterns.
In ClusCite and ClusCite-Rel, the paper-specific recommendation makes a prediction for a query based on its attribute objects. Therefore, we want to examine their performance change by studying the correlation between recommendations of the two proposed methods and the number of
attribute objects in the query manuscript. We divided the
test set into 6 groups with respect to the number of attribute
objects. The resulting query groups had an average number of attribute objects ranging from 6.46 (group 1) to 18.98
(group 6). The results by MRR are summarized in Fig. 6(a).
Overall, ClusCite outperformed ClusCite-Rel, and both outperformed MixFea. The proposed methods achieved a larger
performance improvement when the number of attribute objects increased (e.g.from 0.02 in group 1 to 0.08 in group 6)
while the performance of MixFea seemed less sensitive between different query groups. This demonstrates that with
more attribute objects provided by the query manuscript,
the proposed method can make better paper-specific recommendations because richer attribute objects provide better
estimation on group membership of the query manuscript.
Finally, we tested the model generalization by evaluating performance on test papers from different time periods.
We generated four test subsets using papers in T2 of the
0.65
0.63
0.61
0.59
0.57
0.55
0.53
0.51
0.49
0.47
0.45
1
Model Generalization Test
MixFea
ClusCite−Rel
ClusCite
Recall@50
0.46
0.45
0.44
0.43
0.42
0.41
0.4
0.39
0.38
0.37
0.36
0
MRR
Recall@50
Recall@50
0.48
0.47
0.46
0.45
0.44
0.43
0.42
0.41
0.4
0.39
0.38
0.37
0.36
0
2
3
4
5
Query group ID
6
0.6
0.55
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
ClusCite
MixFea
2010
2011
2012
2013
Time period
(a) # attribute objects
(b) time periods of queries
Figure 6: Performance change with number of attribute objects and time periods of query papers.
PubMed dataset where each subset consists of papers from
one specific year. By applying the methods on each subset,
we want to study how the model, learned from papers in
T0 , can predict citations for future papers. The study results are shown in Fig. 6(b). Overall, the performance of
both methods dropped when recommending for newer papers but ClusCite always outperformed MixFea. Recall@50
of MixFea decreased by 16.42% from year 2010 to 2013 while
Recall@50 of ClusCite dropped only about 7.72%, which indicates the better generalization of the proposed method.
5.5 Case Studies
To demonstrate the effectiveness of mining hidden interest
groups, we conduct two sets of case studies on the DBLP
dataset to show citation behavioral patterns (Fig. 7) and
relative authority ranking of authors and venues, (Table 6)
within an example of mined interest groups.
First, we show that the learned interest groups have distinct citation behavioral patterns and can satisfy different
information needs. We apply K-means clustering on all objects’ group membership indicators and derive their most
likely groups (we set K = 40). Two representative groups
were picked where group A contained 8,345 papers and 208
venues and group B contained 10,922 papers and 291 venues.
We found that major venues in group A were database venues
(e.g., “SIGMOD” and “VLDB”) and those in group B were
computer vision venues (e.g., “TPAMI” and “IJCV”). To
study how the four venues were cited by papers in the two
interest groups, we calculated the average number of citations from papers in group A and B to the four venues,
respectively. The results are in Fig. 7(a). One can see that
papers in group A prefer to cite database papers while those
in group B cite computer vision papers more frequently.
Following a similar procedure, we selected two more representative groups and studied their papers’ citations on four
different authors: data mining researchers “Pillip S. Yu” and
“Rakesh Agrawal” from group A and programming language
researchers “Thomas W. Reps” and “Ken Kenndedy” from
group B. The average number of citations for these four authors are summarized in Fig. 7(b). Similar behavioral patterns were observed that papers in group A cite data mining
researchers more frequently while papers in group B prefer programming language researchers. The derived interest
groups show two different behavioral patterns on citations,
and justify that they can capture different citation interests.
Second, we study the effectiveness of the relative authority
propagation process in the proposed ClusCite algorithm. By
setting the number of interest groups as K = 40, we apply
ClusCite on the training data and obtain relative authority
scores for authors (FA ) and venues (FV ). We can list the
top ranked objects based on their relative authority scores
within different interest groups. Table 6 shows the ranked
lists for two example interest groups. One can easily identify
SIGMOD
VLDB
TPAMI
IJCV
2
1.5
1
0.5
0
Group A
Group B
Interest groups
(a) Citations on venues
Averaged number of citations
Averaged number of citations
2.5
0.1
0.08
Philip S. Yu
Rakesh Agrawal
Thomas W. Reps
Ken Kennedy
0.06
0.04
0.02
0
Group A
Group B
Interest groups
(b) Citations on authors
Figure 7: Case studies on citation behavioral patterns among different interest groups. We show the
averaged number of citations on four venues and
four authors, for two groups of papers.
the research areas that these two interest groups belong to:
Group I is on database and information system while Group
II is on computer vision and multimedia. There is a high
degree of consensus between the ranking list generated by
ClusCite and the top venues and reputed authors in each
research area. This demonstrates that the relative authority
propagation can generate meaningful authority scores with
respect to different interest groups.
Table 6: Top-5 authority venues and authors from
two example interest groups derived by ClusCite.
Rank
1
2
3
4
5
1
2
3
4
5
Venue
Author
Group I (database and information system)
VLDB
0.0763 Hector Garcia-Molina 0.0202
SIGMOD
0.0653
Christos Faloutsos
0.0187
TKDE
0.0651
Elisa Bertino
0.0180
CIKM
0.0590
Dan Suciu
0.0179
SIGKDD
0.0488
H. V. Jagadish
0.0178
Group II (computer vision and multimedia)
TPAMI
0.0733
Richard Szeliski
0.0139
ACM MM 0.0533
Jitendra Malik
0.0122
ICCV
0.0403
Luc Van Gool
0.0121
CVPR
0.0401
Andrew Blake
0.0117
ECCV
0.0393
Alex Pentland
0.0114
5.6 Parameter Study
In this section, we study the impact of four parameters:
cp and cw in ClusCite and ClusCite-Rel, and λA and λV in
ClusCite, on validation sets. The number of interest groups
are set as K = 200. MixFea, the best baseline, is the only
one used here. For conciseness, only DBLP dataset results
are presented in Fig. 8, where the x-axes are in log scale.
In the joint optimization problem in Eq. (7), cp and cw
control the strength of Tikhonov regularizers on group membership indicators and relevance feature weights. A larger
value imposes a higher penalty on the magnitude of variable
values. We vary one of these two parameters while fixing the
other as zero. For ClusCite, we set λA = λV = 0.1. Both
ClusCite and ClusCite-Rel show robust performance over a
large range of cw (Fig. 8(a)) and achieve significant improvement compared to MixFea. We observe a similar trend when
varying cw (Fig. 8(b)) but ClusCite performs slightly better
when cw = 10−7 . Such changes are because W plays a role
in balancing relevance and authority scores for the ClusCite
model while scaling of P will not affect the ranking results.
ClusCite has two more parameters λA and λV , which
control relative importance of authority information from
authors and venues, respectively. By setting one to zero
and varying the other, we aim to see a performance change
when only one information source is utilized in the authority propagation process. Using ClusCite-Rel and MixFea as
baselines, one can see that both information sources help
improve the performance of ClusCite significantly. ClusCite
achieves the best performance when λA = 0.3 (Fig. 8(c))
and λV = 0.3 (Fig. 8(d)). In particular, we found that applying venue information to authority propagation led to
better results.
6. RELATED WORK
6.1 Citation Recommendation
Existing work leverages different kinds of information to
recommend citations for a query manuscript, from paper
content, known citations to authors and venues of a paper.
Traditional keyword-based approaches have difficulty in
finding conceptually similar work due to the ambiguity of
short-text queries [20, 5]. One can notice that the performance of BM25 in our experiments is much worse than those
of the hybrid methods like L2-LR. Using citation local contexts, i.e., text surrounding the citation positions, contextbased methods can capture diverse information needs more
precisely [24, 10, 12]. However, the local context might be
irrelevant to the ideas of cited paper. Moreover, picking the
size of each context window is non-trivial. Also, it will be
interesting to study different intents and purposes behind
the citation contexts to leverage them more accurately.
On the other hand, known citations can be used to measure paper structural similarity. Traditional link prediction
techniques [16, 2] and collaborative filtering techniques [11]
encounter cold-start issue since in practice little or no citations are provided for query manuscript. Heterogenous
link prediction techniques [17, 6] tackle this issue by taking
advantages of multiple types of relationships between papers, authors and venues. However, these link-based methods cannot achieve satisfactory results without considering
content-based features. Therefore, recent studies start integrating both content and structure information to augment
the performance [20]. Latent topic models are used to predict citations for new documents by modeling citation links
jointly[18, 24]. However, topical similarity may be too coarse
to serve as good evidence for citation prediction and experimental results on TopicSim and Link-PLSA-LDA show
their limited performance. Yu et al. [27] derive a rich set of
meta-path based features from heterogeneous bibliographic
networks in modeling citation recommendation, which can
capture text-based similarity, conceptual relevance as well
as several types of social relatedness.
Aforementioned methods consider only paper relevance
but ignore another critical information for citation recommendation, namely the importance and quality of target papers [20]. Bethard and Jurafsky [5] built a literature search
system by learning a linearly weighted model over both relevance and authority features.
Our work considers diverse citation interests by imposing
different feature patternsaccording to the interest groups of
each query (see comparisons between ClusCite-Rel and L2LR [27]). Yu et al. [28] study personalized entity recommendation, which shares the similar idea of building local retrieval model for each cluster specifically. Our work is also
related to [5] in terms of incorporating paper authority, but
we derive paper relative authority within each group specifically (see comparisons between ClusCite and MixFea).
6.2 Authority Ranking on Graphs
Ranking objects on graphs by their importance and popularity has been extensively studied [15, 19] and combined
with keyword search system [3]. In particular, Sun and
cp
(a) Varying cp
cw
(b) Varying cw
0.455
MixFea
ClusCite−Rel
ClusCite
0.45
0.445
0.44
Recall@50
0.46
MixFea
0.45
ClusCite−Rel
0.44
ClusCite
0.43
0.42
0.41
0.4
0.39
0.38
0.37
0.36 −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0
10 10 10 10 10 10 10 10 10 10 10
Recall@50
Recall@50
Recall@50
0.46
MixFea
0.45
ClusCite−Rel
0.44
ClusCite
0.43
0.42
0.41
0.4
0.39
0.38
0.37
0.36 −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0
10 10 10 10 10 10 10 10 10 10 10
0.435
0.43
0.425
0.42
0.415
0.41
0 0.1
0.3
0.5
cw
0.7
0.9 1
(c) Varing λA
0.465
0.46
0.455
0.45
0.445
0.44
0.435
0.43
0.425
0.42
0.415
0.41
0 0.1
MixFea
ClusCite−Rel
ClusCite
0.3
0.5
cw
0.7
0.9 1
(d) Varing λV
Figure 8: Performance change (Recall@50) of ClusCite-Rel and ClusCite on DBLP validation set when varying
parameters cp and cw for both methods, and λA and λV for ClusCite. MixFea is used as a baseline.
Giles [21] consider both citation impacts as well as venue
influence when propagating paper authority scores in bibliographic networks. With ranking supervision, graph-based
semi-supervised ranking frameworks can be further applied[1,
8]. However, these methods do not capture the bias of authority when topics or interests of the query change. (see
comparison between PopRank [19] and ClusCite)
Haveliwala [9] personalizes the PageRank algorithm by
considering query topics to derive query-specific authority
score. Similar ideas were explored when performing clustering [23] and classification [13] in heterogeneous information
networks, where object relative authority served as features
for representing classes. To our best knowledge, the proposed method is the first to learn object relative authority through optimizing the citation recommendation model,
based on multiple types of relationships in heterogeneous
bibliographic networks.
7.
CONCLUSION AND FUTURE WORK
In this paper, we study citation recommendation in the
context of heterogeneous bibliographic networks and propose a novel cluster-based citation recommendation framework to satisfy a user’s diverse citation intents. By organizing paper citations into interest groups, the proposed
method is able to determine the significance of different
structural relevance features for each group, and derive paper’s relative authority within each group. In this way, we
can make paper-specific recommendations to capture each
query’s diverse information needs. We formulate a joint
optimization problem to learn model parameters by taking
advantage of multiple relationships in the network, and develop an efficient algorithm to solve it. Performance evaluation results show a significant improvement compared to
state-of-the-art methods and the case studies demonstrate
the effectiveness of the proposed method.
Interesting future work includes extending the proposed
clustering-based recommendation framework for Web search
tasks or entity recommendation so that one can capture local
relevance and authority jointly. In addition, there is potential to adjust the network structure for each interest group so
that relative authority can more accurately propagate within
the corresponding sub-networks. Finally, one can integrate
object authority information with each meta path instance
to design novel features for citation recommendation.
8.
ACKNOWLEDGEMENTS
The work was supported in part by the U.S. Army Research Laboratory under Cooperative Agreement No. W911NF09-2-0053 (NS-CTA), the U.S. Army Research Office under
Cooperative Agreement No. W911NF-13-1-0193, U.S. National Science Foundation grants CNS-0931975, IIS-1017362,
IIS-1320617, IIS-1354329, NASA NRA-NNH10ZDA001N, DTRA, and MIAS, a DHS-IDS Center for Multimodal Information Access and Synthesis at UIUC.
9. REFERENCES
[1] A. Agarwal, S. Chakrabarti, and S. Aggarwal. Learning to rank
networked entities. In SIGKDD, 2006.
[2] L. Backstrom and J. Leskovec. Supervised random walks:
predicting and recommending links in social networks. In
WSDM, 2011.
[3] A. Balmin, V. Hristidis, and Y. Papakonstantinou. ObjectRank:
Authority-based keyword search in databases. In VLDB, 2004.
[4] M. Belkin, P. Niyogi, and V. Sindhwani. Manifold
regularization: A geometric framework for learning from
labeled and unlabeled examples. JMLR, 7:2399–2434, 2006.
[5] S. Bethard and D. Jurafsky. Who should I cite: learning
literature search models from citation behavior. In CIKM, 2010.
[6] Y. Dong, J. Tang, S. Wu, J. Tian, N. V. Chawla, J. Rao, and
H. Cao. Link prediction and recommendation across
heterogeneous social networks. In ICDM, 2012.
[7] Q. Gu, J. Zhou, and C. Ding. Collaborative filtering: Weighted
nonnegative matrix factorization incorporating user and item
graphs. In SDM, 2010.
[8] Z. Guan, J. Bu, Q. Mei, C. Chen, and C. Wang. Personalized
tag recommendation using graph-based ranking on multi-type
interrelated objects. In SIGIR, 2009.
[9] T. H. Haveliwala. Topic-sensitive pagerank. In WWW, 2002.
[10] Q. He, J. Pei, D. Kifer, P. Mitra, and L. Giles. Context-aware
citation recommendation. In WWW, 2010.
[11] Y. Hu, Y. Koren, and C. Volinsky. Collaborative filtering for
implicit feedback datasets. In ICDM, 2008.
[12] W. Huang, S. Kataria, C. Caragea, P. Mitra, C. L. Giles, and
L. Rokach. Recommending citations: translating papers into
references. In CIKM, 2012.
[13] M. Ji, J. Han, and M. Danilevsky. Ranking-based classification
of heterogeneous information networks. In SIGKDD, 2011.
[14] T. Joachims. Optimizing search engines using clickthrough
data. In SIGKDD, 2002.
[15] J. M. Kleinberg. Authoritative sources in a hyperlinked
environment. Journal of the ACM, 46(5):604–632, 1999.
[16] D. Liben-Nowell and J. Kleinberg. The link prediction problem
for social networks. In CIKM, 2003.
[17] Z. Lu, B. Savas, W. Tang, and I. S. Dhillon. Supervised link
prediction using multiple sources. In ICDM, 2010.
[18] R. M. Nallapati, A. Ahmed, E. P. Xing, and W. W. Cohen.
Joint latent topic models for text and citations. In SIGKDD,
2008.
[19] Z. Nie, Y. Zhang, J.-R. Wen, and W.-Y. Ma. Object-level
ranking: bringing order to web objects. In WWW, 2005.
[20] T. Strohman, W. B. Croft, and D. Jensen. Recommending
citations for academic papers. In SIGIR, 2007.
[21] Y. Sun and C. L. Giles. Popularity weighted ranking for
academic digital libraries. In ECIR. 2007.
[22] Y. Sun, J. Han, X. Yan, S. P. Yu, and T. Wu. PathSim: Meta
Path-Based Top-K Similarity Search in Heterogeneous
Information Networks. In VLDB, 2011.
[23] Y. Sun, Y. Yu, and J. Han. Ranking-based clustering of
heterogeneous information networks with star network schema.
In SIGKDD, 2009.
[24] J. Tang and J. Zhang. A discriminative approach to topic-based
citation recommendation. In PAKDD. 2009.
[25] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su.
Arnetminer: extraction and mining of academic social
networks. In SIGKDD, 2008.
[26] P. Tseng. Convergence of a block coordinate descent method
for nondifferentiable minimization. Journal of optimization
theory and applications, 109(3):475–494, 2001.
[27] X. Yu, Q. Gu, M. Zhou, and J. Han. Citation prediction in
heterogeneous bibliographic networks. In SDM, 2012.
[28] X. Yu, X. Ren, Y. Sun, Q. Gu, B. Sturt, U. Khandelwal,
B. Norick, and J. Han. Personalized entity recommendation: A
heterogeneous information network approach. In WSDM, 2014.