Scientometrics
DOI 10.1007/s11192-016-2034-y
Does high impact factor successfully predict future
citations? An analysis using Peirce’s measure
Gangan Prathap1 • S. Mini1 • P. Nishy1
Received: 9 April 2015
Ó Akadémiai Kiadó, Budapest, Hungary 2016
Abstract Journals are routinely evaluated by journal impact factors. However, more
controversially, these same impact factors are often used to evaluate authors and groups as
well. A more meaningful approach will be to use actual citation rates. Since in each journal
there is a very highly skewed distribution of articles according to citation rates, there is
little correlation between journal impact factor and actual citation rate of articles from
individual scientists or research groups. Simply stated, journal impact factor does not
successfully predict high citations in future. In this paper, we propose the use of Peirce’s
measure of predictive success (Peirce in Science 4(93):453–454, 1884) to see if the use of
journal impact factors to predict high citation rates is acceptable or not. It is seen that this
measure is independent of Pearson’s correlation (Seglen 1997) and gives a more quantitative refinement of the Type I and Type II classification of Smith (Financ Manag 133–149,
2004). The measures are used to examine the portfolios of some active scientists. It is clear
that the journal impact factor is not effective in predicting future citations of successful
authors.
Keywords Performance analysis Bibliometrics Impact factor Citations Peirce’s
measure
& Gangan Prathap
gp@niist.res.in
S. Mini
mini@niist.res.in
P. Nishy
nishy@niist.res.in
1
CSIR National Institute for Interdisciplinary Science and Technology,
Thiruvananthapuram 695019, India
123
Scientometrics
Introduction
Seglen (1997) observed that evaluating the scientific quality of a result published in a paper
in a recognized standard journal ‘‘is a notoriously difficult problem which has no standard
solution.’’ Ideally, each scientific result should be evaluated in a process now known as
peer review in which subject experts assess the work for quality and quantity. However,
this is becoming difficult to perform and even experts resort to simpler approaches like the
use of quantitative indicators like citation rates and journal impact factors (Seglen 1997).
In terms of causality, it is citation rates which determine journal impact factors and not the
other way round. But this is often forgotten or ignored. Ever since the journal impact factor
was introduced (Garfield 1955, 1999, 2005), they have been used not only to evaluate
articles but also individuals, groups and institutions (Calza and Garbisa 1995; Taubes
1993; Vinkler 1986; Maffulli 1995). Such an approach is badly flawed because within each
journal there is a very highly skewed distribution of articles according to citation rates
(Seglen 1992) and this will imply that there is little correlation between journal impact
factor and actual citation rate of articles from individual scientists or research groups
(Vinkler 1986; Seglen 1994). Simply stated, journal impact factor does not successfully
predict high citations in future.
Smith (2004) looked at the problematic issue of using decision rules based on journal
impact factor to promote or reward scientists or give grants to proposals. Specifically, the
question asked was whether a ‘‘top N journals’’ approach provides a reasonable decision
rule when it comes to identifying top articles in the finance literature. The accuracy of these
decision rules was interpreted in terms of Type I errors (a ‘‘top’’ article is rejected by a
particular decision rule, e.g. in top three journals) and Type II errors (a ‘‘non-top’’ article is
accepted as a top article) for each journal and combinations of the journals. High error rates
were observed suggesting that identifying top articles required looking beyond the ‘‘Top N
journals’’.
In this paper, we propose another measure to see if the use of journal impact factors to
predict high citation rates is acceptable or not. This is Peirce’s measure of predictive
success (Peirce 1884). It is seen that this measure is independent of Pearson’s correlation
Fig. 1 The four quadrants of the Predictor-Event space following Peirce (1884)
123
Scientometrics
(Seglen 1997) and gives a more quantitative refinement of the Type I and Type II
classification of Smith (2004).
Peirce’s measure of success of prediction
We are dealing with a problem where we use impact factor (IF) of the journal in which the
article has appeared as a predictor to separate a top article from an article which does not
make the cut (e.g. if IF C IFt it is a top article where IFt is the chosen threshold). This is
done a priori and is nothing more than a promise. Much after the event, it is possible to
count the actual citations C, and if a suitable threshold is identified as Ct, then the event is
said to have taken place if C C Ct. What we now need to do is to assess the success of our
decision rule in predicting future highly cited articles. Peirce’s measure does precisely that.
Figure 1 shows the four quadrants of the Predictor-Event space following Peirce (1884).
The TT quadrant signifies all cases where the predictor has promised a top article (T for
True) and the event shows that this has been realised (T for True). Similarly, FF quadrant
collects all cases where the predictor has rejected the article (F for false) and the event
shows that this has been correctly predicted (F for false, i.e. non-event). The FT quadrant is
therefore the one that represents Type I error—incorrectly predicted events where we have
rejected a case that should have been accepted (Smith 2004). The TF quadrant represents
all Type II errors, incorrectly predicted non-events where we accepted cases which should
have been rejected. The Peirce’s measure of ‘‘the science of the method’’ is given by the
simple formula
i ¼ TT=ðTT þ FT Þ
TF=ðTF þ FF Þ:
We can show by very simple calculations that i will range from 1 (the decision rule is
100 % successful and there are no Type I and Type II errors), through 0 (TT/TF = FT/FF)
to -1 (all are Type I or Type II errors). In the next section we shall take some examples to
demonstrate the method. The first is taken from Seglen (1994) and the other is real-life data
collected for three highly decorated scientists from the authors’ institution from 1987 to
2014.
Demonstration of Peirce’s measure of success of prediction
We start with the example taken from Seglen (1994) and cited in Fig. 3 of Seglen (1997).
The correlation between journal impact (IF) and actual citation rate (C) of articles from
four individual scientists is seen to range from 0.05 to 0.63. Table 1 shows how the
Table 1 The relationship between Pearson’s correlation r and Peirce’s i for the four authors in Fig. 3 of
Seglen (1997)
Author
r
1
0.05
0.223
2
0.27
-0.083
i
3
0.44
-0.118
4
0.63
-0.093
123
Scientometrics
Pearson’s correlation r compares with the computed value of i [this is done by counting the
cases by inspection of the charts in Fig. 3 of Seglen (1997)]. The thresholds for counting
were taken as IFt = 1.0 and Ct = 1.0. We see that there is no pattern of relationship
between r and i. Thus even if numerically there is a high correlation between journal
impact factor and actual citation rate, it does not mean that decision rules based on IF are
effective in separating highly cited work from poorly cited work.
We next look at some real-life data collected for three highly decorated scientists from
the authors’ institution. From 1987 to 2014, the CSIR National Institute for Interdisciplinary Science and Technology has maintained a list of all its papers and the Impact
Factors (IF) for the corresponding year of the journals in which they have appeared. In each
case, we also calculated the total number of citations (C) received by each paper as of
March 2015. The Web of Science Core Collection was used to obtain these figures. Many
exploratory studies were conducted which showed that there was little relationship (both
by way of slope and by Pearson’s correlation) between IF and C. In this paper we report the
results for top three scientists ranked in terms of total citations C who have been active
during this period. Incidentally, these three also ranked as the top three in terms of their hand g-indices. In each case, after many exploratory studies using various combinations of
thresholds, we found IFt = 2 and Ct = 50 as reasonable thresholds for predicting and
confirming a top article. With these thresholds, 1178 articles out of the 2098 articles
published from NIIST during 2004–2014 appeared in journals with IFt C 2. However, only
175 articles had received more than 50 citations. Table 2 shows in each case, the number of
papers P published, the total citations C, the h- and g-indices, of the three leading authors,
the fraction of Type I and Type II articles if these thresholds were used to define predicted
and realized success, and the value of Peirce’s measure of success i. In all three cases, we
see a small fraction of Type I errors (\4 %) and a large fraction of Type II errors (ranging
from nearly 25 to 75 %). In all cases, the measure of success if we use a decision rule based
on IF is poor.
Concluding remarks
An erroneous and unjustifiable practice that is still followed in many places is the use of
journal impact factors to evaluate the quality of scientific work of authors, groups and
institutions. A more justifiable practice is the use of actual citation rates. An interesting
challenge is to ask if journal impact factors can successfully predict high citations in future.
In this paper, we used Peirce’s measure of predictive success (Peirce 1884) to see if the use
of journal impact factors to predict high citation rates is acceptable or not. It is seen that
this measure is independent of Pearson’s correlation (Seglen 1997) and gives a more
quantitative refinement of the Type I and Type II classification of Smith (2004). The
Table 2 The indicators for three top scientists from CSIR-NIIST
Author
Field
1
Photochemistry
2
Organic chemistry
3
Biotechnology
214
123
C
h
g
Type I
Type II
i
85
5533
42
74
0.035
0.494
0.044
104
4118
31
62
0.000
0.760
0.071
3971
35
54
0.037
0.248
0.344
P
Scientometrics
measures are used to examine the portfolios of some active scientists. Type I errors are
seen to be low (\4 %) and Type II errors are significant (ranging from nearly 25 to 75 %).
In all cases, the measure of success if we use a decision rule based on IF is poor. It is clear
that the journal impact factor is not effective in predicting future citations of successful
authors.
References
Calza, L., & Garbisa, S. (1995). Italian professorships. Nature, 374, 492.
Garfield, E. (1955). Citation indexes to science: A new dimension in documentation through association of
ideas. Science, 122(3159), 108–111.
Garfield, E. (1999). Journal impact factor: A brief review. Canadian Medical Association Journal, 161(8),
979–980.
Garfield, E. (2005). The agony and the ecstasy: the history and meaning of the journal impact factor.
International Congress on Peer Review and Biomedical Publication. http://garfield.library.upenn.edu/
papers/jifchicago2005.pdf.
Maffulli, N. (1995). More on citation analysis. Nature, 378, 760.
Peirce, C. S. (1884). The numerical measure of the success of predictions. Science, 4(93), 453–454.
Seglen, P. O. (1992). The skewness of science. Journal of the American Society for Information Science, 43,
628–638.
Seglen, P. O. (1994). Causal relationship between article citedness and journal impact. Journal of the
American Society for Information Science, 45, 1–11.
Seglen, P. O. (1997). Why the impact factor of journals should not be used for evaluating research. British
Medical Journal, 314, 498–502.
Smith, S. D. (2004). Is an article in a top journal a top article? Financial Management, 133–149.
Taubes, G. (1993). Measure for measure in science. Science, 260, 884–886.
Vinkler, P. (1986). Evaluation of some methods for the relative assessment of scientific publications.
Scientometrics, 10, 157–177.
123