1. Introduction
Authors who submit (by their own assumption) high quality papers to scholarly journals, are interested in knowing if there are factors which may increase the probability that their papers be accepted. One such factor may be related to the month or day of submission, as recently discussed [
1]. Indeed, authors might wonder about editors’ and reviewers’ overload at some times of the year. Moreover, the number of submitted papers is relevant for editors and publishers handling machines to the point that artificial intelligence can be useful for helping journal editors [
2,
3]. More generally, informetrics and bibliometrics are also interested in manuscript submission timing, especially in light of an enormous increase in the number of electronic journals.
From the author’s point of view, rejection is often frustrating, be it due to an “editor desk rejection” or following a review process. A high editor desk rejection rate has sometimes been explained as due to an entrance barrier editor load effect [
4]. Thus, it is of interest to observe whether there is a high probability of submission during specific months or seasons. In fact, non uniform submission has already been studied. However, the acceptance distribution, during a year, that is, a “monthly bias”, is rarely studied, because of publisher secrecy. Search engines do not provide any information at all on the timing of rejected papers.
Interestingly, Boja et al. [
1] recently examined a large database of journals with high impact factors and reported that a
day of the week correlation effect occurs between “when a paper is submitted to a peer-reviewed journal (and) whether that paper is accepted”. However, there was no study of rejected papers because of a lack of data, therefore one may wonder whether, besides a
“day of the week” effect, there is some “
seasonal” effect. One may indeed imagine that researchers in academic surroundings do not have a constant occupation rate due to teaching classes, holidays, congresses, and even budgetary conditions. Researchers have only specific times during the academic year for producing research papers.
From the
“seasonal effect” point view, Shalvi et al. [
5] found a discrepancy in the pattern of “submission-per-month” and “acceptance-per-month” for Psychological Science (
) but not for Social Psychology Bulletin (
). Summer months inspired authors to submit more papers to
but the subsequent acceptance was not related to the effect of seasonal bias (based on a
test for percentages). On the other hand, a very low rate of acceptance was recorded for manuscripts sent in November or December. The number of submissions to
, on the contrary, was the greatest during winter months, followed by a reduced “production” in April; however, the rate of acceptance was the highest for papers submitted in the period from August to October. Moreover, a significant “acceptance success dip” was noted for submissions made in winter months. One of the main reasons for such differences between journals was conjectured to lie in different rejection policies; some journals employ desk rejection, whereas others do not.
Schreiber [
4] analysed the acceptance rate of a journal—Europhysics Letters (
)—for a period of 12 years and found that the rate of manuscript submission exceeded the rate of their acceptance. The data revealed (Table 2 in [
4]) that there is a maximum number of submissions in July, defined as a 10% increase compared to the annual mean, together with a minimum in February, even taking into account the “shorter length” of this month. He concluded that significant fluctuations exist between months. The acceptance rate ranged from 45% to 55%; the highest acceptance rate was seen in July and the lowest in January, in the most recent years.
Recently, Ausloos et al. [
6] studied submission and also subsequent acceptance data for two journals, a specialized (chemistry) scientific journal and a multidisciplinary journal, respectively, i.e., the Journal of the Serbian Chemical Society (
JSCS) (
http://shd.org.rs/JSCS/) and
Entropy (
https://www.mdpi.com/journal/entropy), each over a 3 year time interval. The authors found that fluctuations, expectedly, occur: the number of submissions to
is the greatest in July and September and the smallest in May and December. The highest rate of paper submission for
was noted in October and December and the lowest in August. Concerning acceptance for
, the proportion of accepted/submitted manuscripts is the greatest in January and October. Concerning acceptance for
, the number of papers steadily increase from January to a peak in May, followed by a marked dip during summer time, before reaching a peak in October of the order of the May peak.
Concerning the number of submitted manuscripts, it was observed that the acceptance rate in was the highest if papers were submitted in January and February; it was significantly lower if the submission occurred in December. In the case of , the highest rejection rate was for papers submitted in December and March, thus with a January-February peak; the lowest acceptance rate was for manuscripts submitted in June or December; the highest rate being for those sent in spring months, February to May. One recognizes a journal-dependent seasonal shift of the features. Notice that we adapt the word “seasonal”; even though changes in seasons occur on the 21st of various months, we approximate the season transition as occurring on the next 1st day of the following month.
Here, we propose another line of approach in order to study the submission, acceptance, and rejection (number and rate) diversity based on probabilities, with emphasis on the conditional probabilities, thereafter measuring the entropy and other characteristics of the distributions. Indeed, the entropy is a measure of disorder, and one of several ways to measure diversity. Researchers have their own preference [
7,
8] in measuring diversity. Here below, we practically adapt the classical measure of diversity, as used in ecology, but other cases of interest pertaining to information science [
9,
10] can be mentioned.
Let us recall that the general equation of diversity is often written in the form [
11,
12]
in which
, and
the measured variable. For
,
reduces to the exponential of the Shannon entropy [
13,
14]
to which we will only stick here.
Several inequality measures are commonly used in the literature: in the class of entropy related measures, one finds the exponential entropy [
15], which measures the extent of a distribution, and the Theil index [
16] which emerges as the most popular one [
17,
18], besides the Herfindahl- Hirschman index [
19], measuring “concentrations.” “Finally,” upon ranking according to their size the measured variable, the Gini coefficient [
20], is a classical indicator of non-uniform distributions.
The Theil index [
16] is defined by
It seems obvious that the Theil index can be expressed in terms of the negative entropy
indicating the deviation from the maximum disorder entropy,
,
The exponential entropy [
15] is
The Herfindahl–Hirschman index (HHI) [
19] is an indicator of the “concentration” of variables, the “amount of competition” between the months, here. The higher the value of HHI, the smaller the number of months with a large value of (submitted, or accepted, or accepted if submitted) papers in a given month. Formally, adapting the HHI notion to the present case,
Notice that .
The Gini coefficient
[
20] has been widely used as a measure of income [
21] or wealth inequality [
22,
23]; nowadays, it is widely used in many other fields. In brief, defining first the Lorenz curve
as the percentage contributed by the bottom
r of the variable population to the total value
of the measured (and now ranked) variable
, i.e.,
, one obtains the Gini coefficient as twice the area between this Lorenz curve and the diagonal line in the
plane; such a diagonal represents perfect equality; whence,
corresponds to perfect equality of the
variables.
Having set up the framework and presented the definition of the indices to be calculated, we indicate quantities of interest and turn to the data and data analysis, in
Section 2 and
Section 3, respectively. Their discussion and comments on the present study, together with a remark on its limitations, are found in the conclusion
Section 4.
2. Definitions
In order to develop the method measuring the disorder of the time series, let us recall the necessary data. The raw data can be found in Reference [
6]. For completeness, let the time series of submitted and of accepted papers if submitted during a given month to
and to
be recalled through
Figure A1 for the years in which the full data is available, that is, for which the final decisions have been made on the submitted papers.
Let us introduce notations:
the number of monthly submissions in a given month () in year (y) is called
the percentage of this set is the probability of submission in a given month for a specific year
similarly, one can define , as being the number of accepted papers when submitted in year (y) in a specific month (m),
and for the related percentage, one has ;
more importantly, for authors, the (conditional) probability of a paper acceptance when submitted in a given month may be considered and estimated before submission
Thereafter, one can deduce the relevant “monthly information entropies”
and the overall information entropy:
in order to pin point whether the yearly distributions are disordered.
Moreover, we can discuss the data by not only comparing different years, but also the cumulated data per month in the examined time interval as if all years are “equivalent”:
, from which one deduces
and similarly for the accepted papers , and
leading to the ratio between cumulated monthly data
and to the corresponding “monthly cumulated entropy”, ,
finally to
which will be called the “conditional entropy”.
Relevant values are given in
Table 1,
Table 2,
Table 3 and
Table 4 both for
and for
. The diversity and the inequality index values are given in
Table 5. Most of the results stem from the use of a free online software [
24].