Power Law Tails in the Italian Personal Income
Distribution∗
F. Clementia,c , M. Gallegatib,c
a
Department of Public Economics, University of Rome “La Sapienza”, Via del Castro Laurenziano 9, 00161 Rome,
Italy. E-mail address: fabio.clementi@uniroma1.it.
b
Department of Economics, Università Politecnica delle Marche, Piazzale Martelli 8, 60121 Ancona, Italy. E-mail
address: gallegati@dea.unian.it.
c
S.I.E.C., Università Politecnica delle Marche, Piazzale Martelli 8, 60121 Ancona, Italy. Web address:
http://www.dea.unian.it/wehia/.
May 11, 2005
Abstract
We investigate the shape of the Italian personal income distribution using microdata from the Survey on
Household Income and Wealth, made publicly available by the Bank of Italy for the years 1977–2002.
We find that the upper tail of the distribution is consistent with a Pareto power-law type distribution,
while the rest follows a two-parameter lognormal distribution. The results of our analysis show a shift
of the distribution and a change of the indexes specifying it over time. As regards the first issue, we
test the hypothesis that the evolution of both gross domestic product and personal income is governed by
similar mechanisms, pointing to the existence of correlation between these quantities. The fluctuations of
the shape of income distribution are instead quantified by establishing some links with the business cycle
phases experienced by the Italian economy over the years covered by our dataset.
Keywords: Personal income; Pareto law; Lognormal distribution; Income growth rate; Business cycle
JEL Classifications: C16; D31; E32
Introduction. In the last decades, extensive lit1
erature has shown that the size of a large number
of phenomena can be well described by a power-law
type distribution.
The modeling of income distribution originated
more than a century ago with the work of Vilfredo
Pareto, who observed in his Cours d’Économie Politique (1897) that a plot of the logarithm of the number of income-receiving units above a certain thresh∗ Paper prepared for the International Conference in
Memory of Two Eminent Social Scientists: C. Gini and
M.O. Lorenz. Their Impact in the XXth Century Development of Probability, Statistics and Economics, to
be held Siena (Italy) on May 23–26, 2005. The authors would
like to thank Corrado Di Guilmi and Yoshi Fujiwara for helpful
comments and suggestions.
old against the logarithm of the income yields points
close to a straight line. This power-law behaviour is
nowadays known as Pareto law.
Recent empirical work seems to confirm the validity of Pareto (power) law. For example, [1] show that
the distribution of income and income tax of individuals in Japan for the year 1998 is very well fitted by
a power-law, even if it gradually deviates as the income approaches lower ranges. The applicability of
Pareto distribution only to high incomes is actually
acknowledged; therefore, other kinds of distributions
has been proposed by researchers for the low-middle
income region. According to [2], US personal income
data for the years 1935–36 suggest a power-law distribution for the high-income range and a lognormal
2
Lognormal Pattern with Power Law Tail.
2
We use microdata from the Historical Archive
(HA) of the Survey on Household Income and Wealth
(SHIW) made publicly available by the Bank of Italy
for the period 1977–2002 [7].2 All amounts are expressed in thousands of lire. Since we are comparing
incomes across years, to get rid of inflation data are
1 Reference
[6] suggests that a Pareto law may hold also
for lower incomes, yielding a so-called double Pareto-lognormal
distribution, that is a distribution with a lognormal body and
a double Pareto tail.
2 The data for the years preceding 1977 are no longer available. The survey was carried out yearly until 1987 (except for
1985) and every two years thereafter (the survey for 1997 was
shifted to 1998). In 1989 a panel section consisting of units
already interviewed in the previous survey was introduced in
order to allow for better comparison over time. The basic definition of income provided by the SHIW is net of taxation and
social security contributions. It is the sum of four main components: compensation of employees; pensions and net transfers;
net income from self-employment; property income (including
income from buildings and income from financial assets). Income from financial assets started to be recorded only in 1987.
See [8] for details on source description, data quality, and main
changes in the sample design and income definition.
Cumulative probability
−2
0
Income data
Lognormal
Power law
−4
distribution for the rest; a similar shape is found by
[3] investigating the Japanese income and income tax
data for the high-income range over the 112 years
1887–1998, and for the middle-income range over the
44 years 1955–98.1 Reference [4] confirms the powerlaw decay for top taxpayers in the US and Japan from
1960 to 1999, but find that the middle portion of the
income distribution has rather an exponential form;
the same is proposed by [5] for the UK during the
period 1994–99 and for the US in 1998.
The aim of this paper is to look at the shape of the
personal income distribution in Italy by using crosssectional data samples from the population of Italian
households during the years 1977–2002. We find that
the personal income distribution follows the Pareto
law in the high-income range, while the lognormal
pattern is more appropriate in the central body of
the distribution. From this analysis we get the result
that the indexes specifying the distribution change in
time; therefore, we try to look for some factors which
might be the potential reasons for this behaviour.
The rest of the paper is organized as follows. Section 2 reports the data utilized in the analysis and
describes the shape of the Italian personal income
distribution. Section 3 explains the shift of the distribution and the change of the indexes specifying it
over the years covered by our dataset. Section 4 concludes the paper.
2
F. Clementi, M. Gallegati
1
2
3
Income (thousand £)
4
5
Fig. 1: The cumulative probability distribution of
the Italian personal income in 1998. We take the
horizontal axis as the logarithm of the personal income in thousands of lire and the vertical axis as the
logarithm of the cumulative probability. The green
solid line is the lognormal fit with µ̂ = 3.48 (0.004)
and σ̂ = 0.34 (0.006). Gibrat index is β̂ = 2.10.
reported in 1976 prices using the Consumer Prices
Index (CPI) issued by the National Institute of Statistics [9]. The average number of income-earners surveyed from the SHIW-HA is about 10,000.
Figure 1 shows the profile of the personal income
distribution for the year 1998. We take the horizontal axis as the logarithm of the income in thousands
of lire and the vertical axis as the logarithm of the
cumulative probability. The cumulative probability
is the probability to find a person with an income
greater than or equal to x:
P (X ≥ x) =
Z∞
p(t)dt
x
Two facts emerge from this figure. Firstly, the central body of the distribution (almost all of it below
the 99th percentile) follows a two-parameter lognormal distribution (green solid line). The probability
density function is:
"
2 #
1
1 logx − µ
√ exp −
p (x) =
2
σ
xσ 2π
with 0 < x < ∞, and where µ and σ are the mean
and the standard deviation of the normal
√ distribution. The value of the fraction β = 1/ σ 2 returns
the so-called Gibrat index; if β has low values (large
3
−2
Cumulative probability
−1
0
Cumulative probability
−1
0
1
1
2
Power Law Tails in the Italian Personal Income Distribution
3.5
−2
Income data
Power law
1977/78/79
1980/81/82/83/84/86/87/89
1991/93/95/98
2000/02
4
4.5
Income (thousand £)
5
Fig. 2: The fit to the power-law distribution for the
year 1998. The red solid line is the best-fit function. Pareto index, obtained by least-square-fit, is
α̂ = 2.76 (0.002); the estimated minimum income is
x̂0 = 17, 141 thousand lire. The goodness of fit of
OLS estimate in terms of R2 index is 0.9993.
variance of the global distribution), the personal income is unevenly distributed. From our dataset we
obtain the following maximum-likelihood estimates:3
µ̂ = 3.48 (0.004) and σ̂ = 0.34 (0.006);4 Gibrat index
is β̂ = 2.10. Secondly, about the top 1% of the distribution follows a Pareto (power-law) distribution.
This power-law behaviour of the tail of the distribution is more evident from Fig. 2, where the red solid
line is the best-fit linear function. We extract the
power-law slope (Pareto index) by running a simple
OLS regression of the logarithm of the cumulative
probability on a constant and the logarithm of personal income, obtaining a point estimate of α̂ = 2.76
(0.002). Given this value for α̂, our estimate of x0
(the income level below which the Pareto distribution would not apply) is 17,141 thousand lire. The
fit of linear regression is extremely good, as one can
appreciate by noting that the value of R2 index is
0.9993.
The distribution pattern of the personal income expressed as the lognormal with power-law tails seems
to hold all over our time span, as one can easily recognize from Fig. 3, which shows the shape of income
distribution for all the years. The corresponding esti3 We exclude from our estimates about the top 1.4% of the
distribution, which behaves as outlier, and about the bottom
0.8%, which corresponds to non positive entries.
4 The number in parentheses following a point estimate represents its standard error.
2.4
3
4
Income (thousand £)
5
Fig. 3: Time development of the Italian personal
income distribution over the years 1977–2002.
mated parameters for the lognormal and Pareto distributions are given in Table 1. The table also shows
the values of Gibrat index and the OLS R2 . However,
the power-law slope and the curvature of the lognormal fit differ from each other. This fact means that
the indexes specifying the distribution (Pareto and
Gibrat indexes) differs from year to year. We therefore try to quantify the fluctuations of the shape of
income distribution in the next section.
Time Development of the Distribution. We
3
start by considering the change of the distribution over time. As Fig. 3 shows, the distribution
shifts over the years covered by our dataset. Macroeconomics argues that the origin of the change consists in the growth of the Gross Domestic Product
(GDP). To confirm this hypothesis we study the fluctuations in the growth rates of GDP and Personal Income (PI), and try to show that similar mechanisms
may be responsible for the observed growth dynamics of both country and individuals. The distribution
of GDP annual growth rates is shown in Fig. 4. We
calculate it using the data from the OECD Statistical
Compendium [10], and expressing the rates in terms
of their logarithm, RGDP ≡ log (GDPt+1 /GDPt ),
where GDPt and GDPt+1 are the GDP of the country at the years t and t + 1 respectively. Data are
reported in 1976 prices; moreover, to improve comparison of the values over the years we detrend them
by applying the Hodrick-Prescott filter. By means of
a non-linear algorithm, we find that the probability
density function of annual growth rates is well fitted
by a Laplace distribution (the red solid line in the
4
F. Clementi, M. Gallegati
Table 1: Estimated lognormal and Pareto distribution parameters for all the years (standard errors are in
parentheses). Calculations are conducted using data and methods described in the text. For the lognormal
distribution and the Pareto distribution also shown are the values of Gibrat index and R2 index respectively.
3.31
3.33
3.34
3.36
3.36
3.38
3.38
3.39
3.40
3.49
3.53
3.52
3.47
3.46
3.48
3.50
3.52
µ̂
(0.005)
(0.005)
(0.005)
(0.005)
(0.005)
(0.004)
(0.005)
(0.004)
(0.004)
(0.004)
(0.003)
(0.004)
(0.004)
(0.004)
(0.004)
(0.004)
(0.004)
1
Year
1977
1978
1979
1980
1981
1982
1983
1984
1986
1987
1989
1991
1993
1995
1998
2000
2002
0.34
0.34
0.34
0.33
0.32
0.31
0.30
0.32
0.29
0.30
0.26
0.27
0.33
0.32
0.34
0.32
0.31
σ̂
(0.004)
(0.004)
(0.005)
(0.005)
(0.005)
(0.005)
(0.005)
(0.005)
(0.006)
(0.004)
(0.003)
(0.004)
(0.004)
(0.003)
(0.006)
(0.004)
(0.005)
0
.2
Probability density
.4
.6
.8
GDP data (1977−2002)
Laplace fit
−.04
−.02
0
Growth rate
.02
.04
Fig. 4: Probability density function of Italian GDP
annual growth rates for the period 1977–2002 together with the Laplace fit (red solid line). The
data have been detrended by applying the HodrickPrescott filter.
figure), which is expressed as:
1
|x − µ|
p (x) = √ exp −
σ
σ 2
with −∞ < x < +∞, and where µ and σ are the
β̂
2.08
2.09
2.08
2.15
2.23
2.27
2.32
2.24
2.40
2.38
2.70
2.58
2.15
2.19
2.10
2.20
2.25
3.00
3.01
2.91
3.06
3.30
3.08
3.11
3.05
3.04
2.09
2.91
3.45
2.74
2.72
2.76
2.76
2.71
α̂
(0.008)
(0.008)
(0.009)
(0.008)
(0.008)
(0.005)
(0.006)
(0.007)
(0.005)
(0.002)
(0.002)
(0.008)
(0.002)
(0.002)
(0.002)
(0.002)
(0.002)
x̂0
10,876
11,217
11,740
11,453
10,284
11,456
11,147
11,596
11,597
24,120
15,788
14,281
16,625
16,587
17,141
17,470
17,664
R2
0.9921
0.9933
0.9908
0.9915
0.9939
0.9952
0.9945
0.9937
0.9950
0.9993
0.9995
0.9988
0.9997
0.9996
0.9993
0.9994
0.9997
mean value and the standard deviation. This result
seems to be in agreement with the growth dynamics
of PI, as shown in Fig. 5 for two randomly selected
distributions. We calculate them using the panel section of the SHIW-HA, which covers the period 1987–
2002. As one can easily recognize, the same functional form describing the probability distribution of
GDP annual growth rates is also valid in the case of
PI growth rates. These findings lead us to check the
possibility that the growth rates of both GDP and
PI are drawn from the same distribution. To this
end, we perform a two-sample Kolmogorov-Smirnov
test and check the null hypothesis that both GDP
and PI growth rate data are samples from the same
distribution. Before applying this test, to consider almost the same number of data points relating to units
with different sizes we draw a 2% random samples
of the data we have for individuals, and normalize
them together with the data for GDP annual
growth
rate using the transformations RPI − RPI /σPI and
RGDP − RGDP /σGDP . As shown in Table 2, which
reports the p-values for all the cases we studied, the
null hypothesis of equality of the two distributions
can not be rejected at the usual 5% marginal significance level. Therefore, the data are consistent
with the assumption that a common empirical law
5
Power Law Tails in the Italian Personal Income Distribution
Table 2: Estimated Kolmogorov-Smirnov test p-values for both GDP and PI growth rate data. The null
hypothesis that the two distributions are the same at the 5% marginal significance level is not rejected in all
the cases.
Growth rate
RGDP
R89/87
R91/89
R93/91
R95/93
R98/95
R00/98
R89/87
0.872
R91/89
0.919
0.998
R93/91
0.998
0.984
0.970
might describe the growth dynamics of both GDP
and PI, as shown in Fig. 6, where all the curves for
both GDP and PI growth rate normalized data almost collapse onto the red solid line representing the
non-linear Laplace fit.5
We next turn on the fluctuations of the indexes
specifying the income distribution, i.e. the Pareto
and Gibrat indexes, whose yearly estimates are reported in Fig. 7. Figure 7(a) shows the fluctuations
of Pareto index over the years 1977–2002. The black
connected solid line is the time series obtained by excluding income from financial assets, while the red
connected solid line refers to the yearly estimates obtained by the inclusion of the above-stated income,
which was regularly recorded only since 1987 (see
note 2). The course of the two series is similar,
with the more complete definition of income showing
a greater inequality because of the strongly concentrated distribution of returns on capital. The same
can be said for the time series of Gibrat index (Fig.
7(b)). Although the frequency of data (initially annual and then biennial from 1987) makes it difficult
to establish a link with the business cycle, it seems
possible to find a (negative) relationship between the
above-stated indexes and the fluctuations of economic
activity. For example, Italy experienced a period of
economic growth until the late 1980s, but with alternating phases of the internal business cycle: of slowdown of production up to the 1983 stagnation; of recovery in 1984; again of slowdown in 1986. As one can
recognize from the figure, the values of Pareto and
Gibrat indexes, inferred from the numerical fitting,
tend to decrease in the periods of economic expansion (concentration goes up) and increase during the
5 See
[11] for similar findings about GDP and company
growth rates.
R95/93
0.696
0.431
0.979
0.839
R98/95
0.337
0.689
0.995
0.459
0.172
R00/98
0.480
0.860
0.994
0.750
0.459
0.703
R02/00
0.955
0.840
0.997
1.000
0.560
0.378
0.658
recessions (income is more evenly distributed). The
time pattern of inequality is shown in Fig. 8, which
reports the temporal change of Gini coefficient for the
considered years.6 In Italy the level of inequality decreased significantly during the 1980s and rised in the
early 1990s; it was substantially stable in the following years. In particular, a sharp rise of Gini coefficient
(i.e., of inequality) is encountered in 1987 and 1993,
corresponding to a sharp decline of Pareto index in
the former case and of both Pareto and Gibrat indexes in the latter case. We consider that the decline
of Pareto exponent in 1987 corresponds with the peak
of the speculative “bubble” begun in the early 1980s,
and the rebounce of the index follows its burst on October 19, when the Dow Jones index lost more than
20% of its value dragging into disaster the other world
markets. This assumption seems confirmed by the
movement of asset price in the Italian Stock Exchange
(see Fig. 9(a)).7 As regards the sharp decline of both
indexes in 1993, the level and growth of personal income (especially in the middle-upper income range)
were notably influenced by the bad results of the real
economy in that year, following the September 1992
lira exchange rate crisis. The effects of recession (visible in Fig. 9(b)) produced a leftwards shift of the
distribution and widened its range; this, combined
with a concentration of individuals towards middle
income range, induced an increase in inequality.8 It
6 Unlike
the Pareto and Gibrat indexes, which provide two
different measures corresponding to the tail and the rest, the
Gini coefficient is a measure of (in)equality of the income for
the overall distribution taking values from zero (completely
equal) to one (completely inequal).
7 See [3] for a study of the correlation between Pareto index
and asset price in Japan.
8 In particular, in this year there was a significant reduction
of the number of self-employees, whose incomes are much more
dependent from the business cycle. See [12] for further details
6
1
1
F. Clementi, M. Gallegati
GDP (1977−2002)
PI (1989/1987)
PI (1991/1989)
PI (1993/1991)
PI (1995/1993)
PI (1998/1995)
PI (2000/1998)
PI (2002/2000)
Laplace fit
0
0
.2
.2
Probability density
.4
.6
Probability density
.4
.6
.8
.8
PI data (1989/1987)
Laplace fit
−4
0
Growth rate
1
2
−2
0
Growth rate
2
4
.8
−1
Fig. 6: Probability distribution of Italian GDP and
PI growth rates. All data collapse onto a single curve
representing the fit to the Laplace distribution (red
solid line), showing that the distributions are well
described by the same functional form.
Probability density
.4
.6
−2
Concluding Remarks. In this paper we find
4
that the Italian personal income microdata are
consistent with a Pareto (power-law) behaviour in the
1
(a)
0
.2
PI data (1993/1991)
Laplace fit
−2
−1
0
Growth rate
1
2
(b)
Fig. 5: Probability distribution of Italian PI growth
rates RPI ≡ log (P It+i /P It ) for the years 1989/1987
(a) and 1993/1991 (b). The red solid line is the fit
to the Laplace distribution. The data are taken from
the panel section of the SHIW-HA, which covers the
years 1987–2002.
would be expected that these facts cause the invalidity of Pareto law for high incomes. This was well the
case of Italian economy during the mentioned years.
Fig. 10 shows the power-law region in 1987 and 1993.
Compared to other years, one can observe that the
data can not be fitted by the Pareto law in the entire
range of high-income.
on this issue.
high-income range, and with a two-parameter lognormal pattern in the low-middle income region.
The numerical fitting over the time span covered
by our dataset show a shift of the distribution, which
is claimed to be a consequence of the growth of the
country. This assumption is confirmed by testing
the hypothesis that the growth dynamics of both
gross domestic product of the country and personal
income of individuals is the same; the two-sample
Kolmogorov-Smirnov test we perform on this subject lead us to accept the null hypothesis that the
growth rates of both the quantities are samples from
the same probability distribution in all the cases we
studied, pointing to the existence of correlation between them.
Moreover, by calculating the yearly estimates of
Pareto and Gibrat indexes, we quantify the fluctuations of the shape of the distribution over time by
establishing some links with the business cycle phases
which Italian economy experienced over the years of
our concern. We find that there exists a negative relationship between the above-stated indexes and the
fluctuations of economic activity at least until the
late 1980s. In particular, we show that in two circumstances (the 1987 burst of the asset-inflation “bubble”
begun in the early 1980s and the 1993 recession year)
the data can not be fitted by a power-law in the en-
7
.3
2.5
Pareto index
3
Gini coefficient
.32
.34
3.5
.36
Power Law Tails in the Italian Personal Income Distribution
Excluding financial assets
.28
2
Excluding financial assets
Including financial assets
Including financial assets
1977
1977
1980
1985
1990
Year
1995
2000 2002
2.8
(a)
2.6
Excluding financial assets
1980
1985
1990
Year
1995
2000 2002
Fig. 8: Gini coefficient for Italian personal income
during the period 1977–2002. The black connected
solid line excludes income from financial assets, while
the red connected solid line includes it.
Including financial assets
2.2
Gibrat index
2.4
[2] E.W. Montroll, and M.F. Shlesinger (1983),
Maximum Entropy Formalism, Fractals, Scaling
Phenomena, and 1/f Noise: A Tale of Tails,
Journal of Statistical Physics, 32, 2, 209–230.
2
[3] W. Souma (2001), Universal Structure of the
Personal Income Distribution, Fractals, 9, 4,
463–470.
1977
1980
1985
1990
Year
1995
2000 2002
(b)
Fig. 7: The temporal change of Pareto index (a)
and Gibrat index (b) over the years 1977–2002. The
black connected solid line is the time series of the two
indexes obtained by excluding income from financial
assets, while the red connected solid line refers to
the yearly estimates obtained by the inclusion of the
above-stated income.
tire high-income range, causing breakdown of Pareto
law.
[4] M. Nirei, and W. Souma (2004), Two
Factors
Model
of
Income
Distribution Dynamics,
SFI
Working
Paper,
http://www.santafe.edu/research/publications/
workingpapers/04-10-029.pdf.
[5] A.A. Dragulescu, and V.M. Yakovenko (2001),
Exponential and Power-Law Probability Distributions of Wealth and Income in the United Kingdom and the United States, Physica A, 299, 1–2,
213–221.
[6] W.J. Reed (2003), The Pareto Law of Incomes
– An Explanation and an Extension, Physica A,
319, 469–486.
[7] Bank of Italy, Survey on Household Income and
Wealth, http://www.bancaditalia.it.
References
[1] H. Aoyama, Y. Nagahara, M.P. Okazaki, W.
Souma, H. Takayasu, and M. Takayasu (2000),
Pareto’s Law for Income of Individuals and Debt
of Bankrupt Companies, Fractals, 8, 3, 293–300.
[8] A. Brandolini (1999), The Distribution of Personal Income in Post-War Italy: Source Description, Data Quality, and the Time Pattern
of Income Inequality, Temi di discussione n. 350,
Bank of Italy, Rome.
8
Income data
Power law
−2
−1
−.5
MIB
0
Cumulative probability
−1
0
1
2
.5
F. Clementi, M. Gallegati
1977
1980
1985
1990
Year
1995
2000 2002
3.5
4
6
Income data
Power law
1
Cumulative probability
−1
0
−2
5.8
1980
1985
5
(a)
Gross Domestic Product
5.85
5.9
5.95
(a)
1977
4.5
Income (thousand £)
1990
Year
1995
2000 2002
(b)
3.5
4
4.5
Income (thousand £)
5
(b)
Fig. 9: Temporal change of Italian Stock Exchange
MIB Index (a) and Italian GDP (b) during the period
1977–2002. Shown are the logarithmic values of the
variables. The GDP is with the unit of million EUR
at 1995 prices. The data source is [10].
[9] National Institute of
http://www.istat.it.
Statistics,
Fig. 10: The power-law region in 1987 (a) and 1993
(b). The red solid line is the fit to the power-law distribution. The Pareto index for 1987 data is α̂ = 2.09
(0.002), while for 1993 data we have α̂ = 2.74 (0.002).
These estimates were obtained by least-square-fit excluding about top and bottom 1.2% for 1987 data,
and about top 1.4% and bottom 1.3% for 1993 data.
OLS R2 values are 0.9993 and 0.9997 respectively.
ConIstat, As one can note, a large deviation from Pareto law is
seen in both the years.
[10] Organisation for Economic Co-operation and
Development, OECD Statistical Compendium,
[12] Bank of Italy (1993), Italian Household Budgets
ed. 02#2003.
in 1993, Supplements to the Statistical Bullettin, n. 44, Rome.
[11] Y. Lee, L.A.N. Amaral, D. Canning, M. Meyer,
and H.E. Stanley (1998), Universal Features in
the Growth Dynamics of Complex Organizations,
Physical Review Letters, 81, 15, 3275–3278.