Advanced course « Economics of
Inequality » (Master APE, M2 year)
Thomas Piketty
Année universitaire 2009-2010
Academic year 2009-2010
Course Notes D :
Pareto interpolation techniques for
income and wealth distribution
1. The interpolation problem
Very often, available income or
wealth distribution data takes of “grouped data”:
Income
(or wealth or wage…) brackets : [s1 ;s2],...,
[si ;si+1],..., [sp ;+µ]
Ni
= number of individuals (or households or taxpayers…) between si
and si+1
N* =
total population
fi =
Ni/N* = proportion of
population between si and si+1
Ni*=
Ni+Ni+1+..+Np = total number of individuals
above si
pi =
Ni*/N* = proportion of
population above si
Yi = total income of population between si
and si+1
yi =
Yi/Ni = average income of population between si
and si+1
yi*
= (Yi+...+Yp)/Ni* = average income of
population above si
Sometime there is only information
available on the Ni , not on the Yi
Even when
micro data is available, sample size is very often insufficient, especially for
the top of the distribution.
In both
cases, one needs to make assumptions about the functional form of the
distribution f(y) in order to use properly the available data = the
interpolation problem
2. Lognormal
distributions
y follows a lognormal distribution if and only if x = log(y) follows a
normal distribution
I.e. if the density function g(x) can be written:
The density function f(y) of the
lognormal distribution can thus be written:
Where:
Conversely, one can express µ and
σ as:
(no
closed form solution for distribution functions G(x) and F(y), but available on all statistical
software)
Exemple:
With m = 30 000€ and s =
20 000€, one gets me = 24 962€, mo = 17 281€, P90 =
54 297€
= fairly reasonable shape for the
bottom 90% of the distribution
But P90-100 = 74 931€, P99 =
102 315€, P99-100 = 126 646€
= not reasonable for the top 10% of
the distribution (and especially the top 1%)
>>> the problem of the
lognormal distribution is that the upper tail is not fat enough
(i.e. the density function declines
too fast toward 0)
3. Pareto distributions for top income & top
wealth
The
density and distribution functions f(y) and F(y) of a Pareto distribution y can
be written as follows :
G(y) = 1-F(y) = (k/y)a (k>0, a>1)
f(y)=aka/y(1+a)
Key
property n°1 : ratio average/threshold = constant
Note
y*(y) the average income (or wealth, or wage) of the population above threshold
y. Then y*(y) can be expressed as follows :
y*(y) = [ òz>y z f(z)dz ] / [ òz>y f(z)dz ] =
[ òz>y dz/za ] / [ òz>y dz/z(1+a) ] = ay/(a-1)
I.e.
y*(y)/y = b = a/(a-1)
(and a =
b/(b-1) )
In
practice :
For top
incomes:
For top
wealth:
Higher b
coefficients = fatter upper-tail = higher concentration
Key
property n°2 : log-linearity
If one
plots log(1-F(y)) against log(y), one gets a straight line with slope = -a
log(1-F(y)) = log(k) – a log(y)
Note :
if one plots log(f(y)) against log(y), one gets a straight line with slope
-(1+a) : og(f(y)) = log(aka) – (1+a) log(y)
Intuitive
meaning of coefficient a: if one raises y by 1%, by how many % does the
proportion above y declines? In practice this coefficient rises from the bottom
of the distribution (where it is usually much below 1) to the middle top of the
distribution; the point is that is tends to stabilize around 1,8-2,2 around a
large plateau P90-99,99 (= why Pareto needs to complement lognormal: with full
lognormal coefficient a rises continuously, and b declines continuously to very
low levels), before of course going to infinity (b goes to 1) for the very last
top incomes
4. Estimating Pareto parameters from grouped
data
Grouped
data :
Income (or
wealth or wage…) brackets : [s1 ;s2],..., [si ;si+1],...,
[sp ;+µ]
Ni
= number of individuals (or households or taxpayers…) between si
and si+1
N* =
total population
fi =
Ni/N* = proportion of
population between si and si+1
Ni*=
Ni+Ni+1+..+Np = total number of individuals
above si
pi =
Ni*/N* = proportion of
population above si
Yi = total income of population between si
and si+1
yi =
Yi/Ni = average income of population between si
and si+1
yi*
= (Yi+...+Yp)/Ni* = average income of population
above si
Sometime there is only information
available on the Ni , not on the Yi : then we say
that all we have is “frequency information”
Otherwise
we say we have both frequency and income information
Methodology
M1: Average income interpolation methodology
(Piketty
2001, 2003, Piketty-Sez 2003)
This
methodology uses both frequency and income information, and relies on property
1.
In order
to estimate P99,5 (say), pick (si,pi) and (si+1,pi+1)
such that pi+1 < 0,5% < pi), and compute b and k
using the following formulas :
b = yi*
/ si
a = b /
(b-1)
k = sipi1/a
Then use
these coefficients to estimate P99,5=k/(0,0051/a) and
P99,5-100=(a/(a-1))P99,5
Methodology
M2 : Standard log-linear interpolation methodology
(Pareto (1896), Kuznets (1953), Feenberg et
Poterba (1993))
This
methodology uses solely frequency information, and relies on property 2.
In order
to estimate P99,5 (say), pick (si,pi) and (si+1,pi+1)
such that pi+1 < 0,5% < pi), and compute a and k using
the following formulas :
a = log(pi/pi+1) /
log(si+1/si)
b = a /
(a-1)
k = sipi1/a
Then use
these coefficients to estimate P99,5=k/(0,0051/a) and
P99,5-100=(a/(a-1))P99,5
Which of these three
methodologies should be used?
If one has both frequency and income
information, then it is better to use methodology M1
If one has solely frequency information, then
use methodology M2
There also exists more complicated methodologies
taking explicitely into account the fact that Pareto parameters are not
constant (see e.g. Cowell 2000, Zucman 2008). See also Atkinson 2007 for an
interpolation method with explicit upper and lower bounds & confidence
intervals.
Some references on interpolation
techniques:
Atkinson (2007), chapter
D. G. Champernowne and F. Cowell,
“Economic Inequality and Income Distribution”,
Cowell, Measuring Inequalities, electronic
manuscript 2000
Cowell-Mehta, “The estimation and interpolation
of inequality measures”, Review of Economic Studies 1982, vol.49 pp.273-290
Piketty (2001, Appendix B)
5. Pareto
distributions and top shares formulas
TO BE COMPLETED