Journal of Statistical Theory and Applications

Volume 18, Issue 3, September 2019, Pages 270 - 277

On the Influence Function for the Theil-Like Class of Inequality Measures

Authors
Tchilabalo Abozou Kpanzou1, Diam Ba2, Pape Djiby Mergane2, Gane Samb LO2, 3, *
1Kara University, Kara, Togo
2LERSTAD, Gaston Berger University, Saint-Louis, Senegal
3Evanston Drive, NW, Calgary, Canada; Associate Researcher, LSTA, Pierre et Marie University, Paris, France; Professor, African University of Sciences and Technology, Abuja, Nigeria
*Corresponding author. Email: gane-samb.lo@ugh.edu.sn; gslo@aust.edu.ng
Corresponding Author
Gane Samb LO
Received 12 June 2018, Accepted 11 March 2019, Available Online 30 August 2019.
DOI
10.2991/jsta.d.190818.002How to use a DOI?
Keywords
Influence function; Measures of inequality; Lorenz curve; Quantile function; Inequality measures; Theil; Kolm; Atkinson
Abstract

On one hand, a large class of inequality measures, which includes the generalized entropy, the Atkinson, the Gini, etc., for example, has been introduced in P.D. Mergane, G.S. Lo, Appl. Math. 4 (2013), 986–1000. On the other hand, the influence function (IF) of statistics is an important tool in the asymptotics of a nonparametric statistic. This function has been and is being determined and analyzed in various aspects for a large number of statistics. We proceed to a unifying study of the IF of all the members of the so-called Theil-like family and regroup those IF's in one formula. Comparative studies become easier.

Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

1. INTRODUCTION

Over the years, a number of measures of inequality have been developed. Examples include the generalized entropy, the Atkinson, the Gini, the quintile share ratio (QSR) and the Zenga measures (see e.g. [15]). Recently, [6] gathered a significant number of inequality measures under the name of Theil-like family. Such inequality measures are very important in capturing inequality in income distributions. They also have applications in many other branches of Science, e.g., in ecology (see e.g. [7]), sociology (see e.g. [8]), demography (see e.g. [9]) and information science (see e.g. [10]).

In order to make the above mentioned measures applicable, one often makes use of estimation. Classical methods unfortunately rely heavily on assumptions which are not always met in practice. For example, when there are outliers in the data, classical methods often have very poor performance. The idea in robust Statistics is to develop estimators that are not unduly affected by small departures from model assumptions, and so, in order to measure the sensitivity of estimators to outliers, the influence function (IF) was introduced (see [11,12]).

Let us begin by precising the objects and notation of our study, in particular the IF. To make the reading of what follows easier, we suppose that we have a probability space (Ω,A,E) holding a random variable X associated with the cumulative distribution function (cdf) F(x)=P(Xx), xR, and a sequence of independent copies of X: X1, X2, etc. This random variable is considered as an income variable so that it is nonnegative and F(0)=0. The absolute density distribution function (with respect to the Lebesgue measure on R) of X (pdf), if it exists, is denoted by f. Its mean, we suppose finite and nonzero, and moments of order α1 are denoted by

μF=0+ydF(y)(0,) and μF,α=0+yαdF(y),μF,1=μF.

The quantile function associated to F, also called generalized inverse function is defined by

Q(p)F1(z)=inf{zR,F(z)x},p[0,1]
and the Lorentz curve of F is given by
L(F,p)=q(p)μF, with q(p)=0pQ(s)ds,0s1.

A nonparametric estimation T(F) will studied as well as its plug-in nonparametric estimator of the form T(Fn) which is based on the sample X1, …, Xn, n1.

The IF(,T(F)) of T(F) is the Gateaux derivative of T at F in the direction of Dirac measures in the form

IF(z,T(F))=limϵ0T(Fϵ(z))T(F)ϵ=ϵT(Fϵ(z))|ϵ=0,
where
Fϵ(z)(u)=(1ϵ)F(u)+ϵΔZ(u),ϵ[0;1],

Δz is the cdf of the δz, the Dirac measure with mass one at z and z is in the value domain of F.

It is known that the asymptotic variance of the plug-in estimator T(Fn) of statistic T(F) is of the form σ2=IF(x,T(F))2dF(x) under specific condition, among them the Hadamard differentiability (see [13], Theorem 2.27, p. 19). So the IF gives an idea of what might be the variance of the Gaussian limit of the estimator if it exists. At the same time, the behavior of its tails (lower and upper) give indications on how lower extreme and/or upper extreme values impact on the quality of the estimation. For example, recently, the sensitivity of a statistic T(F) and the impact of extreme observations of some IFs have been studied by, e.g. [1].

Another interesting fact is that the IF behaves in nonparametric estimation as the score function does in the parametric setting (see [13], p. 19).

An area of application of the IF is that of measures of inequality (see, e.g. [1416]). Due to the importance of that key element in nonparametric estimation in Econometric and welfare studies, a collection of inequality measures is being actively made. To cite a few, the IF's of the following measures are given in the Appendix section: the generalized entropy class of measures of inequality GE(α), where α>0, the mean logarithmic deviation (MDL), the Theil Measure, the Atkinson Class of Inequality Measures of parameter α(0,1], the Gini Coefficient, the QSR Measure of Inequality.

Fortunately, [6] introduced the so-called Theil-like family, in which are gathered the Generalized Entropy Measure, the MDL [1719], the different inequality measures of Atkinson [20], Champernowne [21] and Kolm [22] in the following form:

Tn(F)=τ1h1μn1nj=1nhXjh2μn,
where μn=1nj=1nXj denotes the empirical mean while h, h1, h2, and τ are measurable functions.

The inequality measures mentioned above are derived from (2) with the particular values of α,τ,h,h1 and h2 as described below for all s>0:

  1. Generalized entropy

    α0,α1,τ(s)=s1α(α1),h(s)=h1(s)=sα,h2(s)0;

  2. Theil's measure

    τ(s)=s,h(s)=slog(s),h1(s)=s,h2(s)=log(s);

  3. Mean logarithmic deviation

    τ(s)=s,h(s)=h2(s)=log(s1),h1(s)1;

  4. Atkinson's measure

    α<1 and α0,τ(s)=1s1/α,h(s)=h1(s)=sα,h2(s)0;

  5. Champernowne's measure

    τ(s)=1exp(s),h(s)=h2(s)=log(s),h1(s)1;

  6. Kolm's measure

    α>0,τ(s)=1αlog(s),h(s)=h1(s)=exp(αs),h2(s)0.

This is simply the plug-in estimator of

T(F)=τEh(X)h1μFh2μF=τ(I).

The following conditions are required for the asymptotic theory.

B1 The functions τ admits a derivative τ which is continuous at I and τ(I)0.

B2. The functions h1 and h2 admit derivatives h1 and h2 which are continuous at μF with h1(μF)0.

B3. Ehj(X)<+, j=1,2.

This offers an opportunity to present a significant number of IF's in a unified approach. This may be an asset for inequality measures comparison. By the way, it constitutes the main goal of this paper.

Let us add more notation. The lower endpoint and upper endpoint of cdf F are denoted by

lep(F)=inf{yF,F(x)>0} and uep(F)=sup{yF,F(x)<1}.

So the domain of admissible values for X, denoted by VX, satisfies VXRX=[lep(F),uep(F)], the latter being the range of F.

The layout of this paper is as follows: In the next section we state our main result on the IF of the TLIM family members and some particularized forms related to each known members. For member whose IF's are already given, we will make a comparison. In Section 3, we give the complete proofs. In Section 4 we provide a conclusion and some perspectives. Section 5 is an appendix gathering IF's expressions of some members of the TLIM available in the literature.

2. MAIN RESULTS

(A) - The main theorem.

Theorem 2.1.

If conditions (B1)(B2) hold, then the IF of the TLIM index is given by

IF(z,F)=τ(I)h1(μF)Eh(X)h1(μF)2+h2(μF)(zμF)+h(X)Eh(X)h1(μF),
for zVX.

Remark on the asymptotic variance. It was said earlier that the plug-in estimator should give the asymptotic variance of the limiting Gaussian variable, if it exists, as

σ2=VXIF(X)2dP=EIF(X)2.

This is exactly the case from the asymptotic normality of the plug-in estimator as established in Theorem 2 in [23].

Let us move to the illustrations of our results for particular cases.

(B) - Particular forms.

Let us proceed to the study of particular members of the TLIM class. We will have to compare our results with existent ones if any in the appendix. When the computation are simple, we only give the result without further details.

(1) Mean logarithmic deviation. We have

τ(s)=s,h(s)=h2(s)=log(s1),h1(s)1
and next τ(s)1, h(s)=h2(s)=1s and h1(s)0. The application of Theorem 2.1 leads to
IF(z,DLM)=μF1(zμF)+(logzElogX),zRF.

(2) Theil's index. We have

τ(s)=s,h(s)=slogs,h1(s)=s,h2(s)=logs,
and next τ(s)1, h1(s)1 and h2(s)=1s. The application of Theorem 2.1 gives for zF,
IF(z,DLM)=μF1(zlogzEXlogX)μF2(μF+ElogX)(zμF)

(3) Class of generalized entropy Measures of parameter α, α{0,1}. We have

τ(s)=s1α(α1),τ(s)=1α(α1),h(s)=h1(s)=sα,h1=αsα1h2(s)0.

The application of Theorem 2.1 gives

IF(z,GE(α))=zαμF,αα(α1)μFαμF,α(α1)μFα+1(zμF),zRF.

(4) Class of Atkinson measures with parameter β(0,1). We have

τ(s)=1s1/β,h(s)=h1(s)=sβ,h2(s)0.

If we denote Xβ=E|X|β1β, the application of Theorem 2.1 yields

IF(z,At(β))=XβμF(zμFμFzαμF,ββμF,β),zRF.

(5) Champernowne's index. We have

τ(s)=1exp(s),h(s)=h2(s)=log(s),h1(s)1.

The application of Theorem 2.1 implies that

IF(z,Champ)=exp(ElogX)μF(1μF(zμF)(logzElogX)),zRF.

(6) Kolm's Familily of inequality measure of parameter α0. We have

τ(s)=1αlog(s),h(s)=h1(s)=exp(αs),h2(s)0.

By Theorem 2.1, we have

IF(z,Kolm(α))=1α(α(zμF)(exp(αz)Eexp(αX)1)),zRF.

3. PROOF OF THE MAIN THEOREM

In the following proof, we will use the method of finding the IF following argument as given in [24]. Suppose that we are interested in estimating T(PX), where PX the image measure is dP defined by dPX(B)=dP(XB) for B(R) and is also Lebesgue–Stieltjes probability law associated F, that is PX([a,b])=F(b)F(a) for all ab+. Here we use integrals based on measures and thus integrals in dF are integrals in dPX in the following sense: for any nonnegative and measurable function :RR, we have

(X)dP=infh(y)dPXh(y)dF(y).

Suppose that T(P) is defined on a family of probability measures Pλ, Pλ being associated with the random variable Xλ with X=Xλ0 and F=Fλ0. Suppose that T is independent of λ. If we have

λT(Pλ)=(y)λPλ,
where is measurable and PX-integrable. Then the IF at T(Fλ0)=T(F) is given by
IF(z,F)=(z)(y)dF(y)=(z)E(X).

Actually, the rule uses Gâteaux differentiations properties and constitutes one of the fastest methods of finding the IF. We are going to apply it.

Proof of Theorem 2.1.

We remind the notation.

I=Eh(X)h1(μF)h2(μF).

We have

λTLIM(PX)=λτ1h1XdPh(X)dPh1XdP.

We get

1τ(I)TLIM(PX)=h1(μF)Eh(X)h1(μF)2XλdP+1h1(μF)h(X)λPh2(μF)XλdP=h1(μF)Eh(X)h1(μF)2+h2(μF)X+h(X)h1(μF)λdP.

By centering at expectations, we have

IF(z,F)=τ(I)h1(μF)Eh(X)h1(μF)2+h2(μF)(zμF)+h(X)Eh(X)h1(μF),zVX.

4. CONCLUSION AND PERSPECTIVES

I this paper, we studied the Theil-like family of inequality measures introduced in [23]. Following the paper on the asymptotic finite-distribution normality, we focus on the IF of that family. Results are compared with those of some authors in particular. We think that this unified and compact approach will serve as general tools for comparison purpose. In addition, in computation packages, it allows more compact programs resulting in more efficiency. A paper on computational aspects will follow soon.

5. APPENDIX: A LIST OF SOME IFS

Here, we list a number of inequality measures and the corresponding IFs.

The generalized entropy measures of inequality GE(α), which depends of a parameter α>0 and defined by

IEα=01α(α1)[yμFα1]dF(y)=1α(α1)(μF,αμFα1),α>0,α{0,1},
has the IF (see e.g. [1])
IF(z;IEα)=1α(α1)μFα(zαμα)μα(α1)μFα+1[zμF],α{0,1}.

Important remark. Our result on the IF of the GE(α) is different from that of [1] by the multiplicative coefficient 1α(α1)μFα. In other words, that coefficient is missing in [1]. We also find the same result by the computations below which is a direct proof.

λGE(α)=λGE(α)=λ1α(α1)(XαdPXdPα1)=1α(α1)μFαXααμFα1XμF2αλdP.

By the method described in the proof, we may center the integrand to get

IFX,GE(α)=1α(α1)μFα(XαEXα)αμFα1(XEX)μF2α.
which again gives the result.

The MDL, which is a special case of the GE class where α=0, defined by

IE0=0log(yμF)dF(y)=logμ1ν,ν=ElogX,

is associated to the IF

IF(z,IE0)=[logzν]+1μ1[zμF].

The Theil measure, which also is a special case of the GE class for α=1,

IE1=0yμFlog(yμF)dF(y)=νμFlogμF,ν=EXlogX,
has the IF
IF(z;IE1)=1μF[zlogzν]ν+μFμ12[zμF].

The Atkinson class of inequality measures of parameter α(0,1], defined by (see [1])

IAα=10yμF1αdF(y)1/(1α)=1μF,1α1/(1α)μF,α>0,α1,
and its IF is given by
IF(z;IAα)=ν(1(1ϵ))1(1ϵ)μF(z1ϵν)+ν1(1ϵ)μF2(zμF),
where ν=EX1ϵ.

We notice that for α=1, we have

IA1=1e0(logy)dyμ=1eIE0,

The Gini coefficient, defined by (see e.g. [1]):

IG=1201L(F,p)dp,
has the IF
IF(z,IG)=2R(F)C(F,F(z))+zμF(R(F)(1F(z))),
where
R(F)=01L(F,p)dp
and C is is the cumulative functional defined by
C(F,p)=0Q(p)xdF(x),0p1.

The QSR measure of inequality, defined by

η=Q(0.8)ydF(y)0Q(0.2)ydF(y)=EX1{X>Q(0.8)}EX1{XQ(0.2)},
where 1A is an indicator function of a set A, is associated with the IF described below (see [16]). Let
N(F)=Q(0.8)xdF(x)
and
D(F)=0Q(0.2)xdF(x).

And define the subdivision of R+: A1=[0,Q(0.2)], A2=(Q(0.2),Q(0.8)), A3=(Q(0.8),1] and set

I1(z,η)=zN(F)+0.2Q(0.8)D(F)+0.8Q(0.2)N(F)]/D2(F);I2(z,η)=0.2Q(0.8)D(F)0.2Q(0.2)N(F)]/D2(F);I3(z,η)=zD(F)0.8Q(0.8)D(F)0.2Q(0.2)N(F)]/D2(F).

The SQR IF is defined by

I1(z,η)=I1(z,η)1A1(z)+I2(z,η)1A2(z)+I3(z,η)1A3(z).

AUTHORS' CONTRIBUTIONS

Each author contributed of 25% of the paper.

ACKNOWLEDGMENTS

The authors warmly thank all the team of research led by professor for their comments et proof-reading of the paper. Thank also to the editor of the journal for insightful comments.

REFERENCES

2.F.A. Cowell, E. Flachaire, and S. Bandyopadhyay, Goodness-of-Fit: An Economic Approach, Distributional Analysis Research Programme (DARP 101), The Toyota Centre, London School of Economics and Political Science, London, 2009. http://sticerd.lse.ac.uk/_new/publications/series.asp?prog=DARP
3.B. Hulliger and T. Schoch, in Presented at the Swiss Statistics Meeting (Geneva), 2009.
4.M. Zenga, Giornale degli Economisti e Annali di Economia, Vol. 43, 1984, pp. 301-326.
17.F.A. Cowell, Theil, Inequality and the Structure of Income Distribution, London School of Economics and Political Sciences, 2003. http://eprints.lse.ac.uk/2288/
21.D.G. Champernowne and F.A. Cowell, Economic Inequality and Income Distribution, Cambridge University Press, Cambridge, 1998.
24.J. Kahn, Influence Functions for Fun and Profit, Ann Arbor, 2015. Working Paper http://j-kahn.com/files/influencefunctions.pdf
Journal
Journal of Statistical Theory and Applications
Volume-Issue
18 - 3
Pages
270 - 277
Publication Date
2019/08/30
ISSN (Online)
2214-1766
ISSN (Print)
1538-7887
DOI
10.2991/jsta.d.190818.002How to use a DOI?
Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Tchilabalo Abozou Kpanzou
AU  - Diam Ba
AU  - Pape Djiby Mergane
AU  - Gane Samb LO
PY  - 2019
DA  - 2019/08/30
TI  - On the Influence Function for the Theil-Like Class of Inequality Measures
JO  - Journal of Statistical Theory and Applications
SP  - 270
EP  - 277
VL  - 18
IS  - 3
SN  - 2214-1766
UR  - https://doi.org/10.2991/jsta.d.190818.002
DO  - 10.2991/jsta.d.190818.002
ID  - Kpanzou2019
ER  -