Journal of Statistical Theory and Applications

Volume 19, Issue 2, June 2020, Pages 148 - 161

Empirical Estimation for Sparse Double-Heteroscedastic Hierarchical Normal Models

Authors
Vida Shantia1, S. K. Ghoreishi2, *
1Department of Statistics, Science and Research branch, Islamic Azad University, Tehran, Iran
2Department of Statistics, Faculty of Sciences, University of Qom, Qom, Iran
*Corresponding author. Email: atty_ghoreishi@yahoo.com
Corresponding Author
S. K. Ghoreishi
Received 5 March 2019, Accepted 11 June 2019, Available Online 2 June 2020.
DOI
10.2991/jsta.d.200422.001How to use a DOI?
Keywords
Asymptotic optimality; Heteroscedasticity; Empirical estimators; Sparsity; Stein's unbiased risk estimate (SURE)
Abstract

The available heteroscedastic hierarchical models perform well for a wide range of real-world data, but for the data sets which exhibit heteroscedasticity mainly due to the lack of constant means rather than unequal variances, the existing models tend to overestimate the variance of the second level model which in turn will cause substantial bias in the parameter estimates. Therefore, in this study, we develop heteroscedastic hierarchical models, called double-heteroscedastic hierarchical models, that take into account the heterogeneity in the means for the second level of the models, in addition to considering the heterogeneity of variance for the first level of the models. In these models, we assume that the vector of means in the second level is sparse. We derive Stein's unbiased risk estimators (SURE) for the parameters in the model based on data decomposition and study their risk properties both in theory and in numerical experiments under the squared loss. The comparison between our SURE estimator and the classical estimators such as empirical Bayes maximum likelihood estimator (EBMLE) and empirical Bayes moment estimator (EBMOM) is illustrated through a simulation study. Finally, we apply our model to a Baseball data set.

Copyright
© 2020 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

1. INTRODUCTION

Hierarchical models have been extensively studied and widely used in many disciplines such as biology, climatology, ecology, medicine and engineering. A hierarchical model is a multi-level model which integrates information from different sources to achieve coherent inferences of unknowns (e.g., [13]). Among the many statisticians who have made significant contribution to theories and applications of hierarchical models, James and Stein [4] and Stein [5] were pioneers by studying simultaneous statistical inferences for the mean of several normal populations. The joint work of James and Stein has fundamentally increased the use of hierarchical models in recent decades. Stein's shrinkage estimators that had an interesting empirical Bayes interpretation were the basis for developing shrinkage estimation in multilevel normal models. Later, Efron and Morris [6] showed further implications of Stein's shrinkage estimator and proposed several competing parametric empirical Bayes estimators.

To date, both parametric and nonparametric empirical Bayes properties of shrinkage estimators have been extensively studied under either a homoscedastic (equal subpopulation variances) or heteroscedastic (unequal subpopulation variances) assumption. For more details, see Berger and Strawderman [7] and Brown and Greenshtein [8]. The majority of these investigations have focused on the risk properties of estimators with various loss functions. For example, the admissible minimax estimators for homoscedastic hierarchical normal models with usual quadratic loss functions were considered by Baranchik [9], a class of proper Bayes minimax estimators was studied by Strawderman [10], and a sufficient condition for admissibility of generalized Bayes estimators was investigated by Brown [11]. Comparisons between those estimators under different loss functions are discussed in Brown [12], Berger [13] and Berger and Strawderman [7].

Recently, heteroscedastic hierarchical normal models have received more attention due to the demand for real applications. Xie et al. [14] proposed a class of shrinkage estimators based on Stein's unbiased risk estimate (SURE) and studied the asymptotic properties of various common estimators as the number of means to be estimated increases. In particular, they established the asymptotic optimality property for the SURE estimators. They further extended their estimators to a class of semi-parametric shrinkage estimators and established corresponding asymptotic optimality results. Ghoreishi and Meshkani [15] considered the estimation of a set of normal population means by assuming heteroscedasticity in both levels of a two-level hierarchical model. They developed weighted shrinkage estimators of population means based on weighted SURE. This is achieved by first estimating the nuisance parameters of variances and then using them in the derivation of the shrinkage estimators of means. Still in the context of heteroscedastic models, Xie et al. [16] discussed the simultaneous inference of mean parameters in a family of distributions with quadratic variance function. They studied the asymptotic optimality properties of their semi-parametric/parametric shrinkage estimators that were defined for the location-scale family and the natural exponential family. Ghoreishi [17] studied the Bayesian analysis of a fully heteroscedastic hierarchical model. He used a class of local-global shrinkage priors, Dirichlet–Laplace priors and evaluated the optimal posterior concentration of their corresponding Bayes estimators.

Compared to nonparametric Bayes estimators, parametric empirical Bayes estimators are more commonly used in real data analysis [18]. The application of parametric empirical Bayes usually involves unknown hyper-parameters that need to be estimated. The empirical Bayes maximum likelihood estimator (EBMLE), empirical Bayes moment estimator (EBMOM) and SURE are the common approaches for the hyper-parameter estimation, see Brown [19] and references therein.

Although in general the available heteroscedastic hierarchical models perform well for a wide range of real-world data, they may fit poorly for the data sets which exhibit heteroscedasticity mainly due to the lack of constant means rather than unequal variances. For such data, implementation of the existing heteroscedastic hierarchical models tends to overestimate the variance of the second-level model which will cause substantial bias in the parameter estimates. According to the above description, the present study is an attempt to consider the following points

  • We develop heteroscedastic hierarchical models that take into account the heterogeneity in the means for the second-level of the model, in addition to considering the heterogeneity of variance for the first level of the models. We call these extended models as “double-heteroscedastic hierarchical models.”

  • Throughout the paper, we assume that the vector of the second level means is a sparse vector. So, we derive SURE for the parameters in the model based on data decomposition and study their risk properties both in theory and in numerical experiments under the squared loss. The comparison between our SURE estimator and the classical estimators such as EBMLE and EBMOM is illustrated through a simulation study.

Practically, our sparse double-heteroscedastic hierarchical models can be widely used in signal detection and gene effect studies. For example, when many gene families (a group of homologous genes which they share similar sequences and they may have identical functions) are compared for the healthy and treated groups, only a few of these sequences are detected, which are known to cause the disease. Here, the sparse double-heteroscedastic hierarchical models perform well for this comparison.

In the application section, we apply our model to the baseball data analyzed by Brown [19] and compare the batting average of two halves for each player and, finally, detect the players who have had different performances in two halves of the baseball season.

The paper is structured as follows: In section 2, we first give preliminaries including notations and definitions, then introduce a two-level sparse heteroscedastic hierarchical model. Section 3 derives estimators for all parameters in the hierarchical model, and Section 4 studies their risk properties. Extensive stimulation studies for evaluation of our parameter estimates are given in Section 5 and a data example of applying our model is presented in Section 6, followed by a brief discussion in Section 7. Technical proofs are given in the Appendix.

2. PRELIMINARIES

2.1. Notations and Definitions

Given a vector xn, define its lp-norm as xpp=i|xi|p, and define x0=iIxi0. Given x, define x+:=max(0,x). For i{1,2,,n}, let α(i) stand for the i-th largest entry of α=(α1,α2,,αn) in absolute value. In other words, one has |α(1)||α(2)||α(n)|. Also, [n] is short for the set {1,2,,n}, and |S| is the cardinality of set S.

2.2. Two-Level Sparse Heteroscedastic Hierarchical Normal Model

Let Y1,Y2,,Yn be independent normal random variables with mean θ1,θ2,,θn and variance A1,A2,,An, respectively. Here we assumes Ai's are possibly distinct points in . Then we assume θi's are independent and normally distributed with mean αi's and constant variance λ. These assumptions form a two-level heteroscedastic hierarchical model as follows:

Yi|θiN(θi,Ai)θiN(αi,λ);i=1,2,,n.(2.1)

We further assume that the vector α=(α1,α2,,αn) is a sparse vector of order k0, that is, θ0k0, where k0n. This sparsity assumption intrinsically guarantees the identifiability of θi's. Note that the sparsity of vector α=(α1,,αn) also violates the condition that the mean of θi's are constant.

From model (2.1) and using Bayes' theorem, we have the posterior distribution of θi as the following:

Nλλ+AiYi+Aiλ+Aiαi,λAiλ+Ai.

We opt for the posterior mode to be a class of shrinkage estimators for θi. Then, given the sparsity condition, this class of estimators become

θ̂iS=λλ+AiYi+Aiλ+AiαiifiSλλ+AiYiifiSc.(2.2)
where S={i|αi0} and |S|=k0.

Since the usual heteroscedastic hierarchical models are derived based on model (2.1) but with conditions α1=α2==αn, our model (2.1) is more flexible and thus will be applicable to a wider range of real data. Statistical inferences using Model (2.1) include the estimation of the hyper-parameter λ, as well as the estimation of the local hyper-parameters α1,α2,,αn. Indeed, the latter task is the biggest challenge in practice. Our primary goal is to develop a methodology for estimating the hyper-parameters in Model (2.1). In addition, we are interested in the performance of the sparse shrinkage estimators θ̂iS's in (2.2).

3. EMPIRICAL PARAMETER ESTIMATORS OF HYPER-PARAMETERS

We introduce empirical methods for estimating the hyper-parameters in model (2.1). We will start with the empirical methods for estimating the hyper-parameter λ, followed by the estimation of sparsity parameter k0 and nonzero set S. Then we discuss the empirical estimation of α1,α2,,αn.

3.1. Empirical Estimators of λ

Although empirical approaches such as EBMLE and EBMOM have been frequently used to estimate the model hyper-parameters, we aim to use SURE to estimate the hyper-parameter λ (and other hyper-parameters), because the SURE estimators possess asymptotic optimality properties within the class of heteroscedastic hierarchical normal models [14,15].

From (2.1), the marginal distribution of Yi is

N(αi,λ+Ai).(3.1)

Let J be a collection of subsets of {1,2,,n} of size n2. For simplicity, we assume n to be even. Define

XJn:=iJnYi2λ+Ai,JnJ.

It is easy to verify that XJn follows a chi-square distribution with n2 degrees of freedom and noncentrality parameter iJnαi2λ+Ai. Given λ, assume Jn is the subset such that

XJn=infJnJXJn.

By Lemma 1 in Carpentier and Verzelen [20], it is straightforward to see that for some constant c>0,

P116<1|Jn|XJn2.212ec|Jn|.

For more details on the proof and the c value see Boucheron et al. [21]. The above inequality follows by taking a union bound over all XJn for JnJ. Therefore, for a relatively large sample size, one approximately has

116<1λ+Amax1|Jn|iJnYi21λ+Amin1|Jn|iJnYi22.2.

Consequently, a range of the empirical estimate for λ is given by

12.2|Jn|iJnYi2Amin+<λ<16|Jn|iJnYi2Amax+,(3.2)
where Amin=miniJnAi and Amax=maxiJnAi.

If we are interested in estimating λ by EBMOM approach, we adopt the following estimator:

λ̂=iJn(Yi2Ai)+|Jn|.(3.3)

Or, if we are interested in an EBMLE estimator of λ, it will be obtained by maximizing the marginal density of YJn,jJn with respect to λ. The EBMLE estimator satisfies the following equation whenever the root exists:

iJn1λ+AiYi2(λ+Ai)2=0.(3.4)

We however focus on estimating λ using the SURE approach. This approach is based on the following squared-loss function:

L1(θ̂,θ)=iJn(θ̂iθi)2.(3.5)

Under this loss function and using (2.2), the shrinkage estimator

θ̂iS=λλ+AiYi
can be used for estimating θi over Jn. Therefore, the corresponding SURE estimator of λ is obtained by minimizing the unbiased estimator of risk function
R1(θ̂S,θ)=EL1(θ̂S,θ)=EiJn(θ̂iSθi)2=EiJnλλ+AiYiθi2=iJnEλλ+AiYiθi2=iJnAiλ+Ai(Aiθi2+λ2).(3.6)

If we define

SURE1(λ)=iJnAiλ+Ai2Yi2+Ai(λAi)Ai+λ,(3.7)
then the SURE estimator of λ is obtained as the minimizer of SURE1(λ) which is also the solution of the following equation whenever the root exists
iJnAi2(λ+Ai)3Yi2Ai2(Ai+λ)2=0.(3.8)

For more details see Xie et al. [14] and Ghoreishi and Meshkani [15].

3.2. Empirical Sparsity Estimation

The estimation of sparsity k0=α0 relies on the empirical characteristic function of Yi;iJnc=[n]Jn. Define the function h:[0,1] as h(x)=121cos(x)(x)2. Conventionally, we define h(0)=0 for x=0. It is easy to see that h(x)14x2 for large |x| and h(x)=x250+o(x2) around 0. Therefore, given t>0, we have

k0=α0  iJchtαiλ+Ai.(3.9)

We then employ the above approximation to estimate k0=α0.

To derive the estimator, we first consider the empirical characteristic function of Yi's that is defined as

κt(x):=11(1|u|)et2u22cos(txu)du.

Then we define a statistic Zλ̂(t) by

Zλ̂(t):=iJnc1κtYiλ̂+Ai.(3.10)

It is straightforward to see that

EZλ̂(t)λ̂=iJnc121costαiλ̂+Aitαiλ̂+Ai2=iJchtαiλ̂+Ai,
which implies that Zλ̂(t) is an empirical estimator of k0=α0. Hence, our empirical estimator for k0 is
k̂0=Zλ̂(t)+1+,
where u stands for the integer part of u and u+=max(0,u). This estimator is a function of t which is usually determined using cross-validation with the objective function as the mean squared prediction error (MSPE) of θ estimates. Section 5 gives more details on how to numerically determine t.

In addition, we may be interested in constructing an 1δ empirical confidence interval for k0. For this purpose, we define

w(x1,x2,,xk)=ik1κt(xi).

It is easy to verify that for a given t and i=1,2,,k,

supxi,x,yw(x1,x2,,xi1,x,xi+1,,xk)w(x1,x2,,xi1,y,xi+1,,xk)2et22,
where xi=(x1,x2,,xi1,xi+1,,xk). Therefore, using McDiarmid's inequality, Boucheron et al. [21], we have an empirical 1δ confidence interval for k0 as
PZλ̂(t)E(Zλ̂(t)))et22lnδ1.Jc1δ,
or equivalently, the confidence interval can be expressed into
k0:iJnc1κtYiλ̂+Ai±et22lnδ1.Jc.

Note that since |J|=n2 and k0n, the above confidence interval is practically more conservative. This is not surprising as confidence intervals derived based on Chebyshev's inequalities and its derivatives such as McDiarmid's or Hoeffding's inequalities are in general more conservative compared to the confidence intervals constructed using the exact distributions. An alternative approach to computing confidence interval is

k̂0±S.E.(k̂0),(3.11)
where
S.E.(k̂0)=iJnc1κtYiλ̂+AihtYiλ̂+Ai2.

Given k̂0, the subset S can be estimated by

Ŝ={R1,R2,,Rk̂0},(3.12)
where {R1,R2,,Rk̂0}{1,2,,n} are the samples corresponding to the largest |Yi|. However, this estimator can be biased without implementing an iterative estimation procedure. This is because under our model (2.1), the quantity iScYi2λ̂+Ai in theory has a central chi-square distribution over Sc. Whereas, this quantity often follows a noncentral chi-square distribution in practice due to the randomness of data. It is well-known that a central chi-square distribution is stochastically smaller than any noncentral chi-square with the same degrees of freedom. Therefore, in order to estimate Sc (thus S) accurately, we need to increase the size of Ŝc via a few iterative steps so that iŜcYi2λ̂+Ai will approximately have a central chi-square distribution. Indeed, Proposition 3.1 below provides intuitive ideas of increasing the size of Ŝc till the central chi-square distribution is approximately reached for iŜcYi2λ̂+Ai.

We propose the following iterative procedure for practical purpose:

  1. Randomly choose half of the sample as our initial estimate of Sc, i.e., |Ŝc|=n2.

  2. Randomly choose 5% of the observations in Ŝ and add them to Ŝc. If infŜcXŜc=infŜc1|Ŝc|iŜcYi2λ̂+Ai1.1, we accept this new Ŝc. Otherwise, we reject the new Ŝc and keep the previous Ŝc.

  3. Repeat Step 1 until Ŝc contains at least 90% of the data. This guarantees k0n0.01 which empirically satisfies our assumption of k0n.

The justification of the proposed iterative estimation procedure is stated in the following proposition:

Proposition 3.1.

Given λ̂,Ŝc, and under marginal model (3.1) with sparsity assumption of α0=k0, one has

P1|Ŝc|iŜcYi2λ̂+Ai>1.1λ̂<e|Ŝc|2.

The proof for this Proposition is deferred to the Appendix.

3.3. Empirical Estimation of αi's

To estimate the hyper-parameters αi's, we apply the squared-loss function

L2(θ̂,θ)=iS(θ̂iθi)2,(3.13)
to samples in S. Under this loss function, one can compute the risk function of the shrinkage estimators in (2.2):
R2(θ̂S,θ)=EL2(θ̂S,θ)=EiŜ(θ̂iSθi)2=EiŜλλ+AiYiθi2=iŜEλλ+AiYi+Aiλ+Aiαiθi2=iŜλ2Ai(λ+Ai)2+iŜAi2(λ+Ai)2(αiθi)2.(3.14)

An unbiased estimator for the risk R2(θ̂iS,θ) is thus given by

SURE2(α1,α2,,αk0)=iŜλ2Ai(λ+Ai)2iŜAi3(λ+Ai)2+iŜAi2(λ+Ai)2(Yiαi)2.

Given empirical estimates λ̂, k̂0, and Ŝ, it is straightforward to see that the SURE estimates of αi's are

α̂i=Yi,(3.15)
for iŜ, which are obtained by minimizing SURE2(α1,α2,,αk̂0) with respect to α1,α2,, and αk̂0. Combining (3.8) and (3.15), the SURE estimates of θi's are
θ̂iSURE=YiifiŜλ̂λ̂+AiYiifiŜc,(3.16)

4. RISK PROPERTIES OF SURE ESTIMATOR

We establish properties of the SURE estimator in (3.16) under the following squared-loss function that is applied to the entire data:

L(θ̂,θ)=1ni=1n(θ̂iθi)2.(4.1)

For ease of notation, in the following we assume θ̂=θ̂SURE. Hence, we have that

L(θ̂,θ)=1ni=1n(θ̂iθi)2=1nL1(θ̂,θ)+1nL2(θ̂,θ).

This decomposition allows us to evaluate the performance of the SURE estimators using estimators derived based on partial data, that is,

SURE(λ,α1,α2,,αk0)=1nSURE1(λ)+1nSURE2(α1,α2,,αk0).

The following theorem shows how well SURE(λ,α1,α2,,αk0) approximates L(θ̂,θ).

Theorem 4.1.

Under Conditions C1 and C2 in the Appendix, we have

supλ>0,α1,α2,,αk0SURE(λ,α1,α2,,αk0)L(θ̂,θ)0inL2,asn.

The proof of this theorem is deferred to the Appendix.

5. NUMERICAL STUDIES

We carry out simulation studies to evaluate the performance of the SURE estimators for sparse heteroscedastic hierarchical normal models. To measure the performance of the estimators, we compute the risk R(θ̂,θ) of the proposed shrinkage estimators. We have n=5000 repetitions in our simulation for each model defined in the four scenarios below. All four scenarios have the two-level model

YiN(θi,Ai)θiN(αi,0.4),
with sparsity size k=10 but with different settings for Ai and θi.
  • Scenario I: Baseline model

    AiU(0,1),i=1,2,,5000,αi=5fori=1,,k0=100fori=k+1,,5000

  • Scenario II: Assume a two-component mixture model for αi

    AiU(0,1),i=1,2,,5000,αi=0.5N(3,Ai)+0.5N(5,Ai)fori=1,,k0=100fori=k+1,,5000

    Scenario II introduces a more complex mean structure than Scenario I.

  • Scenario III: Assume a three-component mixture model for αi

    AiU(0,1),i=1,2,,5000,αi=13N(2,Ai)+13N(5,Ai)+13N(10,Ai)fori=1,,k0=10  0fori=k+1,,5000

  • Scenario IV: Assume three-component mixture models for both Ai and αi

    Ai0.3U(0,1)+0.7U(1,2),i=1,2,,5000,αi=13N(2,Ai)+13N(5,Ai)+13N(10,Ai)fori=1,,k0=10  0fori=k+1,,5000

    Scenario IV has a more complex variance structure than Scenarios I-III.

For each simulated data, we estimate the hyper-parameters λ, k0 and αi, using Equations (3.8), (3.10), and (3.15), respectively. The 95% confidence interval of k0 is also computed based on (3.11). Given k̂0, the nonzero set S will be estimated by (3.12) and we further compute the misclassification rate (MCR) of S defined as the ratio of misclassified αi's to n=5000. We run simulation 200 times to obtain 200 sets of hyper-parameters estimates and MCR, then compute the empirical coverage probability (ECP) defined as the rate of true k0 being covered by its confidence interval over all simulation runs and the risk of parameter estimates, R(θ̂,θ), where R(θ̂,θ) is defined in (4.1). As is seen the parameter estimates depend on t values. A general rule for determining t is through the cross-validation method, that is, one opts for t that minimizes the MSPE defined as

MSPE(t)=1nj=1nYjθ̂j(t)2.

Using this criterion, we find t=1 roughly gives the smallest MSPE with our simulation data, so we fix t=1 to compare the performance of different methods.

To evaluate the performance of our estimators, we consider the oracle estimator of λ, i.e., λ̃=argminλ0iJnλλ+AiYiθi2 and its corresponding shrinkage estimator θ~i=λ~λ~+AiYi, for iJn. Here, the notation θ~i rather than θ̂i is used for emphasizing that θ̃i depends on unknown θi and hence is not really an estimator. However, since θ~i has smaller loss or risk within the class of estimators in the form θ̂i=λ̂λ̂+AiYi, we simply treat it as a reference for evaluating the performance of our SURE estimators.

In addition to the oracle estimator, we also compute the EBMOM and EBML estimates and their MCR, ECP and risk. The summary of simulation results over 200 runs is reported in Table 1. In particular, we show the variability of λ̂, k̂0, and MCR over 200 simulation runs through boxplots in Figures 1, 2 and 3. The results show that under the sparse heteroscedastic hierarchical normal model with nonconstant means we consider in this article, our proposed SURE estimator clearly outperforms EBMOM and EBMLE and performs similarly as the oracle estimator in terms of both parameter estimation and identification of nonzero locations.

Scenarios Method λ̂ k̂0 MCR ECP R(θ̂,θ)
Oracle 0.401 (0.023) 9.421 (3.09) 0.137 (0.083) 0.70 (0.173) 0.217
Scenario I SURE 0.404 (0.043) 9.381 (3.24) 0.191 (0.092) 0.63 (0.173) 0.249
EBMOM 0.513 (0.058) 3.299 (3.41) 0.671 (0.173) 0.21 (0.121) 0.734
EBMLE 0.627 (0.067) 0.632 (4.33) 0.819 (0.192) 0.08 (0.114) 1.117
Oracle 0.401 (0.022) 7.948 (3.19) 0.212 (0.108) 0.66 (0.165) 0.422
Scenario II SURE 0.401 (0.023) 7.700 (3.32) 0.241 (0.123) 0.61 (0.154) 0.437
EBMOM 0.592 (0.051) 1.890 (3.89) 0.757 (0.208) 0.14 (0.102) 0.801
EBMLE 0.693 (0.055) 0.860 (3.61) 0.901 (0.203) 0.02 (0.093) 1.115
Oracle 0.402 (0.027) 9.808 (3.27) 0.163 (0.099) 0.66 (0.163) 0.267
Scenario III SURE 0.403 (0.027) 9.592 (3.35) 0.182 (0.104) 0.64 (0.160) 0.298
EBMOM 0.601 (0.057) 3.978 (3.48) 0.610 (0.123) 0.22 (0.123) 0.721
EBMLE 0.617 (0.056) 1.154 (3.62) 0.857 (0.173) 0.07 (0.099) 1.112
Oracle 0.401 (0.023) 8.883 (3.11) 0.220(0.008) 0.65(0.176) 0.333
Scenario IV SURE 0.403 (0.026) 8.576 (3.23) 0.236(0.008) 0.61(0.142) 0.421
EBMOM 0.577 (0.054) 5.396 (3.88) 0.498(0.049) 0.42(0.133) 0.877
EBMLE 0.611 (0.051) 3.479 (4.64) 0.619(0.058) 0.24(0.114) 1.234

EBMLE, empirical Bayes maximum likelihood estimator; EBMOM, empirical Bayes maximum likelihood estimator; SURE, Stein's unbiased risk estimate; MCR, misclassification rate; ECP, empirical coverage probability.

Table 1

Simulation results under various scenarios. The columns of λ^, k̂0, MCR, and ECP are mean estimates and their standard deviations inside the parentheses over 200 simulation runs.

Figure 1

Boxplots for λ̂ with EBMLE, EBMOM, Oracle and SURE methods for four scenarios. The true value of λ is 4.

Figure 2

Boxplots for k̂0 with EBMLE, EBMOM, Oracle and SURE methods for four scenarios. The true value of k0 is 10.

Figure 3

Boxplots for MCR with EBMLE, EBMOM, Oracle and SURE methods for four scenarios.

To examine whether the true sparsity of the data affects the sparsity parameter estimation, we rerun the simulation by setting k0 in the data generation process to 15, 20, 30 and 50, respectively. Table 2 summarizes the k0 estimates together with the ECP and MCR at different k under the four scenarios. The results show that k0 estimates in Scenario I are consistently close to the true values across all k0, but when the mean structure and the variance structure become more complex, as in Scenario II to IV but especially in Scenario III and IV, k̂0 tends to underestimate the true k0 as k0 increases. As a consequence, the ECP of confidence intervals for k0 deteriorates and the MCR of S estimates is elevated, whenever k0 is underestimated.

k̂0
10 15 20 30 50
k̂0 9.381 13.612 18.704 28.375 46.441
Scenario-I ECP 0.63 0.60 0.71 0.68 0.44
MCR 0.187 0.146 0.104 0.077 0.078
k̂ 7.700 11.890 15.658 24.276 40.442
Scenario-II ECP 0.61 0.50 0.44 0.31 0.19
MCR 0.282 0.243 0.228 0.194 0.191
k̂ 9.592 14.705 18.892 19.284 19.709
Scenario-III ECP 0.64 0.62 0.69 0.14 0.11
MCR 0.177 0.113 0.357 0.676 0.605
k̂ 8.576 13.632 17.482 17.574 17.681
Scenario-IV ECP 0.61 0.67 0.56 0.09 0.05
MCR 0.238 0.142 0.159 0.414 0.646

SURE, Stein's unbiased risk estimate; MCR, misclassification rate; ECP, empirical coverage probability.

Table 2

Estimation of k̂0 ECP, and MCR with SURE method under various k and scenarios.

6. APPLICATION TO REAL DATA

We analyze the baseball data that was first introduced by Brown [19] and then later analyzed by Xie et al. [14]. This dataset contains batting records for all Major League Baseball players in the season of 2005. We first divide the dataset into two half seasons, then our goal is to compare the batting average of two halves for each player and detect the players who have had different performance in two halves of the baseball season. Following Brown [19] and Xie et al. [14], we removed the players whose number of at-bats is less than 10 to improve the accuracy of estimation. After this screening procedure, only 503 players out of 929 were selected.

Following the notations in Brown [19] and Xie et al. [14], we use N to denote the number of at-bats and H to denote the number of successful battings. Then we have

HijB(Nij,pij),
where i=1,2 stands for the season and j=1,2,,n stands for the player. Again following Brown [19], we take a variance-stabilizing transformation of the data to obtain X:
Xij=arcsinHij+0.25Nij+0.4,
which approximately follows the distribution of
XijNθij,14Nij,
where θij=arcsinpij. Finally, we apply our methodology to variables
Yj=X1jX2jNθj,141N1j+1N2j;j=1,2,,503,
where θj=θ1jθ2j.

Our analysis assumes independency between the performance of each player in two half seasons. This assumption may seem unreasonable at first glance. However, if we look at the performance of each player in two half-seasons, the Yule association measure, defined as

H1jN1jH2jN2jH1jN1j+H2jN2j,
will be a small number around zero, because for each player H1jN1jH2jN2j. Therefore, the independency assumption for each individual is not unrealistic.

The cross-validation yields t=0.973, and our SURE estimates for λ is 1.6e4, and for k0 is 1 with a 95% confidence interval [0, 10]. We conclude that at most 10 players perform differently in two half seasons. We also compute the SURE estimate of θi, and select 10 players that correspond to the largest θîSURE in absolute value. The name of the 10 players (i.e., the estimate of S), their total number of at-bats and total number of successful battings for two halves seasons, a two-sample Z statistic defined by

Z=X1X2141N2+1N1,
and the SURE estimate of θi are presented in Table 3. These 10 players include 9 nonpitchers and 1 pitcher. According to θ̂SURE, the most different performance is related to the nonpitcher players. For example, Ordonez and Balanco are the two nonpitch players that perform most differently over the two half seasons although their differences are in opposite directions.

Full Name Pitchers (1)/Nonpitchers (0) N1 N2 H1 H2 Z-Stat. θ̂SURE
Magglio Ordonez 0 10 208 0 92 2.716 0.4379
Brandon Webb 1 27 35 0 6 2.622 0.3375
Victor Martinez 0 252 442 61 106 2.864 0.1281
Todd Helton 0 269 414 71 92 2.769 0.1278
Rafael Furcal 0 304 507 71 104 2.625 0.1109
Aaron Hill 0 131 273 47 52 2.609 0.0258
Moises Alou 0 225 341 73 37 −3.257 −0.1632
Desi Relaford 0 175 210 45 2 −2.976 −0.2781
Tony Blanco 0 44 56 11 0 −2.921 −0.4202
Table 3

Information of the 10 players and their θ estimates.

7. DISCUSSION

We develop heteroscedastic hierarchical normal models with sparse and unequal mean as well as derive SURE for such models at demand of the data sets which exhibit heteroscedasticity mainly due to the lack of constant means rather than unequal variances. Our model fills in the gap that no existing literatures are devoted to the heteroscedastic hierarchical normal models with sparse and unequal mean structure. Both theory and simulation study show that our SURE holds nice properties and outperforms the classic EBML and EBMOM estimator for our proposed model. However, we also notice that the SURE estimates are sensitive to the mean and variance structures as the sparsity decreases. We leave the investigation on improving parameter estimates for future research.

ACKNOWLEDGMENTS

We wish to thank the editor and two anonymous referees whose comments greatly improved the article.

APPENDIX

To establish the asymptotic results, we assume two mild conditions following Ghoreishi and Meshkani [15] and Xie et al. [14],

  • C1)

    limsupn1ni=1nAi<.

  • C2)

    limsupn1ni=1nθi2<.

Moreover, without loss of generality, we assume that the sub-populations were re-indexed such that we have 0<A1A2An.

Proof of Proposition 3.1

Define ui=Yi2λ^+Ai, and X=i=1|Ŝc|u(i). Let Ŝc is a set of size |Ŝc| that does not intersect with the support of α. Then

X=i=1|Ŝc|u(i)χ|Ŝc|2

That is, X follows a χ2 distribution with |Ŝc| degrees of freedom. By Cai and Guo [22], we know that

PX>1.1|Ŝc|eC|Ŝc|,
and this completes the proof. Recall that |Ŝc|n.

Proof of Theorem 4.1

Consider the squared-loss function

L(θ̂,θ)=1nin(θ̂iθi)2=1nL1(θ̂,θ)+1nL2(θ̂,θ).

It is easy to see that the corresponding risk function of this loss function is given by

R(θ̂,θ)=1niJnAiλ+Ai(Aiθi2+λ2)+1niŜλ2Ai(λ+Ai)2+1niŜAi2(λ+Ai)2(αiθi)2.

The SURE unbiased estimator of R(θ̂,θ) is

SURE(λ,α1,α2,,αk0)=1nSURE1(λ)+1nSURE2(α1,α2,,αk0).

For a given k0n, one has the following decomposition

SURE(λ,α1,α2,,αk0)L(θ̂,θ)=1nSURE1(λ)L1(θ̂,θ)+1nSURE2(α1,α2,,αk0)L2(θ̂,θ).

Hence, by taking the absolute value on both sides of the above equation, one has

supλ>0,α1,α2,,αk0SURE(λ,α1,α2,,αk0)L(θ̂,θ)1nsupλ>0SURE1(λ)L1(θ̂,θ)+1nsupα1,α2,,αk0SURE2(α1,α2,,αk0)L2(θ̂,θ).

We consider the two terms of the right hand side separately. For the first term, we have

SURE1(λ)L1(θ̂,θ)=iJnYi2Aiθi22λλ+Ai(Yi2θiYiAi).

It is known that for a decreasing sequence of positive numbers cj=λλ+Ai

supλ>0SURE1(λ)L1(θ̂,θ)iScYi2Aiθi2+supλ>0iSc2λλ+Ai(Yi2θiYiAi)iScYi2Aiθi2+dsupc1c2,cniSc2ci(Yi2θiYiAi).

Applying Theorem 3.1 in Ghoreishi and Meshkani [15] together with Conditions C1) and C2), we have

supλ>01n|SURE1(λ)L1(θ̂,θ)|0inL2,as|Sc|orn.(A.1)

For the second term, it is obvious that

SURE2(α1,α2,,αk0)L2(θ̂,θ)=iS(Yiαi)2Ai(θiαi)22λλ+Ai(Yiαi)2(θiαi)(Yiαi)Ai.

Now, by definition Xi=Yiαi and βi=θiαi, we see again that

supα1,α2,,αk0SURE2(α1,α2,,αk0)L2(θ̂,θ)iSXi2Aiβi2+iS2λλ+Ai(Xi2βiXiAi)iScXi2Aiβi2+supλ>0iS2λλ+Ai(Xi2βiXiAi).

In this case, our hierarchal model on S is defined as

XiN(βi,Ai),βiN(0,λ).

Obviously from Conditions C1) and C2), we have

supα1,α2,,αk01n|SURE2(α1,α2,,αk0)L2(θ̂,θ)|0inL2,asn.(A.2)

Equations (A.1) and (A.2) complete the proof.

REFERENCES

4.W. James and C.M. Stein, University of California Press, in Proceedings of the 4th Berkeley Symposium on Probability and Statistics (Berkeley, CA, USA), Vol. I, 1961, pp. 367-379.
Journal
Journal of Statistical Theory and Applications
Volume-Issue
19 - 2
Pages
148 - 161
Publication Date
2020/06/02
ISSN (Online)
2214-1766
ISSN (Print)
1538-7887
DOI
10.2991/jsta.d.200422.001How to use a DOI?
Copyright
© 2020 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Vida Shantia
AU  - S. K. Ghoreishi
PY  - 2020
DA  - 2020/06/02
TI  - Empirical Estimation for Sparse Double-Heteroscedastic Hierarchical Normal Models
JO  - Journal of Statistical Theory and Applications
SP  - 148
EP  - 161
VL  - 19
IS  - 2
SN  - 2214-1766
UR  - https://doi.org/10.2991/jsta.d.200422.001
DO  - 10.2991/jsta.d.200422.001
ID  - Shantia2020
ER  -