Journal of Statistical Theory and Applications

Volume 17, Issue 3, September 2018, Pages 439 - 461

Parameter Estimation Using the EM Algorithm for Symmetric Stable Random Variables and Sub-Gaussian Random Vectors

Authors
Mahdi Teimouriteimouri@aut.ac.ir
Department of Statistics, Faculty of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic), 424 Hafez Ave., Tehran 15914, Iran Department of Statistics, Faculty of Science, Gonbad Kavous University, Gonbad Kavous, Iran
Saeid Rezakhah*, rezakhah@aut.ac.ir
Department of Statistics, Faculty of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic), 424 Hafez Ave., Tehran 15914, Iran
Adel Mohammadpouradel@aut.ac.ir
Department of Statistics, Faculty of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic), 424 Hafez Ave., Tehran 15914, Iran
*Corresponding author.
Corresponding Author
Saeid Rezakhahrezakhah@aut.ac.ir
Received 18 November 2016, Accepted 2 August 2017, Available Online 30 September 2018.
DOI
10.2991/jsta.2018.17.3.4How to use a DOI?
Keywords
EM algorithm; Markov Chain Monte Carlo; Symmetric α-stable distribution (SαS); Sub-Gaussian α-stable distribution
Abstract

Applying some well-known properties of the class of symmetric α-stable (SαS) distribution, the EM algorithm is extended to estimate the parameters of SαS distributions. Furthermore, we extend this algorithm to the multivariate sub-Gaussian α-stable distributions. Some comparative studies are performed through simulation and for some real data sets to show the performance of the proposed EM algorithm compared with some well-known methods including empirical characteristic function, maximum likelihood, and sample quantile in the univariate and multivariate cases.

Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

1. Introduction

The theory of stable (or α-stable) distributions has received much interest in different fields as physics, economics, finance, and telecommunications since this class of distributions can account for tails thickness, peak value, and skewness. The empirical density function of many real life data sets found in aforementioned fields has heavy tails so that normal models are clearly inappropriate. For comprehensive accounts of applications of stable distributions, we refer the readers to [13], [19], [20], [26], [28], and [31]. There are several parameterizations for characteristic function of a stable random variable. In the following, Definition 1.1 gives the S1 parameterization which is more popular.

Definition 1.1.

The characteristic function of a stable random variable, say X, has the form

φX(t)=Eexp(itX)={exp{-|σt|α[1-iβsgn(t)tan(πα2)]+iμt},α1,exp{-|σt|[1+iβsgn(t)2πlog|t|]+iμt},α=1,
where sgn(.) is the well-known sign function.

A stable distribution is described by four parameters: index of stability α ∈ (0, 2] (also called tail parameter), skewness parameter β ∈ [−1, 1], scale parameter σ ∈ IR+, and location parameter μ ∈ IR. We write S(α, β, σ, μ) to denote a stable distribution in this parameterization. If β = 0, it would be the class of symmetric α-stable (SαS) distributions. We assume throughout that β = 0, in which case φX(t) = exp{−|σt|α}.

In the following, we mention three well-known methods for estimating parameters of this family. The maximum likelihood (ML) estimation for stable distributions approximated theoretically by [9] and evaluated numerically by [21]. Although such approaches lead to some efficient estimates, both involve numerical complexities. The software proposed by [21] called STABLE uses a spline interpolation of stable densities for his numerical method. This package estimates all four parameters of a stable distribution efficiently when α ≥ 0.4. The empirical characteristic function (CF) method is suggested in [12]. As another approach, sample quantile (SQ) technique proposed in [15] presents consistent estimators for all parameters based on five specific sample quantiles.

Definition 1.2.

Let Y be a d-dimensional sub-Gaussian α-stable (sub-Gaussian SαS) random vector. Then, the characteristic function of Y can be written as

φY(t)=Eexp(itTY)=exp{-(tTΣt)α/2+itTμ},
where Σ is a d × d positive definite matrix and μ ∈ IRd is the location parameter.

Computing the maximum likelihood (ML) estimation of a sub-Gaussian SαS distribution directly by maximizing the log-likelihood of sub-Gaussian SαS density, known as full MLE in the literature, has been performed in [24]. Finding the full MLEs is computationally expensive. So, a projecting method is suggested in which a sub-Gaussian SαS random vector is projected into different directions. Then since each component follows a univariate SαS distribution, parameters of each component are estimated using methods such as ML, CF, and SQ which developed for the univariate case, [24]. The obtained estimates in such a way are known as projected estimations and these methods are called the projected ML, CF, and SQ, respectively. In the projection approach, the dispersion matrix of a sub-Gaussian SαS distribution is estimated term-by-term, and so there is no guarantee that the reconstructed matrix to be positive definite. The issue with Σ possibly not being positive definite is addressed in [24]. Also, the projected ML approach still is applicable only for α ≥ 0 4.

Although the proposed EM algorithm in this work is more computationally expensive than the ML, CF, and SQ methods, the following reasons motivate us to propose it. Contrary to the ML approach, it works satisfactorily for all ranges of in the univariate and multivariate cases. Also, it gives better performance than the ML, CF, and SQ approaches when α is near one. In the multivariate case, it estimates all entries of dispersion matrix simultaneously so that the estimated dispersion matrix is always positive definite. The proposed EM algorithm can be developed for modelling the mixture of SαS distributions. This advantage of the proposed EM algorithm received much interest in robust mixture modelling since SαS is a heavy tailed distribution. Modelling the mixture of sub-Gaussian SαS distributions suggested in [29] using Gibbs sampling. We can develop the introduced EM algorithm in this work for modelling the mixture of sub-Gaussian SαS distributions that is faster than the Gibbs sampling approach.

This paper is organized as follows. Some preliminaries are given in Section 2. In Section 3, we propose the EM algorithm for estimating the parameters of a univariate SαS distribution. The methodology given in Section 3 is developed for the sub-Gaussian SαS distribution in Section 4. In Section 5, some simulation studies are considered to compare the performance of the presented EM algorithms in the univariate and multivariate cases versus the ML, CF, and SQ approaches. Our method will be applied to some sets of real data in this section. We conclude the paper in Section 6 with a summary of the contributions of this work.

2. Preliminaries

2.1. EM algorithm and extension

Missing or incomplete observations frequently occurs in the statistical literature. The EM algorithm, introduced in [7], is a popular inferential tool for such a situation. The application of the EM algorithm also includes cases with latent variables or models with random parameter provided that they are formulated as a missing value problem. For a brief description of EM methodology, let x = (x1,…, xn) denote the complete observations follow the density function f(x|θ). We denote x = (y, z), where z denotes missing samples and y accounts for the available observations, [16]. The computation and maximization of

f(y_|θ)=z_f((y_,z_)|θ)dz_,
are difficult in such a case. So, we propose to use the EM algorithm. Let Lc(θ) = f((y, z)|θ) be the complete data likelihood function. The EM algorithm works by maximization of the conditional expectation of the complete log-likelihood function Q(θ|θ(t)) = E(lc(θ; x)|y,θ(t)) given the available data and a current estimate θ(t) of the parameter (here, lc(θ; x) indicates on the logarithm of Lc(θ)). Since missing values are unobservable, we replace them by their expectation, given observed values and a current guess (or estimate) for θ. Each iteration of the EM algorithm involves two steps as:
  1. 1.

    Expectation (E) step: Compute Q(θ|θ(t)) at t-th iteration.

  2. 2.

    Maximization (M) step: Maximize Q(θ|θ(t)) with respect to θ to get θ(t+1).

The E and M steps, in above, are repeated until convergence occurs. When the M step of the EM algorithm becomes analytically intractable, we follow the idea of implying this step using a sequence of conditional maximization, known as the CM step. The resulting procedure is known as the ECM algorithm, [18]. A faster extension of the EM algorithm is the so-called ECME algorithm introduced in [14]. It should be noted that all the EM, ECM and ECME have the same E step. The ECME algorithm works by maximizing the constrained Q(θ|θ(t)) via some CM steps and maximizing the constrained marginal likelihood function, [1]. Sometimes implementation of the EM algorithm is difficult. In this situation, another extension of this algorithm, called the stochastic EM (SEM) is useful, [2], [3], and [4]. Each SEM works by simulating missing data from conditional density f(zi|θ(t), yi); for i = 1,…, n and replacing it into the complete likelihood function. Then, we apply the EM algorithm for the pseudo-complete sample where (Z1(t),,Zn(t)) are simulated ones. The updated parameter is then used to simulate from f(zi|θ(t+1), yi); for i = 1,…, n. This process is repeated until convergence occurs for the distribution of the {θ(t+1)}. Under some mild regularity conditions, {θ(t+1)} constitutes a Markov chain that converges to a stationary distribution, [11]. The number of iterations is determined via an exploratory data analysis approach such as a graphical display. Another example presented in [11] is to consider a total of 300 iterations of the SEM for estimating the parameter of modulus of rupture of 50 observations of fir woods after fitting a two-parameter Weibull model under Type-I censored scheme (the data were censored at 1.6 which corresponds to a censoring of rate 34%). As pointed out by the authors, the SEM is generally very robust to the starting values. Using a burn-in of M0 iterations, the sequence {θ(t)} is expected to be close to some stationary point. After a sufficiently large number of iterations, say M, the SEM estimation of θ is given by

θ^=1M-M0t=M0+1Mθ(t),
where M0 is burn-in size. Summarizing the above statements, each iteration of the SEM algorithm consists of two steps as follows.
  1. 1.

    Stochastic imputation (S) step: Impute the simulated missing values and constituting the pseudo-complete log-likelihood function at t-th iteration.

  2. 2.

    Maximization (M) step: Find a θ, say θ(t+1), which maximizes the pseudo-complete log-likelihood function at t-th iteration.

The S and M steps, in above, are repeated until convergence occurs.

2.2. Properties of symmetric and positive stable laws

Suppose that Y1 and Y2 > 0 are independent stable random variables, Y1 ~ S(α1, 0, σ, 0) and Y2 ~ S(α2, 1, (cos(πα2/2))1/α2, 0) for 0 < α2 < 1. Then, Y1Y21/α1S(α1α2,0,σ,0) , known as the product property in the literature. Based on product property, one can write

Y=dPN1+μ,
where Y ~ S(α, 0, σ, μ), P ~ S(α/2, 1, (cos(πα/4))2/α, 0), and N1 is a zero-mean Gaussian random variable with variance 2σ2. So,
Y|P=p~N2,
where N2 is a Gaussian random variable with mean μ and variance 22. As the second useful property, assume that P ~ S(α/2, 1, (cos(πα/4))2/α, 0). Then,
EP=dW(α/2),
where E is an exponential random variable with mean one and W(ν) is a Weibull random variable, independent of E, with density function fW (w) = νwν−1 exp {−wν}; for w > 0. Since 0 < α/2 < 1 and β = 1, then P sometimes is called the positive stable random variable. For a proof of (2.4), we refer readers to [17] and [30, p. 15].

3. EM Algorithm for SαS

In this section, we introduce two approaches to extend the EM algorithm for SαS distribution. Presenting the first algorithm in Subsection 3.1, we consider a computational method for the first approach in Subsection 3.2. The second approach is presented in Subsection 3.3.

3.1. The first approach

Let y = (y1,…, yn) be a random sample from a SαS distribution. Refer to (2.3) and consider p = (p1,…, pn) as the vector of missing observations. The complete data likelihood Lc(Θ) is factored into the product of the marginal densities of Pi and the conditional densities of Yi given Pi. So, the complete data log-likelihood can be written as

lc(Θ)=l1c(θ)+l2c(α),
where Θ = (θT, α)T, θ = (μ, σ)T. It follows that
l1c(θ)=C-nlogσ-12i=1n(yi-μ)22σ2pi,
and
l2c(α)=i=1nlogfP(pi|α),
where C is a constant independent of the parameter θ and fP(.) denotes the density function of the positive stable random variable P. Conditional expectation of the complete data log-likelihood Q(Θ|Θ(t)) = E(lc(Θ; p)|y, Θ(t)) is given by
Q(Θ|Θ(t))=Q(θ|Θ(t))+Q(α|Θ(t)),=C-nlogσ-12i=1n(yi-μ)2ei(t,s)2σ2+i=1nE[logfP(Pi|α)|yi,Θ(t)],
where ei(t,s)=E(P-1|yi,μ(t),σ(s),α(s)) , Θ(t) = (θ(t)T, α(t))T and θ(t) = (μ(t), σ(t))T. Here, ei(t,s) is the posterior mean of 1/P given yi and Θ(t); for i = 1,…,n. Thus, the EM algorithm’s steps are
  • E step: Given current estimation of Θ, Θ(t), compute ei(t,t) .

  • CM step: Update θ(t) as θ(t+1) by maximizing Q(Θ|Θ(t)) with respect to θ to obtain

    μ(t+1)=i=1nyiei(t,t)i=1nei(t,t),
    σ(t+1)=i=1n(yi-μ(t+1)2ei(t+1,t))2n.

  • CML step: Update value of α(t) by maximizing the marginal log-likelihood function as

    α(t+1)=argmaxαi=1nlogfY(yi|μ(t+1),σ(t+1),α),
    where fY(.) is the SαS density function. Under the methodology that we follow here, the CML step itself contains an SEM algorithm (This is because the density function of SαS random variable has no closed-form expression). In the following, two steps of the SEM are described.

  • The first step of CML: Let E1, E2,…, En be independent exponentially distributed random variables with mean one. At (t + 1)-th iteration, assume that μ(t+1) and σ(t+1) are known. Let Yi = (Yiμ(t+1))/σ(t+1) and Yi=Yi/Ei ; for i = 1,…,n. Then it follows, from (2.2) and (2.4), that

    Yi|Wi=wi~N(0,2wi2),Wi~W(α).
    Considering w1,…, wn as missing observations, the complete data likelihood Lc(Θ) is factored into the product of the marginal densities of Wi and the conditional densities of Y″i given Wi. So, the complete data log-likelihood can be written as
    lc(α)=C+nlogα+(α-1)i=1nlogwi-i=1nwiα.

  • The second step of CML: Simulate the vector w = (w1,…, wn) from conditional distribution of Wi given yi; for i = 1,…,n, and Θ(t) using the Monte Carlo method at the m-th cycle of the S step of the SEM.

  • The third step of CML: Substitute w in (3.3) and maximize it with respect to α to find α(t+1). Then, repeat the algorithm from E step for M times to obtain Θ(t) as

    Θ^=1M-M0t=M0+1MΘ(t),
    where M0 is burn-in size. Maximizing (3.3) with respect to α in each cycle is equal to computing the ML estimation of a Weibull distribution with shape parameter α.

3.2. Computational considerations of the first approach

In the E step of the algorithm, we need to calculate the conditional mean ei(t,t) and ei(t+1,t) in iteration steps (3.1) and (3.2), respectively. Unfortunately, this quantity cannot be calculated analytically. At (t + 1)-th iteration of the E step, we have

ei(t,t)=0u-3/2fP(u|α(t))4πσ(t)exp{-(yi-μ(t)2uσ(t))2}du0u-1/2fP(u|α(t))4πσ(t)exp{-(yi-μ(t)2uσ(t))2}du,
and after obtaining μ(t+1) in the CM step, by replacing μ(t+1) in (3.4) we obtain ei(t+1,t) . Then, we update σ(t+1) in the CM step. Direct computation of ei(t,t) using numerical integration faces with problem when |yi|, for i = 1,…,n, is large. To avoid computational issues, we use the series representation for the density function fP(.|α(t)) of S(α(t)/2, 1, (cos(πα(t)/4))2/α(t), 0) as
fP(p|α(t))1πj=1k(-1)j-1Γ(jα(t)2+1)Γ(j+1)sin(jπα(t)2)p-jα(t)2-1,
for large k, [10] and [25]. Substituting the right-hand side of (3.5) into (3.4), after a suitable change of variable and simplifications, we get
ei(t,t)j=1k(2σ(t))jα(t)+2|yi-μ(t)|jα(t)+2(-1)j-1Γ(jα(t)2+1)Γ(jα(t)2+32)sin(jπα(t)2)Γ(j+1)j=1k(2σ(t))jα(t)|yi-μ(t)|jα(t)(-1)j-1Γ(jα(t)2+1)Γ(jα(t)2+12)sin(jπα(t)2)Γ(j+1),
where Γ(.) denotes the well-known gamma function. It should be noted that series in numerator and denominator of (3.6) converge for
|yi-μ(t)|2σ([Γ(kα(t)/2+α(t)/2+1)Γ(kα(t)/2+α(t)/2+3/2)(k+1)Γ(kα(t)/2+1)Γ(kα(t)/2+3/2)]1/α(t),),
where k is sufficiently large. This makes the computations to be fast. We found that choosing k = [min(168, 168/α(t))], where [x] denotes the integer part of x, yields an almost exact evaluation of fP (.|α(t)) in (3.5) and so of ei(t,t) in (3.6). For those yi that series in (3.5) does not converge, we use Monte Carlo based approximation of ei(t,t) given by
ei(t,t)(j=1NPj-3/24πσ(t)exp{-(yi-μ(t)2Pjσ(t))2})÷(j=1NPj-1/24πσ(t)exp{-(yi-μ(t)2Pjσ(t))2}),
where Pj ~ S(α(t)/2, 1, (cos(πα(t)/4))2/α(t), 0); for j = 1,…, N, is simulated using the method given in [5]. During our simulation, we set N = 2000.

As the CML step itself contains a SEM algorithm, simulation from conditional distribution of Wi given yi=(yi-μ(t+1))/(σ(t+1)ei) (ei, here, denotes an observation from standard exponential distribution) and α(t) can be performed using Metropolis-Hasting algorithm. We follow our simulation based on Weibull distribution with the shape parameter α(t) as the candidate. The acceptance rate, Awi, becomes

Awi=min{1,winewexp{-(yiwinew)24}wi(t)exp{-(yiwi(t))24}}.
The main drawback of the Metropolis-Hasting algorithm is that the new values may have little chance for acceptance in each iteration. In our study, the acceptance rate decays when yi gets large. Due to the heavy tail property of stable distribution, large values are likely to happen. Therefore, we adopt a rejection sampling in m-th cycle of the SEM algorithm of the CML step to generate from the posterior distribution of Wi given yi and α(t+1); for i = 1,…, n. To describe this scheme, we note that the density function
fYi|Wi(yi|wi)=wi4πexp{-(yiwi)24},
as a part of the conditional (posterior) density function
fWi|Yi(wi|yi,α(t+1))fYi|Wi(yi|wi)fWi(wi)=α(t+1)wiα(t+1)4πexp{-(yiwi)24-wiα(t+1)},
is bounded by some constant independent of wi. More precisely
fYi|Wi(yi|wi)exp{-0.5}2π|yi|.
Hence, a rejection sampling is employed to generate from the posterior distribution by the following steps.
  1. 1.

    Simulate a sample from a Weibull distribution, say wi, with shape parameter α(t+1)and scale unity.

  2. 2.

    Generate a sample from a uniform distribution U(0,exp{-0.5}/(2π|yi|)) , say u.

  3. 3.

    If u<wi4πexp{-yi2wi24} , then accept wi as a generation from fWi|Y″i(wi|y″); otherwise, go to 1.

While we expect that the rejection sampling works satisfactorily, it faces with the problem when |yi| is so large. In this case that the machine sets the quantity wi4πexp{-yi2wi24} to zero and consequently generation from posterior density is so time consuming, we use the mode of the posterior density as a generation from the posterior distribution. It is not hard to check that the mode of the posterior distribution Wi|Yi = yi is obtained as a solution of the equation

h(wi)=α(t)-(wiyi2)2-α(t)wiα(t)=0.
Our studies reveal that if |yi| is so large that machine sets the quantity wi4πexp{-yi2wi24} to zero, then choosing the mode of posterior distribution described by (3.7), as an observation from this distribution not only improves the accuracy of the simulation, but also accelerates the EM algorithm implementation.

3.3. The second approach

The first approach works efficiently, but it can be improved to reduce computational complexity by relaxing some CM steps for computing μ(t+1) and σ(t+1). This approach significantly reduces the processing times for implementing the EM algorithm. In the CM step of the second approach, we only update μ while both σ and α are updated through the SEM in the CML step. Like the previous approach, the location parameter is updated through (3.1). Let E1, E2,…, En be independent exponentially distributed random variables with mean one. At (t + 1)-th iteration of the first step of the CML, assume that μ(t+1) is known. Let Yi = (Yiμ(t+1)) and Yi=Yi/Ei ; for i = 1,…, n. It follows that

Yi|Wi=wi~N(0,2σ2wi2),Wi~W(α).
The complete data log-likelihood can be written as
lc(α,σ)=C-nlogσ-12i=1n(yiwi)22σ2+nlogα+αi=1nlogwi-i=1nwiα.
By maximizing (3.8) with respect to σ and α, we find
σ(t+1)=i=1n(yiwi)22n,
and α(t+1) is the solution of
h(wi)=nα+i=1nlogwi-i=1nwiαlogwi=0,
where wi is simulated from conditional distribution of Wi given yi and Θ(t); for i = 1,…,n, using rejection sampling at the m-th cycle of the SEM step. Once σ(t) and α(t) are updated, we implement the E step and repeat the algorithm for a number of M times and obtain Θ^ as
Θ^=1M-M0t=M0+1MΘ(t).
In the first approach, we need to compute ei(t,t) for updating μ(t) and ei(t+1,t) for updating σ(t); for i = 1,…,n. But in the second approach, the sequence of ei(t,t) updates μ(t) and σ(t) will be updated in the second step of the CML. This yields a considerable reduction in implementing time of the EM algorithm compared with the first approach.

4. EM Algorithm for the Multivariate case

Let Y denotes a sub-Gaussian SαS random vector as defined in (1.2). The representation

Y=μ+ΛZ,
holds when Z is a zero-mean Gaussian random vector with some d × d covariance matrix Σ, Λ follows S(α/2, (cos(πα/4))2/α, 1, 0), μ ∈ IRd is the location parameter and 0 < α < 2. Here, Λ and Z are statistically independent. The stochastic representation (4.1) is the multivariate counterpart of the representation given in (2.2). Hence, methodology introduced in the previous section can be applied to the sub-Gaussian SαS distribution. We have
Y|Λ=λ~N(μ,λΣ),Λ~S(α/2,(cos(πα/4))2/α,1,0).
By the followings, we summarize the EM algorithm for a sub-Gaussian SαS distribution based on the second approach presented in the Subsection 3.3. Let y = (y1 ,…, yn) denote a vector of n i.i.d. samples from a sub-Gaussian SαS distribution. The complete data log-likelihood is
lc(Θ)=lc(α,μ,Σ)=C-n2log|Σ|-12i=1n(yi-μ)TΣ-1(yi-μ)λi+i=1nlogfΛ(λi),
where fΛ(.) denotes the density function of positive stable random variable Λ. The conditional expectation of the complete data log-likelihood Q(Θ|Θ(t)) = E(lc(Θ; λ)y, Θ(t) is given by
Q(Θ|Θ(t))=C-n2log|Σ|-12i=1n(yi-μ)TΣ-1(yi-μ)ei(t,s)+i=1nE[logfΛ(λi|α)|yi,Θ(t)],
where ei(t,s)=E(Λ-1|yi,μ(t),Σ(s),α(s)) . So, at the (t + 1)-th iteration of the CM step we obtain
μ(t+1)=i=1nyiei(t,t)i=1nei(t,t).
Let E1, E2,…, En be independent exponentially distributed random variables with mean one. At (t+1)-th iteration, for known μ(t+1), define Yi = (Yiμ(t+1)) and Yi=Yi/Ei ; for i = 1,…, n. We have that
Yi|Wi=wi~N(0,wi-2Σ),
where Wis are independent and fWi(w) = αwα−1 exp{−wα}; for i = 1,…,n. The complete data log-likelihood becomes
lc(α,Σ)=C-n2log|Σ|-12i=1n(yi)TΣ-1(yi)wi2+nlogα+αi=1nlogwi-i=1nwiα,
where yi = (yiμ(t+1))/Ei. By simulating w from conditional distribution of Wi given yi and α(t); for i = 1,…, n, at the m-th cycle of the SEM, we follow some iterated steps to find Σ and α in the M step of the SEM algorithm that maximize the pseudo-complete log-likelihood function. Hence, the updated Σ becomes
Σ(t+1)=i=1nyi(yi)Twi2n,
and α(t+1) is a solution of
h(α)=nα+i=1nlogwi-i=1nwiαlogwi=0.
Updating μ(t+1), Σ(t+1) and α(t+1) are implemented by (4.2), (4.4), and (4.5), respectively for M times.

Corollary 4.1.

The updated dispersion matrix Σ(t+1) in (4.4) is positive definite.

4.1. Computational considerations

At (t + 1)-th iteration, updating μ(t) in (4.2) needs computing ei(t,t) . For this, when Di(t)=(yi-μ(t))T(Σ(t))-1(yi-μ(t)) is large, ei(t,t) is evaluated approximately as

ei(t,t)j=1k(-1)jΓ(j+1)Γ(jα(t)+22)Γ(jα(t)+d+22)sin(jπα(t)2)(Di(t))-jα(t)+d+22j=1k(-1)jΓ(j+1)Γ(jα(t)+22)Γ(jα(t)+d2)sin(jπα(t)2)(Di(t))-jα(t)+d2,
where
Di(t)([Γ(kα(t)+α(t)+22)Γ(kα(t)+α(t)+2+d2)(k+1)Γ(kα(t)+22)Γ(kα(t)+2+d2)]2α,),
and k = [min(168, 168/α(t))] in (4.6). If Di(t) does not lie in (4.7), we use the Monte Carlo based approximation of ei(t,t) as
ei(t,t)(j=1NΛj-d/2-1exp{-Di(t)2Λj})÷(j=1NΛj-d/2exp{-Di(t)2Λj}),
where Λj ~ S(α(t)/2, 1, (cos(πα(t)/4))2/α(t), 0) for large N. As before, we set N = 2000 for implementing the EM algorithm. To construct the pseudo-complete data, wis are generated in (4.3) through a rejection sampling by the following steps.
  1. 1.

    Simulate a sample from a Weibull distribution, say wi, with shape parameter α(t+1) and scale unity.

  2. 2.

    Define b=dd/2((yi)TΣ-1yi)-d/2exp{-d/2}(2π)d/2|Σ|1/2 and generate samples from a uniform distribution U (0, b), say u.

  3. 3.

    If u<widexp{-12((yi)TΣ-1yi)wi2}(2π)d/2|Σ|1/2 , then accept wi as a generation from fWi|Yi(wi|yi) ; otherwise, go to 1.

if (yi)T−1yi is so large, we impute wi=d((yi)TΣ-1yi)-1/2 .

5. Simulation study and Real Data Analysis

This section has five parts. In the first part, we analyze the performance of the proposed EM algorithm compared with other estimators in the univariate case through a simulation study. In the second part, we follow the analyses for the performance of the proposed EM algorithm using real data. Simulation studies in the bivariate case are performed in the third subsection. The fourth subsection is devoted to real data analysis for a bivariate case. Simulation studies in three-variate and higher dimensions are given in the fifth subsection. The set of real data used in the univariate and multivariate cases is 9 years of daily returns of 22 major worldwide market indices, which consists of 2535 observations from 1/4/2000 to 9/22/2009. These indices are from four different areas including America (S&P500, NASDAQ, TSX, Merval, Bovespa and IPC), Europe and Middle East (AEX, ATX, FTSE, DAX, CAC40, SMI, MIB and TA100), and East Asia and Oceania (HgSg, Nikkei, StrTim, SSEC, BSE, KLSE, KOSPI and AllOrd). This data set is available at the website of Yahoo finance, [8]. For simplicity, hereafter, we call this data set WMI (Worldwide Market Indices).

5.1. Simulation study in univariate case

We compare the performance of the ML, SQ, CF, and the EM methods for estimating the parameters of SαS distribution through simulation. Comparisons are based on square root of the mean-squared error (RMSE) criterion. In the simulation, the RMSE was computed for sample sizes of n = 200, 500, and 1000. The number of replications is N = 1000. The parameters were selected as: μ = 0, σ = 0 5, 1, 2, and α = 0.4, 0.6, 0.8, 0.9, 1.1, 1.2, 1.4, 1.6, 1.8, 1.9. The plots of the computed RMSE for all estimators are shown in Figures 13 . It should be noted that the estimation results for the evaluated ML approach (Nolan’s Method [21]) are not reliable for α < 0.4 and so are not shown in the Figures 13. The smoothed version of the actual values plotted in these figures is obtained using the lowess package (see [6]) with the following settings: the smoothing span is considered to be 2/3, the number of iterations is 3, and the speed of computations is determined by 0.01th of the range of the α values.

Fig. 1.

RMSE of α^ for different values of σ = 0.5, 1, 2 and for 0.4 ≤ α ≤ 1.9. In all sub-figures x-axis is values of α and y-axis is RMSE of α^ .

Fig. 2.

RMSE of σ^ for different values of σ = 0.5, 1, 2 and for 0.4 ≤ α ≤ 1.9. In all sub-figures x-axis is values of α and y-axis is RMSE of σ^ .

Fig. 3.

RMSE of μ^ for different values of σ = 0.5, 1, 2 and for 0.4 ≤ α ≤ 1.9. In all sub-figures x-axis is values of α and y-axis is RMSE of μ^ .

In each sub-figure of Figures 1 - 3, the vertical line indicates that the proposed EM algorithm works better than the CF and SQ methods for those α lie in the left side of the line. When the EM algorithm always works better, such a line is not shown. By considering the location parameter as zero, the following results are obtained from simulation and show how RMSE of the estimators varies in terms of σ, α and n:

  1. 1.

    EM-based α^ and σ^ have better performance than the SQ-based and CF-based α^ and σ^ for α < 1.1 in all cases, where shown in Figures 1 and 2, respectively.

  2. 2.

    EM-based μ^ shows better performance than the SQ and CF for all α, ref. Figure 3.

  3. 3.

    RMSE of the EM-based α^ and σ^ are smaller than the SQ-based α^ and σ^ .

  4. 4.

    We know that the ML method gives the efficient estimation particularly when sample size gets large. But, sometimes the EM-based α^ and σ^ outperform the ML counterparts. This occurs, because STABLE gives the approximated ML estimation, not the exact ML estimation.

  5. 5.

    RMSE of all estimators decreases by increasing the sample size n.

Along with the simulation, we investigate the execution time for different sample sizes and scale parameters in the univariate case. The results are given in Table 1. For this, all runs are performed on a machine with 3.5 GHz Core(TM) i7-2700K Intel(R) processor and 8 GB of RAM. It should be noted that implementing of the ML, CF, and SQ approaches, with settings given in Table 1, takes less than 0.5 second.

Scale parameter

Sample size σ=0.5 σ=1 σ=2
500 30(62) 26(64) 22(62)
1000 70(127) 55(122) 43(124)
5000 334(627) 283(639) 223(621)
Table 1.

Average of the execution time in seconds for 50 runs of the EM algorithm when it is applied to SαS distribution. The written program run on a 3.5 GHz Intel processor Core(TM) i7 using a 8 GB RAM. Note that values outside (inside) of parentheses are obtained for α = 0.5 (α = 1.5). We set M = 120, M0 = 70 for implementing the EM algorithm.

5.2. Real data analysis in the univariate case

For real data application, we select the Bovespa (São Paulo Stock Exchange), CAC40 (Bourse de Paris) and DAX (German Stock Index) indices from WMI data set. The empirical distributions of these three sets of data show symmetric pattern around the origin whose tails are heavier than the normal model. We apply the ML, CF, SQ, and EM methods to estimate the parameters of the fitted stable distributions. It should be noted that the sample median is used as an initial value for the location parameter to implement the EM algorithm. Initial values for the tail and scale parameters are obtained by applying the method of the logarithmic moment. This approach, for a symmetric stable distribution, estimates the parameters using the first and the second order moments of the logarithm of a zero-location symmetric stable random variable Y, i.e. L1 = E(log |Y|) and L2 = E(log |Y| − L1)2, respectively. Assuming that data, after subtracting from the sample median, follow a zero-location symmetric stable distribution, the parameters α and σ are estimated by equating the sample logarithmic moments to the quantities L1 and L2, [20] and [32]. Figure 4 displays cycles of the EM algorithm for M = 120 and M0 = 70.

Fig. 4.

Estimated values of α, σ, and μ with M = 120 cycles each with n = 2535 samples for Bovespa, CAC40 and DAX indices selected from WMI data set.

Estimated parameters, along with the Kolmogorov-Smirnov (K-S) and Anderson-Darling (A-D) test statistics are given in Table 2. More analyses reveal that stable model captures the general shape of the empirical histogram well and the main difference between fitnesses occurs at the origin for the height of peaks. As it is seen, EM algorithm gives satisfactory results.

Fig. 5.

Scatterplot of the IPC index versus S&P500 index (a) shows that distribution of points on a plane is roughly elliptical. Points in scatterplot are a number of n = 2535 daily returns from April 1, 2000 through September 22, 2009. Marginal density plots of the S&P500 (b) and IPC (c) show that the stable distribution (black dashed line) provides a better fit than the normal (red dotted line) distribution. Blue solid line is the fitted empirical density.

Bovespa CAC40

Parameters ML CF EM SQ ML CF EM SQ

α 1.8616 1.8976 1.78017 1.7544 1.7074 1.7898 1.6629 1.6076
β −0.5908 −0.828 −0.3021 0.0578 0.3089 0.0974
σ 0.6407 0.6431 0.6151 0.6171 0.5461 0.5561 0.5148 0.5287
μ −0.0572 −0.0541 −0.0155 −0.0805 0.0222 0.0487 0.0111 0.0193

K-S 0.0314 0.0302 0.0305 0.0388 0.0237 0.0251 0.0192 0.0191
A-D 1.7552 1.9720 2.4888 2.5601 1.2075 2.1783 1.6430 1.0461

DAX

Parameters ML CF EM SQ

α 1.8541 1.8856 1.8140 1.6642
β −0.2978 −0.508 −0.1368
σ 0.6401 0.6378 0.6048 0.6029
μ −0.0507 −0.0550 −0.0289 −0.0493

K-S 0.0268 0.0251 0.0220 0.0171
A-D 1.9657 2.1389 2.1374 1.1655
Table 2.

Estimation results for the return of Bovespa, CAC40, and DAX data through the ML, SQ, CF, and EM approaches applied to the data.

5.3. Simulation study in bivariate case

Estimating the parameters of a sub-Gaussian SαS distribution is computationally expensive, [24]. Usually, the parameters of this distribution are estimated using projection method. Hereafter, we write ML, CF, and SQ for the projected ML, CF, and SQ methods, respectively. In the following, we perform a simulation based on a sample of size n = 500 to measure the performance of the ML, CF, EM, and SQ approaches in modelling a bivariate sub-Gaussian S S distribution. For this, we set α = 1.5, 1.7, 1.9, μ = (μ1, μ2)T = (0,0)T, and dispersion matrix

Σ=(σ11σ12σ21σ22)=10-6×(5.95524.07834.078313.9861),
as used in [27]. We follow the CF, SQ, and ML approaches by applying the STABLE software [21]. The boxplots, based on 250 runs for all methods, are depicted in Figure 6. It should be noted that ML estimations are used here as starting values for EM algorithm. However, our investigations show that the proposed EM algorithm is robust with respect to initial values.

Fig. 6.

Boxplot of the ML, CF, EM, and SQ estimations for α, μ = (μ1, μ2)T, and entries of Σ when n = 500 vectors generated from a bivariate sub-Gaussian SαS with α = 1.5, 1.7, 1.9, μ = (μ1,μ2)T = (0,0)T, σ11 = 0.000059552, σ12 = 0.000040783, and σ22 = 0.000139861. In each sub-figure, horizontal line denotes the real value of parameter. Each boxplot is constructed based on N = 250 runs. The used color scheme under each method for boxplots is: blue, red, and green for α = 1.5, 1.7, and 1.9, respectively.

5.4. Real data analysis in bivariate case

Here, we apply ML, SQ, CF, and EM methods to S&P500 and IPC indices selected from WMI data set. The empirical distribution of both S&P500 and IPC indices show a symmetric pattern around the origin. The scatterplot shown in Figure 5 is roughly elliptical contoured. Furthermore, estimated tail parameters after fitting a stable distribution to S&P500 and IPC indices through the ML approach are 1.9003 and 1.9143, respectively, which are assumed to be equal. Thus, the assumption that X = (S&P500, IPC)T follows a sub-Gaussian SαS distribution is acceptable, [23]. The results of estimating parameters through ML, CF, EM, and SQ methods are shown in Table 3. The log-likelihood values indicate that the EM algorithm gives better performance than the CF and SQ approaches.

Parameters

Method α μ Log-likelihood
ML 1.9073 (0.453960.419910.419910.45626) (0.005280.00697) −4967.706
CF 1.9530 (0.456680.419570.419570.46008) (0.025050.02437) −4978.067
EM 1.7116 (0.378300.345730.345730.37589) (-0.02494-0.02215) −4975.581
SQ 1.8179 (0.427510.350740.350740.43147) (0.015590.02313) −5176.572
Table 3.

Estimation results or the return of (S&P500, IPC) when the ML, CF, EM, and SQ approaches are applied to the data.

5.5. Analysis in higher dimensions

In this subsection, firstly we perform a simulation to compare the performance of the ML, CF, EM, and SQ methods where a sample of size n = 1500 is generated from three-variate sub-Gaussian SαS distribution. For this purpose, the considered settings are: α = 1.86, μ = (μ1, μ2, μ3)T = (0, 0, 0)T , and

Σ=(σ11σ12σ13σ21σ22σ23σ31σ32σ33)=10-5×(6.2933.2893.6433.2899.1333.9213.6433.9217.871),
which were used by [24]. Secondly, we fit a thirty-dimensional sub-Gaussian S S distribution to the 30 indices of Dow Jones data described in [24].

For three-variate case, results of the simulation are displayed in Figure 7 where each boxplot is based on 250 runs. If we define the length of each box as a performance criterion, it follows that the EM algorithm works better than the SQ method in all cases. The performance of the EM and CF approaches is almost the same except in the case of estimating α where the EM algorithm performs much better than the CF method. In the case of case of thirty-dimensional, implementing the EM algorithm takes 303 seconds when M = 200 and M0 = 150. To avoid unnecessary details, we only focus on estimating the location parameter and diagonal entries of the dispersion matrix using the EM and ML methods. The result for estimating these parameters are given in Figures 8-9, respectively. As it is seen, the EM algorithm gives results close to the ML approach for estimating diagonal entries of the dispersion matrix. The results of simulation show that the computational cost increases as dimensions increase but differences are not significant. For example, the execution time for implementing the EM and ML methods are given in Table 4. For this, we generated n = 1000 random vectors of a sub-Gaussian SαS distribution.

Fig. 7.

Boxplot of the ML, CF, EM, and SQ estimations for α, μ = (μ1, μ2, μ3)T, and entries of Σ when n = 1500 vectors generated from a three-variate sub-Gaussian SαS with α = 1.86, μ = (μ1, μ2, μ3)T = (0, 0, 0)T, σ11 = 0.0006293, σ12 = 0.0003289, σ13 = 0.0003643, σ22 = 0.0009133, σ23 = 0.0003921, and σ33 = 0.0007871. In each sub-figure, horizontal line denotes the real value of parameter. Each boxplot is constructed based on N = 250 runs.

Fig. 8.

Estimated values for diagonal entries of dispersion matrix when the EM and ML methods are applied to 30 indices of Dow Jones data. The signs × and + correspond to the EM and ML methods.

Fig. 9.

Estimated values for location parameter when the EM and ML methods are applied to 30 indices of Dow Jones data. The signs × and + correspond to the EM and ML methods.

d α =1.5 α =1
5 201(1.2) 185(0.6)
10 222(3.3) 200(3.5)
20 230(7.6) 223(7.8)
50 275(42) 258(40)
100 456(193) 389(173)
Table 4.

Average of the execution time in seconds for 50 runs of the EM and ML methods where they are applied to a d-dimensional sub-Gaussian SαS distribution. The written program run on a 3.5 GHz Intel processor Core(TM) i7 using a 8 GB RAM. During simulation, we set M = 200, M0 = 150, and Σ is a positive definite matrix whose eigenvalues are randomly generated from a uniform distribution on (0, 2). The value inside of the parentheses corresponds to the ML method.

6. Concluding remarks

We propose an EM algorithm for estimating the parameters of SαS and sub-Gaussian SαS distributions. For SαS case, the performance of the proposed EM algorithm is compared with known approaches such as ML, CF, and SQ through a comprehensive simulation study. The comparisons are based on square root of the mean-squared error (RMSE). It follows that the proposed EM algorithm is a good competitor for the ML, CF, and SQ methods. Furthermore, in the univariate case, its performance is proved using three sets of real data. For the sub-Gaussian SαS case, the performance of the proposed EM algorithm is investigated via simulation and real data modelling. To avoid unnecessary details, we consider only two-, three-, and thirty-dimensional cases. The simulation reveals that the proposed EM algorithm provides satisfactory performance. Particularly, in two-dimensional case, it gives better performance than the CF and SQ methods in modelling a set of real data. The proposed EM algorithm has some advantages over other methods studied in this work. Among them we mention: 1- it works for all ranges of parameters. 2- In the multivariate case, it can estimate all entries of dispersion matrix simultaneously so that the estimated dispersion matrix is always positive definite. 3- it motivates further researches for many other distributions where dividing by an auxiliary random variable in the CML step, simplifies the M step of the EM algorithm. This idea can be applied for exponential power distribution. 4- it outperforms the CF and SQ methods in estimating the location parameter when is near to one. 5- it can be developed for modelling the mixture of SαS distributions. This advantage of the proposed EM is a privilege as the evaluated ML, SQ and CF are not applicable to the mixture models. As possible future works, we are developing the proposed EM algorithm to estimate parameters of a stable, mixture of SαS, and mixture of sub-Gaussian SαS distributions.

Acknowledgments

The authors would like to thank the anonymous reviewers for their constructive comments that greatly improved the paper. The authors would also like to especially thank Prof. John P. Nolan for his suggestions and encouragements throughout this work. The source code written in ℜ package for implementing EM algorithm is freely available from corresponding author on request.

References

[1]RM Basso, VH Lachos, CRB Cabral, and P Ghosh, Robust mixture modeling based on scale mixtures of skew-normal distributions, Computational Statistics and Data Analysis, Vol. 54, 2010, pp. 2926-2941.
[2]M Broniatowski, G Celeux, and J Diebolt, Reconnaissance de Mélanges de densités par un algorithme d’apprentissage probabiliste, Data Analysis and Informatics, Vol. 3, 1983, pp. 359-374.
[3]G Celeux and J Diebolt, M algorithm: a probabilistic teacher algorithm derived from the EM algorithm for mixture problem, Computational Statistics Quarterly, Vol. 2, No. 1, 1985, pp. 73-82.
[4]G Celeux and J Diebolt, Une version de type recuit simulé de l’algorithme EM, Notes aux Comptes Rendus de l’Académie des Sciences, Vol. 310, 1990, pp. 119-124.
[5]JM Chambers, CL Mallows, and BW Stuck, A method for simulating stable random variables, Journal of the American Statistical Association, Vol. 71, No. 354, 1976, pp. 340-344.
[6]WS Cleveland, LOWESS: A program for smoothing scatterplots by robust locally weighted regression, Journal of the American Statistical Association, Vol. 35, 1981, pp. 54.
[7]AP Dempster, NM Laird, and DB Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, Series B, Vol. 39, 1977, pp. 1-38.
[8]Y Dominicy and D Veredas, The Method of Simulated Quantiles, Journal of Econometrics, Vol. 172, 2013, pp. 235-247.
[9]WH DuMouchel, Stable distributions in statistical inference, Univ. Ann Arbor, Univ. Microflms, Ann Arbor, Mich, 1971. Ph. D. Thesis
[10]W Feller, An Introduction to Probability Theory and its Applications, John Wiley, New York, Vol. 2, 1971.
[11]EHS Ip, A stochastic EM estimator for handling missing data. Unpublished Ph. D. Thesis, Department of Statistics, Stanford University, 1994.
[12]SM Kogon and DB Williams, Characteristic function based estimation of stable parameters, R Adler, R Feldman, and M Taqqu (editors), A Practical Guide to Heavy Tailed Data, Birkhäuser, Boston, 1988, pp. 311-338.
[14]C Liu and DB Rubin, E algorithm: A simple extension of EM and ECM with faster monotone convergence, Biometrika, Vol. 81, 1994, pp. 633-648.
[15]JH McCulloch, Simple consistent estimators of stable distribution parameters, Communications in Statistics-Simulation and Computation, Vol. 5, 1986, pp. 1109-1136.
[16]GJ McLachlan and T Krishnan, The EM Algorithm and Extensions, John Wiley, New York, 2008.
[17]SG Meintanis, Test for exponentiality against Weibull and gamma decreasing hazard rate alternatives, Kybernetika, Vol. 43, No. 3, 2007, pp. 307-314.
[18]XL Meng and DB Rubin, Maximum likelihood estimation via the ECM algorithm: A general framework, Biometrika, Vol. 80, 1993, pp. 267-278.
[19]S Mittnik and MS Paolella, Prediction of financial downside risk with heavy tailed conditional distributions, ST Rachev (editor), Handbook of Heavy Tailed Distributions in Finance, Elsevier Science, Amsterdam, 2003.
[21]JP Nolan, Maximum likelihood estimation of stable parameters, OE Barndorff-Nielsen, T Mikosch, and I Resnick (editors), Lévy Processes: Theory and Applications, Birkhöuser, Boston, 2001, pp. 379-400.
[22]JP Nolan, Modeling financial distributions with stable distributions, Volume 1 of Handbooks in Finance, Elsevier, Amsterdam, 2013, pp. 105-130. Chapter 3
[24]JP Nolan, Multivariate elliptically contoured stable distributions: theory and estimation, Computational Statistics, Vol. 28, 2013, pp. 2067-2089.
[25]JP Nolan, Stable Distributions-Models for Heavy Tailed Data, Birkhäuser, Boston, 2016. Unfinished manuscript, Chapter 1 online at academic2.american.edu/~jpnolan
[26]S Ortobelli, ST Rachev, and FJ Fabozzi, Risk management and dynamic portfolio selection with stable Paretian distributions, Journal of Empirical Finance, Vol. 17, No. 2, 2010, pp. 195-211.
[27]ST Rachev, Handbook of Heavy Tailed Distributions in Finance, Elsevier, Amsterdam, 2003.
[29]D Salas-Gonzalez, EE Kuruoglu, and DP Ruiz, Modelling with mixture of symmetric stable distributions using Gibbs sampling, Signal Processing, Vol. 90, 2010, pp. 774-783.
[30]G Samorodnitsky and MS Taqqu, Stable Non-Gaussian Random Processes: Stochastic Models and Infinite Variance, Chapman and Hall, London, 1994.
[31]VV Uchaikin and VM Zolotarev, Chance and Stability: Stable Distributions and Their Applications, VSP Press, Utrecht, 1999.
[32]VM Zolotarev, One-Dimensional Stable Distributions, American Mathematical Society, Providence, R. I., 1986.
Journal
Journal of Statistical Theory and Applications
Volume-Issue
17 - 3
Pages
439 - 461
Publication Date
2018/09/30
ISSN (Online)
2214-1766
ISSN (Print)
1538-7887
DOI
10.2991/jsta.2018.17.3.4How to use a DOI?
Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Mahdi Teimouri
AU  - Saeid Rezakhah
AU  - Adel Mohammadpour
PY  - 2018
DA  - 2018/09/30
TI  - Parameter Estimation Using the EM Algorithm for Symmetric Stable Random Variables and Sub-Gaussian Random Vectors
JO  - Journal of Statistical Theory and Applications
SP  - 439
EP  - 461
VL  - 17
IS  - 3
SN  - 2214-1766
UR  - https://doi.org/10.2991/jsta.2018.17.3.4
DO  - 10.2991/jsta.2018.17.3.4
ID  - Teimouri2018
ER  -