You are not Member of this Project.
Project Owner : Shyam.C
Created Date : Sat, 17/03/2012 - 11:01
Project Description :


In probability theory, the central limit theorem (CLT) states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed.

 The central limit theorem has a number of variants. In its common form, the random variables must be identically distributed. In variants, convergence of the mean to the normal distribution also occurs for non-identical distributions, given that they comply with certain conditions.

A simple example of the central limit theorem is rolling a large number of identical, biased dice. The distribution of the sum (or average) of the rolled numbers will be well approximated by a normal distribution. Since real-world quantities are often the balanced sum of many unobserved random events, the central limit theorem also provides a partial explanation for the prevalence of the normal probability distribution. It also justifies the approximation of large-sample statistics to the normal distribution in controlled experiments.


Central limit theorems for independent sequences

A distribution being "smoothed out" by summation, showing original density of distribution and three subsequent summations; see Illustration of the central limit theorem for further details.

Classical CLT

Let \{X_1,\ldots,X_n\} be a random sample of size n—that is, a sequence of independent and identically distributed random variables with expected values \mu and variances \sigma^2. Suppose we are interested in the behavior of the sample average of these random variables: S_n:=(X_1+\cdots+X_n)/n. Then the central limit theorem asserts that as n gets larger, the distribution of S_n approximates normal with mean µand variance 1nσ2. The true strength of the theorem is that Sn approaches normality regardless of the shapes of the distributions of individual Xi’s. Formally, the theorem can be stated as follows:

Lindeberg–Lévy CLT: suppose {Xi} is a sequence of iid random variables with E[Xi] = µ and Var[Xi] = σ2. Then as n approaches infinity, the random variable n(Sn − µ) converges in distribution to anormal N(0, σ2):

    \sqrt{n}\bigg(\bigg(\frac{1}{n}\sum_{i=1}^n X_i\bigg) - \mu\bigg)\ \xrightarrow{d}\ \mathcal{N}(0,\;\sigma^2).

Convergence in distribution means that the cumulative distribution function of n(Sn − µ) converges pointwise to the cdf of the N(0, σ2) distribution: for any real number z,

    \lim_{n\to\infty} \Pr[\sqrt{n}(S_n-\mu) \leq z] = \Phi(z/\sigma),

where Φ(x) is the standard normal cdf.

Lyapunov CLT

The theorem is named after a Russian mathematician Aleksandr Lyapunov. In this variant of the central limit theorem the random variables Xi have to be independent, but not necessarily identically distributed. The theorem also requires that random variables |Xi| have moments of some order (2 + δ), and that the rate of growth of these moments is limited by the Lyapunov condition given below.

Lyapunov CLT: let {Xi} be a sequence of independent random variables, each having a finite expected value μi and variance σ 2
. Define s 2
  = ∑n
i = 1
 σ 2
. If for some δ > 0, theLyapunov’s condition

     \lim_{n\to\infty} \frac{1}{s_{n}^{2+\delta}} \sum_{i=1}^{n} \operatorname{E}\big[\,|X_{i} - \mu_{i}|^{2+\delta}\,\big] = 0
is satisfied, then a sum of (Xi − μi)/sn converges in distribution to a standard normal random variable, as n goes to infinity:

     \frac{1}{s_n} \sum_{i=1}^{n} (X_i - \mu_i) \ \xrightarrow{d}\ \mathcal{N}(0,\;1).

In practice it is usually easiest to check the Lyapunov’s condition for δ = 1. If a sequence of random variables satisfies Lyapunov’s condition, then it also satisfies Lindeberg’s condition. The converse implication, however, does not hold.

Lindeberg CLT

In the same setting and with the same notation as above, the Lyapunov condition can be replaced with the following weaker one (from Lindeberg in 1920). For every ε > 0

  \lim_{n \to \infty} \frac{1}{s_n^2}\sum_{i = 1}^{n} \operatorname{E}\big[
   (X_i - \mu_i)^2 \cdot \mathbf{1}_{\{ | X_i - \mu_i | > \varepsilon s_n \}}
  \big] = 0

where 1{…} is the indicator function. Then the distribution of the standardized sums \frac{1}{s_n}\sum_{i = 1}^n \left( X_i - \mu_i \right) converges towards the standard normal distribution N(0,1).

Multidimensional CLT

Proofs that use characteristic functions can be extended to cases where each individual X_1, X_2, X_3, ..., X_n is an independent and identically distributed random vector in \mathbb{R}^k, with mean vector \mu=E\left ( X_i\right ) and covariance matrix Σ (amongst the individual components of the vector). Now, if we take the summations of these vectors as being done componentwise, then the multidimensional central limit theorem states that when scaled, these converge to a multivariate normal distribution.


\mathbf{X_i}=\begin{bmatrix} X_{i(1)} \\ \vdots \\ X_{i(k)} \end{bmatrix} be the i-vector. The bold in \mathbf{X_i} means that it is a random vector, not a random (univariate) variable.

Then the sum of the random vectors will be

\begin{bmatrix} X_{1(1)} \\ \vdots \\ X_{1(k)} \end{bmatrix}+\begin{bmatrix} X_{2(1)} \\ \vdots \\ X_{2(k)} \end{bmatrix}+...+\begin{bmatrix} X_{n(1)} \\ \vdots \\ X_{n(k)} \end{bmatrix}= \begin{bmatrix} \sum_{i=1}^{n} \left [ X_{i(1)} \right ] \\ \vdots \\ \sum_{i=1}^{n} \left [ X_{i(k)} \right ] \end{bmatrix} =\sum_{i=1}^{n} \left [ \mathbf{X_i} \right ]

and the average will be

\left (\frac{1}{n}\right)\sum_{i=1}^{n} \left [ \mathbf{X_i} \right ]= \frac{1}{n}\begin{bmatrix} \sum_{i=1}^{n} \left [ X_{i(1)} \right ] \\ \vdots \\ \sum_{i=1}^{n} \left [ X_{i(k)} \right ] \end{bmatrix}=\begin{bmatrix} \bar X_{i(1)} \\ \vdots \\ \bar X_{i(k)} \end{bmatrix}=\mathbf{\bar X_n}

and therefore

\frac{1}{\sqrt{n}} \sum_{i=1}^{n} \left [ \mathbf{X_i} - E\left ( X_i\right ) \right ]=\frac{1}{\sqrt{n}}\sum_{i=1}^{n} \left [ \mathbf{X_i} - \mu \right ]=\sqrt{n}\left(\mathbf{\overline{X}}_n - \mu\right) .

The multivariate central limit theorem states that

\sqrt{n}\left(\mathbf{\overline{X}}_n - \mu\right)\ \stackrel{D}{\rightarrow}\ \mathcal{N}_k(0,\Sigma)

where the covariance matrix \Sigma is equal to

{\color{Red}Var \left (X_{1(1)} \right)} & {\color{OliveGreen}Cov \left (X_{1(1)},X_{1(2)} \right)} & Cov \left (X_{1(1)},X_{1(3)} \right) & \cdots & Cov \left (X_{1(1)},X_{1(n)} \right) \\
{\color{OliveGreen}Cov \left (X_{1(2)},X_{1(1)} \right)} & {\color{Turquoise}Var \left (X_{1(2)} \right)} & {\color{RubineRed}Cov \left (X_{1(2)},X_{1(3)} \right)} & \cdots & Cov \left (X_{1(2)},X_{1(k)} \right) \\
Cov \left (X_{1(3)},X_{1(1)} \right) & {\color{RubineRed}Cov \left (X_{1(3)},X_{1(2)} \right)} & Var \left (X_{1(3)} \right) & \cdots & Cov \left (X_{1(3)},X_{1(k)} \right) \\ 
\vdots & \vdots & \vdots & \ddots & \vdots \\ 
Cov \left (X_{1(k)},X_{1(1)} \right) & Cov \left (X_{1(k)},X_{1(2)} \right) & Cov \left (X_{1(k)},X_{1(3)} \right) & \cdots & Var \left (X_{1(k)} \right) \\

Central limit theorems for dependent processes

CLT under weak dependence

A useful generalization of a sequence of independent, identically distributed random variables is a mixing random process in discrete time; "mixing" means, roughly, that random variables temporally far apart from one another are nearly independent. Several kinds of mixing are used in ergodic theory and probability theory. See especially strong mixing (also called α-mixing) defined by α(n) → 0 where α(n) is so-called strong mixing coefficient.

A simplified formulation of the central limit theorem under strong mixing is given in (Billingsley 1995, Theorem 27.4):

Theorem. Suppose that X1X2, … is stationary and α-mixing with αn = O(n−5) and that E(Xn) = 0 and E(Xn12) < ∞. Denote Sn = X1 + … + Xn, then the limit σ2 = lim nn − 1E(Sn2)exists, and if σ ≠ 0 then  S_n / (\sigma \sqrt n)  converges in distribution to N(0, 1).

In fact, σ2 = E(X12) + 2∑k=1E(X1X1+k), where the series converges absolutely.

The assumption σ ≠ 0 cannot be omitted, since the asymptotic normality fails for Xn = Yn − Yn−1 where Yn are another stationary sequence.

For the theorem in full strength see (Durrett 1996, Sect. 7.7(c), Theorem (7.8)); the assumption E(Xn12) < ∞ is replaced with E(|Xn|2 + δ) < ∞, and the assumption αn = O(n − 5) is replaced with  \sum_n \alpha_n^{\frac\delta{2(2+\delta)}} < \infty.  Existence of such δ > 0 ensures the conclusion. For encyclopedic treatment of limit theorems under mixing conditions see (Bradley 2005).

Martingale difference CLT

Theorem. Let a martingale Mn satisfy

  •  \frac1n \sum_{k=1}^n \mathrm{E} ((M_k-M_{k-1})^2 | M_1,\dots,M_{k-1}) \to 1    in probability as n tends to infinity,
  • for every ε > 0,    \frac1n \sum_{k=1}^n \mathrm{E} \Big( (M_k-M_{k-1})^2; |M_k-M_{k-1}| > \varepsilon \sqrt n \Big) \to 0    as n tends to infinity,

then  M_n / \sqrt n  converges in distribution to N(0,1) as n tends to infinity.

See (Durrett 1996, Sect. 7.7, Theorem (7.4)) or (Billingsley 1995, Theorem 35.12).

Caution: The restricted expectation E(XA) should not be confused with the conditional expectation E(X|A) = E(XA)/P(A).


Proof of classical CLT

For a theorem of such fundamental importance to statistics and applied probability, the central limit theorem has a remarkably simple proof using characteristic functions. It is similar to the proof of a (weak) law of large numbers. For any random variable, Y, with zero mean and a unit variance (var(Y) = 1), the characteristic function of Y is, by Taylor's theorem,

\varphi_Y(t) = 1 - {t^2 \over 2} + o(t^2), \quad t \rightarrow 0

where o (t2 ) is "little o notation" for some function of t  that goes to zero more rapidly than t2. Letting Yi be (Xi − μ)/σ, the standardized value of Xi, it is easy to see that the standardized mean of the observations X1X2, ..., Xn is

Z_n = \frac{n\overline{X}_n-n\mu}{\sigma \sqrt{n}} = \sum_{i=1}^n {Y_i \over \sqrt{n}}.

By simple properties of characteristic functions, the characteristic function of Zn is

\left[\varphi_Y\left({t \over \sqrt{n}}\right)\right]^n = \left[ 1 - {t^2
\over 2n} + o\left({t^2 \over n}\right) \right]^n \, \rightarrow \, e^{-t^2/2}, \quad n \rightarrow \infty.

But this limit is just the characteristic function of a standard normal distribution N(0, 1), and the central limit theorem follows from the Lévy continuity theorem, which confirms that the convergence of characteristic functions implies convergence in distribution.

Convergence to the limit

The central limit theorem gives only an asymptotic distribution. As an approximation for a finite number of observations, it provides a reasonable approximation only when close to the peak of the normal distribution; it requires a very large number of observations to stretch into the tails.

If the third central moment E((X1 − μ)3) exists and is finite, then the above convergence is uniform and the speed of convergence is at least on the order of 1/n1/2 (see Berry-Esseen theorem).

The convergence to the normal distribution is monotonic, in the sense that the entropy of Zn increases monotonically to that of the normal distribution, as proven in Artstein, Ball, Barthe and Naor (2004).

The central limit theorem applies in particular to sums of independent and identically distributed discrete random variables. A sum of discrete random variables is still a discrete random variable, so that we are confronted with a sequence of discrete random variables whose cumulative probability distribution function converges towards a cumulative probability distribution function corresponding to a continuous variable (namely that of the normal distribution). This means that if we build a histogram of the realisations of the sum of n independent identical discrete variables, the curve that joins the centers of the upper faces of the rectangles forming the histogram converges toward a Gaussian curve as n approaches infinity, this relation is known as de Moivre–Laplace theorem. The binomial distribution article details such an application of the central limit theorem in the simple case of a discrete variable taking only two possible values.

Relation to the law of large numbers

The law of large numbers as well as the central limit theorem are partial solutions to a general problem: "What is the limiting behavior of Sn as n approaches infinity?" In mathematical analysis, asymptotic series are one of the most popular tools employed to approach such questions.

Suppose we have an asymptotic expansion of ƒ(n):

f(n)= a_1 \varphi_{1}(n)+a_2 \varphi_{2}(n)+O(\varphi_{3}(n)) \qquad  (n \rightarrow \infty).

Dividing both parts by φ1(n) and taking the limit will produce a1, the coefficient of the highest-order term in the expansion, which represents the rate at which ƒ(n) changes in its leading term.


Informally, one can say: "ƒ(n) grows approximately as a1 φ(n)". Taking the difference between ƒ(n) and its approximation and then dividing by the next term in the expansion, we arrive at a more refined statement about ƒ(n):

\lim_{n\to\infty}\frac{f(n)-a_1 \varphi_{1}(n)}{\varphi_{2}(n)}=a_2

Here one can say that the difference between the function and its approximation grows approximately as a2 φ2(n). The idea is that dividing the function by appropriate normalizing functions, and looking at the limiting behavior of the result, can tell us much about the limiting behavior of the original function itself.

Informally, something along these lines is happening when the sum, Sn, of independent identically distributed random variables, X1, ..., Xn, is studied in classical probability theory. If each Xi has finite mean μ, then by the law of large numbers, Sn/n → μ.[6] If in addition each Xi has finite variance σ2, then by the central limit theorem,

 \frac{S_n-n\mu}{\sqrt{n}} \rightarrow \xi

where ξ is distributed as N(0, σ2). This provides values of the first two constants in the informal expansion

S_n \approx \mu n+\xi \sqrt{n}. \,

In the case where the Xi's do not have finite mean or variance, convergence of the shifted and rescaled sum can also occur with different centering and scaling factors:

\frac{S_n-a_n}{b_n} \rightarrow \Xi,

or informally

S_n \approx a_n+\Xi b_n. \,

Distributions Ξ which can arise in this way are called stable.

 Clearly, the normal distribution is stable, but there are also other stable distributions, such as the Cauchy distribution, for which the mean or variance are not defined. The scaling factor bn may be proportional to nc, for any c ≥ 1/2; it may also be multiplied by a slowly varying function of n.

The law of the iterated logarithm tells us what is happening "in between" the law of large numbers and the central limit theorem. Specifically it says that the normalizing function  \sqrt{n\log\log n}  intermediate in size between n of the law of large numbers and √n of the central limit theorem provides a non-trivial limiting behavior.

You are not authorized to access this content.
You are not authorized to access this content.
You are not authorized to access this content.
You are not authorized to access this content.
You are not authorized to access this content.