CENTRAL LIMIT THEOREM
In probability theory, the central limit theorem (CLT) states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed.
The central limit theorem has a number of variants. In its common form, the random variables must be identically distributed. In variants, convergence of the mean to the normal distribution also occurs for nonidentical distributions, given that they comply with certain conditions.
A simple example of the central limit theorem is rolling a large number of identical, biased dice. The distribution of the sum (or average) of the rolled numbers will be well approximated by a normal distribution. Since realworld quantities are often the balanced sum of many unobserved random events, the central limit theorem also provides a partial explanation for the prevalence of the normal probability distribution. It also justifies the approximation of largesample statistics to the normal distribution in controlled experiments.
Central limit theorems for independent sequences
Classical CLT
Let be a random sample of size —that is, a sequence of independent and identically distributed random variables with expected values and variances . Suppose we are interested in the behavior of the sample average of these random variables: . Then the central limit theorem asserts that as gets larger, the distribution of approximates normal with mean µand variance 1nσ^{2}. The true strength of the theorem is that S_{n} approaches normality regardless of the shapes of the distributions of individual X_{i}’s. Formally, the theorem can be stated as follows:

Lindeberg–Lévy CLT: suppose {X_{i}} is a sequence of iid random variables with E[X_{i}] = µ and Var[X_{i}] = σ^{2}. Then as n approaches infinity, the random variable √n(S_{n} − µ) converges in distribution to anormal N(0, σ^{2}):
Convergence in distribution means that the cumulative distribution function of √n(S_{n} − µ) converges pointwise to the cdf of the N(0, σ^{2}) distribution: for any real number z,
where Φ(x) is the standard normal cdf.
Lyapunov CLT
The theorem is named after a Russian mathematician Aleksandr Lyapunov. In this variant of the central limit theorem the random variables X_{i} have to be independent, but not necessarily identically distributed. The theorem also requires that random variables X_{i} have moments of some order (2 + δ), and that the rate of growth of these moments is limited by the Lyapunov condition given below.

Lyapunov CLT: let {X_{i}} be a sequence of independent random variables, each having a finite expected value μ_{i} and variance σ 2
i . Define s 2
n = ∑n
i = 1 σ 2
i . If for some δ > 0, theLyapunov’s condition 
is satisfied, then a sum of (X_{i} − μ_{i})/s_{n} converges in distribution to a standard normal random variable, as n goes to infinity:
In practice it is usually easiest to check the Lyapunov’s condition for δ = 1. If a sequence of random variables satisfies Lyapunov’s condition, then it also satisfies Lindeberg’s condition. The converse implication, however, does not hold.
Lindeberg CLT
In the same setting and with the same notation as above, the Lyapunov condition can be replaced with the following weaker one (from Lindeberg in 1920). For every ε > 0
where 1_{{…}} is the indicator function. Then the distribution of the standardized sums converges towards the standard normal distribution N(0,1).
Multidimensional CLT
Proofs that use characteristic functions can be extended to cases where each individual is an independent and identically distributed random vector in , with mean vector and covariance matrix Σ (amongst the individual components of the vector). Now, if we take the summations of these vectors as being done componentwise, then the multidimensional central limit theorem states that when scaled, these converge to a multivariate normal distribution.
Let
 be the ivector. The bold in means that it is a random vector, not a random (univariate) variable.
Then the sum of the random vectors will be
and the average will be
and therefore
 .
The multivariate central limit theorem states that
where the covariance matrix is equal to
Central limit theorems for dependent processes
CLT under weak dependence
A useful generalization of a sequence of independent, identically distributed random variables is a mixing random process in discrete time; "mixing" means, roughly, that random variables temporally far apart from one another are nearly independent. Several kinds of mixing are used in ergodic theory and probability theory. See especially strong mixing (also called αmixing) defined by α(n) → 0 where α(n) is socalled strong mixing coefficient.
A simplified formulation of the central limit theorem under strong mixing is given in (Billingsley 1995, Theorem 27.4):
Theorem. Suppose that X_{1}, X_{2}, … is stationary and αmixing with α_{n} = O(n^{−5}) and that E(X_{n}) = 0 and E(X_{n}^{12}) < ∞. Denote S_{n} = X_{1} + … + X_{n}, then the limit σ^{2} = lim _{n}n ^{− 1}E(S_{n}^{2})exists, and if σ ≠ 0 then converges in distribution to N(0, 1).
In fact, σ^{2} = E(X_{1}^{2}) + 2∑_{k=1}^{∞}E(X_{1}X_{1+k}), where the series converges absolutely.
The assumption σ ≠ 0 cannot be omitted, since the asymptotic normality fails for X_{n} = Y_{n} − Y_{n−1} where Y_{n} are another stationary sequence.
For the theorem in full strength see (Durrett 1996, Sect. 7.7(c), Theorem (7.8)); the assumption E(X_{n}^{12}) < ∞ is replaced with E(X_{n}^{2 + δ}) < ∞, and the assumption α_{n} = O(n ^{− 5}) is replaced with Existence of such δ > 0 ensures the conclusion. For encyclopedic treatment of limit theorems under mixing conditions see (Bradley 2005).
Martingale difference CLT
Theorem. Let a martingale M_{n} satisfy
 in probability as n tends to infinity,
 for every ε > 0, as n tends to infinity,
then converges in distribution to N(0,1) as n tends to infinity.
See (Durrett 1996, Sect. 7.7, Theorem (7.4)) or (Billingsley 1995, Theorem 35.12).
Caution: The restricted expectation E(X; A) should not be confused with the conditional expectation E(XA) = E(X; A)/P(A).
Remarks
Proof of classical CLT
For a theorem of such fundamental importance to statistics and applied probability, the central limit theorem has a remarkably simple proof using characteristic functions. It is similar to the proof of a (weak) law of large numbers. For any random variable, Y, with zero mean and a unit variance (var(Y) = 1), the characteristic function of Y is, by Taylor's theorem,
where o (t^{2} ) is "little o notation" for some function of t that goes to zero more rapidly than t^{2}. Letting Y_{i} be (X_{i} − μ)/σ, the standardized value of X_{i}, it is easy to see that the standardized mean of the observations X_{1}, X_{2}, ..., X_{n} is
By simple properties of characteristic functions, the characteristic function of Z_{n} is
But this limit is just the characteristic function of a standard normal distribution N(0, 1), and the central limit theorem follows from the Lévy continuity theorem, which confirms that the convergence of characteristic functions implies convergence in distribution.
Convergence to the limit
The central limit theorem gives only an asymptotic distribution. As an approximation for a finite number of observations, it provides a reasonable approximation only when close to the peak of the normal distribution; it requires a very large number of observations to stretch into the tails.
If the third central moment E((X_{1} − μ)^{3}) exists and is finite, then the above convergence is uniform and the speed of convergence is at least on the order of 1/n^{1/2} (see BerryEsseen theorem).
The convergence to the normal distribution is monotonic, in the sense that the entropy of Z_{n} increases monotonically to that of the normal distribution, as proven in Artstein, Ball, Barthe and Naor (2004).
The central limit theorem applies in particular to sums of independent and identically distributed discrete random variables. A sum of discrete random variables is still a discrete random variable, so that we are confronted with a sequence of discrete random variables whose cumulative probability distribution function converges towards a cumulative probability distribution function corresponding to a continuous variable (namely that of the normal distribution). This means that if we build a histogram of the realisations of the sum of n independent identical discrete variables, the curve that joins the centers of the upper faces of the rectangles forming the histogram converges toward a Gaussian curve as n approaches infinity, this relation is known as de Moivre–Laplace theorem. The binomial distribution article details such an application of the central limit theorem in the simple case of a discrete variable taking only two possible values.
Relation to the law of large numbers
The law of large numbers as well as the central limit theorem are partial solutions to a general problem: "What is the limiting behavior of S_{n} as n approaches infinity?" In mathematical analysis, asymptotic series are one of the most popular tools employed to approach such questions.
Suppose we have an asymptotic expansion of ƒ(n):
Dividing both parts by φ_{1}(n) and taking the limit will produce a_{1}, the coefficient of the highestorder term in the expansion, which represents the rate at which ƒ(n) changes in its leading term.
Informally, one can say: "ƒ(n) grows approximately as a_{1} φ(n)". Taking the difference between ƒ(n) and its approximation and then dividing by the next term in the expansion, we arrive at a more refined statement about ƒ(n):
Here one can say that the difference between the function and its approximation grows approximately as a_{2} φ_{2}(n). The idea is that dividing the function by appropriate normalizing functions, and looking at the limiting behavior of the result, can tell us much about the limiting behavior of the original function itself.
Informally, something along these lines is happening when the sum, S_{n}, of independent identically distributed random variables, X_{1}, ..., X_{n}, is studied in classical probability theory. If each X_{i} has finite mean μ, then by the law of large numbers, S_{n}/n → μ.^{[6]} If in addition each X_{i} has finite variance σ^{2}, then by the central limit theorem,
where ξ is distributed as N(0, σ^{2}). This provides values of the first two constants in the informal expansion
In the case where the X_{i}'s do not have finite mean or variance, convergence of the shifted and rescaled sum can also occur with different centering and scaling factors:
or informally
Distributions Ξ which can arise in this way are called stable.
Clearly, the normal distribution is stable, but there are also other stable distributions, such as the Cauchy distribution, for which the mean or variance are not defined. The scaling factor b_{n} may be proportional to n^{c}, for any c ≥ 1/2; it may also be multiplied by a slowly varying function of n.
The law of the iterated logarithm tells us what is happening "in between" the law of large numbers and the central limit theorem. Specifically it says that the normalizing function intermediate in size between n of the law of large numbers and √n of the central limit theorem provides a nontrivial limiting behavior.