The Central Limit Theorem

We are now ready to ascend the mountain of proving the central limit theorem, which we will take inspiration from this paper by Calvin Wooyoung Chin.

Lemma 1. Let X be a real-valued random variable with finite expectation. For any \delta > 0,

\displaystyle \mathbb P(|X| \geq \delta) \leq \frac{\mathbb E[|X|]}{\delta}.

This is called Markov’s inequality. Furthermore, if X has finite variance, then

\displaystyle \mathbb P(|X - \mathbb E[X]| \geq \delta ) \leq \frac{\mathrm{Var}(X)}{\delta^2}.

This is called Chebychev’s inequality.

Proof. For the first identity,

\begin{aligned} \mathbb E[|X|] &= \mathbb E[|X| \cdot \mathbb I_{[\delta, \infty)}] + \mathbb E[|X| \cdot \mathbb I_{(-\infty, \delta)}] \\ &\geq \mathbb E[\delta \cdot \mathbb I_{[\delta, \infty)}] + 0\\ &= \delta \cdot \mathbb E[\mathbb I_{[\delta, \infty)} ] \\ &= \delta \cdot \mathbb P(|X| \geq \delta). \end{aligned}

Dividing by \delta yields the desired result. For Chebychev’s inequality, apply Markov’s inequality to the random variable (X - \mathbb E[X])^2 to get

\begin{aligned} \mathbb P(|X - \mathbb E[X]| \geq \delta) &= \mathbb P( (X - \mathbb E[X])^2 \geq \delta^2 ) \\ &= \frac{\mathbb E[(X - \mathbb E[X])^2]}{\delta^2} \\ &\leq \frac{\mathrm{Var}(X)}{\delta^2}. \end{aligned}

Chebychev’s inequality is also responsible for our intuition about probability arising from repeated experiments.

Corollary 1. Fix K \subseteq \Omega and denote p:= \mathbb P(K). Define the i.i.d. random variables \xi_i by \xi_i = \mathbb I_K, so that \xi_i \sim \mathrm{Ber}(p). Define

\displaystyle \bar \xi_n := \frac 1n \cdot \sum_{i=1}^n \xi_i.

Then for any \delta > 0, \mathbb P(|\bar \xi_n - p| > \delta) \to 0. In this case, we say that \bar \xi_n \to p in probability.

Proof. Fix \delta > 0. Since \mathbb E[\xi_n] = p and \mathrm{Var}(\xi_n) = p(1-p),

\displaystyle \mathbb E[\bar \xi_n] = p,\quad \mathrm{Var}(\bar \xi_n) = \frac{p(1-p)}{n}.

By Chebychev’s inequality,

\begin{aligned} \mathbb P(|\bar \xi_n - p| > \delta) &\leq \frac{\mathrm{Var}(\bar \xi_n)}{\delta^2} = \frac{p(1-p)}{n \cdot\delta^2}. \end{aligned}

Taking n \to \infty, \mathbb P(|\bar \xi_n - p| > \delta) \to 0.

Denote Z \sim \mathcal N(0, 1) and its c.d.f. by \Phi.

Lemma 2. Let X_1, X_2,\dots be real-valued random variables. Suppose for any thrice-differentiable f : \mathbb R \to \mathbb R such that f, f', f'', f''' are bounded,

\displaystyle \lim_{n \to \infty} \mathbb E[f(X_n)] = \mathbb E[f(Z)].

Then for any z \in \mathbb R,

\displaystyle \lim_{n \to \infty} \mathbb P(X_n \leq z) = \Phi(z),\quad z \in \mathbb R.

Proof. Fix z \in \mathbb R and \epsilon > 0. Since the p.d.f. of Z is continuous, there exists \delta > 0 such that

|\Phi(z \pm \delta) - \Phi(z)| < \epsilon.

In particular, \Phi(z+\delta) < \Phi(z) < \Phi(z-\delta) + \epsilon. Define thrice-differentiable bounded functions f, F : \mathbb R \to [0, 1] with bounded first, second, and third derivatives such that

f|_{(-\infty, z-\delta]} = 1,\quad  f|_{[z, \infty)} = 0,\quad  F|_{(-\infty, z]} = 1,\quad  F|_{[z+\delta, \infty)} = 0.

By hypothesis,

\displaystyle \lim_{n \to \infty} \mathbb E[f(X_n)] = \mathbb P(Z \leq z -\delta) = \Phi(z-\delta) \geq \Phi(z) - \epsilon.

Similarly, \mathbb E[F(X_n)] < \Phi(z) + \epsilon. Since f \leq \mathbb I_{(-\infty, z]} \leq F point-wise, for sufficiently large n,

\Phi(z) - \epsilon < \mathbb E[f(X_n)] \leq \underbrace{ \mathbb E[ \mathbb I_{(-\infty, z]} ] }_{ \mathbb P(X_n \leq z)} \leq \mathbb E[F(X_n)] < \Phi(z) + \epsilon,

so that \displaystyle \lim_{n \to \infty} \mathbb P(X_n \leq z) = \Phi(z), as required.

Theorem 1 (Central Limit Theorem). Let X_1,X_2,\dots be i.i.d. with mean 0 and variance 1. Then for any z \in \mathbb R,

\displaystyle \lim_{n \to \infty} \mathbb P(\sqrt n \cdot \bar X_n \leq x) = \Phi(z).

In this case, we say that \sqrt n \cdot \bar X_n \to Z in distribution.

Proof. Fix i.i.d. Y_1,Y_2,\dots \sim \mathcal N(0, 1) that are independent of X_1,X_2,\dots. For each n \in \mathbb N and 0 \leq i \leq n, define

\displaystyle Z_{n,i} := \frac 1{\sqrt{n}} \cdot \left( \sum_{j=1}^i X_j + \sum_{j=i+1}^n Y_j \right),\quad S_{n,i} := Z_{n,i} - \frac 1{\sqrt{n}} \cdot X_i.

By direct computations, Z_{n,n} = \sqrt{n} \cdot \bar X_n and Z_{n,0} = Z \sim \mathcal N(0, 1). The key idea is using Lemma 2 to conclude our proof. Fix any thrice-differentiable f : \mathbb R \to \mathbb R such that f, f', f'', f''' are bounded. We claim that

\displaystyle \lim_{n \to \infty} \mathbb E[f(Z_{n,n})] = \mathbb E[f(Z_{n,0})].

Fix \epsilon > 0. We aim to find N \in \mathbb N such that for n > N,

\displaystyle |\mathbb E[f(Z_{n,n})] - \mathbb E[f(Z_{n,0})]| < \epsilon.

Applying Taylor’s theorem for the interval containing S_{n,i} and Z_{n,i}, there exists C_{n,i} between them such that

\displaystyle f(Z_{n_i}) = f(S_{n,i}) + f'(S_{n,i}) \cdot (Z_{n,i} - S_{n,i}) + \frac{f''(C_{n,i})}{2!} \cdot (Z_{n,i} - S_{n,i})^2.

Performing algebra yields

\displaystyle f(Z_{n,i}) - f(S_{n,i}) - f'(S_{n,i}) \cdot \frac{X_i}{\sqrt n}  - f''(S_{n,i}) \cdot \frac{X_i^2}{2 n} = \underbrace{ \frac{ ( f''(C_{n,i}) - f''(S_{n,i}) ) X_i^2}{2n} }_{R_{n,i}}.

Since f''' is bounded, there exists \delta > 0 such that

\delta \cdot \| f''' \|_\infty \leq \epsilon.

By the mean value theorem, whenever |x - y| \leq \delta,

|f''(x) - f''(y)| \leq \| f''' \|_\infty \cdot |x - y| \leq \delta \cdot \| f''' \|_\infty \leq \epsilon.

Consider the “good event” G := \{ |X_i| \leq \delta \sqrt{n} \} and its complement

G^c := \Omega \backslash G = \{ |X_i| > \delta \sqrt{n} \}.

By our bound on the second derivative via the mean value theorem, we can bound R_{n,i} in the “good event” G by

\displaystyle |f''(C_{n,i}) - f''(S_{n,i})| \leq \epsilon \quad \Rightarrow \quad \mathbb E[|R_{n,i}| \cdot \mathbb I_G] \leq \frac{\epsilon}{2n} \cdot \mathbb E[X_i^2] = \frac{\epsilon}{2n}.

On the other hand, in the “bad event” G^c, since \| f'' \|_\infty \leq M for some M > 0,

\displaystyle \mathbb E[|R_{n,i}| \cdot \mathbb I_{G^c}] \leq \frac{M}{n} \cdot \mathbb E[X_i^2 \cdot \mathbb I_{G^c}] \leq \frac Mn \cdot \mathbb E[\mathbb I_{G^c}].

By Chebychev’s inequality,

\begin{aligned}\mathbb E[\mathbb I_{G^c}] &= \mathbb P(|X_i| \geq \delta \sqrt n) \leq \frac{1}{\delta \cdot \sqrt{n}^2} = \frac{1}{n \delta}.\end{aligned}

Therefore,

\displaystyle \mathbb E[|R_{n,i}|] = \mathbb E[|R_{n,i}| \cdot \mathbb I_G] + \mathbb E[|R_{n,i}| \cdot \mathbb I_{G^c}] \leq \left( \frac{\epsilon}{2} + \frac{M}{n\delta} \right) \cdot \frac 1n.

Thus, selecting N > 2M/(\epsilon \delta) yields \mathbb E[|R_{n,i}|] \leq \epsilon/n. Plugging in the left-hand side of R_{n,i},

\displaystyle |\mathbb E[f(Z_{n,i})] - \mathbb E[f(S_{n,i})] - \mathbb E[f''(S_{n,i}) ] / 2n |\leq \frac{\epsilon}{n}.

The argument remains almost unchanged with Z_{n,i} replaced with Z_{n,i-1}, so that we have the bound

\displaystyle |\mathbb E[f(Z_{n,i-1})] - \mathbb E[f(S_{n,i})] - \mathbb E[f''(S_{n,i}) ] / 2n |\leq \frac{\epsilon}{n}.

Applying the triangle inequality,

\displaystyle |\mathbb E[f(Z_{n,i})] - \mathbb E [f(Z_{n,i-1})]|  \leq \frac{\epsilon}{n} + \frac{\epsilon}{n} = \frac{2\epsilon}{n}.

Finally, applying a telescoping series,

\begin{aligned} |\mathbb E[f(Z_{n,n})] - \mathbb E [f(Z_{n,0})]| &\leq \sum_{i=1}^n |\mathbb E[f(Z_{n,i})] - \mathbb E [f(Z_{n,i-1})]| \leq \sum_{i=1}^n \frac{2\epsilon}n = 2\epsilon. \end{aligned}

Replace \epsilon with \epsilon/3 to complete the argument.

With that, we are done with probability…for now. There are many directions we can take from this point on. We could have proven this result using techniques involving characteristic functions or even Brownian motion, but those topics will require their entire blog sections in order to properly discuss. We conclude with the famous normal approximation to the binomial and Poisson distributions.

Corollary 1. For X_n \sim \mathrm{Bin}(n, p),

\displaystyle  \lim_{n \to \infty} \mathbb P \left( \frac{X_n - np}{\sqrt{np(1-p)}} \leq z \right) = \Phi(z).

This is the normal approximation to the binomial distribution.

Proof. For each n, write X_n := \sum_{i=1}^n \xi_i for i.i.d. \xi_i \sim \mathrm{Ber}(p) and

\displaystyle Y_n := \frac{ X_n - np }{ \sqrt{p(1-p)} } .

Then

\displaystyle \bar Y_n := \frac 1n \cdot Y_n = \frac 1n \cdot \sum_{i=1}^n \frac{ \xi_i - p }{\sqrt{p(1-p)}} = \frac{X_n - np}{n \sqrt{p(1-p)}}.

Since the (\xi_i - p)/\sqrt{p(1-p)} are i.i.d. with mean 0 and variance 1, by the central limit theorem,

\displaystyle  \lim_{n \to \infty} \mathbb P \left( \frac{X_n - np}{\sqrt{np(1-p)}} \leq z \right) = \lim_{n \to \infty} \mathbb P ( \sqrt n \cdot \bar Y_n \leq z) = \Phi(z).

Corollary 2. We have

\displaystyle \lim_{\lambda \to \infty} \mathbb P \left( \frac{X - \lambda}{\sqrt{\lambda}} \leq z \right) = \Phi(z), \quad X \sim \mathrm{Pois}(\lambda).

This is the normal approximation to the Poisson distribution.

Proof. Assume \lambda \in \mathbb N^+. Write X_{\lambda} := \sum_{i=1}^{\lambda} \xi_i for i.i.d. \xi_i \sim \mathrm{Pois}(1) and

\displaystyle Y_\lambda := X_{\lambda} - \lambda .

Then

\displaystyle \bar Y_{\lambda}  = \frac{X_{\lambda} - \lambda}{\lambda } = \frac 1{\lambda} \cdot \sum_{i=1}^{\lambda} (\xi_i - 1).

Since the \xi_i - 1 are i.i.d. with mean 0 and variance 1, by the central limit theorem,

\displaystyle \lim_{\lambda \to \infty} \mathbb P \left( \frac{X_{\lambda} - \lambda}{ \sqrt{\lambda} }  \leq z \right) = \lim_{n \to \infty} \mathbb P ( \sqrt{ \lambda } \cdot \bar Y_\lambda \leq z) = \Phi(z).

—Joel Kindiak, 31 Jul 25, 1359H

,

Published by


Leave a comment