The Central Limit Theorem

We are now ready to ascend the mountain of proving the central limit theorem, which we will take inspiration from this paper by Calvin Wooyoung Chin.

Lemma 1. Let $X$ be a real-valued random variable with finite expectation. For any $\delta > 0$ ,

$\displaystyle \mathbb P(|X| \geq \delta) \leq \frac{\mathbb E[|X|]}{\delta}.$

This is called Markov’s inequality. Furthermore, if $X$ has finite variance, then

$\displaystyle \mathbb P(|X - \mathbb E[X]| \geq \delta ) \leq \frac{\mathrm{Var}(X)}{\delta^2}.$

This is called Chebychev’s inequality.

Proof. For the first identity,

$\begin{aligned} \mathbb E[|X|] &= \mathbb E[|X| \cdot \mathbb I_{[\delta, \infty)}] + \mathbb E[|X| \cdot \mathbb I_{(-\infty, \delta)}] \\ &\geq \mathbb E[\delta \cdot \mathbb I_{[\delta, \infty)}] + 0\\ &= \delta \cdot \mathbb E[\mathbb I_{[\delta, \infty)} ] \\ &= \delta \cdot \mathbb P(|X| \geq \delta). \end{aligned}$

Dividing by $\delta$ yields the desired result. For Chebychev’s inequality, apply Markov’s inequality to the random variable $(X - \mathbb E[X])^2$ to get

$\begin{aligned} \mathbb P(|X - \mathbb E[X]| \geq \delta) &= \mathbb P( (X - \mathbb E[X])^2 \geq \delta^2 ) \\ &= \frac{\mathbb E[(X - \mathbb E[X])^2]}{\delta^2} \\ &\leq \frac{\mathrm{Var}(X)}{\delta^2}. \end{aligned}$

Chebychev’s inequality is also responsible for our intuition about probability arising from repeated experiments.

Corollary 1. Fix $K \subseteq \Omega$ and denote $p:= \mathbb P(K)$ . Define the i.i.d. random variables $\xi_i$ by $\xi_i = \mathbb I_K$ , so that $\xi_i \sim \mathrm{Ber}(p)$ . Define

$\displaystyle \bar \xi_n := \frac 1n \cdot \sum_{i=1}^n \xi_i.$

Then for any $\delta > 0$ , $\mathbb P(|\bar \xi_n - p| > \delta) \to 0$ . In this case, we say that $\bar \xi_n \to p$ in probability.

Proof. Fix $\delta > 0$ . Since $\mathbb E[\xi_n] = p$ and $\mathrm{Var}(\xi_n) = p(1-p)$ ,

$\displaystyle \mathbb E[\bar \xi_n] = p,\quad \mathrm{Var}(\bar \xi_n) = \frac{p(1-p)}{n}.$

By Chebychev’s inequality,

$\begin{aligned} \mathbb P(|\bar \xi_n - p| > \delta) &\leq \frac{\mathrm{Var}(\bar \xi_n)}{\delta^2} = \frac{p(1-p)}{n \cdot\delta^2}. \end{aligned}$

Taking $n \to \infty$ , $\mathbb P(|\bar \xi_n - p| > \delta) \to 0$ .

Denote $Z \sim \mathcal N(0, 1)$ and its c.d.f. by $\Phi$ .

Lemma 2. Let $X_1, X_2,\dots$ be real-valued random variables. Suppose for any thrice-differentiable $f : \mathbb R \to \mathbb R$ such that $f, f', f'', f'''$ are bounded,

$\displaystyle \lim_{n \to \infty} \mathbb E[f(X_n)] = \mathbb E[f(Z)].$

Then for any $z \in \mathbb R$ ,

$\displaystyle \lim_{n \to \infty} \mathbb P(X_n \leq z) = \Phi(z),\quad z \in \mathbb R.$

Proof. Fix $z \in \mathbb R$ and $\epsilon > 0$ . Since the p.d.f. of $Z$ is continuous, there exists $\delta > 0$ such that

$|\Phi(z \pm \delta) - \Phi(z)| < \epsilon.$

In particular, $\Phi(z+\delta) < \Phi(z) < \Phi(z-\delta) + \epsilon$ . Define thrice-differentiable bounded functions $f, F : \mathbb R \to [0, 1]$ with bounded first, second, and third derivatives such that

$f|_{(-\infty, z-\delta]} = 1,\quad f|_{[z, \infty)} = 0,\quad F|_{(-\infty, z]} = 1,\quad F|_{[z+\delta, \infty)} = 0.$

By hypothesis,

$\displaystyle \lim_{n \to \infty} \mathbb E[f(X_n)] = \mathbb P(Z \leq z -\delta) = \Phi(z-\delta) \geq \Phi(z) - \epsilon.$

Similarly, $\mathbb E[F(X_n)] < \Phi(z) + \epsilon$ . Since $f \leq \mathbb I_{(-\infty, z]} \leq F$ point-wise, for sufficiently large $n$ ,

$\Phi(z) - \epsilon < \mathbb E[f(X_n)] \leq \underbrace{ \mathbb E[ \mathbb I_{(-\infty, z]} ] }_{ \mathbb P(X_n \leq z)} \leq \mathbb E[F(X_n)] < \Phi(z) + \epsilon,$

so that $\displaystyle \lim_{n \to \infty} \mathbb P(X_n \leq z) = \Phi(z)$ , as required.

Theorem 1 (Central Limit Theorem). Let $X_1,X_2,\dots$ be i.i.d. with mean $0$ and variance $1$ . Then for any $z \in \mathbb R$ ,

$\displaystyle \lim_{n \to \infty} \mathbb P(\sqrt n \cdot \bar X_n \leq x) = \Phi(z).$

In this case, we say that $\sqrt n \cdot \bar X_n \to Z$ in distribution.

Proof. Fix i.i.d. $Y_1,Y_2,\dots \sim \mathcal N(0, 1)$ that are independent of $X_1,X_2,\dots$ . For each $n \in \mathbb N$ and $0 \leq i \leq n$ , define

$\displaystyle Z_{n,i} := \frac 1{\sqrt{n}} \cdot \left( \sum_{j=1}^i X_j + \sum_{j=i+1}^n Y_j \right),\quad S_{n,i} := Z_{n,i} - \frac 1{\sqrt{n}} \cdot X_i.$

By direct computations, $Z_{n,n} = \sqrt{n} \cdot \bar X_n$ and $Z_{n,0} = Z \sim \mathcal N(0, 1)$ . The key idea is using Lemma 2 to conclude our proof. Fix any thrice-differentiable $f : \mathbb R \to \mathbb R$ such that $f, f', f'', f'''$ are bounded. We claim that

$\displaystyle \lim_{n \to \infty} \mathbb E[f(Z_{n,n})] = \mathbb E[f(Z_{n,0})].$

Fix $\epsilon > 0$ . We aim to find $N \in \mathbb N$ such that for $n > N$ ,

$\displaystyle |\mathbb E[f(Z_{n,n})] - \mathbb E[f(Z_{n,0})]| < \epsilon.$

Applying Taylor’s theorem for the interval containing $S_{n,i}$ and $Z_{n,i}$ , there exists $C_{n,i}$ between them such that

$\displaystyle f(Z_{n_i}) = f(S_{n,i}) + f'(S_{n,i}) \cdot (Z_{n,i} - S_{n,i}) + \frac{f''(C_{n,i})}{2!} \cdot (Z_{n,i} - S_{n,i})^2.$

Performing algebra yields

$\displaystyle f(Z_{n,i}) - f(S_{n,i}) - f'(S_{n,i}) \cdot \frac{X_i}{\sqrt n} - f''(S_{n,i}) \cdot \frac{X_i^2}{2 n} = \underbrace{ \frac{ ( f''(C_{n,i}) - f''(S_{n,i}) ) X_i^2}{2n} }_{R_{n,i}}.$

Since $f'''$ is bounded, there exists $\delta > 0$ such that

$\delta \cdot \| f''' \|_\infty \leq \epsilon.$

By the mean value theorem, whenever $|x - y| \leq \delta$ ,

$|f''(x) - f''(y)| \leq \| f''' \|_\infty \cdot |x - y| \leq \delta \cdot \| f''' \|_\infty \leq \epsilon.$

Consider the “good event” $G := \{ |X_i| \leq \delta \sqrt{n} \}$ and its complement

$G^c := \Omega \backslash G = \{ |X_i| > \delta \sqrt{n} \}.$

By our bound on the second derivative via the mean value theorem, we can bound $R_{n,i}$ in the “good event” $G$ by

$\displaystyle |f''(C_{n,i}) - f''(S_{n,i})| \leq \epsilon \quad \Rightarrow \quad \mathbb E[|R_{n,i}| \cdot \mathbb I_G] \leq \frac{\epsilon}{2n} \cdot \mathbb E[X_i^2] = \frac{\epsilon}{2n}.$

On the other hand, in the “bad event” $G^c$ , since $\| f'' \|_\infty \leq M$ for some $M > 0$ ,

$\displaystyle \mathbb E[|R_{n,i}| \cdot \mathbb I_{G^c}] \leq \frac{M}{n} \cdot \mathbb E[X_i^2 \cdot \mathbb I_{G^c}] \leq \frac Mn \cdot \mathbb E[\mathbb I_{G^c}].$

By Chebychev’s inequality,

$\begin{aligned}\mathbb E[\mathbb I_{G^c}] &= \mathbb P(|X_i| \geq \delta \sqrt n) \leq \frac{1}{\delta \cdot \sqrt{n}^2} = \frac{1}{n \delta}.\end{aligned}$

Therefore,

$\displaystyle \mathbb E[|R_{n,i}|] = \mathbb E[|R_{n,i}| \cdot \mathbb I_G] + \mathbb E[|R_{n,i}| \cdot \mathbb I_{G^c}] \leq \left( \frac{\epsilon}{2} + \frac{M}{n\delta} \right) \cdot \frac 1n.$

Thus, selecting $N > 2M/(\epsilon \delta)$ yields $\mathbb E[|R_{n,i}|] \leq \epsilon/n$ . Plugging in the left-hand side of $R_{n,i}$ ,

$\displaystyle |\mathbb E[f(Z_{n,i})] - \mathbb E[f(S_{n,i})] - \mathbb E[f''(S_{n,i}) ] / 2n |\leq \frac{\epsilon}{n}.$

The argument remains almost unchanged with $Z_{n,i}$ replaced with $Z_{n,i-1}$ , so that we have the bound

$\displaystyle |\mathbb E[f(Z_{n,i-1})] - \mathbb E[f(S_{n,i})] - \mathbb E[f''(S_{n,i}) ] / 2n |\leq \frac{\epsilon}{n}.$

Applying the triangle inequality,

$\displaystyle |\mathbb E[f(Z_{n,i})] - \mathbb E [f(Z_{n,i-1})]| \leq \frac{\epsilon}{n} + \frac{\epsilon}{n} = \frac{2\epsilon}{n}.$

Finally, applying a telescoping series,

$\begin{aligned} |\mathbb E[f(Z_{n,n})] - \mathbb E [f(Z_{n,0})]| &\leq \sum_{i=1}^n |\mathbb E[f(Z_{n,i})] - \mathbb E [f(Z_{n,i-1})]| \leq \sum_{i=1}^n \frac{2\epsilon}n = 2\epsilon. \end{aligned}$

Replace $\epsilon$ with $\epsilon/3$ to complete the argument.

With that, we are done with probability…for now. There are many directions we can take from this point on. We could have proven this result using techniques involving characteristic functions or even Brownian motion, but those topics will require their entire blog sections in order to properly discuss. We conclude with the famous normal approximation to the binomial and Poisson distributions.

Corollary 1. For $X_n \sim \mathrm{Bin}(n, p)$ ,

$\displaystyle \lim_{n \to \infty} \mathbb P \left( \frac{X_n - np}{\sqrt{np(1-p)}} \leq z \right) = \Phi(z).$

This is the normal approximation to the binomial distribution.

Proof. For each $n$ , write $X_n := \sum_{i=1}^n \xi_i$ for i.i.d. $\xi_i \sim \mathrm{Ber}(p)$ and

$\displaystyle Y_n := \frac{ X_n - np }{ \sqrt{p(1-p)} } .$

Then

$\displaystyle \bar Y_n := \frac 1n \cdot Y_n = \frac 1n \cdot \sum_{i=1}^n \frac{ \xi_i - p }{\sqrt{p(1-p)}} = \frac{X_n - np}{n \sqrt{p(1-p)}}.$

Since the $(\xi_i - p)/\sqrt{p(1-p)}$ are i.i.d. with mean $0$ and variance $1$ , by the central limit theorem,

$\displaystyle \lim_{n \to \infty} \mathbb P \left( \frac{X_n - np}{\sqrt{np(1-p)}} \leq z \right) = \lim_{n \to \infty} \mathbb P ( \sqrt n \cdot \bar Y_n \leq z) = \Phi(z).$

Corollary 2. We have

$\displaystyle \lim_{\lambda \to \infty} \mathbb P \left( \frac{X - \lambda}{\sqrt{\lambda}} \leq z \right) = \Phi(z), \quad X \sim \mathrm{Pois}(\lambda).$

This is the normal approximation to the Poisson distribution.

Proof. Assume $\lambda \in \mathbb N^+$ . Write $X_{\lambda} := \sum_{i=1}^{\lambda} \xi_i$ for i.i.d. $\xi_i \sim \mathrm{Pois}(1)$ and

$\displaystyle Y_\lambda := X_{\lambda} - \lambda .$

Then

$\displaystyle \bar Y_{\lambda} = \frac{X_{\lambda} - \lambda}{\lambda } = \frac 1{\lambda} \cdot \sum_{i=1}^{\lambda} (\xi_i - 1).$

Since the $\xi_i - 1$ are i.i.d. with mean $0$ and variance $1$ , by the central limit theorem,

$\displaystyle \lim_{\lambda \to \infty} \mathbb P \left( \frac{X_{\lambda} - \lambda}{ \sqrt{\lambda} } \leq z \right) = \lim_{n \to \infty} \mathbb P ( \sqrt{ \lambda } \cdot \bar Y_\lambda \leq z) = \Phi(z).$

—Joel Kindiak, 31 Jul 25, 1359H

KindiakMath

The Central Limit Theorem

Leave a comment Cancel reply

The Central Limit Theorem

Share this:

Leave a comment Cancel reply