A Student’s Nightmare

Let X_1,X_2,\dots be i.i.d. random variables that denote the scores on a recent exam, with mean \mu and standard deviation \sigma. Suppose students used to score \mu_0 marks on an end-year exam. This time, however, students seem rather dejected after leaving the exam hall, lamenting that the exam was a lot more challenging than before.

We can use hypothesis testing to determine with some reasonably small error \alpha \in (0, 1) whether or not their claim holds weight; that is, whether the exam was harder, and thus, the population mean score \mu has decreased. The default case \mu = \mu_0 is called the null hypothesis, denoted \mathrm H_0, and the proposed change \mu < \mu_0 is called the alternative hypothesis, denoted \mathrm H_1. Usually, we abbreviate as follows

\mathrm H_0 : \mu = \mu_0 \quad \text{vs.} \quad \mathrm H_1 : \mu < \mu_0.

How do we go about testing this hypothesis? We will presume the innocence of the defendant \mathrm H_0 until we have sufficient evidence to convict it guilty, concluding \mathrm H_1. Under \mathrm H_0, which states that \mu = \mu_0, we have \bar X_n \sim \mathcal N(\mu_0, \sigma^2/n) approximately by the central limit theorem.

We then go and sample n students, obtaining the sample points X_1 = x_1, \dots, X_n = x_n, and we can compute

\displaystyle \bar x := \frac 1n \cdot \sum_{i=1}^n x_i,

since \bar X_n := \frac 1n \sum_{i=1}^n X_i is an unbiased estimator for \mu. But how do we estimate \sigma^2? We use the unbiased estimator S_n^2 := \frac{n}{n-1} \cdot \tilde S_n^2, where

\displaystyle \tilde S_n^2 = \frac 1n \cdot \sum_{i=1}^n (X_i - \bar X_n)^2 \quad \Rightarrow \quad S_n^2 = \frac 1{n-1} \cdot \sum_{i=1}^n (X_i - \bar X_n)^2.

If each X_i are normally distributed, so is \bar X_n. What would be the distribution of S_n^2?

Theorem 1. For n \in \mathbb N^+, if X_1,\dots, X_n \sim \mathcal N(\mu, \sigma^2) are i.i.d., then there exist i.i.d. Z_1,\dots, Z_n \sim \mathcal N(0, 1) such that

\displaystyle \frac{(n-1) \cdot S_n^2}{\sigma^2} = \sum_{i=1}^{n-1} Z_i^2.

Proof. See Exercise 6 on multivariate normal distributions.

The right-hand side is often abbreviated as the chi-squared distribution with n-1 degrees of freedom. We slowly formalise it as follows.

Lemma 1. Suppose X = Z^2, where Z \sim \mathcal N(0, 1). Then the p.d.f. of X is given by

\displaystyle f_X(x) = \frac 1{2^{1/2} \cdot \Gamma(1/2)} \cdot x^{-1/2} e^{-x/2},

where \Gamma(\cdot) denotes the gamma function. Recall that \Gamma(1/2) = \sqrt{\pi}.

Proof. We first remark that \mathbb P(X < 0) = \mathbb P(Z^2 < 0) = 0. Therefore, we restrict our attention to x \geq 0. Then

\begin{aligned} \mathbb P(X \leq x) &= \mathbb P(Z^2 \leq x) \\ &= \mathbb P(-\sqrt{x} \leq Z \leq \sqrt{x}) \\ &= \mathbb P(Z \leq \sqrt{x}) - \mathbb P(Z \leq -\sqrt{x}). \end{aligned}

Differentiating and applying the p.d.f. of a standard normal distribution,

\begin{aligned} f_X(x) &= \frac{\mathrm d}{\mathrm dx} \mathbb P(X \leq x) \\ &= \frac{\mathrm d}{\mathrm dx} (\mathbb P(Z \leq \sqrt{x}) - \mathbb P(Z \leq -\sqrt{x})) \\ &= f_Z(\sqrt{x}) \cdot \frac{1}{2\sqrt x} - f_Z(-\sqrt{x}) \cdot -\frac 1{2\sqrt x}) \\ &= \frac 1{\sqrt x} \cdot f_Z(\sqrt{x}) = \frac 1{\sqrt x} \cdot \frac{1}{\sqrt{2\pi}} \cdot e^{-(\sqrt x)^2/2} \\ &= \frac 1{2^{1/2} \cdot \Gamma(1/2)} \cdot x^{-1/2} e^{-x/2}. \end{aligned}

Lemma 2. For any \alpha, \beta > 0,

\displaystyle \int_0^{1} u^{\alpha-1} \cdot (1-u)^{\beta-1} \, \mathrm du = \frac{\Gamma(\alpha) \cdot \Gamma(\beta)}{\Gamma(\alpha+\beta)}.

Proof. Assuming all integrals are finite, we first prove that

\displaystyle \int_{\mathbb R^n} f * g\, \mathrm d\lambda = \left( \int_{\mathbb R^n} f \, \mathrm d\lambda \right) \cdot \left(\int_{\mathbb R^n} g \, \mathrm d\lambda \right),

where \lambda denotes the Lebesgue measure on \mathbb R^n. Including the dummy variables for readability, Fubini’s theorem and the change-of-variables \mathbf w = \mathbf x - \mathbf y reduces the left-hand side reduces to

\begin{aligned} \int_{\mathbb R^n} f * g\, \mathrm d\lambda &= \int_{\mathbb R^n} (f * g)(\mathbf x)\, \mathrm d\lambda(\mathbf x) \\ &= \int_{\mathbb R^n} \int_{\mathbb R^n} f(\mathbf y) g(\mathbf x - \mathbf y)\, \mathrm d\lambda(\mathbf y)\, \mathrm d\lambda(\mathbf x) \\ &= \int_{\mathbb R^n} f(\mathbf y) \int_{\mathbb R^n} g(\mathbf x - \mathbf y)\, \mathrm d\lambda(\mathbf x)\, \mathrm d\lambda(\mathbf y) \\ &= \int_{\mathbb R^n} f(\mathbf y) \int_{\mathbb R^n} g(\mathbf w)\, \mathrm d\lambda(\mathbf w)\, \mathrm d\lambda(\mathbf y) \\ &= \left( \int_{\mathbb R^n} f(\mathbf y)\, \mathrm d\lambda(\mathbf y) \right) \cdot \left( \int_{\mathbb R^n} g(\mathbf w)\, \mathrm d\lambda(\mathbf w) \right) \\ &= \left( \int_{\mathbb R^n} f \, \mathrm d\lambda \right) \cdot \left(\int_{\mathbb R^n} g \, \mathrm d\lambda \right). \end{aligned}

Now define f_\alpha(t) = t^{\alpha-1} e^{-t} \cdot \mathbb I_{(0,\infty)}(t), so that applying this result to the definition of the gamma function defined by

\displaystyle \Gamma(z) = \int_{-\infty}^{\infty} t^{z-1} e^{-t} \cdot \mathbb I_{(0,\infty)}(t)\, \mathrm dt

yields

\begin{aligned} \Gamma(\alpha) \cdot \Gamma(\beta) &= \left( \int_{-\infty}^{\infty} f_\alpha\, \mathrm d\lambda \right) \cdot \left( \int_{-\infty}^{\infty} f_\beta\, \mathrm d\lambda \right) =  \int_{-\infty}^{\infty} (f_\alpha * f_\beta)\, \mathrm d\lambda.\end{aligned}

Evaluating the convolution, and using the change of variables s = ut,

\begin{aligned} (f_\alpha * f_\beta)(t) &= \int_{-\infty}^{\infty} f_\alpha(s) \cdot f_\beta(t-s)\, \mathrm ds \\ &= \int_{0}^{t} s^{\alpha-1} e^{-s} \cdot (t-s)^{\beta-1} e^{-(t-s)} \, \mathrm ds \\ &= e^{-t} \cdot \int_{0}^{t} s^{\alpha-1}  \cdot (t-s)^{\beta-1}  \, \mathrm ds \\ &= e^{-t} \cdot \int_{0}^{1} (ut)^{\alpha-1}  \cdot (t-ut)^{\beta-1}  \cdot t \, \mathrm du \\ &= e^{-t} \cdot \int_{0}^{1} t^{\alpha-1} \cdot u^{\alpha - 1} \cdot t^{\beta - 1} \cdot (1-u)^{\beta-1}  \cdot t \, \mathrm du \\ &=  t^{(\alpha + \beta) - 1}  \cdot e^{-t} \cdot\int_{0}^{1} u^{\alpha - 1} \cdot (1-u)^{\beta-1} \, \mathrm du. \end{aligned}

Integrating on both sides, and applying Fubini’s theorem,

\begin{aligned}\Gamma(\alpha) \cdot \Gamma(\beta) &= \int_{-\infty}^{\infty} \left( t^{(\alpha + \beta) - 1}  \cdot e^{-t} \cdot  \int_{0}^{1} u^{\alpha - 1} \cdot (1-u)^{\beta-1} \, \mathrm du \right) \, \mathrm dt \\ &= \left( \int_{-\infty}^{\infty} t^{(\alpha + \beta) - 1}  \cdot e^{-t} \, \mathrm dt \right)  \cdot  \left( \int_{0}^{1} u^{\alpha - 1} \cdot (1-u)^{\beta-1} \, \mathrm du \right) \\ &= \Gamma(\alpha + \beta)  \cdot  \left( \int_{0}^{1} u^{\alpha - 1} \cdot (1-u)^{\beta-1} \, \mathrm du \right), \end{aligned}

and the result follow by algebruh.

Definition 1. The random variable W is said to follow a chi-squared distribution with \nu > 0 degrees of freedom, denoted W \sim \chi_\nu^2, if its density function is given by

\displaystyle f_W(w) = \frac 1{2^{\nu/2} \cdot \Gamma(\nu/2)} \cdot w^{\nu/2-1} \cdot e^{- w/2}.

Lemma 3. For any \mu, \nu, if X \sim \chi_\mu^2 and Y \sim \chi_\nu^2 are independent, then W := X +Y \sim \chi_{\mu + \nu}^2.

Proof. Denoting \alpha := \mu/2, \beta := \nu/2, taking convolutions and applying Lemma 2,

\begin{aligned} f_W(w) &= \int_0^{w} f_X(x) f_Y(w-x)\, \mathrm dx \\ &= \int_0^{w} \frac 1{2^{\alpha} \cdot \Gamma(\alpha)} \cdot x^{\alpha-1} \cdot \frac 1{ \Gamma(\beta)} \cdot (w-x)^{\beta-1} e^{- w/2}\, \mathrm dy \\ &= \frac 1{2^{\alpha+\beta} \cdot \Gamma(\alpha+\beta)} \cdot e^{- w/2}  \cdot \frac {\Gamma(\alpha+\beta)}{\Gamma(\alpha) \cdot \Gamma(\beta)} \cdot \int_0^{w}  x^{\alpha - 1} \cdot (w-x)^{\beta - 1}\, \mathrm dx \\ &= \frac 1{2^{\alpha +\beta} \cdot \Gamma(\alpha +\beta)} \cdot e^{- w/2}  \cdot \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \cdot \Gamma(\beta)} \int_0^{1}  (uw)^{\alpha-1} \cdot (w-uw)^{\beta-1} \cdot w \, \mathrm du \\ &= \frac 1{2^{\alpha +\beta} \cdot \Gamma(\alpha+\beta)} \cdot w^{\alpha +\beta-1} \cdot e^{- w/2}  \cdot \underbrace{ \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \cdot \Gamma(\beta)} \cdot \int_0^{1} u^{\alpha-1} \cdot (1-u)^{\beta-1} \, \mathrm du }_1 \\ &= \frac 1{2^{(\mu + \nu)/2} \cdot \Gamma((\mu + \nu)/2)} \cdot w^{(\mu + \nu)/2-1} \cdot e^{- w/2}. \end{aligned}

Lemma 4. Fix n \in \mathbb N^+. Then W \sim \chi_n^2 if and only if there exist i.i.d. Z_1, \dots, Z_{n} \sim \mathcal N(0, 1) such that W = \sum_{i=1}^{n} Z_i^2.

Proof. The direction (\Leftarrow) is the content of Lemma 3, and the direction (\Rightarrow) follows an argument by induction applied to Lemma 3 again.

Example 1. As defined in Theorem 1, (n-1) \cdot S_n^2 / \sigma^2 \sim \chi_{n-1}^2.

Let X_1,\dots, X_n be i.i.d. with mean \mu and known variance \sigma^2, The central limit theorem (which we aim to eventually prove) tells us that regardless of the underlying distribution of X_1,\dots, X_n,

\displaystyle \bar X_n \sim \mathcal N \left( \mu , \frac{\sigma^2}{n} \right)  \quad \text{approximately}.

Equivalently, we have that

\displaystyle \frac{\bar X_n - \mu}{\sigma/\sqrt n} \sim \mathcal N(0, 1)\quad \text{approximately}.

Of course, if the X_i terms are normally distributed, then this statement holds exactly.

If \sigma^2 is unknown, however, then such a conclusion isn’t very useful, since our estimate (\bar x_n - \mu)/(\sigma / \sqrt n) can only be expressed in terms of \sigma. However, the quantity S_n^2 is an unbiased estimate of \sigma^2. Furthermore, if the X_i terms are normally distributed, then Example 1 characterises the distribution of S_n^2 via

\displaystyle \frac{(n-1) S_n^2}{\sigma^2} \sim \chi_{n-1}^2.

What would be the distribution of the modified random variable T_n defined below?

\displaystyle T_n := \frac{\bar X_n - \mu}{S_n/\sqrt n}

By algebraic manipulation,

\displaystyle T_n = \frac{\bar X_n - \mu}{\sigma/\sqrt n} \cdot \frac{\sigma}{S_n} = Z \cdot \frac 1{\sqrt{(n-1) \cdot \frac{S_n^2}{\sigma^2} / (n-1)}}.

Definition 2. A random variable T is said to follow a Student’s t-distribution with \nu degrees of freedom, denoted T \sim t(\nu), if there exist independent Z \sim \mathcal N(0, 1) and W \sim \chi_\nu^2 such that

\displaystyle T := \frac{Z}{\sqrt{W / \nu}}.

In particular, \displaystyle \frac{\bar X_n - \mu}{S_n/\sqrt n} \sim t(n-1). As such, we recover the classic t-test for statistical hypothesis testing by modelling the data, assuming the underlying data follows a normal distribution.

To use the t-distribution well, we would need its p.d.f., so that we can use numerical integration techniques to estimate various crucial probabilities that we can tabulate nicely in a table.

Theorem 2. If T \sim t(\nu), then T has a p.d.f. f_T given by

\displaystyle f_T(t) = \frac{\Gamma((\nu+1)/2)}{\Gamma (\nu/2) \cdot \sqrt{\nu \pi}} \cdot \left(  1 + \frac{t^2}{\nu} \right)^{-(\nu+1)/2}.

Proof. By definition, find i.i.d. Z \sim \mathcal N(0, 1) and W \sim \chi_\nu^2 such that T = Z/\sqrt{W / \nu}. Denote F_T(t) := \mathbb P(T \leq t). By Fubini’s theorem,

\displaystyle \begin{aligned} F_T(t) &= \mathbb P\left( \frac{Z}{\sqrt{W / \nu}} \leq t \right) \\ &= \int_{\mathbb R^2} \mathbb I_{z \leq t\sqrt{w/\nu}}(z,w) \cdot f_Z(z) \cdot f_W(w)\, \mathrm dz\, \mathrm dw \\ &= \int_{0}^\infty \int_{-\infty}^{t\sqrt{w/\nu}}  f_Z(z) \cdot f_W(w)\, \mathrm dz \, \mathrm dw. \end{aligned}

Differentiating under the integral sign,

\displaystyle \begin{aligned} f_T(t) = \frac{\mathrm d}{\mathrm dt} ( F_T(t) ) &= \int_{0}^\infty \frac{\partial}{\partial t} \int_{-\infty}^{t\sqrt{w/\nu}}  f_Z(z) \cdot f_W(w)\, \mathrm dz \, \mathrm dw \\ &= \int_{0}^\infty  f_Z(t\sqrt{w/\nu}) \cdot f_W(w) \cdot \sqrt{\frac w\nu} \, \mathrm dw. \end{aligned}

We leave it as an exercise to verify that

\displaystyle f_Z(t\sqrt{w/\nu}) \cdot f_W(w) \cdot \sqrt{\frac w\nu} = w^{(\nu-1)/2} \cdot e^{-\frac w2\left( 1 + \frac{t^2}{\nu} \right)},

where c = (2^{(\nu+1)/2} \cdot \Gamma (\nu/2) \cdot \sqrt{\nu \pi})^{-1}, so that by the substitution u = \frac w2\left( 1 + \frac{t^2}{\nu} \right),

\begin{aligned} f_T(t) &= c \cdot  \int_{0}^\infty \left( \frac 2{ 1 + \frac{t^2}{\nu}} \cdot u \right)^{(\nu-1)/2} \cdot e^{-u} \cdot \frac 2{ 1 + \frac{t^2}{\nu}} \, \mathrm du \\ &= c \cdot \left( \frac 2{ 1 + \frac{t^2}{\nu}} \right)^{(\nu+1)/2} \int_{0}^\infty u^{(\nu+1)/2 - 1} \cdot e^{-u}\, \mathrm du \\ &= c \cdot 2^{(\nu+1)/2} \cdot \left(  1 + \frac{t^2}{\nu} \right)^{-(\nu+1)/2} \cdot \Gamma((\nu+1)/2) \\ &= \frac{\Gamma((\nu+1)/2)}{\Gamma (\nu/2) \cdot \sqrt{\nu \pi}} \cdot \left(  1 + \frac{t^2}{\nu} \right)^{-(\nu+1)/2}. \end{aligned}

Recall our original question: we wanted to analyse if students’ scores have decreased, by testing the hypothesis \mathrm H_0 against \mathrm H_1 given by

\mathrm H_0 : \mu = \mu_0,\quad \text{vs.}\quad \mathrm H_1 : \mu < \mu_0.

Assuming our students’ scores follow a normal distribution (which happens frequently enough to be an acceptable assumption), we can obtain a random sample X_1,\dots, X_n and compute the t-statistic T \sim t(n-1) given by the expressions

\displaystyle \bar X_n := \frac 1n \cdot \sum_{i=1}^n X_i,\quad S_n^2 := \frac 1{n-1} \cdot \sum_{i=1}^n (X_i - \bar X_n)^2, \quad T:= \frac{\bar X_n - \mu_0 }{ S_n / \sqrt n }.

By collecting data to evaluate \bar X_n = \bar x_n and S_n = s_n, we obtain a t-value of

\displaystyle t := \frac{\bar x_n - \mu_0 }{s_n / \sqrt{n}}.

The instance t < 0 inches towards our suspicions of marks having decreased being correct. Since T \sim t(n-1), we can compute the p-value p := \mathbb P(T < t).

If instead we don’t know that student’s scores follow a normal distribution but we do know what the standard deviation of their scores is, then the central limit theorem tells us the standardized random variable of \bar X_n is approximately normally distributed (that is, it converges in distribution to the standard normal distribution), so that we can regard

\displaystyle \frac{ \bar X_n - \mu_0 }{\sigma/ \sqrt n} \approx Z \sim \mathcal N(0, 1).

In this case, our p-value is computed using the z-value:

\displaystyle z := \frac{\bar x_n - \mu_0}{\sigma / \sqrt{n}},\quad p := \mathbb P(Z < z).

How do we reject \mathrm H_0? Notice that in the t-value and z-value calculations, we assume \mathrm H_0 holds, that is, \mu = \mu_0, and substituted accordingly. The larger that |t| or |z| is away from 0, the greater the normalised change, and so the smaller the value of p.

How small is sufficiently small for us to reject \mathrm H_0? No one knows. If you require p \leq 0, then there is no way that we can reject \mathrm H_0. Otherwise, if you require p \leq \alpha for some predetermined \alpha \in (0, 1) of your choice, commonly \alpha = 0.05 (or in physicists’ case, \alpha \approx 5.7 \times 10^{-7}, known as the 5-sigma-rule), there is a chance of rejecting \mathrm H_0. We call \alpha the level of significance, and reject \mathrm H_0 if and only if p \leq \alpha.

This form of statistical hypothesis testing is commonly used in sciences—physical and social—by interpreting collected data and the statistics it suggests about the underlying distribution. Yet, our conclusions may be wrong.

  • If we rejected \mathrm H_0 when we shouldn’t have, we call it a type-I error.
  • If we did not reject \mathrm H_0 when we should have, we call it the type-II error.

We note that \alpha refers to our maximum chosen probability of unintentionally committing a type-I error.

Finally, hypothesis tests pertaining an existing population mean commonly take on three flavours:

  • Left-tailed: \mathrm H_0 : \mu = \mu_0 vs. \mathrm H_1 : \mu < \mu_0,
  • Right-tailed: \mathrm H_0 : \mu = \mu_0 vs. \mathrm H_1 : \mu > \mu_0,
  • Two-tailed: \mathrm H_0 : \mu = \mu_0 vs. \mathrm H_1 : \mu \neq \mu_0.

Now we return to our original quest: proving the central limit theorem. We will need to revisit distribution functions and characteristic functions for a not-too-challenging proof of the central limit theorem, at least for the undergraduate context.

—Joel Kindiak, 26 Jul 25, 2103H

,

Published by


Leave a comment