Adding Random Variables…Revisited

We are now justified in adding random variables. For instance, if X, Y \sim \mathcal N(0, 1), what is the distribution of X + Y? Unfortunately, in the most extreme cases, this answer is trivial.

Lemma 1. Let X \sim \mathcal N(0, 1) and Y = -X. Then Y \sim \mathcal N(0, 1) and X + Y = 0.

Proof. We observe that for any K \in \frak{B}(\mathbb R), since f_X(\cdot) = f_X(-\cdot), by a change of variables,

\begin{aligned} \mathbb P_Y(K) = \mathbb P_X(-K) &= \int_{-K} f_X(x)\, \mathrm dx \\ &= \int_K f_X(-x)\, \mathrm dx \\ &= \int_K f_X(x)\, \mathrm dx = \mathbb P_X(K).\end{aligned}

Therefore, for any K \in \frak{B}(\mathbb R),

\displaystyle \int_K f_Y\, \mathrm d\lambda = \int_K f_X\, \mathrm d\lambda.

Therefore, f_Y = f_X so that Y \sim \mathcal N(0, 1).

Here, it is clear that Y is entirely dependent—100\% correlated even—to the random variable X. We got a rather useless answer to our question in this case. If instead we swung to the other extreme, namely X,Y being independent, we don’t have a general formula for any X, Y. Nevertheless, the independent case proves to be far more useful in reality—for instance, the exam scores of two students are effectively independent barring (sufficiently drastic) externalities.

Therefore, let’s discuss what it means for two random variables X,Y : \Omega \to \mathbb R to be independent. We have previously seen that two events K,L are called independent if \mathbb P(K \cap L) = \mathbb P(K) \cdot \mathbb P(L).

Lemma 2. The set \sigma(X) := \{X^{-1}(K) : K \in \frak{B}(\mathbb R)\} \subseteq \mathcal F forms a \sigma-algebra, called the \sigma-algebra generated by X.

Given two \sigma-algebras \mathcal F_1, \mathcal F_2 \subseteq \mathcal F, also known as sub-\sigma-algebras of \mathcal F, we say that \mathcal F_1, \mathcal F_2 are independent if for any K \in \mathcal F_1, L \in \mathcal F_2, K, L are independent.

Definition 1. Two random variables X, Y are said to be independent if \sigma(X), \sigma(Y) are independent.

Suppose \mathbb P_X \ll \mu and \mathbb P_Y \ll \mu, where \mu denotes either the counting measure |\cdot| or the Lebesgue measure \lambda. Let \mu^2 be the product of two copies of \mu. Suppose (X,Y) \ll \mu^2.

Theorem 1. X, Y are independent and (X,Y) \ll \mu^2 if and only if

\displaystyle \mathbb P((X, Y) \in K) = \int_K f_X \cdot f_Y\, \mathrm d\mu^2,

where the integrand in the right-hand side is interpreted to mean

f_X(\cdot , y) = f_X,\quad f_Y(x, \cdot) = f_Y, \quad x, y \in \mathbb R.

Proof. Consider the cumulative distribution functions

F_X(x) := \mathbb P(X \in (-\infty, x]) \equiv \mathbb P(X \leq x),\quad F_Y(y) := \mathbb P( Y \leq y).

By the Radon-Nikodým theorem,

\displaystyle F_X(x) = \int_{(-\infty, x]} f_X\, \mathrm d\mu,\quad F_Y(y) = \int_{(-\infty, y]} f_Y\, \mathrm d\mu.

In the direction (\Leftarrow), by the Fubini-Tonelli theorem,

\begin{aligned} \mathbb P(X^{-1}((-\infty, x]) \cap Y^{-1}((-\infty, y])) &= \mathbb P((X,Y) \in (-\infty, x] \times (-\infty, y]) \\ &= \int_{(-\infty, x] \times (-\infty, y]} f_{X,Y}\, \mathrm d\mu^2 \\ &= \int_{(-\infty, x]} \int_{(-\infty, y]} f_{X,Y}\, \mathrm d \mu\, \mathrm d \mu \\ &= \int_{(-\infty, x]} \int_{(-\infty, y]} f_X \cdot f_Y\, \mathrm d \mu\, \mathrm d \mu \\ &= \int_{(-\infty, x]} f_X \, \mathrm d \mu \cdot \int_{(-\infty, y]} f_Y\, \mathrm d \mu \\ &= \mathbb P(X^{-1}(-\infty, x])  \cdot \mathbb P(Y^{-1}(-\infty, y]), \end{aligned}

establishing independence.

In the direction (\Rightarrow), we similarly use the Fubini-Tonelli theorem to obtain

\begin{aligned} \iint_{(-\infty, x] \times (-\infty, y]} f_{X,Y}\, \mathrm d\mu^2 &= \int_{(-\infty, x]} f_X\, \mathrm d\mu \cdot \int_{(-\infty, y]} f_Y\, \mathrm d\mu \\ &= \int_{(-\infty, x]} \int_{(-\infty, y]} f_X \cdot f_Y\, \mathrm d \mu\, \mathrm d \mu \\ &= \iint_{(-\infty, x] \times (-\infty, y]} f_X \cdot f_Y\, \mathrm d\mu^2, \end{aligned}

so that f_{X,Y} = f_X \cdot f_Y.

Henceforth, when X,Y are independent and X \ll \mu and Y \ll \mu, we assume (X,Y) \ll \mu so that f_{X,Y} is a meaningful quantity.

Corollary 1. Define

\displaystyle F_{X,Y}(x,y) = \mathbb P((X,Y) \in (-\infty, x] \times (-\infty, y])  \equiv \mathbb P(X \leq x, Y \leq y).

Then X, Y are independent and (X,Y) \ll \mu^2 if and only if for any x, y \in \mathbb R,

\displaystyle F_{X,Y}(x,y) = F_X(x)\cdot F_Y(y).

Finally, let’s add these random variables together.

Theorem 1. If X, Y are independent \mathbb R-valued random variables with density functions f_X, f_Y, then X+Y is a random variable with density function

f_{X+Y} = f_X * f_Y,

where the right-hand side denotes convolution with respect to \mu:

\displaystyle (f * g)(u) := \int_{\mathbb R} f \cdot g(u- \cdot )\, \mathrm d\mu \equiv \int_{\mathbb R} f(x) \cdot g(u- x )\, \mathrm d\mu(x).

If \mu is the counting measure, we get

\displaystyle (f * g)(u) = \sum_{n \in \mathbb Z} f(n) \cdot g(u-n).

If \mu is the Lebesgue measure, we get

\displaystyle (f * g)(u) = \int_{-\infty}^{\infty} f(t) \cdot g(u-t)\, \mathrm dt.

Proof. Denote g(x,y) = x+y so that for any fixed x, g(x,y) \leq u if and only if y \leq u - x. For any K \in \frak{B}(\mathbb R),

\displaystyle \begin{aligned} \int_{K} f_{X+Y}\, \mathrm d\mu &= \mathbb P(X+Y \in K) \\ &= \mathbb P((X,Y) \in g^{-1}(K)) \\ &= \int_{g^{-1}(K)} f_{X,Y}\, \mathrm d\mu^2 \\ &= \int_{g^{-1}(K)} f_{X,Y}(x,y)\, \mathrm d\mu^2(x,y) \\ &= \int_{\mathbb R} \int_{K-x} f_{X,Y}(x,y)\, \mathrm d\mu(y)\, \mathrm d\mu(x) \\ &= \int_{\mathbb R} \int_{K} f_{X,Y}(x,y-x)\, \mathrm d\mu(y)\, \mathrm d\mu(x) \\ &= \int_{K} \int_{\mathbb R} f_{X,Y}(x,y-x)\, \mathrm d\mu(x)\, \mathrm d\mu(y) \\ &= \int_{K} \underbrace{ \int_{\mathbb R} f_{X}(x) \cdot f_Y(y-x)\, \mathrm d\mu(x) }_{(f_X * f_Y)(y)} \, \mathrm d\mu(y) = \int_{K} f_X * f_Y\, \mathrm d\mu. \end{aligned}

Finally, let’s concretely add two independently normally distributed random variables.

Theorem 2. If X \sim \mathcal N(\mu_1, \sigma_1^2) and Y \sim \mathcal N(\mu_2, \sigma_2^2) are independent, then X + Y \sim \mathcal N(\mu_1 + \mu_2, \sigma_1^2 + \sigma_2^2).

Proof. Write X = \mu_1 + \sigma_1 Z_1 and Y = \mu_2 + \sigma_2 Z_2. Then

X + Y = (\mu_1 + \mu_2) + \sigma_1 Z_1 + \sigma_2 Z_2,

where Z_1, Z_2 \sim \mathcal N(0, 1) are independent. It suffices to prove that

\sigma_1 Z_1 + \sigma_2 Z_2 \sim \mathcal N(0, \sigma_1^2 + \sigma_2^2).

By construction, for any \sigma > 0, if U \sim \mathcal N(0, \sigma^2), then

\displaystyle f_U(u) = \frac{1}{ \sigma \sqrt{2\pi} } e^{-\frac{u^2}{2\sigma^2} }.

Defining W := \sigma_1 Z_1 + \sigma_2 Z_2,

\begin{aligned} f_W(w) &= (f_{\sigma_1 Z_1} * f_{\sigma_2 Z_2})(w) \\ &= \int_{-\infty}^{\infty} \frac{1}{ \sigma_1 \sqrt{2\pi} } e^{-\frac{t^2}{2\sigma_1^2} } \cdot \frac{1}{ \sigma_2 \sqrt{2\pi} } e^{-\frac{(w-t)^2}{2\sigma_2^2} }\, \mathrm dt \\ &= \frac{1}{\sigma_1 \sigma_2 \cdot 2\pi}\int_{-\infty}^{\infty} \exp\left( - \frac 12 \left( \frac{t^2}{\sigma_1^2} + \frac{(w-t)^2}{\sigma_2^2} \right)\right)\, \mathrm dt. \end{aligned}

By algebruh,

\displaystyle \frac{t^2}{\sigma_1^2} + \frac{(w-t)^2}{\sigma_2^2} = \left( \frac{1}{\sigma_1^2} + \frac{1}{\sigma_2^2} \right) t^2 - \frac{2w}{\sigma_2^2} \cdot t + \frac{w^2}{\sigma_2^2} = A(t - h)^2 + k.

for carefully calculated constants A, h, k, in particular, with

\displaystyle A = \frac{1}{\sigma_1^2} + \frac{1}{\sigma_2^2},\quad k = \frac{w}{A \sigma_2^2} = -\frac 14 \cdot \frac{4w^2}{\sigma_2^4} \cdot \frac{\sigma_1^2 \cdot \sigma_2^2}{\sigma_1^2 + \sigma_2^2} + \frac{w^2}{\sigma_2^2} = -\frac{w^2}{\sigma_1^2 + \sigma_2^2}.

Denoting \sigma_3^2 := 1/A, the integral then simplifies to

\begin{aligned}\int_{-\infty}^{\infty} e^{-\frac 12 (A(t-h)^2 + k)}\, \mathrm dt &= e^{-\frac 12 k} \cdot \sigma_3 \cdot \sqrt{2\pi} \cdot \underbrace{ \int_{-\infty}^{\infty} \frac{1}{\sigma_3 \sqrt{2\pi} }e^{-\frac {(t-h)^2}{2\sigma_3^2}} \, \mathrm dt }_1 \\ &= e^{-\frac 12 k} \cdot \sigma_3 \cdot \sqrt{2\pi}.\end{aligned}

Therefore, denoting \sigma^2 := \sigma_1^2 + \sigma_2^2 so that k = -w^2/\sigma^2,

\begin{aligned} f_W(w) &= \frac{\sigma_3}{\sigma_1 \sigma_2} \cdot \frac{1}{\sqrt{2\pi}} \cdot e^{-\frac 12 k} = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{w^2}{2\sigma^2}}. \end{aligned}

Therefore, \sigma_1 Z_1 + \sigma_2 Z_2 = W \sim \mathcal N(0, \sigma^2) = \mathcal N(0, \sigma_1^2 + \sigma_2^2), as required.

In practice, to compute probabilities involving normal distributions, we define

\displaystyle \Phi(z) := \mathbb P(Z \leq z) = \frac 12 + \frac{1}{\sqrt{2\pi}} \int_0^z e^{-t^2/2}\, \mathrm dt

and tabulate commonly used approximate values of \Phi(z) for z > 0, known in the statistics community as a z-table. By symmetry, \Phi(-z) = 1- \Phi(z). For any X \sim \mathcal N(\mu, \sigma^2), since X = \mu + \sigma Z, we can reduce the computation to

\displaystyle \mathbb P(X \leq x) = \mathbb P\left( Z \leq \frac{x - \mu}{\sigma} \right) = \Phi \left( \frac{x - \mu}{\sigma}\right).

By induction, we have the following sampling distribution \bar X_n.

Corollary 2. For independent, identically distributed random variables X_1,\dots, X_n \sim \mathcal N(\mu, \sigma^2),

\displaystyle \bar X_n := \frac{1}{n} \sum_{i=1}^n X_i \sim \mathcal N \left( \mu, \frac{\sigma^2}{n} \right).

The central limit theorem claims that even if X_i are not normally distributed, \bar X_n converges in distribution to \mathcal N(\mu, \sigma^2/n). For the limiting distribution to be constant, the equivalent claim is that (\bar X_n-\mu) / (\sigma / \sqrt{n}) converges in distribution to Z \sim \mathcal N(0, 1).

We have previously stated the special case (\mu, \sigma) = (0, 1) and aim to prove it properly using techniques in stochastic calculus. If we have this result, we are emboldened to carry out hypothesis tests, which are commonplace in the STEM fields as well as the data-driven social sciences. We will digress to this application before we dive right back into our ascent toward the central limit theorem.

—Joel Kindiak, 23 Jul 25, 2257H

,

Published by


Leave a comment