Legitimate Integral Swapping

Given two random variables X, Y, what is the distribution of X + Y? We could define the discrete random variables, and perhaps continuous ones as well. But let’s go back to measure theory and define the joint distribution (X, Y) as rigorously as possible.

Let (\Omega, \mathcal F, \mu) be a measure space (or a probability space if \mu is a probability measure). Equip \mathbb R^n is equipped with the Borel \sigma-algebra \frak{B}(\mathbb R^n), generated by open balls under the Euclidean metric.

Lemma 1. Let X, Y : \Omega \to \mathbb R be a random variable. Then (X,Y) : \Omega \to \mathbb R^2 defined by (X,Y)(\omega) := (X(\omega), Y(\omega)) is a random variable.

Proof. The map (X,Y) is continuous, and therefore, has open sets as pre-images of open sets. Therefore, (X,Y) is \mathcal F/\frak{B}(\mathbb R^2)-measurable.

What would be a reasonable measure on \mathbb R^2? Intuitively, we should have a measure \lambda_2 on \mathbb R^2 such that \lambda_2(K \times L) = \lambda_1(K) \cdot \lambda_1(L), where \lambda_1 denotes the usual Lebesgue measure that we painstakingly constructed. In fact, more generally, given measure spaces (\Omega_i, \mathcal F_i, \mu_i), we would like to define a reasonable \sigma-algebra \mathcal F on \Omega := \prod_{i=1}^n \Omega_i and a measure \mu on \mathcal F such that

\displaystyle \mu\left( \prod_{i=1}^n K_i \right) = \prod_{i=1}^n \mu_i(K_i),\quad K_i \in \mathcal F_i.

It turns out that with the help of Carathéodory’s extension theorem, this task isn’t as Sisyphean as it seems.

Theorem 1. Given measure spaces (\Omega_i, \mathcal F_i, \mu_i), there exists a \sigma-algebra \mathcal F on \Omega := \prod_{i=1}^n \Omega_i and a measure \mu on \mathcal F such that \prod_{i=1}^n K_i \in \mathcal F and

\displaystyle \mu\left( \prod_{i=1}^n K_i \right) = \prod_{i=1}^n \mu_i(K_i),\quad K_i \in \mathcal F_i.

Proof. We will prove the special case n = 2 for simplicity. Define the algebra

\mathcal F^0 := \{ K \times L : K \in \mathcal F_1, L \in \mathcal F_2\}.

For any K \in \mathcal F_1 and x \in \Omega_1, define the x-section K_x \subseteq \Omega_2 by

K_x := \{ y \in \Omega_2 : (x, y) \in K\}.

Define the y-section K^y similarly. Now given M := \bigcup_{i=1}^n K_i \times L_i \in \mathcal F^0, for any x \in \Omega_1

\displaystyle M_x = \bigcup_{i : x \in K_i} L_i \in \mathcal F_2.

Therefore, the quantity \mu_2(M_x) is well-defined. Similarly, \mu_1(M^y) is well-defined for any y \in \Omega_2. Hence, define the function f_M(x) := \mu_2(M_x), which is non-negative and simple since in the special case M = (K_1 \times L_1) \cup (K_2 \times L_2),

f_M = \mu_2(L_1 \cup L_2) \cdot \mathbb I_{K_1 \cap K_2} + \mu_2(L_1) \cdot \mathbb I_{K_1 \backslash K_2} + \mu_2(L_2) \cdot \mathbb I_{K_2 \backslash K_1}.

We can similarly define g_M(y) := \mu_1(M^y), and define

\displaystyle \mu_0(M) := \int_{\Omega_1} f_M\, \mathrm d\mu_1 = \int_{\Omega_2} g_M\, \mathrm d\mu_2.

To see this in the simplest case when M is a disjoint union (the rest follows by careful bookkeeping),

\displaystyle \int_{\Omega_1} f_M\, \mathrm d\mu_1 = \sum_{i=1}^n \mu_2(L_i) \cdot \mu_1(K_i)  = \int_{\Omega_2} g_M\, \mathrm d\mu_2.

In particular, \mu_0(K \times L) = \mu_1(K) \cdot \mu_2(L). We claim that \mu_0 is countably additive. Fix M = \bigsqcup_{i=1}^\infty M_i \in \mathcal F^0. Then for any x \in \Omega_1,

\displaystyle f_M(x) = \mu_2(M_x) = \mu_2 \left( \bigsqcup_{i=1}^\infty (M_i)_x \right) = \sum_{i=1}^\infty \mu_2((M_i)_x) = \sum_{i=1}^\infty f_{M_i}(x).

Therefore, the function \sum_{i=1}^n f_{M_i} converges monotonically to f_M, and by the monotone convergence theorem,

\displaystyle \mu_0(M) = \int_{\Omega_1} f_M\, \mathrm d\mu_1 = \sum_{i=1}^\infty \int_{\Omega_1} f_{M_i}\, \mathrm d\mu_1 = \sum_{i=1}^\infty \mu_0(M_i).

Now apply Carathéodory’s extension theorem to obtain a \sigma-algebra \mathcal F \supseteq \mathcal F^0 and a measure \mu : \mathcal F \to [0, \infty] such that \mu|_{\mathcal F^0} = \mu_0.

Theoretically, we could just start defining random variables on the product space \Omega_1 \times \Omega_2 and go on our merry way. But we still need to answer a key question: given the distributions \mathbb P_X and \mathbb P_Y, how do we compute \mathbb P_{X+Y}? In a more abstract manner, we need to integrate with respect to our newly minted measure \mu in a computationally consistent manner with integrals with respect to our old measures \mu_1,\mu_2 respectively. Surprisingly, answering this question leads us to one of the most important theorems in multivariable calculus, which is Fubini’s theorem, as it allows us to rigorously swap integrals—a key tool in any reasonable calculation.

Denote the base measure spaces by \Omega_1 \equiv (\Omega_1,\mathcal F_1,\mu_1) and \Omega_2 \equiv (\Omega_2,\mathcal F_2,\mu_2), and their product space by \Omega \equiv (\Omega, \mathcal F, \mu). By construction, \mathcal F \supseteq \mathcal F_1 \times \mathcal F_2. We remark that for any M \in \mathcal F_1 \times \mathcal F_2 and x \in \Omega_1, y \in \Omega_2, M_x \in \mathcal F_2 and M^y \in \mathcal F_1, since the \sigma-algebra

\{M \in \mathcal F : \forall x \in \Omega_1\ \forall y \in \Omega_2\ M_x \in \mathcal F_1 \wedge M^y \in \mathcal F_2\}

contains \mathcal F_1 \times \mathcal F_2.

Now we observe that \mathbb R = \bigcup_{n \in \mathbb Z} [n, n+1) and each [n, n+1) has Lebesgue measure 1.

Definition 1. A measure space (\Omega, \mathcal F, \mu) is \sigma-finite if there exist K_1,K_2\dots with \mu(K_i) < \infty such that \Omega = \bigcup_{i=1}^\infty K_i. For instance, \mathbb R^n is \sigma-finite.

Lemma 2. Suppose \Omega_1,\Omega_2 are \sigma-finite. For any M \in \mathcal F_1 \times \mathcal F_2, the non-negative functions f_M(x) := \mu_2(M_x) and g_M(y) := \mu_1(M^y) are measurable, and define the predicate \phi by

\displaystyle \phi(M) := \left( \int_{\Omega_1} f_M\, \mathrm d\mu_1 = \mu(M) = \int_{\Omega_2} g_M\, \mathrm d\mu_2 \right).

Then \phi(M) holds for any M \in \mathcal F_1 \times \mathcal F_2. Note that this result is an extension from that of Theorem 1.

Proof. We first prove the case that \mu_1(\Omega_1) < \infty and \mu_2(\Omega_2) < \infty. It is straightforward that \phi(M) holds if M \in \mathcal F^0 or even a disjoint union of sets in \mathcal F^0. If \phi(M_1) and \phi(M_2), then \phi(M_1 \backslash M_2). Finally, if M_1 \subseteq M_2 \subseteq \dots and \phi(M_i), then defining M = \bigcup_{i=1}^\infty M_i, f_M = \lim_{n \to\infty} f_{M_n} is measurable, and by the monotone convergence theorem,

\displaystyle \int_{\Omega_1} f_M\, \mathrm d\mu_1 = \lim_{n \to \infty} \int_{\Omega_1} f_{M_n}\, \mathrm d\mu_1 = \lim_{n \to \infty} \mu(M_n) = \mu(M).

Therefore, \phi(M). Let \mathcal F' \supseteq \mathcal F_0 denote the smallest subset of \mathcal F such that these two properties are satisfied. We can verify that \mathcal F' is a \sigma-algebra, and hence contains \mathcal F_1 \times \mathcal F_2, as required.

We now generalise to the \sigma-finite case. Suppose K_1 \subseteq K_2 \subseteq \dots such that \bigcup_{i=1}^\infty K_i = \Omega_1, and \bigcup_{i=1}^\infty L_i = \Omega_2 similarly. For each I, define M_i := M \cap (K_i \times L_i) so that M = \bigcup_{i=1}^\infty M_i. The result follows by the monotone convergence theorem.

Lemma 3. For any map f : \Omega_1 \times \Omega_2 \to [0, \infty], all of its sections

f_x := f(x, \cdot) : \Omega_2 \to [0, \infty],\quad f^y := f( \cdot, y) : \Omega_1 \to [0, \infty]

are measurable.

Proof. Apply Lemma 2 to the result f_x^{-1}([a, \infty]) = (f^{-1}([a, \infty]))_x \in \mathcal F_2.

We can now discuss the Fubini-Tonelli theorem. The Fubini theorem is the special case when all integrals therein are finite. Here, a function f : \Omega \to [-\infty, \infty] is integrable if K := f^{-1}(\{-\infty, \infty\}) has measure zero and f \cdot \mathbb I_{\Omega \backslash K} is integrable.

Theorem 2 (Fubini-Tonelli Theorem). Suppose \Omega_1,\Omega_2 are \sigma-finite. If f:\Omega_1 \times \Omega_2 \to [-\infty, \infty] is either non-negative and measurable (resp. integrable), then the functions g, h defined by

\displaystyle g(x) := \int_{\Omega_2} f(x, \cdot)\, \mathrm d\mu_2,\quad h(y) := \int_{\Omega_1} f(\cdot, y)\, \mathrm d\mu_1

are measurable (resp. integrable) and

\displaystyle \int_{\Omega_1} \int_{\Omega_2} f\, \mathrm d\mu_2\, \mathrm d\mu_1 = \int_{\Omega_2} \int_{\Omega_1} f\, \mathrm d\mu_1\, \mathrm d\mu_2 = \int_{\Omega} f\, \mathrm d\mu.

Proof. We return to the usual simple \Rightarrow non-negative measurable \Rightarrow integrable strategy. If f = \mathbb I_M, then we obtain this result by Lemma 2. The result extends by linearity to non-negative simple functions.

If f is non-negative, find a sequence of non-negative simple functions \Phi_n that monotonically converge to f. By the monotone convergence theorem,

\displaystyle \int_{\Omega} f\, \mathrm d\mu = \lim_{n \to \infty} \int_{\Omega} \Phi_n\, \mathrm d\mu.

For each n, define \varphi_n by setting for each x \in \Omega_1, \varphi_n(x) := \int_{\Omega_2} \Phi_n(x,\cdot)\, \mathrm d\mu_2. Then \varphi_n monotonically increases to g. By the monotone convergence theorem again, since \Phi_n are all step functions,

\displaystyle \int_{\Omega_1} g\, \mathrm d\mu_1 = \lim_{n \to \infty} \int_{\Omega_1} \varphi_n\, \mathrm d \mu_1 = \lim_{n \to \infty} \int_{\Omega} \Phi_n\, \mathrm d\mu = \int_{\Omega} f\, \mathrm d\mu.

Finally, in the case f is integrable, write f = f^+ - f^- and perform needful bookkeeping.

As much as we feel somewhat justified to add distributions in general, there is one more measure-theoretic machinery we need to discuss—the technical density function known as the Radon-Nikodým derivative. In doing so, we can be justified in letting f_X denote the density function for any sufficiently nice random variable X.

—Joel Kindiak, 21 Jul 25, 2313H

,

Published by


Leave a comment