Integrable Sanity Checks

Having developed much technology for Lebesgue integration, let’s do a quick sanity check that we do, in fact, recover the usual Riemann integral.

Theorem 1. Let f : \mathbb R \to \mathbb R be a bounded measurable function. If f|_{[a, b]} is Riemann-integrable, then f|_{[a, b]} is Lebesgue-integrable, and

\displaystyle \int_{[a, b]} f\, \mathrm d\lambda = \int_a^b f \equiv \int_a^b f(x)\, \mathrm dx

where the integral on the left-hand side denotes the Lebesgue integral (here \lambda denotes the Lebesgue measure), and the integral on the right-hand side denotes the usual Riemann integral.

Proof. Suppose f \geq 0 for simplicity. We first note that all step functions \sum_{i=1}^n a_i \cdot \mathbb I_{[x_{i-1},x_i)} are simple functions. By the definition of the lower integral \mathcal L_a^b(f) and upper integral \mathcal R_a^b(f),

\begin{aligned} \mathcal L_a^b(f) &= \sup_P \sum_{i=1}^n m_i(f, P) \Delta x_i \\ &= \sup_P \int_{[a,b]} m_i(f, P) \cdot \mathbb I_{[x_{i-1},x_i)} \\ &\leq \sup_{\substack{\varphi\ \text{simple} \\ 0 \leq \varphi \leq f \cdot \mathbb I_{[a, b]}} } \int_{\mathbb R} \varphi \, \mathrm d\lambda \\ &\leq \inf_P \sum_{i=1}^n M_i(f, P) \Delta x_i \leq \mathcal R_a^b(f). \end{aligned}

Since f is Riemann-integrable, both left-hand side and right-hand side equal to \int_a^b f, so that

\displaystyle \int_a^b f\, \mathrm d\lambda = \int_{\mathbb R} f \cdot \mathbb I_{[a, b]}\, \mathrm d\lambda = \sup_{\substack{\varphi\ \text{simple} \\ 0 \leq \varphi \leq f \cdot \mathbb I_{[a, b]}} } \int_{\mathbb R} \varphi \, \mathrm d\lambda = \int_a^b f.

Since the right-hand side is finite, so is the left-hand side, so that f is Lebesgue-integrable.

For the general case, define -m := \inf_{x \in [a, b]} f(x) < 0. Now (f + m)|_{[a,b]} is bounded, measurable, and Riemann-integrable. Since it is nonnegative, by the first result, it is Lebesgue-integrable. Therefore, f = (f+m) -m when restricted to [a, b] is also Lebesgue-integrable, and

\begin{aligned} \int_{[a, b]} f\, \mathrm d\lambda &= \int_{[a, b]} ((f + m) - m)\, \mathrm d\lambda \\ &= \int_{[a, b]} (f + m)\, \mathrm d\lambda -  \int_{[a, b]} m\, \mathrm d\lambda\\ &= \int_a^b f + \int_a^b m -\int_a^b m  = \int_a^b f.\end{aligned}

Theorem 1 therefore tells us that the Lebesgue integral generalises the Riemann integrable, at least when f is a bounded function. If f \geq 0 and measurable (but not necessarily integrable), more is true.

Let (\Omega, \mathcal F, \mu) be a measure space.

Lemma 1. Let f : \Omega \to [0, \infty] be measurable. The map \nu : \mathcal F \to [0, \infty] defined by

\displaystyle \nu(K) := \int_K f\, \mathrm d\mu

is a measure. Furthermore, \lambda(K) = 0 implies that \nu(K) = 0.

Proof. For the empty set condition,

\displaystyle \nu(\emptyset) = \int_{\emptyset}f\, \mathrm d\lambda = \int_{\Omega} f \cdot \mathbb I_\emptyset\, \mathrm d\lambda  = \int_{\Omega} 0\, \mathrm d\lambda = 0.

For the countable additivity condition, fix a pairwise-disjoint sequence \{K_i\} \subseteq \frak{B}(\mathbb R). Define K := \bigsqcup_{i=1}^\infty K_i. The sequence f \cdot \sum_{i=1}^n \mathbb I_{K_i} of measurable functions monotonically increases to the measurable function f \cdot \mathbb I_K. By the monotone convergence theorem,

\begin{aligned} \nu\left(\bigsqcup_{i=1}^\infty K_i \right) &= \nu(K) = \int_K f\, \mathrm d\lambda = \int_{\Omega} f \cdot \mathbb I_K \, \mathrm d\lambda \\ &= \int_{\Omega} \lim_{n \to \infty} f \cdot \sum_{i=1}^n \mathbb I_{K_i}  \, \mathrm d\lambda = \lim_{n \to \infty} \int_{\Omega} f \cdot \sum_{i=1}^n \mathbb I_{K_i} \, \mathrm d\lambda \\ &= \lim_{n \to \infty} \sum_{i=1}^n \int_{\Omega} f \cdot \mathbb I_{K_i} \, \mathrm d\lambda = \lim_{n \to \infty} \sum_{i=1}^n \int_{K_i} f  \, \mathrm d\lambda \\ &= \lim_{n \to \infty} \sum_{i=1}^n \nu(K_i) = \sum_{i=1}^\infty \nu(K_i). \end{aligned}

Therefore, \nu is countably additive. Finally, suppose \lambda (K) = 0. Fix any simple function 0 \leq \varphi \leq f, \varphi = \sum_{i=1}^n a_i \cdot \mathbb I_{K_i}, where K = \bigsqcup_{i=1}^n K_i \supseteq K_i. By the monotonicity of \lambda, 0 \leq \lambda(K_i) \leq \lambda(K) = 0. Hence,

\displaystyle \int_{K} \varphi\, \mathrm d\lambda = \sum_{i=1}^n a_i \cdot \lambda(K_i) = \sum_{i=1}^n a_i \cdot 0 = 0.

Therefore,

\displaystyle \nu(K) = \int_K f\, \mathrm d\lambda = \sup_{\varphi} \int_K \varphi\, \mathrm d\lambda = \sup_{\varphi} 0 = 0,

as required.

Lemma 2. Let f : \Omega \to \mathbb R be a non-negative measurable function. Then \int_{\Omega} f\, \mathrm d\mu = 0 if and only if there exists some K \in \mathcal F with \mu(K) = 0 such that f|_{\Omega \backslash K} = 0. In this case, we say that f=0 \mualmost everywhere (abbreviated: \mu-a.e.).

Proof. For the direction (\Leftarrow), the proof in Lemma 1 yields

\begin{aligned} \int_{\Omega} f\, \mathrm d\mu = \int_{\Omega\backslash K} f\, \mathrm d\mu + \int_{K} f\, \mathrm d\mu = \int_{\Omega\backslash K} f\, \mathrm d\mu + 0 = \int_{\Omega\backslash K} f\, \mathrm d\mu. \end{aligned}

Hence,

\displaystyle \int_{ \Omega } f\, \mathrm d\mu = \int_{ \Omega \backslash K } f\, \mathrm d\mu = \int_{ \Omega \backslash K } g\, \mathrm d \mu = \int_{ \Omega } g\, \mathrm d\mu.

For the direction (\Rightarrow), we will prove by contrapositive. Fix K \in \mathcal F with \mu(K) = 0 and f|_{\Omega \backslash K} > 0. Then

\displaystyle \int_{\Omega} f\, \mathrm d\mu = \int_{K} f\, \mathrm d\mu + \int_{\Omega \backslash K} f\, \mathrm d\mu \geq \int_{\Omega \backslash K}f\, \mathrm d\mu > 0.

Lemma 3. Let f, g : \Omega \to \mathbb R be integrable functions. Then \int_{K} f\, \mathrm d\mu = \int_{K} g\, \mathrm d\mu for any K \in \mathcal F if and only if there exists some L \in \mathcal F with \mu(L) = 0 such that f|_{\Omega \backslash L} = g|_{\Omega \backslash L}. In this case, we say that f=g \mua.e..

Proof. By linearity of the Lebesgue integral, it suffices to prove the case g = 0. For the direction (\Leftarrow), for any K \in \mathcal F,

\displaystyle \int_K f\, \mathrm d\mu = \int_{K \backslash L} f\, \mathrm d\mu = \int_{K \backslash L} 0\, \mathrm d\mu = 0.

For the direction (\Rightarrow), define \Omega^+ := f^{-1}(\mathbb R^+) and \Omega^- := f^{-1}(\mathbb R^-). Then f^+, f^- are nonnegative functions and by Lemma 2,

\displaystyle \int_{\Omega} f^+\, \mathrm d\mu = \int_{\Omega^+} f\, \mathrm d\mu = 0 \quad \Rightarrow \quad f^+ = 0\quad \mu\ \text{a.e.}.

Thus, there exists L^+ \in \mathcal F with \mu(L^+) = 0 such that f^+|_{\Omega \backslash L^+} = 0. Similarly, f^- = 0 \mu-a.e., and there exists L^- \in \mathcal F with \mu(L^-) = 0 such that f^+|_{\Omega \backslash L^+} = 0. Observe that

\begin{aligned} \mu(L^+ \cup L^-) &= \mu(L^+ \sqcup L^- \backslash L^+) \\ &= \mu(L^+) + \mu(L^- \backslash L^+) \\ &\leq \mu(L^+) + \mu(L^-) = 0 + 0 = 0. \end{aligned}

Hence, writing f = f^+ - f^-,

\begin{aligned} f|_{\Omega \backslash (L^+ \cup L^-)} &= (f^+ - f^-)|_{\Omega \backslash (L^+ \cup L^-)} \\ &= f^+|_{\Omega \backslash (L^+ \cup L^-)} -  f^-|_{\Omega \backslash (L^+ \cup L^-)} \\ &= 0 - 0 = 0.\end{aligned}

Therefore, f = 0 \mu-a.e..

Lemma 4. For any non-negative measurable function f : \Omega \to [0, \infty], there exists a sequence \{f_n\} of simple functions f_n : \Omega \to [0, \infty] such that f_n \to f monotonically.

Proof. For each \mathbb N, define for k = 1,2,\dots,2^{2n}

\displaystyle I_{n,k} := 2^{-n} \cdot [k-1, k).

Furthermore, define I_{n,2^n+1} := [2^n, \infty). Define the non-negative simple functions f_n : \Omega \to [0, \infty] by

\displaystyle f_n := \sum_{k=1}^{2^{2n}+1} \frac{k-1}{2^n} \cdot \mathbb I_{f^{-1}(I_{n,k})}.

Here are the functions for n=0,1 for illustration:

\begin{aligned} f_0 := \sum_{k=1}^2 (k-1) \cdot \mathbb I_{f^{-1}(I_{0,k})} &= \mathbb I_{f^{-1}([1, 2))}, \\ f_1 := \sum_{k=1}^5 \frac{k-1}{2} \cdot \mathbb I_{f^{-1}(I_{1,k})} &=  \frac{1}{2} \cdot \mathbb I_{f^{-1}([1/2, 1))} + \mathbb I_{f^{-1}([1, 3/2))} \\ &\phantom{--} + \frac{3}{2} \cdot \mathbb I_{f^{-1}([3/2, 2))} + 2 \cdot \mathbb I_{f^{-1}([2, 5/2))}. \end{aligned}

Each f_n is made up of 2^{2^n} pieces. By performing necessary real-analysis calculations, it’s not hard to verify that f_n \to f monotonically as n \to \infty.

Theorem 2 (Change-of-Variables). Let (\Psi, \mathcal G) be a measurable space, and f : \Omega \to \Psi be a measurable function that induces the pushforward measure \nu \equiv \mu(f^{-1}(\cdot)) on \mathcal G. Then for any integrable g : \Psi \to \mathbb R,

\displaystyle \int_{\Omega} g \circ f\, \mathrm d\mu = \int_{\Psi} g\, \mathrm d\nu .

Proof. We first prove the result for measurable g : \Psi \to [0, \infty]. Firstly, suppose g = \sum_{i=1}^n a_i \cdot \mathbb I_{K_i} is a simple function. We observe that

(g \circ f)(\omega) = a_i \quad \iff \quad f(\omega) \in K_i \quad \iff \quad \omega \in f^{-1}(K_i).

Therefore, g \circ f = \sum_{I=i}^n a_i \cdot \mathbb I_{f^{-1}(K_i)}, so that

\displaystyle \int_{\Omega} g \circ f\, \mathrm d\mu= \sum_{i=1}^n a_i \cdot \mu(f^{-1}(K_i)) = \sum_{i=1}^n a_i \cdot \nu(K_i) = \int_{\Psi} g\, \mathrm d\nu.

Now suppose g is nonnegative and measurable. Use Lemma 4 to find a sequence \{g_n\} of non-negative simple functions g_n \leq g that monotonically converge to g. It is obvious that g_n \circ f \to g \circ f monotonically as well. By the monotone convergence theorem,

\displaystyle \int_{\Omega} g \circ f\, \mathrm d\mu = \lim_{n \to \infty} \int_{\Omega} g_n \circ f\, \mathrm d\mu = \lim_{n \to \infty} \int_{\Psi} g_n\, \mathrm d\mu = \int_{\Psi} g\, \mathrm d\nu .

Finally suppose g : \Psi \to \mathbb R is integrable. Write g = g^+ - g^-. Applying relevant linearity properties,

\begin{aligned}\int_{\Omega} g \circ f\, \mathrm d\mu &= \int_{\Omega} (g^+ - g^-) \circ f\, \mathrm d\mu \\ &= \int_{\Omega} (g^+ \circ f - g^- \circ f)\, \mathrm d\mu \\ &= \int_{\Omega} g^+ \circ f\, \mathrm d\mu - \int_{\Omega} g^- \circ f\, \mathrm d\mu \\ &=  \int_{\Psi} g^+\, \mathrm d\nu - \int_{\Psi} g^-\, \mathrm d\nu \\ &=  \int_{\Psi} (g^+\, - g^-)\, \mathrm d\nu = \int_{\Psi} g\, \mathrm d \nu.\end{aligned}

Let \Psi \equiv (\Psi, \mathcal G, \nu) denote either of the measure spaces \mathbb Z \equiv (\mathbb Z, \mathcal P(\mathbb Z), |\cdot|) or \mathbb R \equiv (\mathbb R, \frak B(\mathbb R), \lambda).

Definition 2. Let X : \Omega \to \Psi be a measurable map. Suppose there exists a non-negative integrable function f_X : \Psi \to \mathbb R such that its distribution \mathbb P_X satisfies

\displaystyle \mathbb P_X(K) = \int_K f_X\, \mathrm d \nu,\quad K \in \mathcal F.

If \Psi = \mathbb Z, then we call X a discrete random variable so that f_X = \sum_{i \in \mathbb Z} f_X(i) \cdot \mathbb I_{\{ i \}} and for any K \subseteq \mathbb Z,

\begin{aligned} \mathbb P_X( K ) &= \int_{ K } f_X\, \mathrm d |\cdot| \\ &= \int_{ \mathbb Z} f_X \cdot \mathbb I_{K} \, \mathrm d |\cdot| \\ &= \sum_{x \in K} f_X(x) \cdot |\{x \}| = \sum_{x \in K} f_X(x). \end{aligned}

If \Psi = \mathbb R, then we call X a continuous random variable, and for any a \leq b,

\displaystyle \mathbb P_X([a, b]) = \int_a^b f_X \equiv \int_a^b f_X(x)\, \mathrm dx.

A random variable X : \Omega \to \mathbb R is said to be continuous if there exists a non-negative integrable function f_X : \mathbb R \to \mathbb R such that its distribution \mathbb P_X satisfies

\displaystyle \mathbb P_X(K) = \int_K f_X\, \mathrm d \lambda,\quad K \in \mathcal F \quad \Rightarrow \quad \mathbb P_X([a, b]) = \int_a^b f_X.

In these special cases, we call f_X the probability density function of X, which is \lambda-almost everywhere unique by Lemma 3, and define the cumulative distribution function by F_X(x) := \mathbb P_X((-\infty, x]).

We see therefore the unification that measure theory offers our study of probability theory—in fact we can just take these results for granted when eventually talking about (absolutely) continuous random variables.

Theorem 3. Suppose \mu = \mathbb P is a probability measure. Then for any continuous random variable X : \Omega \to \Psi and continuous g : \Psi \to \mathbb R,

\displaystyle \mathbb E[g(X)] = \int_{\Omega} g \circ X\, \mathrm d \mathbb P = \int_{\Psi} g\, \mathrm d \mathbb P_X = \int_{\Psi} g \cdot f_X\, \mathrm d\nu.

Proof. We first claim that for any \mathbb P_X-integrable g : \Psi \to \mathbb R and K \in \mathcal F,

\begin{aligned} \int_{\Psi} g\, \mathrm d \mathbb P_X &= \int_{\Psi} g \cdot f_X\, \mathrm d\nu. \end{aligned}

It suffices to prove the case when g = \sum_{i=1}^n a_i \cdot \mathbb I_{K_i} : \Psi \to [0, \infty] is simple, and the rest follows by the monotone convergence theorem (for non-negative g) and linearity arguments on the decomposition g = g^+ - g^- (for integrable g). To that end,

\begin{aligned} \int_{\Psi} g\, \mathrm d \mathbb P_X &= \sum_{i=1}^n a_i \cdot \mathbb P_X(K_i) \\ &= \sum_{i=1}^n a_i \cdot \int_{K_i} f_X \, \mathrm d\nu \\ &= \sum_{i=1}^n a_i \cdot \int_{\Psi} f_X \cdot \mathbb I_{K_i} \, \mathrm d\nu \\ &= \int_{\Psi} \sum_{i=1}^n a_i \cdot \mathbb I_{K_i}  \cdot f_X \cdot \mathrm d\nu = \int_{\Psi} g  \cdot f_X \, \mathrm d\nu. \end{aligned}

The result is then obvious.

Theorem 3 helps us recover the usual formula for expectation when we particularise X to discrete or continuous random variables in the spirit of Definition 2:

  • for discrete X, \mathbb E[g(X)] = \sum_{x \in \mathbb Z} g(x) f_X(x),
  • for continuous X, \mathbb E[g(X)] = \int_{-\infty}^{\infty} g(x) f_X(x)\, \mathrm dx.

Furthermore, by using (\Psi, \mathcal G) rather than their specific implementations, our arguments remain valid when we particularise to higher-dimensional spaces like (\mathbb R^n, \frak{B}(\mathbb R^n), \lambda) so that we can discuss the distributions of combinations of random variables like g(X,Y) or even more specifically X + Y.

We turn to cumulative distribution functions and continuous random variables next time.

—Joel Kindiak, 13 Jul 25, 2349H

,

Published by


Leave a comment