The Lebesgue Integral

Now that we have properly constructed the Lebesgue measure \lambda on a suitably defined \sigma-algebra \mathcal F on \mathbb R that effectively calculates lengths of intervals (for instance, \lambda([a, b]) = b-a). Just like constructing the real numbers, we constructed this measure formally for purely logical consistency reasons—the fun starts when we use these constructions to solve problems.

Continuous random variables X are often modelled using continuous probability density functions f_X that define their distributions \mathbb P_X by

\displaystyle \mathbb P_X(K) := \int_K f_X(x)\, \mathrm dx \equiv \int_{\mathbb R} (f_X \cdot \mathbb I_{K})(x)\, \mathrm dx.

By construction, we require \mathbb P_X(\mathbb R) = 1, so that

\displaystyle \int_{\mathbb R} f_X(x)\, \mathrm dx = 1.

This construction suggests a need for a robust notion of integration that accounts for Lebesgue measurable sets K (i.e. K \in \mathcal F), that is even stronger than usual Riemann integration.

Recall the definition of a measurable function.

Definition 1. Let (\Omega, \mathcal F) and (\Psi, \mathcal G) be measurable spaces. We call f : \Omega \to \Psi \mathcal F/\mathcal G-measurable if f^{-1}(K) \in \mathcal F for any K \in \mathcal G, and omit the prefix when there is no ambiguity.

Unless stated otherwise, we abbreviate \mathbb R \equiv (\mathbb R, \frak{B}(\mathbb R)), where \frak{B}(\mathbb R) refers to the Borel \sigma-algebra on \mathbb R. Similarly, we equip [-\infty, \infty] with the Borel \sigma-algebra \frak{B}([-\infty, \infty]). Henceforth, let (\Omega, \mathcal F) be a measurable space.

Lemma 1. For any K \subseteq \Omega, \mathbb I_K : \Omega \to \mathbb R is measurable if and only if K is measurable.

Let’s now equip (\Omega, \mathcal F) with a measure \mu. Using this measure, we can define integration properly. Rather intuitively, for any measurable K, we should define

\displaystyle \int_{\Omega} \mathbb I_K\, \mathrm d\mu := \mu(K).

The idea is to define \int_{\mathbb R} f\, \mathrm d\mu using these indicator functions by linear extensions, and we will do so slowly.

Definition 2. A measurable function f : \Omega \to [-\infty, \infty] is simple if f(\Omega) is a finite set. Denote K_i := f^{-1}(\{ x_i \}), so that

\displaystyle f = \sum_{i=1}^n x_i \cdot \mathbb I_{K_i}.

If f is non-negative, then we define by linearity

\displaystyle \int_{\Omega} f\, \mathrm d\mu := \sum_{i=1}^n x_i \cdot \mu(\mathbb I_{K_i}).

Lemma 2. Let f , g: \Omega \to [-\infty, \infty] be simple functions. Then f+g is simple, and for any \alpha \in \mathbb R, \alpha f is also simple. Furthermore, if f,g are non-negative, so is f+g, as well as \alpha f whenever \alpha \geq 0. In these cases,

\begin{aligned} \int_{\Omega} (f+g)\, \mathrm d\mu = \int_{\Omega} f\, \mathrm d\mu + \int_{\Omega} g\, \mathrm d\mu,\quad  \int_{\Omega} \alpha f\, \mathrm d\mu = \alpha \cdot \int_{\Omega} f\, \mathrm d\mu . \end{aligned}

Proof. Write

\displaystyle f = \sum_{i=1}^n x_i \cdot \mathbb I_{K_i}, \quad g = \sum_{j=1}^m y_j \cdot \mathbb I_{L_j}.

Defining M_{i,j} := K_i \cap L_j, we observe that

\begin{aligned} \mathbb I_{K_i} &= \mathbb I_{K_i} \cdot \mathbb I_{\Omega} = \mathbb I_{K_i} \cdot \sum_{j=1}^m \mathbb I_{L_j} = \sum_{j=1}^m \mathbb I_{K_i}\mathbb I_{L_j} = \sum_{j=1}^m \mathbb I_{K_i \cap L_j} = \sum_{j=1}^m \mathbb I_{M_{i,j}}. \end{aligned}

Similarly, \mathbb I_{L_j} = \sum_{i=1}^n \mathbb I_{M_{i,j}}. Hence,

\begin{aligned} f + g &= \sum_{i=1}^n x_i \cdot \mathbb I_{K_i}+ \sum_{j=1}^m y_j \cdot \mathbb I_{L_j} \\ &= \sum_{i=1}^n x_i \cdot \sum_{j=1}^m \mathbb I_{M_{i,j}} + \sum_{j=1}^m y_j \cdot \sum_{i=1}^n \mathbb I_{M_{i,j}} \\ &= \sum_{i=1}^n \sum_{j=1}^m (x_i + y_j) \cdot \mathbb I_{M_{i,j}}. \end{aligned}

Since \{x_i + y_j : i, j\} is finite, f+g is a simple function. To compute its integral, we remark that

\displaystyle \bigsqcup_{i=1}^n M_{i,j} = \bigsqcup_{i=1}^n (K_i \cap L_j) = L_j \cap \bigsqcup_{i=1}^n K_i = L_j \cap \Omega = L_j.

Similarly, \bigsqcup_{j=1}^m M_{i,j} = K_i, so that

\displaystyle \sum_{i=1}^n \mu( M_{i,j} ) = \mu(K_i), \quad \sum_{j=1}^m \mu( M_{i,j} ) = \mu(L_j).

Hence,

\begin{aligned} \int_{\Omega} (f+g)\, \mathrm d\mu &=  \sum_{i=1}^n \sum_{j=1}^m (x_i + y_j) \mu(M_{i,j}) \\ &= \sum_{i=1}^n \sum_{j=1}^m x_i \mu(M_{i,j}) + \sum_{i=1}^n \sum_{j=1}^m y_j \mu(M_{i,j}) \\ &= \sum_{i=1}^n x_i \sum_{j=1}^m \mu(M_{i,j}) + \sum_{j=1}^m y_j \sum_{i=1}^n \mu(M_{i,j}) \\ &= \sum_{i=1}^n x_i \mu(K_i) + \sum_{j=1}^m y_j \mu(L_j) \\ &= \int_{\Omega} f\, \mathrm d\mu + \int_{\Omega} g\, \mathrm d\mu. \end{aligned}

The proof of the other result is similar, and simpler (pun intended).

Having defined (possibly infinite) integrals of non-negative simple functions, we shall extend our ideas a little bit to encompass non-negative functions.

Definition 3. Let f : \Omega \to [0, \infty] be a measurable function. Clearly, \varphi = 0 is a simple function that satisfies the inequality 0 \leq \varphi \leq f. Thus, we define

\displaystyle \int_{\Omega} f\, \mathrm d\mu := \sup \left\{ \int_{\Omega} \varphi\, \mathrm d\mu : \varphi\ \text{simple}, 0\leq \varphi \leq f \right\}.

Do we recover the same properties in Lemma 2? The answer is yes.

Lemma 3. Let f, g : \Omega \to [0, \infty] be measurable functions. Then f+g : \Omega \to [0, \infty] is measurable, and

\displaystyle \int_{\Omega} (f + g)\, \mathrm d\mu = \int_{\Omega} f\, \mathrm d\mu + \int_{\Omega} g\, \mathrm d\mu.

Likewise, for \alpha \in [0, \infty], \alpha f : \Omega \to [0, \infty] is measurable and

\displaystyle \int_{\Omega} \alpha f \, \mathrm d\mu = \alpha \int_{\Omega} f\, \mathrm d\mu .

Proof. Fix \epsilon > 0. By definition, there exists a simple function \varphi : \Omega \to [0, \infty] such that \varphi \leq f and

\displaystyle \int_{\Omega} f\, \mathrm d\mu - \epsilon \leq \int_{\Omega} \varphi\, \mathrm d\mu \leq \int_{\Omega} f\, \mathrm d\mu.

Similarly, there exists a simple function \psi : \Omega \to [0, \infty] such that \psi \leq g and

\displaystyle \int_{\Omega} g\, \mathrm d\mu - \epsilon \leq \int_{\Omega} \psi\, \mathrm d\mu \leq \int_{\Omega} g\, \mathrm d\mu.

Since \varphi + \psi : \Omega \to [0,\infty] is a simple function,

\displaystyle \left( \int_{\Omega} f\, \mathrm d\mu + \int_{\Omega} g\, \mathrm d\mu \right) - 2\epsilon \leq \int_{\Omega} (\varphi + \psi)\, \mathrm d\mu \leq \int_{\Omega} f\, \mathrm d\mu + \int_{\Omega} g\, \mathrm d\mu.

Furthermore, since \varphi + \psi \leq f + g,

\displaystyle \left( \int_{\Omega} f\, \mathrm d\mu + \int_{\Omega} g\, \mathrm d\mu \right) - 2\epsilon \leq \int_{\Omega} (f + g)\, \mathrm d\mu \leq \int_{\Omega} f\, \mathrm d\mu + \int_{\Omega} g\, \mathrm d\mu.

Taking \epsilon \to 0^+ yields the desired result.

Lemma 4. For any measurable K, define

\displaystyle \int_K f\, \mathrm d\mu := \int_{\Omega} f \cdot \mathbb I_K\, \mathrm d\mu.

Then for measurable f \geq 0 and measurable K_1, \dots, K_n, if K= \bigsqcup_{i=1}^n K_i, then

\displaystyle \int_K f\, \mathrm d\mu = \sum_{i=1}^n \int_{K_i} f\, \mathrm d\mu.

Proof. We observe that \mathbb I_K = \sum_{i=1}^n \mathbb I_{K_i}, so that Lemma 3 yields

\begin{aligned} \int_K f \, \mathrm d\mu &= \int_{\Omega} f \cdot \mathbb I_K \, \mathrm d\mu \\ &= \int_{\Omega} f \cdot \sum_{i=1}^n \mathbb I_{K_i} \, \mathrm d\mu = \int_{\Omega} \sum_{i=1}^n f \cdot \mathbb I_{K_i} \, \mathrm d \mu \\ &= \sum_{i=1}^n \int_{\Omega} f \cdot \mathbb I_{K_i} \, \mathrm d \mu = \sum_{i=1}^n \int_{K_i} f \, \mathrm d\mu. \end{aligned}

Definition 4. Let f : \Omega \to \mathbb R be measurable. Define the non-negative functions

f^+ := \max\{0, f\}, f^{-} := \max\{0, -f\} : \Omega \to [0,\infty).

We say that f is \muintegrable if \int_{\Omega} f^+\, \mathrm d\mu and \int_{\Omega} f^-\, \mathrm d\mu are finite, and define its integral by their difference:

\displaystyle \int_{\Omega} f\, \mathrm d\mu := \int_{\Omega} f^+\, \mathrm d\mu - \int_{\Omega} f^-\, \mathrm d\mu.

We omit the prefix when there is no ambiguity. We say that a function is Lebesgue-integrable if it is \lambda-integrable, where \lambda denotes the Lebesgue measure.

Using Lemma 3, it should be obvious that the integral is linear.

Theorem 1. If f, g : \Omega \to \mathbb R are integrable, then so is f+g, and

\displaystyle \int_{\Omega} ( f + g ) \, \mathrm d\mu = \int_{\Omega} f \, \mathrm d\mu + \int_{\Omega} g \, \mathrm d\mu.

Furthermore, for any \alpha \in \mathbb R,

\displaystyle \int_{\Omega} \alpha f \, \mathrm d\mu = \alpha \int_{\Omega} f\, \mathrm d\mu.

Proof. We leave the scalar multiplication case as a relatively routine exercise in case-splitting. It turns put that we will need the special case \alpha = -1 for additivity. The idea is to find a useful disjoint union \Omega = \bigsqcup_{i=1}^n K_i, so that

\begin{aligned} \int_{\Omega} (f+g) \, \mathrm d\mu &= \sum_{i=1}^n \int_{K_i} (f+g)\, \mathrm d\mu \\ &= \sum_{i=1}^n \left( \int_{K_i} f\, \mathrm d\mu + \int_{K_i} g\, \mathrm d\mu \right) \\&= \sum_{i=1}^n \int_{K_i} f\, \mathrm d\mu + \sum_{i=1}^n \int_{K_i} g\, \mathrm d\mu \\ &=  \int_{\Omega} f \, \mathrm d\mu + \int_{\Omega} g \, \mathrm d\mu.\end{aligned}

Now consider (f+g)^+ = \max\{0, f+g\} and (f+g)^- = \max\{0, -(f+g)\}. Define K_f^+ := \{ \omega \in \Omega: f(\omega) \geq 0\}. Similarly define K_f^-, K_g^+, K_g^-. By observation,

\begin{aligned} K_f^+ \cap K_g^+ \subseteq K_{f+g}^+,\quad  K_f^- \cap K_g^- \subseteq K_{f+g}^-. \end{aligned}

On the other hand for any \omega \in K_f^+ \cap K_g^-, \omega \in K_{f+g}^+ if and only if

f(\omega)+g(\omega) \geq 0 \iff |f(\omega)| = f (\omega)\geq -g(\omega) = |g(\omega)|.

Hence, define L_{\geq} := \{ \omega \in \Omega : |f(\omega)| \geq |g(\omega)| \} and L_{\leq} similarly. Then

\begin{aligned} K_{f+g}^+ &= ( \underbrace{ K_f^+ \cap K_g^+ }_{K_1} ) \sqcup (\underbrace{ K_f^+ \cap K_g^- \cap L_{\geq} }_{K_2}) \sqcup ( \underbrace{ K_f^- \cap K_g^+ \cap L_{\leq} }_{K_3} ), \\ K_{f+g}^- &= ( \underbrace{ K_f^- \cap K_g^- }_{K_4} ) \sqcup (\underbrace{ K_f^- \cap K_g^+ \cap L_{\geq} }_{K_5}) \sqcup (\underbrace{ K_f^+ \cap K_g^- \cap L_{\leq} }_{K_6}). \end{aligned}

We remark that (f+g)|_{K_2}, -g|_{K_2} are non-negative and hence

\displaystyle \begin{aligned} \int_{K_2} f\, \mathrm d\mu = \int_{K_2} ((f+g) + (-g))\, \mathrm d\mu &= \int_{K_2} (f+g) \, \mathrm d\mu + \int_{K_2} (-g) \, \mathrm d\mu, \end{aligned}

so that

\begin{aligned} \int_{K_2} (f+g) \, \mathrm d\mu &= \int_{K_2} f \, \mathrm d\mu - \int_{K_2} (-g) \, \mathrm d\mu \\ &= \int_{K_2} f \, \mathrm d\mu - (-1)\int_{K_2} g \, \mathrm d\mu  \\ &= \int_{K_2} f \, \mathrm d\mu + \int_{K_2} g \, \mathrm d\mu. \end{aligned}

This result follows similarly for K_3,K_5,K_6. Since \Omega = \bigsqcup_{i=1}^6 K_i, the result follows.

Definition 5. Let \mathbb P be a probability measure on (\Omega, \mathcal F). For any random variable X : \Omega \to \mathbb R, the expectation of X is defined by

\displaystyle \mathbb E[X] := \int_{\Omega} X\, \mathrm d\mathbb P,

whenever the integral is finite.

The immediate application of proving linearity therefore is to recover the famous additivity of expectation:

\begin{aligned} \mathbb E[X+Y] &= \int_{\Omega} (X+Y)\, \mathrm d\mathbb P \\ &= \int_{\Omega} X\, \mathrm d\mathbb P + \int_{\Omega} Y\, \mathrm d\mathbb P \\ &= \mathbb E[X] + \mathbb E[Y]. \end{aligned}

In fact, if X(\Omega) is finite, then we have already proven this result. But we need to think bigger, and explore the three crucial convergence theorems in measure theory.

—Joel Kindiak, 12 Jul 25, 1256H

,

Published by


Leave a comment