Banach’s Key Points

Let $K$ be a compact metric space. Previously, we have discussed how the metric space $(\mathcal C(K), d_\infty)$ of real-valued functions with domain $K$ and supremum metric $d_\infty$ forms a complete metric space. We will eventually contemplate on investigating the compactness of $\mathcal F \subseteq \mathcal C(K)$ .

For now, let’s apply these results to solve ordinary differential equations.

Definition 1. Let $y_1, \dots , y_n : \mathbb R \to \mathbb R$ be differentiable functions. Define the function $\mathbf y : \mathbb R \to \mathbb R^n$ by

$\displaystyle \mathbf y(t) := \begin{bmatrix} y_1(t) \\ \vdots \\ y_n(t) \end{bmatrix},\quad \mathbf y'(t) := \begin{bmatrix} y_1'(t) \\ \vdots \\ y_n'(t) \end{bmatrix},\quad \int_0^t \mathbf y(s)\, \mathrm ds := \begin{bmatrix} \int_0^t y_1(s)\, \mathrm ds \\ \vdots \\ \int_0^t y_n(s)\, \mathrm ds \end{bmatrix}.$

Theorem 1. Let $D = [a_0, b_0] \times \prod_{i=1}^n [a_i, b_i] \subseteq \mathbb R \times \mathbb R^n$ be a closed rectangle. Fix $(t_0, \mathbf y_0) \in \mathrm{int}(D)$ . Suppose $f : D \to \mathbb R^n$ is a map that satisfies the following properties:

For any $\mathbf y \in \prod_{i=1}^n [a_i, b_i] =: R$ , $f(\cdot, \mathbf y)$ is continuous.
There exists $L > 0$ such that for any $t \in [a_0, b_0]$ , for any $\mathbf y_1,\mathbf y_2 \in R$ , $\| f(t,\mathbf y_1) - f(t, \mathbf y_2) \|_\infty \leq L \cdot \|\mathbf y_1 - \mathbf y_2 \|_\infty$ .

Then there exists $\epsilon > 0$ and a unique function $\mathbf y = \mathbf y( \cdot )$ that satisfies the initial value problem

$\mathbf y'(t) = f(t, \mathbf y(t)),\quad \mathbf y(t_0) = \mathbf y_0$

for $t \in [t_0 - \epsilon, t_0 + \epsilon]$ .

In more layperson terms, almost every differential question that appears in undergraduate and even high school exercises has a solution.

Now to prove Theorem 1, we’re going to need to analyse our needs carefully. The first key observation is to recast the differential equation into an integral equation:

$\displaystyle \mathbf y(t) = \mathbf y_0 + \int_{t_0}^t f(s, \mathbf y(s))\, \mathrm ds.$

In doing so, we can assume without loss of generality that $\mathbf y_0 = \mathbf 0$ . If we have solved the case $\mathbf y_0 = \mathbf 0$ , then for the general case, the transformed function $\mathbf x(t) := \mathbf y(t) - \mathbf y_0$ yields the initial value problem

$\mathbf x'(t) = f(t, \mathbf y_0 + \mathbf x(t)),\quad \mathbf x(0) = \mathbf x_0.$

The transformed map $g(\cdot, \cdot) = f(\cdot, \mathbf y_0 + \cdot)$ still satisfies the hypotheses of Theorem 1, so that

$\mathbf x'(t) = g(t, \mathbf x(t)),\quad \mathbf x(0) = \mathbf x_0.$

By Theorem 1, such a unique $\mathbf x$ exists, so that by bookkeeping, such a unique $\mathbf y \equiv \mathbf y( \cdot ) = \mathbf y_0 + \mathbf x( \cdot )$ exists and solves the original initial value problem.

A similar simplification can be made if we can prove Theorem 1 in the case $t_0 = 0$ . Therefore, for simplicity, assume $(t_0, \mathbf y_0) = (0, \mathbf 0)$ without loss of generality. Let’s return to the original equation of interest:

$\displaystyle \mathbf y(t) = \int_0^t f(s, \mathbf y(s))\, \mathrm ds.$

If for any $\mathbf y : \mathbb R \to \mathbb R^n$ , we defined $F(\mathbf y)$ by setting $\displaystyle F(\mathbf y)(t) := \int_0^t f(s, \mathbf y(s))\, \mathrm ds$ for each $t$ , then we are interested in finding a unique solution to the equation $F(\mathbf y) = \mathbf y$ . In this case, $\mathbf y$ is a fixed point of $F$ . We remark that similar arguments yield that the collection $\mathcal C(K, \mathbb R^n)$ of continuous maps from $K$ to $\mathbb R^n$ also forms a complete metric space given the coordinate-wise supremum metric:

$\displaystyle d_{\infty} (\mathbf f, \mathbf g) := \sup_{1 \leq i \leq n} d_\infty(f_i, g_i).$

This raises a crucial question: given a metric space $(K, d)$ and a map $f : K \to K$ , when does $f$ have a fixed point? Furthermore, is such a fixed point unique?

Definition 2. A map $F : K \to K$ is called a contraction mapping if there exists some $\alpha \in [0, 1)$ such that for any $x,y \in K$ ,

$d(F(x), F(y)) \leq \alpha \cdot d(x, y).$

We call $\alpha$ a contraction factor of $F$ .

Lemma 1. Given the notions defined in Theorem 1 with $(t_0,\mathbf y_0) = (0,\mathbf 0)$ , there exists $\epsilon > 0$ such that the map $F : \mathcal C([-\epsilon, \epsilon], \mathbb R^n) \to \mathcal C([-\epsilon, \epsilon], \mathbb R^n)$ defined by

$\displaystyle F(\mathbf y)(t) := \int_0^t f(s, \mathbf y(s))\, \mathrm ds, \quad t \in [-\epsilon, \epsilon]$

is a contraction mapping. Here, we equip $\mathbb R^n$ with the supremum norm $\| \cdot \|_\infty$ and the collection $\mathcal C([-\epsilon, \epsilon], \mathbb R^n)$ with the supremum metric:

$\displaystyle \|\mathbf f\|_\infty := \sup_{t \in [-\epsilon, \epsilon]} \|\mathbf f(t)\|_\infty,\quad d_{\infty} (\mathbf f, \mathbf g) := \|\mathbf f - \mathbf g\|_\infty.$

Proof. Fix any $\mathbf y_1,\mathbf y_2 \in \mathcal C([-\epsilon, \epsilon], \mathbb R^n)$ . For any $t \in [-\epsilon, \epsilon]$ ,

$\begin{aligned} \| F(\mathbf y_1)(t) - F(\mathbf y_2)(t) \|_\infty &= \left\| \int_0^t f(s, \mathbf y_1(s))\, \mathrm ds - \int_0^t f(s, \mathbf y_2(s))\, \mathrm ds \right\|_\infty \\ &\leq \int_0^t \| f(s, \mathbf y_1(s))- f(s, \mathbf y_2(s)) \|_\infty\, \mathrm ds \\ &\leq L \int_0^t \| \mathbf y_1(s) -\mathbf y_2(s) \|_\infty\, \mathrm ds \\ &\leq L \int_0^t \| \mathbf y_1 -\mathbf y_2 \|_\infty\, \mathrm ds \\ &= L \cdot t \cdot \| \mathbf y_1 -\mathbf y_2 \|_\infty \\ & \leq L \cdot\epsilon \cdot d_\infty(\mathbf y_1, \mathbf y_2). \end{aligned}$

Since the bound on the right-hand side does not depend on $t$ ,

$d_\infty(F(\mathbf y_1) , F(\mathbf y_2)) \leq L \cdot \epsilon \cdot d_\infty(\mathbf y_1, \mathbf y_2).$

Hence, setting $\epsilon := L/2$ yields the desired contraction mapping.

Roughly speaking, the map $F$ brings distinct functions $\mathbf y_1,\mathbf y_2$ closer together. This suggests the converging power of contraction maps, yielding Banach’s fixed point theorem.

Theorem 2 (Banach’s Fixed Point Theorem). Let $(K, d)$ be a complete metric space and $T : K \to K$ be a contraction mapping. Then there exists some unique $x \in K$ such that $T(x) = x$ .

Proof. Fix $x_0 \in K$ and iteratively define $x_{n+1} := T(x_n)$ . Let $\alpha \in [0, 1)$ be a contraction factor for $T$ . For $n \geq 1$ ,

$d(x_{n+1}, x_n) = d(T(x_n), T(x_{n-1})) \leq \alpha \cdot d(x_n, x_{n-1}).$

By induction, $d(x_{n+1},x_n) \leq \alpha^n d(x_1,x_0)$ . Now, for any $m = n+ k > n$ , a similar argument and the geometric series yields

$\displaystyle d(x_m,x_n) = d(x_{n+k} , x_n) \leq \frac{1}{1 - \alpha} \cdot d(x_{n+1},x_n) \leq \frac{ \alpha^n }{1 - \alpha} \cdot d(x_1,x_0).$

Thus, for any $\epsilon > 0$ , choose sufficiently large $N \in \mathbb N$ so that $\alpha^N / (1-\alpha) < \epsilon$ to prove that $\{x_n\}$ is Cauchy, and thus converges to some $x \in K$ since $K$ is complete. We leave it as an exercise to verify that $T$ is continuous, so that

$\displaystyle T(x) = \lim_{n \to \infty} T(x_n) = \lim_{n \to \infty} x_{n+1} = x.$

To establish uniqueness, let $x, y$ be two fixed points. By the contraction mapping,

$d(x, y) = d(T(x), T(y)) \leq \alpha \cdot d(x,y),$

so that

$0 = 0 \cdot d(x,y) \leq (1 - \alpha)\cdot d(x,y) \leq 0.$

Since $K$ is non-degenerate, we must have $x = y$ , as required.

Now, we can solve our original differential equation question.

Proof of Theorem 1. Under the given hypotheses, by Lemma 1, there exists $r > 0$ such that the map $F : \mathcal C([-\epsilon, \epsilon], \mathbb R^n) \to \mathcal C([-\epsilon, \epsilon], \mathbb R^n)$ defined by

$\displaystyle F(\mathbf y)(t) := \int_0^t f(s, \mathbf y(s))\, \mathrm ds, \quad t \in [-\epsilon, \epsilon]$

is a contraction mapping. By Banach’s fixed point theorem, there exists some unique $\mathbf y \in \mathcal C([-\epsilon, \epsilon], \mathbb R^n)$ such that $F(\mathbf y) = \mathbf y$ . Expanding, for any $t \in [-\epsilon, \epsilon]$ ,

$\mathbf y(t) = \displaystyle F(\mathbf y)(t) := \int_0^t f(s, \mathbf y(s))\, \mathrm ds.$

Differentiating on both sides, $\mathbf y'(t) = f(t, \mathbf y(t))$ , as required.

The implications of Banach’s fixed point theorem are vast and nontrivial. Let’s first apply Banach’s fixed point theorem to solving differential equations.

Corollary 1. Let $D = [a_0, b_0] \times K \subseteq \mathbb R \times \mathbb C^n$ be a closed rectangle. Fix $(t_0, \mathbf y_0) \in \mathrm{int}(D)$ . Suppose $f : D \to \mathbb C^n$ is a map that satisfies the following properties:

For any $\mathbf y \in K$ , $f(\cdot, \mathbf y)$ is continuous.
There exists $L > 0$ such that for any $t \in [a_0, b_0]$ , for any $\mathbf y_1,\mathbf y_2 \in K$ , $\| f(t,\mathbf y_1) - f(t, \mathbf y_2) \|_\infty \leq L \cdot \|\mathbf y_1 - \mathbf y_2 \|_\infty$ .

Then there exists $\epsilon > 0$ and a unique function $\mathbf y = \mathbf y( \cdot )$ that satisfies the initial value problem

$\mathbf y'(t) = f(t, \mathbf y(t)),\quad \mathbf y(t_0) = \mathbf y_0$

for $t \in [t_0 - \epsilon, t_0 + \epsilon]$ .

Proof. Apply Theorem 1 by regarding $\mathbb C = \mathbb R^2$ so that $\mathbb C^n = \mathbb R^{2n}$ .

Corollary 2. For $1 \leq i \leq n$ , let $p_i , q : \mathbb R \to \mathbb R$ be continuous. Assuming $p_n = 1$ and given $\mathbf y_0 \in \mathbb R^n$ , there exists a unique $n$ -times differentiable function $y$ such that

$\displaystyle \sum_{i=0}^n p_i \cdot y^{(i)} = q,\quad y^{(i-1)}(t_0) = \mathbf y_0.$

Proof. Define

$\mathbf A(t) := \begin{bmatrix}0 & 1 & 0 & \cdots & 0 \\ \vdots & \ddots & 1 & \ddots & \vdots \\ \vdots & & \ddots & \ddots & 0 \\ 0 & \cdots & \cdots & & 1 \\ p_0(t) & p_1(t) & p_2(t) & \cdots & -p_{n-1}(t) \end{bmatrix},\quad \mathbf y(t) := \begin{bmatrix} y(t) \\ y'(t) \\ y''(t) \\ \vdots \\ y^{(n-1)}(t) \end{bmatrix}.$

Then we have the system of differential equations $\mathbf y' = \mathbf A \mathbf y$ , $\mathbf y(0) = \mathbf y_0$ . It is not difficult to verify that the function $f$ defined by $f(t, \mathbf y) = \mathbf A(t)\mathbf y(t)$ satisfies the required conditions for any $[t_0 - \delta, t_0+\delta]$ .

By Theorem 1, a unique solution $\mathbf y(t)$ for the initial value problem exists, so that the function $g(t) := (\mathbf y(t))_1$ solves the original initial value problem.

Corollary 3. For any $\alpha \in \mathbb C$ , the initial value problem $y' = \alpha y$ , $y(0) = 1$ has a unique solution $y \equiv y(\cdot) =: f_\alpha : \mathbb R \to \mathbb C$ .

Proof. By Theorem 1, the function $g(t, y) := \alpha y$ satisfies the required conditions, so that a unique solution $y(\cdot) = f_\alpha$ exists on $[-\epsilon, \epsilon]$ .

To obtain the general result, we extend the validity of our observations to $\mathbb R$ . To that end, consider the initial value problem $y' = y$ , $y(\epsilon) = f_\alpha(\epsilon)$ . By Theorem 1, we obtain a unique solution $f_{\alpha,\epsilon}$ on $[0, 2\epsilon]$ . Furthermore, by analysing $g$ , we can use the same $\epsilon$ to obtain solutions on $[0,2\epsilon] = [\epsilon-\epsilon, \epsilon + \epsilon]$ .

Finally, we claim that $f_\alpha|_{[0,\epsilon]} = f_{\alpha,\epsilon}|_{[0,\epsilon]}$ . Defining $g := f_\alpha - f_{\alpha,\epsilon}$ , for $t \in (0, \epsilon)$ , $g' = g$ and $g(\epsilon) = 0$ . By an application of the mean value theorem, $g \equiv 0$ , so that $f_\alpha = f_{\alpha,\epsilon}$ , as required.

Writing $\mathbb R = \bigcup_{n=1}^\infty [-n\epsilon, n \epsilon]$ and inductively extending the function yields $f$ defined on $\mathbb R$ such that $f|_{[-\epsilon, \epsilon]} = f_\alpha|_{[-\epsilon,\epsilon]}$ . Therefore, we define $f = f_\alpha$ without ambiguity.

Theorem 3. For any $\alpha \in \mathbb C$ , define $e^{\alpha} := f_\alpha(1)$ . For any $z, w \in \mathbb C$ , $e^{z+w} = e^z \cdot e^w$ . In particular, for any $z \equiv x + iy \in \mathbb C$ , $e^z = e^x \cdot e^{iy}$ .

Proof. Define the function $g := f_z \cdot f_w$ . By Corollary 3, it suffices to prove that $g' = (z+w)g$ , so that setting $t = 1$ yields

$e^{z+w} = f_{z + w}(1) = g(1) = f_z(1) \cdot f_w(1) = e^z \cdot e^w.$

To that end, employ the product rule to obtain

$g' = z f_z \cdot f_w + f_z \cdot w f_w = (z+w) g,$

as required.

Corollary 4. For any $\alpha \in \mathbb C$ and $t \in \mathbb R$ , $e^{\alpha t} = f_{\alpha}(t)$ .

Proof. Fix $\alpha \in \mathbb C$ and $t \in \mathbb R$ . Define $g : \mathbb R \to \mathbb C$ by $g(s) := f_{\alpha}(st)$ . By the chain rule,

$g'(s) = t \cdot f_{\alpha}'(st) = \alpha t \cdot f_{\alpha}(st) = \alpha t \cdot g(s).$

Therefore, $g' = \alpha t \cdot g$ . Since $g(0) = f_{\alpha t} (0) = 1$ , Corollary 3 yields $g = f_{\alpha t}$ . Setting $s = 1$ ,

$f_{\alpha}(t) = g(1) = f_{\alpha t}(1) = e^{\alpha t}.$

Remark 1. If we have previously defined $e^t$ , then the uniqueness of Corollary 3 combined with $(e^t)' = e^t$ yields $f_1(t) = f_t(1) = e^t$ . If instead we have not defined $e^t$ a priori, then Corollary 4 helps us define the exponential rigorously with our desired properties.

In fact, more is true: we can define $e^{\alpha t} := f_{\alpha}(t)$ for any complex number $\alpha$ and real number $t$ . In particular, $e^{it} = f_i(t)$ is well-defined for any $t \in \mathbb R$ , and satisfies the equation $(e^{it})' = i e^{it}$ .

Corollary 5. For any $t \in \mathbb R$ , $e^{it} = \cos(t) + i \sin(t)$ . In particular, $e^{i\pi} + 1 = 0$ .

Proof. Define $f : \mathbb R \to \mathbb C$ by $f(t) := e^{-it}(\cos(t) + i \sin(t))$ . Use basic differentiation to prove that $f' = 0$ . Coupled with $f(0) = 1$ , use the mean value theorem to obtain $f = 1$ , so that the result follows.

Definition 3. The map $\exp : \mathbb C \to \mathbb C$ defined by $\exp(z) := e^z$ is called the complex exponential. Just for fun, defining the complex trigonometric functions $\cos, \sin : \mathbb C \to \mathbb C$ by

$\displaystyle \cos(z) := \frac{e^{iz} + e^{-iz}}{2},\quad \sin(z) := \frac{e^{iz} - e^{-iz}}{2i}$

agrees with their usual functions defined on $\mathbb R$ that are characterized by Corollary 5.

The property of the complex exponential in Theorem 3 is arguably of highest importance in mathematics pure and applied, and deserves its own attention in the field of complex analysis. We will relegate that discussion elsewhere.

Corollary 6. Define the sequence $\{x_n\}$ by $x_0 := 3$ and $x_{n+1} := \sin(x_n) + x_n$ . Then $x_n \to \pi$ .

Proof. Define the map $f : [\pi - \delta, \pi + \delta] \to \mathbb R$ by $f(t) = \sin(t) + t$ for suitably chosen $\delta > 0$ . We first note that

$f(\pi) = \sin(\pi) + \pi = 0 + \pi = \pi$

yields $\pi$ being a fixed point of $f$ . Thus, we want to choose $\delta > \pi-3$ so that $3 \in [\pi-\delta, \pi+\delta]$ and $f$ is a contraction mapping.

Taking derivatives, for any $\delta \in (0, \pi/2)$ ,

$f'(t) = \cos(t) + 1 \in [0, 1],\quad t \in [\pi - \delta,\pi+ \delta].$

By the mean value theorem,

$|f(t) - \pi| = |f(t) - f(\pi)| \leq |t - \pi| \leq \delta.$

By algebra,

$\displaystyle \pi - \delta \leq f(t) \leq \pi + \delta.$

Hence, $f([\pi-\delta, \pi+\delta]) \subseteq [\pi-\delta, \pi+\delta]$ . To prove that $f$ is a contraction mapping, fix $x,y \in [\pi-\delta, \pi+\delta]$ with $x < y$ without loss of generality. We apply the mean value theorem to find $c \in (x, y)$ such that