Fréchet Derivatives

Recall that we have defined \mathbb C is defined not merely as \mathbb R^2, but a special collection of matrices that, due to linear independence, have a natural vector space isomorphism with \mathbb R^2. Roughly speaking, therefore, we call \mathbb C as a two-dimensional space of numbers.

What sets \mathbb C apart from \mathbb R^2 is that the former is a field, which allows division, while the latter does not have any natural definition for division. This limitation also explains why we can define complex-differentiability easily, but not so for differentiability on \mathbb R^2.

Nonetheless, we can use complex-differentiability to motivate differentiability in \mathbb R^2. Recall that f : \mathbb C \to \mathbb C is differentiable at z_0 when there exists some unique f'(z_0) \in \mathbb C such that

\displaystyle f'(z_0) = \lim_{w \to 0} \frac{f(z_0 + w) - f(z_0)}{w}.

We can bring the left-hand side over to the right-hand side to obtain the equation

\displaystyle \lim_{w \to 0} \frac{f(z_0 + w) - f(z_0) - f'(z_0) \cdot w}{w} = 0.

In the language of normed spaces, denote \|w \| = |w|, so that (in the context of complex numbers)

\displaystyle \lim_{w \to 0} \frac{\| f(z_0 + w) - f(z_0) - f'(z_0) \cdot w \|}{\| w \|} = 0.

Expanding this result using epsilontics: for any \epsilon > 0, there exists \delta > 0 such that whenever 0 < \|w\| < \delta implies

\| f(z_0 + w) - f(z_0) - f'(z_0) \cdot w \| < \epsilon \cdot \| w \|.

Since \mathbb C and \mathbb R^2 are topologically indistinguishable, and f : \mathbb C \to \mathbb C is related to \mathbf f : \mathbb R^2 \to \mathbb R^2 by the composition

\mathbf f = \iota^{-1} \circ f \circ \iota = \begin{bmatrix} f_1 \\ f_2 \end{bmatrix} \equiv \begin{bmatrix} u \\ v \end{bmatrix},\quad u, v : \mathbb R^2 \to \mathbb R,

both vector spaces (over \mathbb R) are complete normed spaces, and so we can use this expression as our definition for multivariable differentiability. Furthermore, we notice that f : \mathbb R^2 \to \mathbb R can be generalised to the map \mathbf f : \mathbb R^m \to \mathbb R^n. And our definitions will still apply there.

Definition 1. The map \mathbf f : \mathbb R^n \to \mathbb R^m is Fréchet-differentiable at \mathbf x_0 \in \mathbb R^n if there exists a (unique) linear transformation \mathbf f'( \mathbf x_0) : \mathbb R^n \to \mathbb R^m, called the Fréchet-derivative of \mathbf f at \mathbf x_0, such that for any \epsilon > 0, there exists \delta > 0 such that if 0 < \|\mathbf v\| < \delta, then

\displaystyle \| \mathbf f(\mathbf x_0 + \mathbf v) - \mathbf f(\mathbf x_0) - (\mathbf f'( \mathbf x_0)) (\mathbf v) \| < \epsilon \cdot \| \mathbf v \|.

Just for this post, we use boldface notation to emphasise the vector-ish nature of the objects discussed (e.g. \mathbf x_0, \mathbf v, \mathbf f(\mathbf x_0) are all real-valued vectors).

Using ideas in topology, it is not hard to establish that the map \mathbf f = (f_1,\dots f_m)^{\mathrm T} : \mathbb R^n \to \mathbb R^m is continuous if and only if each of its components f_i : \mathbb R^n \to \mathbb R is continuous. Differentiability, on the other hand, yields a more interesting result.

Lemma 1. The map \mathbf f = (f_1,\dots, f_m)^{\mathrm T} : \mathbb R^n \to \mathbb R^m is Fréchet-differentiable at \mathbf x_0 if and only if each \mathbf f_i : \mathbb R^n \to \mathbb R is differentiable at \mathbf x_0. Furthermore,

\mathbf f'( \mathbf x_0) = (f_1'( \mathbf x_0), \dots, f_n'( \mathbf x_0))^{\mathrm T}.

Proof Sketch. For any linear transformation \mathbf A = (\mathbf a_1,\dots, \mathbf a_m) : \mathbb R^n \to \mathbb R^m, apply Definition 1 to the calculation

\displaystyle \| \mathbf f(\mathbf x_0 + \mathbf v) - \mathbf f(\mathbf x_0) - \mathbf A \mathbf v \|^2 = \sum_{i=1}^n \| f_i(\mathbf x_0 + \mathbf v) - f_i(\mathbf x_0) - \langle \mathbf a_i ,\mathbf v \rangle \|^2,

where \langle \cdot, \cdot \rangle denotes the usual dot product on \mathbb R^n.

Lemma 2. If \mathbf f : \mathbb R^n \to \mathbb R^m is Fréchet-differentiable at \mathbf x_0, then \mathbf f is Lipchitz-continuous relative to \mathbf x_0.

Proof. Since \mathbf f'(\mathbf x_0) is a linear transformation, it can be represented by a matrix

\mathbf f'(\mathbf x_0) = \mathbf A := \begin{bmatrix} \mathbf a_1 & \cdots & \mathbf a_n \end{bmatrix}.

Therefore, since each |\hat v_i| \leq 1,

\displaystyle \| \mathbf f'(\mathbf x_0) (\hat{\mathbf v}) \| = \left\| \sum_{i=1}^n v_i \mathbf a_i \right\| \leq \sum_{i=1}^n |\hat v_i| \| \mathbf a_i \| \leq \sum_{i=1}^n \| \mathbf a_i \| =: M.

Therefore, \| \mathbf f'(\mathbf x_0)(\mathbf v) \| \leq M \cdot \| \mathbf v \|. By Definition 1 and the triangle inequality,

\|\mathbf f(\mathbf x_0 + \mathbf v) - \mathbf f(\mathbf x_0)\| < (\epsilon + M) \cdot \|\mathbf v\|.

Lemma 3 (Chain Rule). If \mathbf f : \mathbb R^n \to \mathbb R^m is Fréchet-differentiable at \mathbf x_0 and \mathbf g : \mathbb R^m \to \mathbb R^k is Fréchet-differentiable at \mathbf f(\mathbf x_0), then \mathbf g \circ \mathbf f : \mathbb R^n \to \mathbb R^k is Fréchet-differentiable at \mathbf x_0 and

(\mathbf g \circ \mathbf f)'(\mathbf x_0) = \mathbf g'(\mathbf f(\mathbf x_0)) \circ \mathbf f'(\mathbf x_0).

Proof. Abbreviate

(*) := \|(\mathbf g \circ \mathbf f)( \mathbf x_0 +\mathbf v ) - (\mathbf g \circ \mathbf f)(\mathbf x_0) -(\mathbf g'(\mathbf f(\mathbf x_0)) \circ \mathbf f'(\mathbf x_0))(\mathbf v)\| .

We leave it as an exercise to verify that the map \mathbf g'(\mathbf f(\mathbf x_0)) is Lipschitz-continuous on \mathbb R, so that there exists M_1 > 0 such that for any \mathbf u_1,\mathbf u_2,

(\dagger)_1 := \|(\mathbf g'(\mathbf f(\mathbf x_0))) (\mathbf u_1) - (\mathbf g'(\mathbf f(\mathbf x_0))) (\mathbf u_2)\| \leq M_1 \cdot \| \mathbf u_1 - \mathbf u_2 \|.

Furthermore, \mathbf f is Lipschitz-continuous relative to \mathbb R, so that there exists M_2 > 0 such that for any \mathbf z,

(\dagger)_2 :=\| \mathbf f(\mathbf x_0 + \mathbf z) - \mathbf f(\mathbf x_0) \| \leq M_2 \cdot \| \mathbf z \|.

Fix \epsilon > 0. Since \mathbf g is Fréchet-differentiable at \mathbf f(\mathbf x_0), for any \epsilon_1 > 0, there exists \delta_1 > 0 such that if \| \mathbf w \| < \delta_1, then

(\dagger)_3 := \|\mathbf g(\mathbf f(\mathbf x_0) +\mathbf w) - \mathbf g(\mathbf f(\mathbf x_0)) - (\mathbf g'(\mathbf f(\mathbf x_0)))(\mathbf w)\| < \epsilon_1 \cdot \|\mathbf w\|.

Since \mathbf f is Fréchet-differentiable at \mathbf x_0, for any \epsilon_2 > 0, there exists \delta_2 > 0 such that if \| \mathbf v \| < \delta_2, then

(\dagger)_4 := \|\mathbf f( \mathbf x_0 +\mathbf v) -\mathbf f(\mathbf x_0) - (\mathbf f' (\mathbf x_0))(\mathbf v)\| < \epsilon_2 \cdot \|\mathbf v\|.

Define \delta := \min\left\{ \delta_1/(\epsilon_2 + M_3), \delta_2 \right\}, fix 0 < \| \mathbf v \| < \delta, then make the following declarations:

\mathbf w = \mathbf f( \mathbf x_0 +\mathbf v) -\mathbf f(\mathbf x_0),\quad \mathbf z = \mathbf v, \quad \mathbf u_1 = \mathbf w,\quad \mathbf u_2 = \mathbf f'(\mathbf x_0).

By bookkeeping,

(*) \leq (\dagger)_3 + (\dagger)_1 < \epsilon_1 \cdot (\dagger)_2 + M_1 \cdot (\dagger)_4 < (\epsilon_1 \cdot M_2 + \epsilon_2 \cdot M_1) \cdot \|\mathbf v\|.

Setting \epsilon_1 := \epsilon/(2M_2) and \epsilon_2 := \epsilon/(2M_1), we get the desired result.

Lemma 4. If \mathbf f = (\mathbf f_1,\dots, \mathbf f_m)^{\mathrm T}: \mathbb R^n \to \mathbb R^m is Fréchet-differentiable at \mathbf x_0, then it is Gâteaux-differentiable at \mathbf x_0 in the following sense: for any nonzero \mathbf v \in \mathbb R^n, the limit

\displaystyle \lim_{t \to 0} \frac{\mathbf f(\mathbf x_0 + t \mathbf v) - \mathbf f(\mathbf x_0)}{t} =: (\partial_{\mathbf v}({\mathbf f}))(\mathbf x_0)

exists. To be absolutely precise, there exists (\partial_{\mathbf v}({\mathbf f}))(\mathbf x_0) \in \mathbb R^m such that for any \epsilon > 0, there exists \delta > 0 such that

\displaystyle 0 < | t | < \delta \quad \Rightarrow \quad \left\| \mathbf f(\mathbf x_0 + t \mathbf v) - \mathbf f(\mathbf x_0)- (\partial_{\mathbf v}({\mathbf f}))(\mathbf x_0) \cdot t \right\| < \epsilon \cdot |t|.

We call (\partial_{\mathbf v}({\mathbf f})) (\mathbf x_0) the directional derivative of \mathbf f at \mathbf x_0 with respect to \mathbf v.

Proof. Fix \mathbf v \in \mathbb R^n. Since the map \mathbf g : \mathbb R \to \mathbb R^m defined by \mathbf g(h) := \mathbf f(\mathbf x_0 + t \mathbf v) is Fréchet-differentiable by Lemma 3, (\partial_{\mathbf v}({\mathbf f}))(\mathbf x_0) = \mathbf g'(0) = (\mathbf f'(\mathbf x_0))(\mathbf v).

Definition 2. Suppose f : \mathbb R^n \to \mathbb R is Gâteaux-differentiable at \mathbf x_0 \in \mathbb R^n. Define for i = 1,\dots, n the i-th partial derivative

\displaystyle \frac{\partial f}{\partial x_i} := \partial_{\mathbf e_i}(f) : \mathbb R^n \to \mathbb R,

where \{\mathbf e_1, \dots, \mathbf e_n\} denotes the standard basis for \mathbb R^n. In the special case n \leq 3, we denote (x_1,x_2,x_3) = (x,y,z). Therefore, we define the gradient vector (\nabla f)(\mathbf x_0) \in \mathbb R^n of f at \mathbf x_0 by

\displaystyle (\nabla f)(\mathbf x_0) := \sum_{i=1}^n \mathbf e_i \frac{\partial f}{\partial x_i} (\mathbf x_0) \equiv \begin{bmatrix} \displaystyle \frac{\partial f}{\partial x_1} (\mathbf x_0) & \cdots & \displaystyle \frac{\partial f}{\partial x_n} (\mathbf x_0) \end{bmatrix}^{\mathrm T}.

Let’s go a bit crazier. First, we observe that

\displaystyle \nabla f := \sum_{i=1}^n \mathbf e_i \frac{\partial f}{\partial x_i} : \mathbb R^n \to \mathbb R^n,

so that \nabla f \in \mathcal F(\mathbb R^n, \mathbb R^n). But we could even define \nabla that takes in functions like f \in \mathcal F(\mathbb R^n, \mathbb R) and returns functions like \nabla f \in \mathcal F(\mathbb R^n, \mathbb R^n):

\displaystyle \nabla := \sum_{i=1}^n  \mathbf e_i \frac{\partial}{\partial x_i} : \mathcal F(\mathbb R^n, \mathbb R) \to \mathcal F(\mathbb R^n, \mathbb R^n).

Thus, \nabla is known as a differential operator, which has ubiquitous uses in the study of partial differential equations.

Lemma 5. If f : \mathbb R^n \to \mathbb R is Fréchet-differentiable on \mathbb R^n, then for any \mathbf x_0 \in \mathbb R^n and vector \mathbf v, \displaystyle (\partial_{\mathbf v}(f))(\mathbf x_0) = \langle (\nabla f)(\mathbf x_0) , \mathbf v\rangle.

Proof. Fix \mathbf v = \sum_{i=1}^n v_i \mathbf e_i and \mathbf x_0. Since f'(\mathbf x_0) is linear,

\begin{aligned} (\partial_{\mathbf v}(f) )(\mathbf x_0) &= (f'(\mathbf x_0)) (\mathbf v) \\ &= (f'(\mathbf x_0)) \left( \sum_{i=1}^n v_i \mathbf e_i \right) \\ &= \sum_{i=1}^n v_i (f'(\mathbf x_0)) (\mathbf e_i) \\ &= \sum_{i=1}^n v_i \frac{\partial f}{\partial x_i} (\mathbf x_0) = \langle (\nabla f)(\mathbf x_0)) ,\mathbf v \rangle. \end{aligned}

Corollary 1. Let \mathbf f : \mathbb R^n \to \mathbb R^m be Frechét-differentiable. Then for any \mathbf x_0 \in \mathbb R^n,

\mathbf f'(\mathbf x_0) = \begin{bmatrix} (\nabla f_1)(\mathbf x_0) \\ \vdots \\ (\nabla f_m) (\mathbf x_0) \end{bmatrix} = \begin{bmatrix} \displaystyle \frac{\partial f_1}{\partial x_1}(\mathbf x_0) & \cdots & \displaystyle \frac{\partial f_1}{\partial x_n}(\mathbf x_0) \\ \vdots & \ddots & \vdots \\ \displaystyle \frac{\partial f_m}{\partial x_1}(\mathbf x_0) & \cdots & \displaystyle \frac{\partial f_m}{\partial x_n}(\mathbf x_0) \end{bmatrix} =: \mathbf J_{\mathbf f}(\mathbf x_0).

We call \mathbf J_{\mathbf f} \equiv \mathbf  f' the Jacobian matrix of \mathbf f.

Proof. Combine Lemmas 1, 4, and 5.

All of that abstract machinery just to define the derivative of a multivariable function \mathbf f : \mathbb R^n \to \mathbb R^m. What was the goal? To answer the question: given a complex function f = u + i v : \mathbb C \to \mathbb C and a point z_0 \in \mathbb C, is f complex-differentiable at z_0?

The answer is yes—if and only if \mathbf f Fréchet-differentiable and its component functions u, v satisfy the Cauchy-Riemann equations.

Theorem 1. The function f : \mathbb C \to \mathbb C is complex-differentiable at z_0 = x_0 + iy_0 if and only if \mathbf f is Fréchet-differentiable at (x_0, y_0) and its partial derivatives satisfy the Cauchy-Riemann equations:

\displaystyle u_x = v_y,\quad u_y = -v_x.

Proof. For the direction (\Rightarrow), we have proven the Cauchy-Riemann equation claim, and to establish the Fréchet-differentiability claim, we simply define

\mathbf f' = \begin{bmatrix} u_x & v_x \\ -v_x & u_x \end{bmatrix} \equiv \begin{bmatrix} v_y & -u_y \\ u_y & v_y \end{bmatrix}.

The direction (\Leftarrow) is immediate with bookkeeping since

|f(x+iy)|^2 = |u(x,y) + i \cdot v(x,y)|^2 = |u(x,y)|^2 + |v(x,y)|^2 = \| \mathbf f(x,y)\|^2.

Finally, we ask a simple question: when is a function f : \mathbb R^n \to \mathbb R Fréchet-differentiable at \mathbf x_0? By Lemma 4, \mathbf f must minimally be Gâteaux-differentiable at \mathbf x_0.

Theorem 2. A Gâteaux-differentiable function f : \mathbb R^n \to \mathbb R is Fréchet-differentiable at \mathbf x_0 \in \mathbb R^n if there exists a neighbourhood U of \mathbf x_0 such that its first-order partial derivatives \displaystyle f_{x_i} are continuous on U.

Proof. We prove the result for the special case n = 2 for simplicity and leave the general case as an exercise. Firstly, find r > 0 and a neighbourhood

U:= \{\mathbf x_0 + \mathbf v : \|\mathbf v\| < r\}

of \mathbf x_0 such that all f_{x_i} are continuous on U. Fix \epsilon > 0. Since f_{x_i} is continuous at \mathbf c, for any k_i > 0, there exists \delta_i \in (0, r/2) such that if \| \mathbf v \| < \delta_i,

| f_{x_i} (\mathbf x_0 + \mathbf v) - f_{x_i} (\mathbf x_0)| < k_i \cdot \epsilon.

Define \delta := \min\{\delta_1,\delta_2\} and fix \mathbf v \in \mathbb R^2 so that 0 < \|\mathbf v \| < \delta < r/2 and hence \mathbf v \in U. Since f_{x_1} exists at \mathbf x_0 + v_2 \mathbf e_2, use the mean value theorem to find \xi_1 between 0 and v_1 such that

\displaystyle f(\mathbf x_0 + \mathbf v) - f(\mathbf x_0 + v_2 \mathbf e_2) = \underbrace{ f_{x_1}(\mathbf x_0 + \xi_1 \mathbf e_1 + v_2 \mathbf e_2) }_{(\dagger)_1} \cdot\, v_1.

Since f_{x_2} exists at \mathbf x_0, use the mean value theorem to find \xi_2 between 0 and v_2 such that

\displaystyle f(\mathbf x_0 + v_2 \mathbf e_2) - f(\mathbf x_0 ) = \underbrace{ f_{x_2}(\mathbf x_0 + \xi_2 \mathbf e_2) }_{(\dagger)_2} \cdot\, v_2.

Since |v_i| \leq \| \mathbf v\|,

\begin{aligned} & |f(\mathbf x_0 + \mathbf v) - f(\mathbf x_0 ) - \langle (\nabla \mathbf f)(\mathbf x_0)) ,\mathbf v \rangle| \\ &\leq (| (\dagger)_1 - f_{x_1}(\mathbf x_0) | + | (\dagger)_2 - f_{x_2}(\mathbf x_0) |) \cdot \| \mathbf v \| \\ &< (k_1 \cdot \epsilon + k_2 \cdot \epsilon) \cdot \| \mathbf v \| \\ &= (k_1  + k_2 )\cdot \epsilon \cdot \| \mathbf v \|. \end{aligned}

Setting k_1 = k_2 = 1/2 yields the desired result, so that f is Fréchet-differentiable with Fréchet derivative f'( \mathbf x_0) given by

(f'( \mathbf x_0))(\mathbf v) = \langle (\nabla f)(\mathbf x_0) , \mathbf v\rangle.

To generalise these ideas even further will propel us into the highly abstract discussion of functional analysis, so we shall not ascend that mountain here. We have all the tools we need for basic complex analysis, starting with holomorphic functions.

—Joel Kindiak, 11 Aug 25, 2108H

,

Published by


Responses

  1. Several Differentiation Drills – KindiakMath

    […] the language of partial derivatives, denoting […]

    Like

  2. Holomorphic Functions – KindiakMath

    […] Suppose without loss of generality that is an open square containing some and . Define the Fréchet-differentiable functions […]

    Like

Leave a comment