The Cayley-Hamilton Theorem

Let $V$ be a vector space over a field $\mathbb K$ and $T : V \to V$ be a linear transformation.

Lemma 1. Suppose $V$ is finite-dimensional with ordered basis $K$ . Then $T$ is similar to $[T]_K$ . In particular, we define $c_T(x) := c_{[T]_K}(x)$ .

Proof. Define the isomorphism $R : V \to \mathbb K^{|K|}$ by $R(\mathbf v) = [\mathbf v]_K$ . Then we leave it as an exercise to check that $T = R^{-1} \circ [T]_K \circ R$ .

The transformation $T$ being diagonalisable is a special property, and it satisfies the rather elegant result below.

Definition 1. For any polynomial $\displaystyle p \in \mathbb K[x]$ defined by $\displaystyle p(x) = \sum_{i=0}^k a_i x^i$ , define the linear transformation $p(T) : V \to V$ by

$\displaystyle p(T) := \sum_{i=0}^k a_i T^i.$

Since $T^i T^j = T^j T^i$ , we have $p(T) \circ q(T) = q(T) \circ p(T)$ for polynomials $p,q \in \mathbb K[x]$ .

Theorem 1. Define the zero transformation $O_V : V \to V$ by $O_V(\mathbf v) = \mathbf 0$ for any $\mathbf v$ . If $V$ is finite-dimensional and $T$ is diagonalisable, then $c_T(T) = O_V$ .

Proof. Let $\{\mathbf v_1,\dots\mathbf v_n\}$ be a basis of eigenvectors of $T$ for $V$ , and $\lambda_i$ denote the eigenvalue corresponding to $\mathbf v_i$ . Then each $(x - \lambda_i)$ is a factor of $c_T$ . Hence, for each $\lambda_i$ , there exists some $q_i \in \mathbb K[x]$ such that

$c_T(x) = q_i(x) (\lambda_i - x).$

In particular,

$c_T(T) = q_i(T) \circ (\lambda_i I_V - T).$

Thus, for any $\mathbf v_i$ ,

$\begin{aligned} c_T(T)(\mathbf v_i) &= q_i(T) ((\lambda_i I_V - T)(\mathbf v_i)) \\ &= q_i(T) (\lambda_i \mathbf v_i - T(\mathbf v_i)) \\ &= q_i(T) (\lambda_i \mathbf v_i - \lambda_i \mathbf v_i) \\ &= q_i(T)(\mathbf 0) = \mathbf 0. \end{aligned}$

Now, for any $\mathbf v \in V$ , find unique constants $a_i \in \mathbb K$ such that

$\displaystyle \mathbf v = \sum_{i=1}^n a_i \mathbf v_i.$

By the linearity of $c_T(T)$ ,

$\displaystyle c_T(T)(\mathbf v) = \sum_{i=1}^n a_i\cdot c_T(T)(\mathbf v_i) = \sum_{i=1}^n a_i \cdot \mathbf 0 = \mathbf 0.$

Corollary 1. Suppose $V$ is finite-dimensional. Let $\displaystyle c_T(x) = \sum_{i=0}^n a_i x^i$ . If $T$ is invertible and $c_T(T) = 0$ , then

$\displaystyle T^{-1} = - a_0^{-1} \sum_{i=0}^{n-1} a_i T^i.$

Proof. Since $T$ is invertible, $0$ is not an eigenvalue of $T$ , so that $a_0 = c_T(0) \neq 0$ :

$\displaystyle O_V = c_T(T) = a_0 + T \circ \sum_{i=0}^{n-1} a_i T^i.$

Therefore, $\displaystyle T^{-1} = - a_0^{-1} \sum_{i=0}^{n-1} a_i T^i$ .

Corollary 2. Suppose the characteristic equation of $\mathbf A : \mathbb K^n \to \mathbb K^n$ is $\displaystyle c_{\mathbf A}(x) = \sum_{i=0}^n a_i x^i$ . If $\mathbf A$ is diagonalisable and invertible, then there exists an invertible matrix $\mathbf P$ and a diagonal matrix $\mathbf D$ such that

$\displaystyle \mathbf A^{-1} = - a_0^{-1} \mathbf P \left( \sum_{i=0}^{n-1} a_i \mathbf D^i \right) \mathbf P^{-1}.$

It turns out that we don’t actually require $T$ to be diagonalisable in Theorem 1. In fact, $T$ just needs to be a linear transformation from $V$ to $V$ , and $V$ must be finite-dimensional, so that $c_T(T) = O_V$ . This is the Cayley-Hamilton theorem which we aim to prove in this post.

Definition 2. We call a subspace $W \subseteq V$ a $T$ –invariant subspace of $V$ if $T(W) \subseteq W$ . Equivalently, $T|_W : W \to W$ is well-defined.

Example 1. For any nonzero $\mathbf v \in V$ , the $T$ –cyclic subspace generated by $\mathbf v$ defined by $\langle \mathbf v \rangle := \mathrm{span}\{T^n(\mathbf v) : n \in \mathbb N_0\}$ is $T$ -invariant. In particular, if $\lambda$ be an eigenvalue of $T$ , then $E_{\lambda}$ is $T$ -invariant.

Example 2. Define the differentiation operator by $D : C^\infty \to C^\infty$ , $D(f) = f'$ . For any $n$ , $\mathbb K_n[x] \subseteq C^\infty$ is $D$ -invariant.

In what follows, let $V$ be finite-dimensional with ordered basis $K = (\mathbf v_1,\dots,\mathbf v_n)$ .

Definition 3. Let $T : V \to W$ be a linear transformation. Suppose $W$ is a finite-dimensional vector space with ordered basis $L := (\mathbf w_1,\dots,\mathbf w_m)$ respectively. For each $\mathbf w \in W$ , define

$\displaystyle [\mathbf w]_L = \begin{bmatrix} c_1 \\ \vdots \\ c_m \end{bmatrix},\quad \mathbf w = \sum_{i=1}^m c_i \mathbf w_i.$

Then define $[T]_{K, L} := \begin{bmatrix} [T(\mathbf v_1)]_L & \cdots & [T(\mathbf v_n)]_L \end{bmatrix}$ . If $T: V \to V$ then define $[T]_K := [T]_{K,K}$ .

Theorem 2. Suppose $W \subseteq V$ is $T$ -invariant. Then $c_{ T|_W}$ is a factor of $c_{T}$ .

Proof. Let $K:= \{\mathbf w_1,\dots,\mathbf w_k\}$ be a basis for $W$ and extend it to a basis $L := K \cup \{\mathbf u_1,\dots,\mathbf u_{n-k}\}$ for $V$ . Then there exist matrices $\mathbf B,\mathbf C$ of appropriate sizes such that

$\displaystyle \begin{aligned} c_{[T]_L}(x) &= |x\mathbf I_n - [T]_L| \\ &= \left| \begin{matrix} x\mathbf I_k - [T|_W]_K & \mathbf B \\ \mathbf 0 & x\mathbf I_{n-k} - \mathbf C \end{matrix} \right| \\ &= |x\mathbf I_k - [T|_W]_K| |\mathbf C| \\ &= |x\mathbf I_k - [T|_W]_K| |x\mathbf I_{n-k} - \mathbf C| \\ &= c_{[T|_W]_K}(x) |x\mathbf I_{n-k} - \mathbf C| \end{aligned}$

Therefore, $c_{T|_W} = c_{[T|_W]_K}$ is a factor of $c_T = c_{[T]_L}$ .

Lemma 2. For any nonzero $\mathbf v \in V$ , there exists some unique integer $1 \leq k \leq n$ such that $\{\mathbf v, T(\mathbf v),\dots,T^{k-1}(\mathbf v)\}$ is a basis for $\langle \mathbf v \rangle$ . Furthermore,

$\displaystyle T^k(\mathbf v) = \sum_{i=0}^{k-1} a_{i} T^{i}(\mathbf v)\quad \Rightarrow \quad \displaystyle c_{ T|_{\langle \mathbf v \rangle} }(x) = -\sum_{i=0}^{k-1} a_{i} x^i + x^k.$

Proof. Since $\langle \mathbf v \rangle \subseteq V$ as a subspace, it is finite-dimensional with dimension $k$ . Now, let $j$ be the smallest integer such that

$T^j(\mathbf v) \in \mathrm{span}\{\mathbf v,T(\mathbf v),\dots,T^{j-1}(\mathbf v)\}.$

Then for any $i \geq j$ , $T^i(\mathbf v) \in \mathrm{span}\{\mathbf v,T(\mathbf v),\dots,T^{j-1}(\mathbf v)\}$ . Hence,

$\langle \mathbf v \rangle = \mathrm{span}\{\mathbf v,T(\mathbf v),\dots,T^{j-1}(\mathbf v)\}.$

Therefore, $k \leq j$ . For scalars $c_0,c_1,\dots,c_j \in \mathbb K$ , consider the equation

$c_0 \mathbf v + c_1 T(\mathbf v) + \cdots + c_{j-1} T^{j-1}(\mathbf v) = \mathbf 0.$

If $c_{j-1} \neq 0$ , then $T^{j-1}(\mathbf v) \in \mathrm{span}\{\mathbf v,T(\mathbf v),\dots,T^{j-1}(\mathbf v)\},$ and $j \leq j-1$ , a contradiction. Hence, $c_{j-1} = 0$ . By induction, we can show that $c_{i-1} = 0$ for any $1 \leq i \leq j$ , so that $\{\mathbf v,T(\mathbf v),\dots,T^{j-1}(\mathbf v)\}$ is linearly independent. Therefore, $j \leq k$ . Putting both results together yields the desired conclusion.

Finally, by considering the ordered basis $K := (\mathbf v,T(\mathbf v),\dots,T^{j-1}(\mathbf v))$ , we have

$\begin{aligned} c_{[T|_{\langle \mathbf v \rangle}]_K}(x) &= \left|x \mathbf I_k - [T|_{\langle \mathbf v \rangle}]_K \right| \\ &= \left| \begin{matrix} x & 0 & \cdots & 0 & -a_0 \\ -1 & x & \ddots & \vdots & -a_1 \\ 0 & -1 & \ddots & 0 & \vdots \\ \vdots & \ddots & \ddots & x & -a_{k-2} \\ 0 & \cdots & 0 & -1 & -a_{k-1} \end{matrix} \right| = -\sum_{i=0}^{k-1} a_{i} x^i + x^k, \end{aligned}$

by performing cofactor expansion on the first row and applying induction.

When I learned about the Cayley-Hamilton theorem and saw its proof, I’m not kidding when I said I shed a tear.

Theorem 3 (Cayley-Hamilton Theorem). $c_T(T) = O_V$ .

Proof. Fix $\mathbf v \neq \mathbf 0$ for non-triviality. Write

$\langle v \rangle := \mathrm{span}\{\mathbf v, T(\mathbf v),\dots, T^{k-1}(\mathbf v)\}$

for some integer $k \leq \dim(V)$ . Since $T^k(\mathbf v) \in \langle \mathbf v \rangle$ , find coefficients $a_i \in \mathbb K$ such that

$\displaystyle T^k(\mathbf v) = \sum_{i=0}^{k-1} a_{i} T^i(\mathbf v).$

By Example 1, $\langle \mathbf v\rangle$ is $T$ -invariant. By Lemma 2,

$\displaystyle c_{T|_{\langle \mathbf v \rangle}}(x) = -\sum_{i=0}^{k-1} a_{i} x^i + x^k \quad \Rightarrow \quad c_{T|_{\langle \mathbf v \rangle}}(T) = -\sum_{i=0}^{k-1} a_{i} T^i + T^k.$

Therefore,

$\displaystyle \begin{aligned} c_{T|_{\langle \mathbf v \rangle}}(T)(\mathbf v) &= \left(-\sum_{i=0}^{k-1} a_{i} T^i + T^k \right)(\mathbf v) \\ &= -\sum_{i=0}^{k-1} a_{i} T^i(\mathbf v) + T^k(\mathbf v) \\ &= -T^k(\mathbf v) + T^k(\mathbf v) = \mathbf 0.\end{aligned}$

By Theorem 1, there exists some $q \in \mathbb K[x]$ such that $c_T = q \cdot c_{T|_{\langle \mathbf v \rangle}}$ so that $c_T(T) = q(T) \circ c_{T|_{\langle \mathbf v \rangle}}(T)$ . In particular,

$c_T(T)(\mathbf v) = q(T)(c_{T|_{\langle \mathbf v \rangle}}(T)(\mathbf v) ) = q(T)(\mathbf 0) = \mathbf 0.$

Since $\mathbf v \in V$ is arbitrary, we obtain $c_T(T) = O_V$ , as required.

Given $T : V \to V$ , what polynomials $p \in \mathbb K[x]$ would yield $p(T) = O_V$ ? By the Cayley-Hamilton theorem, $c_T$ is one such polynomial. But is $c_T$ the smallest such polynomial? The answer lies in the notion of a minimal polynomial, which intriguingly launches us into discussing generalised eigenstuff. More of such yapping in the next post.

—Joel Kindiak, 8 Mar 2258H

KindiakMath

The Cayley-Hamilton Theorem

Response

Leave a reply to Generalised Eigenstuff – KindiakMath Cancel reply

The Cayley-Hamilton Theorem

Share this:

Response

Leave a reply to Generalised Eigenstuff – KindiakMath Cancel reply