The Cayley-Hamilton Theorem

Let V be a vector space over a field \mathbb K and T : V \to V be a linear transformation.

Lemma 1. Suppose V is finite-dimensional with ordered basis K. Then T is similar to [T]_K. In particular, we define c_T(x) := c_{[T]_K}(x).

Proof. Define the isomorphism R : V \to \mathbb K^{|K|} by R(\mathbf v) = [\mathbf v]_K. Then we leave it as an exercise to check that T = R^{-1} \circ [T]_K \circ R.

The transformation T being diagonalisable is a special property, and it satisfies the rather elegant result below.

Definition 1. For any polynomial \displaystyle p \in \mathbb K[x] defined by \displaystyle p(x) = \sum_{i=0}^k a_i x^i, define the linear transformation p(T) : V \to V by

\displaystyle p(T) := \sum_{i=0}^k a_i T^i.

Since T^i T^j = T^j T^i, we have p(T) \circ q(T) = q(T) \circ p(T) for polynomials p,q \in \mathbb K[x].

Theorem 1. Define the zero transformation O_V : V \to V by O_V(\mathbf v) = \mathbf 0 for any \mathbf v. If V is finite-dimensional and T is diagonalisable, then c_T(T) = O_V.

Proof. Let \{\mathbf v_1,\dots\mathbf v_n\} be a basis of eigenvectors of T for V, and \lambda_i denote the eigenvalue corresponding to \mathbf v_i. Then each (x - \lambda_i) is a factor of c_T. Hence, for each \lambda_i, there exists some q_i \in \mathbb K[x] such that

c_T(x) = q_i(x) (\lambda_i - x).

In particular,

c_T(T) = q_i(T) \circ (\lambda_i I_V - T).

Thus, for any \mathbf v_i,

\begin{aligned} c_T(T)(\mathbf v_i) &= q_i(T) ((\lambda_i I_V - T)(\mathbf v_i)) \\ &= q_i(T) (\lambda_i \mathbf v_i - T(\mathbf v_i))  \\ &= q_i(T) (\lambda_i \mathbf v_i - \lambda_i \mathbf v_i) \\ &= q_i(T)(\mathbf 0) = \mathbf 0. \end{aligned}

Now, for any \mathbf v \in V, find unique constants a_i \in \mathbb K such that

\displaystyle \mathbf v = \sum_{i=1}^n a_i \mathbf v_i.

By the linearity of c_T(T),

\displaystyle c_T(T)(\mathbf v) = \sum_{i=1}^n a_i\cdot c_T(T)(\mathbf v_i) = \sum_{i=1}^n a_i \cdot \mathbf 0 = \mathbf 0.

Corollary 1. Suppose V is finite-dimensional. Let \displaystyle c_T(x) = \sum_{i=0}^n a_i x^i. If T is invertible and c_T(T) = 0, then

\displaystyle T^{-1} = - a_0^{-1} \sum_{i=0}^{n-1} a_i T^i.

Proof. Since T is invertible, 0 is not an eigenvalue of T, so that a_0 = c_T(0) \neq 0:

\displaystyle O_V = c_T(T) = a_0 + T \circ \sum_{i=0}^{n-1} a_i T^i.

Therefore, \displaystyle T^{-1} = - a_0^{-1} \sum_{i=0}^{n-1} a_i T^i.

Corollary 2. Suppose the characteristic equation of \mathbf A : \mathbb K^n \to \mathbb K^n is \displaystyle c_{\mathbf A}(x) = \sum_{i=0}^n a_i x^i. If \mathbf A is diagonalisable and invertible, then there exists an invertible matrix \mathbf P and a diagonal matrix \mathbf D such that

\displaystyle \mathbf A^{-1} = - a_0^{-1} \mathbf P \left( \sum_{i=0}^{n-1} a_i \mathbf D^i \right) \mathbf P^{-1}.

It turns out that we don’t actually require T to be diagonalisable in Theorem 1. In fact, T just needs to be a linear transformation from V to V, and V must be finite-dimensional, so that c_T(T) = O_V. This is the Cayley-Hamilton theorem which we aim to prove in this post.

Definition 2. We call a subspace W \subseteq V a Tinvariant subspace of V if T(W) \subseteq W. Equivalently, T|_W : W \to W is well-defined.

Example 1. For any nonzero \mathbf v \in V, the Tcyclic subspace generated by \mathbf v defined by \langle \mathbf v \rangle := \mathrm{span}\{T^n(\mathbf v) : n \in \mathbb N_0\} is T-invariant. In particular, if \lambda be an eigenvalue of T, then E_{\lambda} is T-invariant.

Example 2. Define the differentiation operator by D : C^\infty \to C^\infty, D(f) = f'. For any n, \mathbb K_n[x] \subseteq C^\infty is D-invariant.

In what follows, let V be finite-dimensional with ordered basis K = (\mathbf v_1,\dots,\mathbf v_n).

Definition 3. Let T : V \to W be a linear transformation. Suppose W is a finite-dimensional vector space with ordered basis L := (\mathbf w_1,\dots,\mathbf w_m) respectively. For each \mathbf w \in W, define

\displaystyle [\mathbf w]_L = \begin{bmatrix} c_1 \\ \vdots \\ c_m \end{bmatrix},\quad \mathbf w = \sum_{i=1}^m c_i \mathbf w_i.

Then define [T]_{K, L} := \begin{bmatrix} [T(\mathbf v_1)]_L & \cdots & [T(\mathbf v_n)]_L \end{bmatrix}. If T: V \to V then define [T]_K := [T]_{K,K}.

Theorem 2. Suppose W \subseteq V is T-invariant. Then c_{ T|_W} is a factor of c_{T}.

Proof. Let K:= \{\mathbf w_1,\dots,\mathbf w_k\} be a basis for W and extend it to a basis L := K \cup \{\mathbf u_1,\dots,\mathbf u_{n-k}\} for V. Then there exist matrices \mathbf B,\mathbf C of appropriate sizes such that

\displaystyle \begin{aligned} c_{[T]_L}(x) &= |x\mathbf I_n - [T]_L| \\ &= \left| \begin{matrix} x\mathbf I_k - [T|_W]_K & \mathbf B \\ \mathbf 0 & x\mathbf I_{n-k} - \mathbf C \end{matrix} \right| \\ &= |x\mathbf I_k - [T|_W]_K| |\mathbf C| \\ &= |x\mathbf I_k - [T|_W]_K| |x\mathbf I_{n-k} - \mathbf C| \\ &=  c_{[T|_W]_K}(x) |x\mathbf I_{n-k} - \mathbf C| \end{aligned}

Therefore, c_{T|_W} =  c_{[T|_W]_K} is a factor of c_T = c_{[T]_L}.

Lemma 2. For any nonzero \mathbf v \in V, there exists some unique integer 1 \leq k \leq n such that \{\mathbf v, T(\mathbf v),\dots,T^{k-1}(\mathbf v)\} is a basis for \langle \mathbf v \rangle. Furthermore,

\displaystyle T^k(\mathbf v) = \sum_{i=0}^{k-1} a_{i} T^{i}(\mathbf v)\quad \Rightarrow \quad \displaystyle c_{ T|_{\langle \mathbf v \rangle} }(x) = -\sum_{i=0}^{k-1} a_{i} x^i + x^k.

Proof. Since \langle \mathbf v \rangle \subseteq V as a subspace, it is finite-dimensional with dimension k. Now, let j be the smallest integer such that

T^j(\mathbf v) \in \mathrm{span}\{\mathbf v,T(\mathbf v),\dots,T^{j-1}(\mathbf v)\}.

Then for any i \geq j, T^i(\mathbf v) \in \mathrm{span}\{\mathbf v,T(\mathbf v),\dots,T^{j-1}(\mathbf v)\}. Hence,

\langle \mathbf v \rangle = \mathrm{span}\{\mathbf v,T(\mathbf v),\dots,T^{j-1}(\mathbf v)\}.

Therefore, k \leq j. For scalars c_0,c_1,\dots,c_j \in \mathbb K, consider the equation

c_0 \mathbf v + c_1 T(\mathbf v) + \cdots + c_{j-1} T^{j-1}(\mathbf v) = \mathbf 0.

If c_{j-1} \neq 0, then T^{j-1}(\mathbf v) \in \mathrm{span}\{\mathbf v,T(\mathbf v),\dots,T^{j-1}(\mathbf v)\}, and j \leq j-1, a contradiction. Hence, c_{j-1} = 0. By induction, we can show that c_{i-1} = 0 for any 1 \leq i \leq j, so that \{\mathbf v,T(\mathbf v),\dots,T^{j-1}(\mathbf v)\} is linearly independent. Therefore, j \leq k. Putting both results together yields the desired conclusion.

Finally, by considering the ordered basis K := (\mathbf v,T(\mathbf v),\dots,T^{j-1}(\mathbf v)), we have

\begin{aligned} c_{[T|_{\langle \mathbf v \rangle}]_K}(x) &= \left|x \mathbf I_k - [T|_{\langle \mathbf v \rangle}]_K \right| \\ &= \left| \begin{matrix} x & 0 & \cdots & 0 & -a_0 \\ -1 & x & \ddots & \vdots & -a_1 \\ 0 & -1 & \ddots & 0 & \vdots \\ \vdots & \ddots & \ddots & x & -a_{k-2} \\ 0 & \cdots & 0 & -1 & -a_{k-1} \end{matrix} \right| = -\sum_{i=0}^{k-1} a_{i} x^i + x^k, \end{aligned}

by performing cofactor expansion on the first row and applying induction.

When I learned about the Cayley-Hamilton theorem and saw its proof, I’m not kidding when I said I shed a tear.

Theorem 3 (Cayley-Hamilton Theorem). c_T(T) = O_V.

Proof. Fix \mathbf v \neq \mathbf 0 for non-triviality. Write

\langle v \rangle := \mathrm{span}\{\mathbf v, T(\mathbf v),\dots, T^{k-1}(\mathbf v)\}

for some integer k \leq \dim(V). Since T^k(\mathbf v) \in \langle \mathbf v \rangle, find coefficients a_i \in \mathbb K such that

\displaystyle T^k(\mathbf v) = \sum_{i=0}^{k-1} a_{i} T^i(\mathbf v).

By Example 1, \langle \mathbf v\rangle is T-invariant. By Lemma 2,

\displaystyle c_{T|_{\langle \mathbf v \rangle}}(x) = -\sum_{i=0}^{k-1} a_{i} x^i + x^k \quad \Rightarrow \quad c_{T|_{\langle \mathbf v \rangle}}(T) = -\sum_{i=0}^{k-1} a_{i} T^i + T^k.

Therefore,

\displaystyle \begin{aligned} c_{T|_{\langle \mathbf v \rangle}}(T)(\mathbf v) &= \left(-\sum_{i=0}^{k-1} a_{i} T^i + T^k \right)(\mathbf v) \\ &= -\sum_{i=0}^{k-1} a_{i} T^i(\mathbf v) + T^k(\mathbf v) \\ &= -T^k(\mathbf v) + T^k(\mathbf v) = \mathbf 0.\end{aligned}

By Theorem 1, there exists some q \in \mathbb K[x] such that c_T = q \cdot c_{T|_{\langle \mathbf v \rangle}} so that c_T(T) = q(T) \circ c_{T|_{\langle \mathbf v \rangle}}(T). In particular,

c_T(T)(\mathbf v) = q(T)(c_{T|_{\langle \mathbf v \rangle}}(T)(\mathbf v) ) = q(T)(\mathbf 0) = \mathbf 0.

Since \mathbf v \in V is arbitrary, we obtain c_T(T) = O_V, as required.

Given T : V \to V, what polynomials p \in \mathbb K[x] would yield p(T) = O_V? By the Cayley-Hamilton theorem, c_T is one such polynomial. But is c_T the smallest such polynomial? The answer lies in the notion of a minimal polynomial, which intriguingly launches us into discussing generalised eigenstuff. More of such yapping in the next post.

—Joel Kindiak, 8 Mar 2258H

,

Published by


Response

  1. Generalised Eigenstuff – KindiakMath

    […] . Then is an eigenvalue of if and only if . We call the characteristic polynomial of . By the Cayley-Hamilton theorem, […]

    Like

Leave a reply to Generalised Eigenstuff – KindiakMath Cancel reply