The Nuts and Bolts of Diagonalisation

Previously, we defined eigenvectors and eigenvalues, and used them to obtain a succinct formula for the n-th Fibonacci number x_n, known as Binet’s formula:

\displaystyle x_n = \frac{\phi^n - (-\phi)^{-n}}{\sqrt 5},\quad n \in \mathbb N,

where \phi = \frac 12 (1+\sqrt 5) denotes the golden ratio. We needed to take advantage of the eigenvectors and eigenvalues of the matrix \mathbf A := \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix}. In particular, we needed to find two eigenvectors \mathbf v_1, \mathbf v_2 that can encode any \mathbf v \in \mathbb R^2. That is, we wanted a basis for \mathbb R^2 comprised of the eigenvectors of \mathbf A.

Let \mathbb K be a field and V be a vector space over \mathbb K.

Definition 1. Let U be a vector space over \mathbb K. The linear transformation S : U \to U is similar to T : V \to V if there exists an isomorphism R : U \to V such that S = R \circ T \circ R^{-1}. Equivalently, S \circ R = R \circ T. We remark that the “is similar to” relation behaves like an equivalence relation.

Example 1. If V is finite dimensional with \dim(V) = n, then any linear transformation T : V \to V is similar to some linear transformation U : \mathbb K^n \to \mathbb K^n. This means that computations in V can effectively be reduced to computations in U : \mathbb K^n.

Lemma 1. Let S : U \to U and T : V \to V be similar linear transformations. Then there exists an isomorphism R : U \to V such that for any k \in \mathbb N,

S^n = R \circ T^n \circ R^{-1},

where T^n denotes n-fold composition of T.

Roughly speaking, Definition 1 gives us a condition on which U and T have essentially the same (i.e. similar) effect on the vectors in their respective vector spaces. The similarity vocabulary empowers us to discuss diagonalisation.

Definition 2. A matrix \mathbf D : \mathbb K^n \to \mathbb K^n is diagonal if each \mathbf e_i is an eigenvector of \mathbf D, i.e. for each \mathbf e_i, there exists \lambda_i \in \mathbb K such that \mathbf D(\mathbf e_i) = \lambda_i \mathbf e_i.

In matrix notation, we have

\begin{aligned} \mathbf D &= \begin{bmatrix} \mathbf D(\mathbf e_1) & \mathbf D(\mathbf e_2) & \cdots & \mathbf D(\mathbf e_n) \end{bmatrix} \\ &= \begin{bmatrix} \lambda_1 \mathbf e_1 & \lambda_2 \mathbf e_2 & \cdots & \lambda_n \mathbf e_n \end{bmatrix} = \begin{bmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots &  \vdots & \ddots &  \vdots \\ 0 &  0 & \cdots &  \lambda_n \end{bmatrix} .\end{aligned}

In particular, for any \mathbf e_i, \mathbf D^k \mathbf e_i = \lambda_i^k \mathbf e_i. A matrix \mathbf A : \mathbb K^n \to \mathbb K^n is diagonalisable if it is similar to some diagonal matrix.

Theorem 1. A matrix \mathbf A : \mathbb K^n \to \mathbb K^n is diagonalisable if and only if there exists a basis \{\mathbf v_1,\dots,\mathbf v_n\} of \mathbb K^n such that each \mathbf v_i is an eigenvector of \mathbf A.

Proof. We prove this theorem in two directions. Suppose \mathbf A : \mathbb K^n \to \mathbb K^n is diagonalisable. Then there exists a diagonal matrix \mathbf D : \mathbb K^n \to \mathbb K^n and an invertible matrix \mathbf P : \mathbb K^n \to \mathbb K^n such that

\mathbf A = \mathbf P \mathbf D \mathbf P^{-1} \quad \iff \quad \mathbf A \mathbf P = \mathbf P \mathbf D.

Writing \mathbf P = \begin{bmatrix} \mathbf v_1 &\cdots & \mathbf v_n\end{bmatrix} and employing the recipe-ingredient analogy for matrix multiplication,

\begin{aligned}\mathbf A\begin{bmatrix} \mathbf v_1 &\cdots & \mathbf v_n\end{bmatrix} &=\mathbf P \begin{bmatrix}\lambda_1 \mathbf e_1 & \cdots & \lambda_n \mathbf e_n\end{bmatrix} \\ \begin{bmatrix}  \mathbf A \mathbf v_1 &\cdots & \mathbf A \mathbf v_n\end{bmatrix} &= \begin{bmatrix}\lambda_1 \mathbf P\mathbf e_1 & \cdots & \lambda_n \mathbf P \mathbf e_n\end{bmatrix} \\ &= \begin{bmatrix}\lambda_1 \mathbf v_1 & \cdots & \lambda \mathbf v_n\end{bmatrix}.\end{aligned}

Since \mathbf A\mathbf v_i = \lambda_i\mathbf v_i for each i, \{\mathbf v_1,\dots,\mathbf v_n\} forms the desired basis proving the direction (\Rightarrow). Reversing the argument is not hard, and establishes the direction (\Leftarrow).

One theoretical consequence is that diagonalisation only proves meaningful when discussing vector spaces that have convenient “standard bases”, such as \mathbb K^n with the standard ordered basis (\mathbf e_1,\dots,\mathbf e_n). Nevertheless, the discussion on eigenstuff generally still works in broad contexts, especially if the vector space is finite-dimensional.

Corollary 1. Suppose \dim(V) = n. A linear transformation T : V \to V is diagonalisable if it is similar to a diagonalisable matrix \mathbf A : \mathbb K^n \to \mathbb K^n. Then T is diagonalisable if and only if there exists a basis \{\mathbf v_1,\dots,\mathbf v_n\} of V such that each \mathbf v_i is an eigenvector of T.

Definition 3. A linear transformation T : V \to V is diagonalisable if there exists a basis \{\mathbf v_1,\dots,\mathbf v_n\} of V such that each \mathbf v_i is an eigenvector of T.

Therefore, we can reduce our study of general finite-dimensional vector spaces to the concrete setting of \mathbb K^n. In this case, what sufficiently simple tests do we have to know if a matrix \mathbf A is diagonalisable?

Lemma 2. Any matrix \mathbf A : \mathbb K^n \to \mathbb K^n has at most n distinct eigenvalues. In particular, if \mathbf A has n distinct eigenvalues, then \mathbf A is diagonalisable.

Proof. Each eigenvalue \lambda_i yields an eigenvector \mathbf v_i, and since distinct eigenvalues yield linearly independent eigenvectors \mathbf v_i, the set \{\mathbf v_1,\dots,\mathbf v_k\} \subseteq \mathbb K^n is linearly independent, so that k \leq n. In particular, if k = n, the set forms a basis for \mathbb K^n, as required.

Example 1. Since \mathbf A := \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix} : \mathbb R^2 \to \mathbb R^2 has two eigenvalues \phi, -\phi^{-1} \in \mathbb R, it is diagonalisable, and by Corollary 1, we could extract two key vectors \mathbf v_1,\mathbf v_2 such that \mathbf A^n \mathbf v_i = \lambda_i^n \mathbf v_i trivially. This property was key in helping us derive the formula for the Fibonacci numbers.

Now, what if we had less than n eigenvalues? Is all hope lost? Not quite. Theorem 1 only requires us to have n eigenvectors. Some of them might have the same eigenvalues—that’s alright. Let \lambda_1,\dots,\lambda_k denote the k \leq n eigenvalues of \mathbf A. Each eigenvalue \lambda_i yields at least one nonzero eigenvector \mathbf v_i such that

\begin{aligned} \mathbf A\mathbf v_i = \lambda_i \mathbf v_i \quad &\iff \quad (\lambda \mathbf I_n - \mathbf A)\mathbf v_i = \mathbf 0 \\ &\iff \quad \mathbf v_i \in \ker(\lambda \mathbf I_n - \mathbf A) =: E_\lambda. \end{aligned}

It is clear that \det(\lambda \mathbf I_n - \mathbf A) = 0.

Definition 4. We call the function c_{\mathbf A}(x) := \det(x\mathbf I_n - \mathbf A) the characteristic polynomial of \mathbf A. In particular, \lambda is an eigenvalue of \mathbf A if and only if c_{\mathbf A}(\lambda) = 0.

Lemma 3. If \mathbf A,\mathbf B : \mathbb K^n \to \mathbb K^n are similar, then c_{\mathbf A}(x) = c_{\mathbf B}(x).

Proof. Let \mathbf P be a matrix such that \mathbf A = \mathbf P \mathbf B \mathbf P^{-1}. Then

\begin{aligned} c_{\mathbf A}(x) &= |x\mathbf I_n - \mathbf A| \\ &= |x \mathbf P \mathbf P^{-1} - \mathbf P \mathbf B \mathbf P^{-1}| \\ &= |\mathbf P (x \mathbf I_n - \mathbf B) \mathbf P^{-1}| \\ &= |\mathbf P| |x \mathbf I_n - \mathbf B| |\mathbf P^{-1}| \\ &= |x \mathbf I_n - \mathbf B| = c_{\mathbf B}(x). \end{aligned}

By the factor theorem, there exists a polynomial p \in \mathbb K_{n-k}[x] and unique integers \alpha_i such that

c_{\mathbf A}(x) = (x-\lambda_1)^{\alpha_1} \cdots (x-\lambda_k)^{\alpha_k} p(x).

We call \alpha_i the algebraic multiplicity of E_{\lambda_i}. If \mathbb K = \mathbb C, then by the fundamental theorem of algebra, we can take p(x)=1 so that

\alpha_1 + \cdots + \alpha_k = n.

In any case, the point from Lemma 2 is that E_{\lambda_i} captures all of the eigenvectors of \mathbf A with eigenvalue \lambda_i. Denote \gamma_i := \dim(E_{\lambda_i}), called the geometric multiplicity of \lambda_i. We have a fascinating result.

Theorem 2. For any eigenvalue \lambda_i of \mathbf A for 1 \leq i \leq k, \gamma_i \leq \alpha_i. Hence, if \mathbb K = \mathbb C and \gamma_i = \alpha_i for all 1 \leq i \leq k, then \mathbf A is diagonalisable.

Proof. Let \{\mathbf v_1,\dots,\mathbf v_{\gamma_i}\} denote the eigenvectors in E_{\lambda_i}. Extend to a basis \{\mathbf v_1,\dots,\mathbf v_{\gamma_i}, \mathbf u_1,\dots, \mathbf u_{n -\gamma_i}\} of \mathbb K^n with corresponding invertible matrix \mathbf P. Then \mathbf A is similar to the matrix

\mathbf J := \begin{bmatrix} \lambda_i \mathbf I_{\gamma_i} & \mathbf B \\ \mathbf 0 & \mathbf C \end{bmatrix},

not necessarily diagonal. It follows that

\begin{aligned} c_{\mathbf A}(x) = c_{\mathbf J}(x) &= | x \mathbf I_n - \mathbf J| \\ &=\left| \begin{matrix} (x - \lambda_i) \mathbf I_{\gamma_i} & \mathbf B \\ \mathbf 0 & x \mathbf I_{n-\gamma_i} - \mathbf C \end{matrix} \right| \\ &= |(x - \lambda_i) \mathbf I_{\gamma_i}| |x \mathbf I_{n-\gamma_i} - \mathbf C| \\ &= (x -\lambda_i)^{\gamma_i} |x \mathbf I_{n-\gamma_i} - \mathbf C|. \end{aligned}

Hence, (x-\lambda_i)^{\gamma_i} is a factor of c_{\mathbf A}(x), which itself contains a factor of (x-\lambda_i)^{\alpha_i}, which yields \gamma_i \leq \alpha_i. In particular, over \mathbb C, \mathbf A is diagonalisable if and only if for each eigenvalue \lambda_i, \gamma_i = \alpha_i.

Far from just determining eigenvalues, the characteristic polynomial boasts remarkable algebraic properties, encapsulated by the Cayley-Hamilton theorem. In particular, rather surprisingly, we can use characteristic polynomials to compute inverse matrices. We will explore these ideas in the next post.

—Joel Kindiak, 7 Mar 25, 1707H

,

Published by


Leave a comment