Diagonalization for Legends

Introductory mathematics undergraduates do not know what diagonalization is. Those introduced to linear algebra learn that not all matrices can be diagonalised—not even all square matrices. Those who learn a bit more recognise that square matrices over \mathbb C can be diagonalised in a generalised sense into the Jordan normal form.

Legends suggest that any matrix over \mathbb K = \mathbb R or \mathbb C can be diagonalised. In a modified sense, of course. This is called singular value decomposition.

Theorem 1 (Singular Value Decomposition). For any matrix \mathbf A \in \mathcal M_{m \times n}(\mathbb K), there exist unitary matrices \mathbf U \in \mathrm{GL}_m(\mathbb K) and \mathbf V \in \mathrm{GL}_n(\mathbb K) and a diagonal matrix \mathbf{\Sigma} \in \mathcal M_{m \times n}(\mathbb K) such that

\mathbf A = \mathbf U \mathbf{\Sigma} \mathbf V^*.

Here, we say that \mathbf{\Sigma} \in \mathcal M_{m \times n}(\mathbb K) is diagonal to mean that \mathbf{\Sigma} _{ij} = 0 whenever i \neq j.

Lemma 1. The matrix \mathbf A^* \mathbf A \in \mathcal M_{n \times n}(\mathbb K) is unitarily diagonalisable with nonnegative eigenvalues.

Proof. By conjugate-transpose properties,

(\mathbf A^* \mathbf A)^* = \mathbf A^* (\mathbf A^*)^* = \mathbf A^* \mathbf A.

Therefore, the matrix \mathbf A^* \mathbf A is self-adjoint, and thus normal. By the spectral theorems, it is unitarily diagonalisable, i.e. there exists an orthonormal basis of \mathbb K^n consisting of eigenvectors of \mathbf A^* \mathbf A.

Now suppose (\mathbf A^* \mathbf A) \mathbf v = \lambda \mathbf v. Then

\begin{aligned} \langle \mathbf A \mathbf v , \mathbf A \mathbf v \rangle &= \langle (\mathbf A^* \mathbf A) \mathbf v , \mathbf v \rangle = \langle \lambda \mathbf v, \mathbf v \rangle = \lambda \langle \mathbf v , \mathbf v \rangle. \end{aligned}

By the non-degeneracy of \langle \cdot , \cdot \rangle,

\displaystyle \lambda = \frac{ \langle \mathbf A \mathbf v , \mathbf A \mathbf v \rangle }{\langle \mathbf v , \mathbf v \rangle} \geq \frac{ 0 }{ \langle \mathbf v , \mathbf v \rangle } = 0.

Lemma 2. Let \mathbf A \in \mathcal M_{k \times m}(\mathbb K) and \mathbf B \in \mathcal M_{m \times n}(\mathbb K) be matrices.

  • \mathrm{rank}(\mathbf A) = \mathrm{rank}(\mathbf A^*).
  • \mathrm{rank}(\mathbf A) \leq \min\{k, m\}.
  • \mathrm{rank}(\mathbf A \mathbf B) \leq \min\{\mathrm{rank}(\mathbf A), \mathrm{rank}(\mathbf B)\}.
  • \mathrm{rank}(\mathbf A^* \mathbf A) = \mathrm{rank}(\mathbf A) = \mathrm{rank}( \mathbf A \mathbf A^*).

Proof. For the first claim, we leave it as an exercise to verify that \mathrm{ker}(\mathbf A^*) = \mathbf A(\mathbb K^m)^{\perp}. By orthogonal decomposition,

\mathbb K^k = \mathbf A(\mathbb K^m) \oplus \mathbf A(\mathbb K^m)^{\perp}.

Hence,

\begin{aligned} \mathrm{nullity}(\mathbf A^*) &= \mathrm{dim}(\mathbf A(\mathbb K^m)^{\perp}) \\ &= k - \mathrm{dim}(\mathbf A(\mathbb K^m)) \\ &= k - \mathrm{rank}(\mathbf A). \end{aligned}

By the rank-nullity theorem,

\begin{aligned} \mathrm{rank}(\mathbf A) &= k - \mathrm{nullity}(\mathbf A^*) = \mathrm{rank}(\mathbf A^*). \end{aligned}

The second result follows from the inequalities \mathrm{rank}(\mathbf A) \leq m and \mathrm{rank}(\mathbf A^*) \leq k. The third result follows from the containment \mathbf A\mathbf B(\mathbb K^n) \subseteq \mathbf A(\mathbb K^m) and

\mathrm{rank}(\mathbf A \mathbf B) = \mathrm{rank}(\mathbf B^* \mathbf A^*) \leq \mathrm{rank}(\mathbf B^*) = \mathrm{rank}(\mathbf B).

The direction \leq in the fourth result is an obvious corollary. For the direction \geq, we leave it as an exercise to verify that \mathrm{ker}(\mathbf A^* \mathbf A) \subseteq \mathrm{ker}(\mathbf A), from which the result follows by the rank-nullity theorem.

Proof of Theorem 1. We first note that it suffices to establish the equation

\mathbf A \mathbf V = \mathbf U \mathbf{\Sigma}.

Letting \sigma_i := \mathbf{\Sigma}_{ii} for 1 \leq i \leq r, if \{\mathbf u_1,\dots,\mathbf u_m\} is any orthonormal basis for \mathbb K^m and \{\mathbf v_1,\dots,\mathbf v_n\} is any orthonormal basis for \mathbb K^n, defining

\mathbf U := \begin{bmatrix} \mathbf u_1 & \cdots & \mathbf u_m \end{bmatrix},\quad \mathbf V := \begin{bmatrix} \mathbf v_1 & \cdots & \mathbf v_n \end{bmatrix},

we obtain for any 1 \leq r \leq \min\{m , n \},

\begin{aligned} \mathbf A \mathbf V &= \mathbf A \begin{bmatrix} \mathbf v_1 & \cdots & \mathbf v_n \end{bmatrix} \\ &= \begin{bmatrix} \mathbf A \mathbf v_1 & \cdots & \mathbf A \mathbf v_r & \mathbf A \mathbf v_{r+1} & \cdots & \mathbf A \mathbf v_n \end{bmatrix} \end{aligned}

and

\begin{aligned} \mathbf U \mathbf{\Sigma} &= \begin{bmatrix} \mathbf u_1 & \cdots & \mathbf u_m \end{bmatrix} \begin{bmatrix} \sigma_1 & 0 & \cdots & 0 & \cdots & 0 \\ 0 & \ddots & \ddots & \vdots & & \vdots \\ \vdots & \ddots & \sigma_r & 0 & \cdots & 0 \\ 0 & \cdots & 0 & 0 & \cdots & 0 \\ \vdots & & \vdots & \vdots & & \vdots \\ 0 & \cdots & 0 & 0 & \cdots & 0 \end{bmatrix} \\ &= \begin{bmatrix} \sigma_1 \mathbf u_1 & \cdots & \sigma_r \mathbf u_r & \mathbf 0 & \cdots & \mathbf 0 \end{bmatrix}. \end{aligned}

This means we need to find suitable orthonormal vectors \mathbf u_1,\dots,\mathbf u_m and \mathbf v_1,\dots,\mathbf v_n, as well as suitably defined numbers \sigma_1, \dots, \sigma_r such that

\mathbf A \mathbf v_i = \sigma_i \mathbf u_i,\quad 1 \leq i \leq r.

In particular,

\displaystyle \sigma_i \neq 0 \quad \iff \quad \mathbf u_i = \frac{1}{\sigma_i} \mathbf A \mathbf v_i .

By Lemma 2, suppose \mathrm{rank}(\mathbf A) = r \leq \min\{m, n\}. By Lemma 1, the matrix \mathbf A^* \mathbf A \in \mathcal M_{n \times n}(\mathbb K) is unitarily diagonalisable with nonnegative eigenvalues

\lambda_1 \geq \dots \geq \lambda_r > 0 = \lambda_{r+1} = \cdots = \lambda_n,

some possibly repeated, and orthonormal basis \{\mathbf v_1,\dots,\mathbf v_n\}. Furthermore, for i, j,

\langle \mathbf A \mathbf v_i, \mathbf A \mathbf v_j \rangle =\langle \mathbf v_i, \mathbf A^* \mathbf A \mathbf v_j \rangle = \langle \mathbf v_i, \lambda_j \mathbf v_j \rangle = \lambda_j \langle \mathbf v_i, \mathbf v_j \rangle.

In particular, \{\mathbf A \mathbf v_i : 1 \leq i \leq r\} is orthogonal, and for each i, \| \mathbf A \mathbf v_i \| = \sqrt{\lambda_i} =: \sigma_i. Define the corresponding unit vectors

\displaystyle \mathbf u_i := \frac{ \mathbf A \mathbf v_i }{\| \mathbf A \mathbf v_i \|} = \frac{1}{\sigma_i} \mathbf A \mathbf v_i,\quad 1 \leq i \leq r.

Then \{\mathbf u_1,\dots,\mathbf u_r\} is clearly orthonormal. Extend it to an orthonormal basis for \mathbb K^m via a modified form of the Gram-Schmidt process to yield the desired result.

Remark 1. The constants \sigma_i are called the singular values of \mathbf A.

Therefore, any matrix, given effort, can be diagonalised, albeit rather creatively using our arsenal of linear algebraic tools.

—Joel Kindiak, 26 Mar 25, 1920H

,

Published by


Leave a comment