We Finally Discuss Transposes

Previously, we wanted to find a best approximation to the usually unsolvable equation \mathbf A\mathbf x =\mathbf b. Using the idea of projections, we know that \mathbf u will give us the best approximation if and only if

\mathbf A\mathbf u = \mathrm{proj}_{\mathbf A(\mathbb K^n)} (\mathbf b).

With a bit more work, we show that this equation holds if and only if

\langle \mathbf A \mathbf e_j, \mathbf A \mathbf u \rangle = \langle \mathbf A \mathbf e_j, \mathbf b \rangle.

But where do we go from here?

Now, let’s recall the usual inner product on \mathbb K^n. Given \mathbf u, \mathbf v \in \mathbb K^n, the usual inner product is defined by

\displaystyle \langle \mathbf u, \mathbf v \rangle := \sum_{j=1}^n u_i \bar{v}_i = \begin{bmatrix} \bar{v}_1 & \cdots & \bar{v}_n \end{bmatrix} \mathbf u.

Thus, we have the definition \mathbf v^* :=\begin{bmatrix} \bar{v}_1 & \cdots & \bar{v}_n \end{bmatrix} if and only if

\mathbf v^* := \langle \cdot ,\mathbf v \rangle : \mathbb K^n \to \mathbb K.

In this case, \langle \mathbf u, \mathbf v \rangle = \mathbf v^* \mathbf u. To adapt to more general settings, we adopt the left-hand definition, called an adjoint.

Henceforth, let V be an inner product space over a field \mathbb K.

Definition 1. For any \mathbf v \in V, define its adjoint by \mathbf v^* := \langle \cdot, \mathbf v\rangle \in \mathcal L(V, \mathbb K) =: V^*.

Given \mathbf A : \mathbb K^n \to \mathbb K^m, a reasonable definition for \mathbf A^* would be

\mathbf A^* := \begin{bmatrix} (\mathbf A \mathbf e_1)^* \\ \vdots \\ (\mathbf A \mathbf e_n)^* \end{bmatrix} : \mathbb K^m \to \mathbb K^n.

In particular, by writing in terms of the usual standard basis,

\displaystyle \mathbf A^* = \sum_{j=1}^n \langle \cdot , \mathbf A \mathbf e_j \rangle \mathbf e_j : \mathbb K^m \to \mathbb K^n.

where we adopted the inner product notation for aesthetic purposes. When \mathbb K = \mathbb R, we denote \mathbf A^{\mathrm T} = \mathbf A^*.

Theorem 1. We have [a_{ij}]^* = [\bar{a}_{ji}]. If \mathbb K = \mathbb R, then [a_{ij}]^* = [a_{ji}]. Thus, we call \mathbf A^* the conjugate transpose of \mathbf A, and \mathbf A^{\mathrm T} the transpose of \mathbf A.

Proof. Let \mathbf A = \begin{bmatrix} [a_{1,i}] & \cdots &  [a_{n,i}] \end{bmatrix}. By definition,

\displaystyle (\mathbf A^*(\mathbf e_i))(j) = \sum_{k=1}^n \langle \mathbf e_i , \mathbf A \mathbf e_k \rangle \mathbf e_k(j) = \langle \mathbf e_i , \mathbf A\mathbf e_j \rangle = \bar{a}_{ji}.

To generalise this idea to the linear transformation T : V \to W between two inner product spaces would take considerably much more effort, but there are instances where we might be able to make a sufficiently reasonable definition.

In particular, if B := \{\mathbf e_\alpha : \alpha \in I\} and C := \{\mathbf f_\beta : \beta \in J\} form bases for V and W respectively (this is certainly true in the finite-dimensional setting), then assuming that B is finite, we can make the definition

\displaystyle T^* := \sum_{\alpha \in I} \langle \cdot , T(\mathbf e_\alpha) \rangle \mathbf e_\alpha : W \to V,

where the sum of the right-hand side is finite and thus well-defined. In particular, for each \beta \in J,

\displaystyle T^*(\mathbf f_\beta) := \sum_{\alpha \in I} \langle \mathbf f_\beta , T(\mathbf e_\alpha) \rangle \mathbf e_\alpha.

If furthermore that B is orthonormal, then applying \langle \mathbf e_\gamma, \cdot \rangle on both sides,

\displaystyle \begin{aligned} \langle \mathbf e_\gamma, T^*(\mathbf f_\beta) \rangle &= \left\langle \mathbf e_\gamma,  \sum_{\alpha \in I} \langle \mathbf f_\beta , T(\mathbf e_\alpha) \rangle \mathbf e_\alpha \right\rangle \\ &= \sum_{\alpha \in I} \overline{\langle \mathbf f_\beta , T(\mathbf e_\alpha) \rangle} \langle \mathbf e_\gamma,  \mathbf e_\alpha \rangle \\ &= \sum_{\alpha \in I} \langle T(\mathbf e_\alpha), \mathbf f_\beta \rangle \langle \mathbf e_\gamma,  \mathbf e_\alpha \rangle \\ &= \langle T(\mathbf e_\gamma), \mathbf f_\beta \rangle + \sum_{\alpha \neq \gamma} \langle T(\mathbf e_\alpha), \mathbf f_\beta \rangle \cdot 0 \\ &= \langle T(\mathbf e_\gamma), \mathbf f_\beta \rangle. \end{aligned}

Extending by linearity, we obtain the crucial identity

\langle T(\mathbf v), \mathbf w \rangle = \langle \mathbf v, T^*(\mathbf w)\rangle

that serves as a working definition of T^*, which we have properly constructed at least in the case V is finite-dimensional.

Let V be an inner product space over \mathbb K.

Lemma 1. Let T : V \to W be a linear transformation such that there exists a linear transformation T^* : W \to V that satisfies the crucial property

\langle T(\mathbf v), \mathbf w \rangle = \langle \mathbf v, T^*(\mathbf w)\rangle,\quad \mathbf v \in V,\quad \mathbf w \in W.

Then T^* is unique. We call T^* the adjoint of T.

Proof. We first remark that for any \mathbf v \in V, \langle \cdot ,\mathbf v \rangle = O_V implies

\langle \mathbf v ,\mathbf v \rangle = O_V(\mathbf v) = 0 \quad \Rightarrow \quad \mathbf v = \mathbf 0.

Now suppose there exist two such transformations T_1, T_2 that satisfy the crucial property. Fix \mathbf w \in W. Then linearity yields

\langle \cdot, T_1(\mathbf w) - T_2(\mathbf w) \rangle = O_V \quad \Rightarrow \quad T_1(\mathbf w) = T_2(\mathbf w).

Since \mathbf w \in W is arbitrary, we have T_1 = T_2, as required.

Corollary 1. For any \mathbf v \in \mathbb K^n, \mathbf w \in \mathbb K^m, and \mathbf A : \mathbb K^n \to \mathbb K^m,

\langle \mathbf A\mathbf v, \mathbf w\rangle = \langle \mathbf v, \mathbf A^* \mathbf w \rangle.

Furthermore, the map \mathcal M_{m \times n}(\mathbb K) \to \mathcal M_{n \times m}(\mathbb K), \mathbf A \mapsto \mathbf A^* is an isomorphism.

Corollary 2. Let U , V , W be inner product spaces over \mathbb K and S : U \to V, T : V \to W where their adjoints exist. Then (T \circ S)^* = S^* \circ T^*.

Proof. Fix \mathbf u \in U, \mathbf w \in W. Then

\begin{aligned}\langle  (T \circ S)(\mathbf u), \mathbf w \rangle &= \langle  T (S(\mathbf u)), \mathbf w \rangle \\ &= \langle  S(\mathbf u), T^*(\mathbf w) \rangle \\ &= \langle  \mathbf u, S^*(T^*(\mathbf w)) \rangle \\ &= \langle  \mathbf u, (S^* \circ T^*)(\mathbf w) \rangle. \end{aligned}

By uniqueness, (T \circ S)^* = S^* \circ T^*.

Corollary 3. For \mathbf A : \mathbb K^m \to \mathbb K^k and \mathbf B : \mathbb K^n \to \mathbb K^m. Then (\mathbf A \mathbf B)^* = \mathbf B^* \mathbf A^*. If \mathbb K = \mathbb R, then (\mathbf A \mathbf B)^{\mathrm T} = \mathbf B^{\mathrm T} \mathbf A^{\mathrm T}.

Corollary 4. Define \overline{[a_{ij}]} := [\bar{a}_{ij}] for appropriately sized matrices. For \mathbf A : \mathbb K^m \to \mathbb K^k, \mathbf A^{\mathrm T} = {\overline{\mathbf A}}^*, even when \mathbb K = \mathbb C. Hence, for \mathbf B : \mathbb K^n \to \mathbb K^m, (\mathbf A \mathbf B)^{\mathrm T} = \mathbf B^{\mathrm T} \mathbf A^{\mathrm T}.

Finally, we can return to our original problem in \mathbb R^n. We have derived the following identity using best approximations:

\langle \mathbf A \mathbf e_j, \mathbf A \mathbf u \rangle = \langle \mathbf A \mathbf e_j, \mathbf b \rangle.

In the context \mathbb K = \mathbb R, {\mathbf A}^* = {\mathbf A}^{\mathrm T}, so that taking adjoints,

\langle \mathbf e_j, {\mathbf A}^{\mathrm T} \mathbf A \mathbf u \rangle = \langle \mathbf e_j, {\mathbf A}^{\mathrm T} \mathbf b \rangle.

Since \mathbf e_j is arbitrary, by the injectivity of \mathbf v \mapsto \langle \cdot ,\mathbf v \rangle, {\mathbf A}^{\mathrm T} \mathbf A \mathbf u = {\mathbf A}^{\mathrm T} \mathbf b. Therefore, we can solve our best approximation problem. In fact, we can solve this problem even if \mathbb K = \mathbb C.

Theorem 2. Let V, W be inner product spaces over \mathbb K, T : V \to W a linear transformation.

  • If W is finite-dimensional, then for any \mathbf y \in W, the equation T(\mathbf x) = \mathbf y has a unique best approximation.
  • If the equation T(\mathbf x) = \mathbf y has a unique best approximation then T(\mathbf u) is the best approximation if and only if T(\mathbf u) = \mathrm{proj}_{T(V)}(\mathbf y) .
  • Furthermore, if T^* exists, then T(\mathbf u) is the best approximation if and only if (T^* \circ T)(\mathbf u)=T^*(\mathbf y).

Corollary 5. Fix \mathbf A : \mathbb K^n \to \mathbb K^m and \mathbf b \in \mathbb K^m. The equation \mathbf A \mathbf x = \mathbf b has a unique best approximation. Furthermore, \mathbf A \mathbf u is the best approximation if and only if {\mathbf A}^* \mathbf A \mathbf u = {\mathbf A}^* \mathbf b. In particular, if \mathbb K =\mathbb R, \mathbf u must satisfy the equation {\mathbf A}^{\mathrm T} \mathbf A \mathbf u = {\mathbf A}^{\mathrm T} \mathbf b.

And so I used my graphing calculator to solve the equation {\mathbf A}^{\mathrm T} \mathbf A \mathbf u = {\mathbf A}^{\mathrm T} \mathbf b, and with 5 minutes of somewhat tedious typing, computed the theoretically perfect best-fit line (up to a precision of 3 significant figures). Then 3 months later, I learned that the calculator had an in-built function to compute this line in 3 seconds. Bummer.

In our next post, we will examine the adjoint transformation T^* a bit more, and discover something peculiar about matrices that satisfy the innocent-looking equality \mathbf A^* = \mathbf A. These tools will help us explore the spectral theorems, which will empower us with one very useful concrete application—singular value decomposition.

—Joel Kindiak, 13 Mar 25, 1716H

,

Published by


Leave a comment