We Finally Discuss Transposes

Previously, we wanted to find a best approximation to the usually unsolvable equation $\mathbf A\mathbf x =\mathbf b$ . Using the idea of projections, we know that $\mathbf u$ will give us the best approximation if and only if

$\mathbf A\mathbf u = \mathrm{proj}_{\mathbf A(\mathbb K^n)} (\mathbf b).$

With a bit more work, we show that this equation holds if and only if

$\langle \mathbf A \mathbf e_j, \mathbf A \mathbf u \rangle = \langle \mathbf A \mathbf e_j, \mathbf b \rangle.$

But where do we go from here?

Now, let’s recall the usual inner product on $\mathbb K^n$ . Given $\mathbf u, \mathbf v \in \mathbb K^n$ , the usual inner product is defined by

$\displaystyle \langle \mathbf u, \mathbf v \rangle := \sum_{j=1}^n u_i \bar{v}_i = \begin{bmatrix} \bar{v}_1 & \cdots & \bar{v}_n \end{bmatrix} \mathbf u.$

Thus, we have the definition $\mathbf v^* :=\begin{bmatrix} \bar{v}_1 & \cdots & \bar{v}_n \end{bmatrix}$ if and only if

$\mathbf v^* := \langle \cdot ,\mathbf v \rangle : \mathbb K^n \to \mathbb K.$

In this case, $\langle \mathbf u, \mathbf v \rangle = \mathbf v^* \mathbf u$ . To adapt to more general settings, we adopt the left-hand definition, called an adjoint.

Henceforth, let $V$ be an inner product space over a field $\mathbb K$ .

Definition 1. For any $\mathbf v \in V$ , define its adjoint by $\mathbf v^* := \langle \cdot, \mathbf v\rangle \in \mathcal L(V, \mathbb K) =: V^*$ .

Given $\mathbf A : \mathbb K^n \to \mathbb K^m$ , a reasonable definition for $\mathbf A^*$ would be

$\mathbf A^* := \begin{bmatrix} (\mathbf A \mathbf e_1)^* \\ \vdots \\ (\mathbf A \mathbf e_n)^* \end{bmatrix} : \mathbb K^m \to \mathbb K^n.$

In particular, by writing in terms of the usual standard basis,

$\displaystyle \mathbf A^* = \sum_{j=1}^n \langle \cdot , \mathbf A \mathbf e_j \rangle \mathbf e_j : \mathbb K^m \to \mathbb K^n.$

where we adopted the inner product notation for aesthetic purposes. When $\mathbb K = \mathbb R$ , we denote $\mathbf A^{\mathrm T} = \mathbf A^*$ .

Theorem 1. We have $[a_{ij}]^* = [\bar{a}_{ji}]$ . If $\mathbb K = \mathbb R$ , then $[a_{ij}]^* = [a_{ji}]$ . Thus, we call $\mathbf A^*$ the conjugate transpose of $\mathbf A$ , and $\mathbf A^{\mathrm T}$ the transpose of $\mathbf A$ .

Proof. Let $\mathbf A = \begin{bmatrix} [a_{1,i}] & \cdots & [a_{n,i}] \end{bmatrix}$ . By definition,

$\displaystyle (\mathbf A^*(\mathbf e_i))(j) = \sum_{k=1}^n \langle \mathbf e_i , \mathbf A \mathbf e_k \rangle \mathbf e_k(j) = \langle \mathbf e_i , \mathbf A\mathbf e_j \rangle = \bar{a}_{ji}.$

To generalise this idea to the linear transformation $T : V \to W$ between two inner product spaces would take considerably much more effort, but there are instances where we might be able to make a sufficiently reasonable definition.

In particular, if $B := \{\mathbf e_\alpha : \alpha \in I\}$ and $C := \{\mathbf f_\beta : \beta \in J\}$ form bases for $V$ and $W$ respectively (this is certainly true in the finite-dimensional setting), then assuming that $B$ is finite, we can make the definition

$\displaystyle T^* := \sum_{\alpha \in I} \langle \cdot , T(\mathbf e_\alpha) \rangle \mathbf e_\alpha : W \to V,$

where the sum of the right-hand side is finite and thus well-defined. In particular, for each $\beta \in J$ ,

$\displaystyle T^*(\mathbf f_\beta) := \sum_{\alpha \in I} \langle \mathbf f_\beta , T(\mathbf e_\alpha) \rangle \mathbf e_\alpha.$

If furthermore that $B$ is orthonormal, then applying $\langle \mathbf e_\gamma, \cdot \rangle$ on both sides,

$\displaystyle \begin{aligned} \langle \mathbf e_\gamma, T^*(\mathbf f_\beta) \rangle &= \left\langle \mathbf e_\gamma, \sum_{\alpha \in I} \langle \mathbf f_\beta , T(\mathbf e_\alpha) \rangle \mathbf e_\alpha \right\rangle \\ &= \sum_{\alpha \in I} \overline{\langle \mathbf f_\beta , T(\mathbf e_\alpha) \rangle} \langle \mathbf e_\gamma, \mathbf e_\alpha \rangle \\ &= \sum_{\alpha \in I} \langle T(\mathbf e_\alpha), \mathbf f_\beta \rangle \langle \mathbf e_\gamma, \mathbf e_\alpha \rangle \\ &= \langle T(\mathbf e_\gamma), \mathbf f_\beta \rangle + \sum_{\alpha \neq \gamma} \langle T(\mathbf e_\alpha), \mathbf f_\beta \rangle \cdot 0 \\ &= \langle T(\mathbf e_\gamma), \mathbf f_\beta \rangle. \end{aligned}$

Extending by linearity, we obtain the crucial identity

$\langle T(\mathbf v), \mathbf w \rangle = \langle \mathbf v, T^*(\mathbf w)\rangle$

that serves as a working definition of $T^*$ , which we have properly constructed at least in the case $V$ is finite-dimensional.

Let $V$ be an inner product space over $\mathbb K$ .

Lemma 1. Let $T : V \to W$ be a linear transformation such that there exists a linear transformation $T^* : W \to V$ that satisfies the crucial property

$\langle T(\mathbf v), \mathbf w \rangle = \langle \mathbf v, T^*(\mathbf w)\rangle,\quad \mathbf v \in V,\quad \mathbf w \in W.$

Then $T^*$ is unique. We call $T^*$ the adjoint of $T$ .

Proof. We first remark that for any $\mathbf v \in V$ , $\langle \cdot ,\mathbf v \rangle = O_V$ implies

$\langle \mathbf v ,\mathbf v \rangle = O_V(\mathbf v) = 0 \quad \Rightarrow \quad \mathbf v = \mathbf 0.$

Now suppose there exist two such transformations $T_1, T_2$ that satisfy the crucial property. Fix $\mathbf w \in W$ . Then linearity yields

$\langle \cdot, T_1(\mathbf w) - T_2(\mathbf w) \rangle = O_V \quad \Rightarrow \quad T_1(\mathbf w) = T_2(\mathbf w).$

Since $\mathbf w \in W$ is arbitrary, we have $T_1 = T_2$ , as required.

Corollary 1. For any $\mathbf v \in \mathbb K^n$ , $\mathbf w \in \mathbb K^m$ , and $\mathbf A : \mathbb K^n \to \mathbb K^m$ ,

$\langle \mathbf A\mathbf v, \mathbf w\rangle = \langle \mathbf v, \mathbf A^* \mathbf w \rangle.$

Furthermore, the map $\mathcal M_{m \times n}(\mathbb K) \to \mathcal M_{n \times m}(\mathbb K), \mathbf A \mapsto \mathbf A^*$ is an isomorphism.

Corollary 2. Let $U , V , W$ be inner product spaces over $\mathbb K$ and $S : U \to V$ , $T : V \to W$ where their adjoints exist. Then $(T \circ S)^* = S^* \circ T^*$ .

Proof. Fix $\mathbf u \in U, \mathbf w \in W$ . Then

$\begin{aligned}\langle (T \circ S)(\mathbf u), \mathbf w \rangle &= \langle T (S(\mathbf u)), \mathbf w \rangle \\ &= \langle S(\mathbf u), T^*(\mathbf w) \rangle \\ &= \langle \mathbf u, S^*(T^*(\mathbf w)) \rangle \\ &= \langle \mathbf u, (S^* \circ T^*)(\mathbf w) \rangle. \end{aligned}$

By uniqueness, $(T \circ S)^* = S^* \circ T^*$ .

Corollary 3. For $\mathbf A : \mathbb K^m \to \mathbb K^k$ and $\mathbf B : \mathbb K^n \to \mathbb K^m$ . Then $(\mathbf A \mathbf B)^* = \mathbf B^* \mathbf A^*$ . If $\mathbb K = \mathbb R$ , then $(\mathbf A \mathbf B)^{\mathrm T} = \mathbf B^{\mathrm T} \mathbf A^{\mathrm T}$ .

Corollary 4. Define $\overline{[a_{ij}]} := [\bar{a}_{ij}]$ for appropriately sized matrices. For $\mathbf A : \mathbb K^m \to \mathbb K^k$ , $\mathbf A^{\mathrm T} = {\overline{\mathbf A}}^*$ , even when $\mathbb K = \mathbb C$ . Hence, for $\mathbf B : \mathbb K^n \to \mathbb K^m$ , $(\mathbf A \mathbf B)^{\mathrm T} = \mathbf B^{\mathrm T} \mathbf A^{\mathrm T}$ .

Finally, we can return to our original problem in $\mathbb R^n$ . We have derived the following identity using best approximations:

$\langle \mathbf A \mathbf e_j, \mathbf A \mathbf u \rangle = \langle \mathbf A \mathbf e_j, \mathbf b \rangle.$

In the context $\mathbb K = \mathbb R$ , ${\mathbf A}^* = {\mathbf A}^{\mathrm T}$ , so that taking adjoints,

$\langle \mathbf e_j, {\mathbf A}^{\mathrm T} \mathbf A \mathbf u \rangle = \langle \mathbf e_j, {\mathbf A}^{\mathrm T} \mathbf b \rangle.$

Since $\mathbf e_j$ is arbitrary, by the injectivity of $\mathbf v \mapsto \langle \cdot ,\mathbf v \rangle$ , ${\mathbf A}^{\mathrm T} \mathbf A \mathbf u = {\mathbf A}^{\mathrm T} \mathbf b$ . Therefore, we can solve our best approximation problem. In fact, we can solve this problem even if $\mathbb K = \mathbb C$ .

Theorem 2. Let $V, W$ be inner product spaces over $\mathbb K$ , $T : V \to W$ a linear transformation.

If $W$ is finite-dimensional, then for any $\mathbf y \in W$ , the equation $T(\mathbf x) = \mathbf y$ has a unique best approximation.
If the equation $T(\mathbf x) = \mathbf y$ has a unique best approximation then $T(\mathbf u)$ is the best approximation if and only if $T(\mathbf u) = \mathrm{proj}_{T(V)}(\mathbf y)$ .
Furthermore, if $T^*$ exists, then $T(\mathbf u)$ is the best approximation if and only if $(T^* \circ T)(\mathbf u)=T^*(\mathbf y)$ .

Corollary 5. Fix $\mathbf A : \mathbb K^n \to \mathbb K^m$ and $\mathbf b \in \mathbb K^m$ . The equation $\mathbf A \mathbf x = \mathbf b$ has a unique best approximation. Furthermore, $\mathbf A \mathbf u$ is the best approximation if and only if ${\mathbf A}^* \mathbf A \mathbf u = {\mathbf A}^* \mathbf b$ . In particular, if $\mathbb K =\mathbb R$ , $\mathbf u$ must satisfy the equation ${\mathbf A}^{\mathrm T} \mathbf A \mathbf u = {\mathbf A}^{\mathrm T} \mathbf b$ .

And so I used my graphing calculator to solve the equation ${\mathbf A}^{\mathrm T} \mathbf A \mathbf u = {\mathbf A}^{\mathrm T} \mathbf b$ , and with 5 minutes of somewhat tedious typing, computed the theoretically perfect best-fit line (up to a precision of 3 significant figures). Then 3 months later, I learned that the calculator had an in-built function to compute this line in 3 seconds. Bummer.

In our next post, we will examine the adjoint transformation $T^*$ a bit more, and discover something peculiar about matrices that satisfy the innocent-looking equality $\mathbf A^* = \mathbf A$ . These tools will help us explore the spectral theorems, which will empower us with one very useful concrete application—singular value decomposition.

—Joel Kindiak, 13 Mar 25, 1716H

KindiakMath

We Finally Discuss Transposes

Leave a comment Cancel reply

We Finally Discuss Transposes

Share this:

Leave a comment Cancel reply