Entering the Matrix

Having discussed vector spaces and linear transformations in sufficient generality, we are finally ready to discuss matrices: the beefed-up brother to vectors.

Let \mathbb K denote a field and E_n := (\mathbf e_1,\dots,\mathbf e_n) denote the standard (ordered) basis for \mathbb K^n. Here, we will distinguish the ordered bases (\mathbf e_1, \mathbf e_2) and (\mathbf e_2, \mathbf e_1) since the elements are sequenced in different orders.

Definition 1. For positive integers m, n, an m \times n matrix over \mathbb K is a rectangular array of elements in \mathbb K. We use capital letters \mathbf A, \mathbf B, \mathbf C to denote matrices and small letters a_{ij}, b_{ij}, c_{ij} to denote the entry in the i-th row and j-th column:

\mathbf A = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn}  \end{bmatrix} = \begin{bmatrix}\mathbf a_1 & \dots & \mathbf a_n \end{bmatrix} = [a_{ij}].

We will let \mathcal M_{m \times n}(\mathbb K) denote the collection of such matrices. Rather obviously, we denote \mathbf A = \mathbf B to mean a_{ij} = b_{ij} for any i, j.

This is where our discussion on matrices will diverge from the usual progression—defining the various operations and then computing several examples. We will not adopt that approach. Instead, we will interpret matrices, fundamentally, as linear transformations. Then using established ideas in linear transformations, we will recover the natural definitions for such matrix operations.

For vector spaces V, W over \mathbb K, let \mathcal L(V, W) denote the collection of linear transformations from V \to W.

Theorem 1. Let m,n be positive integers. Define

\mathbf A_T := \begin{bmatrix}T(\mathbf e_1) & T(\mathbf e_2) & \cdots & T(\mathbf e_n)\end{bmatrix},\quad T(\mathbf e_j) = \begin{bmatrix} a_{1j} \\ \vdots \\ a_{mj} \end{bmatrix} \in \mathbb K^m.

Then the map \mathcal S : \mathcal L(\mathbb K^n, \mathbb K^m) \to \mathcal M_{m \times n}(\mathbb K) defined by \mathcal S(T) = \mathbf A_T is a well-defined bijection.

Proof. That \mathcal S is well-defined follows from T being a function. For surjectivity, fix \mathbf A = \begin{bmatrix} \mathbf a_1 & \cdots & \mathbf a_n \end{bmatrix}. Define T(\mathbf e_j) = \mathbf a_j and extend it to a linear transformation T : \mathbb K^n \to \mathbb K^m by linearity. Then

\begin{aligned} \mathcal S(T) = \mathbf A_T &= \begin{bmatrix}T(\mathbf e_1) & T(\mathbf e_2) & \cdots & T(\mathbf e_n)\end{bmatrix} \\ &=  \begin{bmatrix}\mathbf a_1 & \dots & \mathbf a_n \end{bmatrix} = \mathbf A. \end{aligned}

As of now, we have not shown that \mathcal M_{m \times n}(\mathbb K) forms a vector space over \mathbb K. Thus, we cannot use the property \mathcal S(T) = \mathbf 0 \Rightarrow T = 0 to prove that T is injective. (Indeed, what do we mean by \mathbf 0 on the right-hand side?) Thankfully, we can still use the vanilla definition of injectivity with relative ease.

For injectivity, fix T, T' \in \mathcal L(\mathbb K^n, \mathbb K^m) and suppose

\mathbf A_{T} = \mathcal S(T) = \mathcal S(T') = \mathbf A_{T'}.

Then for each j, T(\mathbf e_j) = \mathbf a_j = T'(\mathbf e_j). Therefore T|_{E_n} = T'|_{E_n}. Since the extension to \mathbb K^n = \mathrm{span}(E_n) is unique, we have T = T'.

Since \mathcal S is bijective, \mathcal S^{-1} : \mathcal M_{m \times n}(\mathbb K) \to \mathcal F(\mathbb K^n, \mathbb K^m) is also a bijection. This means that we can create the vector space structure in \mathcal M_{m \times n}(\mathbb K) using the vector space structure in \mathcal F(\mathbb K^n, \mathbb K^m).

Corollary 1. Let \mathbf A = [a_{ij}], \mathbf B = [b_{ij}] \in \mathcal M_{m \times n}(\mathbb K) be matrices and c \in \mathbb K be a scalar. Using the transformation \mathcal S in Theorem 1, addition and scalar multiplication are defined by

[a_{ij}] +  [b_{ij}] = [a_{ij} + b_{ij}],\quad c \cdot  [a_{ij}] = [c \cdot a_{ij}].

In particular, we have the additive identity \mathbf 0 = 0 \cdot [a_{ij}] = [0] and for any [a_{ij}], the additive inverse -[a_{ij}] = (-1)[a_{ij}] = [-a_{ij}].

Proof. Define T_{\mathbf A} := \mathcal S^{-1}(\mathbf A) and T_{\mathbf B} := \mathcal S^{-1}(\mathbf B). Then by an exercise, matrix addition is defined by

\mathbf A + \mathbf B := \mathcal S(T_{\mathbf A} + T_{\mathbf B}).

For any j, we use the vector space structure of \mathbb K^m to compute

\begin{aligned} (T_{\mathbf A} + T_{\mathbf B})(\mathbf e_j) &= T_{\mathbf A} (\mathbf e_j) + T_{\mathbf B}(\mathbf e_j) \\ &= \mathbf a_j + \mathbf b_j \\ &= \begin{bmatrix} a_{1j} \\ \vdots \\ a_{mj} \end{bmatrix} + \begin{bmatrix} b_{1j} \\ \vdots \\ b_{mj}  \end{bmatrix} \\ &= \begin{bmatrix} a_{1j} + b_{1j} \\ \vdots \\ a_{mj} + b_{mj} \end{bmatrix}. \end{aligned}

Therefore, the (i,j)-th entry of \mathbf A + \mathbf B is a_{ij} + b_{ij}. In more concise terms,

[a_{ij}] + [b_{ij}] = [a_{ij} + b_{ij}].

Scalar multiplication is derived similarly using the definition c\mathbf A := \mathcal S^{-1}(cT_{\mathbf A}).

This construction also strengthens \mathcal S into a vector space isomorphism.

Corollary 2. \mathcal M_{m \times n}(\mathbb K) \cong \mathcal L(\mathbb K^n, \mathbb K^m).

The key point worth highlighting from Corollary 2 is this:

Matrices are effectively the same as linear transformations from \mathbb K^n to \mathbb K^m with respect to the standard ordered basis.

In more layperson terms, the key point is to think of matrices not as arbitrary tables of numbers, but as encodings of linear transformations. In this regard, the definition of matrix multiplication and its properties become trivial consequences of composing linear transformations.

Definition 2. Let \mathbf A \in \mathcal M_{k \times m}(\mathbb K) and \mathbf B \in \mathcal M_{m \times n}(\mathbb K). Define matrix multiplication by

\mathbf A \mathbf B := \mathcal S(T_{\mathbf A} \circ T_{\mathbf B})\quad \iff \quad T_{\mathbf A \mathbf B} := T_{\mathbf A} \circ T_{\mathbf B}.

Corollary 3. Let \mathbf A \in \mathcal M_{k \times m}, \mathbf B \in \mathcal M_{m \times n}, \mathbf C \in \mathcal M_{n \times p}. Then

\mathbf A (\mathbf B \mathbf C) = (\mathbf A \mathbf B)\mathbf C.

Proof. By the associativity of function composition,

\begin{aligned} T_{\mathbf A (\mathbf B \mathbf C)} &= T_{\mathbf A} \circ T_{\mathbf B \mathbf C} \\ &= T_{\mathbf A} \circ (T_{\mathbf B } \circ T_{\mathbf C}) \\ &= (T_{\mathbf A} \circ T_{\mathbf B }) \circ T_{\mathbf C} \\ &= (T_{\mathbf A\mathbf B }) \circ T_{\mathbf C} = T_{ (\mathbf A\mathbf B) \mathbf C}. \end{aligned}

Therefore,

\mathbf A (\mathbf B \mathbf C) = \mathcal S^{-1}(T_{\mathbf A (\mathbf B \mathbf C)}) = \mathcal S^{-1}(T_{ (\mathbf A\mathbf B) \mathbf C}) = (\mathbf A \mathbf B)\mathbf C.

Lemma 1. Let \mathbf A \in \mathcal M_{m \times n}. For any \mathbf v \in \mathbb K^n \cong \mathcal M_{n \times 1}(\mathbb K), \mathbf A\mathbf v = [T_{\mathbf A}(\mathbf v)]. In particular,

\mathbf A\mathbf v = \begin{bmatrix} \mathbf a_1 & \cdots & \mathbf a_n \end{bmatrix} \begin{bmatrix} v_1 \\ \vdots \\ v_n \end{bmatrix} = v_1 \mathbf a_1 + \cdots + v_n \mathbf a_n.

Proof. To prove \mathbf A\mathbf v = [T_{\mathbf A}(\mathbf v)], we first note that T_{\mathbf v} : \mathbb K \to \mathbb K^n is defined by T_{\mathbf v}(1) = \mathbf v, then extended by linearity since \mathbb K = \mathrm{span}\{1\}. By definition of matrix multiplication,

\begin{aligned} \mathbf A \mathbf v &= \mathcal S(T_{\mathbf A} \circ T_{\mathbf v}) \\ &= [(T_{\mathbf A} \circ T_{\mathbf v})(1)] \\ &= [T_{\mathbf A} (T_{\mathbf v}(1)) ] = [T_{\mathbf A} (\mathbf v) ]. \end{aligned}

For the matrix representation, find unique scalars v_1, \dots, v_n \in \mathbb K such that \mathbf v = v_1 \mathbf e_1 + \dots + v_n \mathbf e_n. Then

\begin{aligned} \mathbf A\mathbf v = T_{\mathbf A}(\mathbf v) &= T_{\mathbf A}(v_1 \mathbf e_1 + \dots + v_n \mathbf e_n) \\ &= v_1 T_{\mathbf A}(\mathbf e_1) + \dots + v_n T_{\mathbf A}(\mathbf e_n) \\ &= v_1 \mathbf a_1 + \dots + v_n \mathbf a_n.\end{aligned}

As a side-note, Lemma 1 gives us a rather comical yet useful analogy to left-multiplying the matrix \mathbf A = \begin{bmatrix} \mathbf a_1 & \dots & \mathbf a_n \end{bmatrix} with the vector \mathbf v = \begin{bmatrix} v_1 \\ \vdots \\ v_n \end{bmatrix}. We can think of each \mathbf a_j as an “ingredient”, and \mathbf v as a “recipe”, which tells us to add up v_j “units” of each “ingredient” \mathbf a_j. Then

\mathbf A \mathbf v =\begin{bmatrix} \mathbf a_1 & \cdots & \mathbf a_n \end{bmatrix} \begin{bmatrix} v_1 \\ \vdots \\ v_n \end{bmatrix} = v_1 \mathbf a_1 + \cdots + v_n \mathbf a_n

gives us the “dish” \mathbf A\mathbf v that we created by combining the recipe with the ingredients.

Corollary 4. Let \mathbf A \in \mathcal M_{k \times m} and \mathbf B \in \mathcal M_{m \times n}. Then for any vector \mathbf v,

(\mathbf A \mathbf B)\mathbf v = \mathbf A(\mathbf B\mathbf v).

In particular,

\mathbf A \begin{bmatrix} \mathbf b_1 & \dots \mathbf b_n \end{bmatrix} = \begin{bmatrix} \mathbf A \mathbf b_1 & \dots \mathbf A \mathbf b_n \end{bmatrix}.

Furthermore, for 1 \leq i \leq k and 1 \leq j \leq n

\mathbf A \mathbf B = [a_{i1} b_{1j} + \cdots + a_{im} b_{mj}].

Proof. The associativity of matrix multiplication follows from Corollary 3. The (i,j)-entry of \mathbf A\mathbf B refers to the i-th coordinate of (\mathbf A \mathbf B)(\mathbf e_j), which we evaluate by the first part to equal

\begin{aligned}  (\mathbf A \mathbf B)\mathbf e_j = \mathbf A \mathbf (\mathbf B \mathbf e_j) = \mathbf A \mathbf b_j  &= \begin{bmatrix} \mathbf a_1 & \cdots & \mathbf a_m \end{bmatrix} \begin{bmatrix} b_{1j} \\ \vdots \\ b_{mj} \end{bmatrix} \\ &= b_{1j} \mathbf a_1 + \cdots + b_{mj} \mathbf a_m \\ &= b_{1j} \begin{bmatrix} a_{11} \\ \vdots \\ a_{k1} \end{bmatrix} + \cdots + b_{mj} \begin{bmatrix} a_{1m} \\ \vdots \\ a_{km} \end{bmatrix} \\ &= \begin{bmatrix} a_{11}b_{1j} + \cdots + a_{1m}b_{mj} \\ \vdots \\ a_{k1}b_{1j} + \cdots + a_{km}b_{mj} \end{bmatrix},\end{aligned}

where the expansion in the fourth “=” works thanks to Lemma 1. In particular, the i-th coordinate is a_{i1}b_{1j} + \cdots + a_{im}b_{mj}, as required.

The final result is the conventional definition for matrix multiplication, but the intermediate result tends to be more intuitive and useful, since it can be interpreted using our recipe-ingredient analogy in the following manner:

The matrix \mathbf B = \begin{bmatrix} \mathbf b_1 & \dots \mathbf b_n \end{bmatrix} encodes the n different recipes, with \mathbf b_j denoting the j-th recipe. Since we have n recipes, we want to cook n dishes. For the j-th dish, we use the j-th recipe with the ingredients in \mathbf A to obtain the dish \mathbf A\mathbf b_j, which is the j-th column in \mathbf A \mathbf B. Therefore, \mathbf A \mathbf B encodes the n different dishes:

\mathbf A \begin{bmatrix} \mathbf b_1 & \dots \mathbf b_n \end{bmatrix} = \begin{bmatrix} \mathbf A \mathbf b_1 & \dots \mathbf A \mathbf b_n \end{bmatrix}.

Therefore, many of our desired matrix operations arise from regarding matrices as linear transformations, including matrix multiplication, which encodes composing linear transformations. As such, it should make intuitive sense that more often than not, for matrices \mathbf A,\mathbf B, \mathbf A \mathbf B \neq \mathbf B \mathbf A, since function composition is likewise in general not commutative, i.e. f \circ g = g \circ f is a very rare sighting.

We could wonder to ourselves: are there infinite-dimensional matrices? Once again, if we interpret matrices as linear transformations, and infinite-dimensional vector spaces like \mathbb K[x] do exist, it wouldn’t be too far-fetched to rigorously formulate infinite-dimensional matrices, albeit with more care than the finite-dimensional case.

But perhaps let’s at least posit a more immediate question. If there are inverses T^{-1} of bijective linear transformations T, could there be inverses of matrices \mathbf A = \mathcal S(T)? Intuitively, we would like

\mathbf A^{-1} = \mathcal S(T^{-1}).

This idea turns out to work. In fact, since \mathbf A = \mathcal S(T), this idea gives us a nice commutative property

\mathcal S(T)^{-1} = \mathcal S(T^{-1}).

But we will explore this idea in more detail next time. In particular, invertible matrices will help us formalise the otherwise mundane algorithm known as Gaussian elimination.

—Joel Kindiak, 28 Feb 25, 1825H

,

Published by


Leave a comment