Theoretic Gaussian Elimination

Let \mathbb K be a field. Previously, we discussed at length how matrices with m rows and n columns of entries in \mathbb K are basically the same as linear transformations from \mathbb K^n to \mathbb K^m.

In this post, we explore the idea of invertible transformations and how we can use these ideas to help us solve simultaneous linear equations. It turns out that solving this problem is key to helping us understand dimensions in vector spaces.

Let V,W be vector spaces of \mathbb K. Recall that a linear transformation T : V \to W is a vector space isomorphism if T is bijective. The collection of inverse functions is a fascinating subject of study.

Lemma 1. For any nonempty set K, let \mathrm{Sym}(K) denote the set of bijections (more technically, symmetries) from K to itself.

  • For any f, g \in \mathrm{Sym}(K), f \circ g \in \mathrm{Sym}(K).
  • For any f, g, h \in \mathrm{Sym}(K), f \circ (g \circ h) = (f \circ g) \circ h.
  • The identity map \mathrm{id}_K \in \mathrm{Sym}(K) and for any f \in \mathrm{Sym}(K), f \circ \mathrm{id}_K = \mathrm{id}_K \circ f = f.
  • For any f \in \mathrm{Sym}(K), f^{-1} \in \mathrm{Sym}(K) and f \circ f^{-1} = f^{-1} \circ f = \mathrm{id}_K.

Together, we call \mathrm{Sym}(K) a group under function composition, called the symmetric group over K. Furthermore, if V is a vector space over \mathbb K, then the general linear group of V is defined by

\mathrm{GL}(V) := \mathcal L(V, V) \cap \mathrm{Sym}(V),

and is the group of vector space isomorphisms (under function composition) from V to itself. In the special case V = \mathbb K^n, we call \mathrm{GL}_n(\mathbb K) := \mathrm{GL}(\mathbb K^n) the general linear group of dimension n over \mathbb K.

It turns out that if there exists an isomorphism T : \mathbb K^m \to \mathbb K^n, then we must have m = n. This will be the key result we aim to prove in this blogpost. But first, a motivating question:

What are the solutions (in \mathbb K) to x_1 - 4x_2 - 5x_3 = 0?

Intuitively, we would think that x_1 = 4x_2 + 5x_3. This means that x_2 can be any free parameter s \in \mathbb K, and x_3 can be any free parameter t  \in \mathbb K. In that sense, we have the solution

x_1 = 4s + 5t,\quad x_2 = s,\quad x_3 = t.

Now we could represent this information in terms of vectors. Denote the vector

\mathbf x = x_1 \mathbf e_1 + x_2 \mathbf e_2 + x_3 \mathbf e_3 = \begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix}.

The equation tells us that we need to combine the recipe \mathbf x with the ingredients \mathbf A:= \begin{bmatrix}1 & -4 & -5\end{bmatrix} to obtain the dish

\mathbf A\mathbf x = \begin{bmatrix}1 & -4 & -5\end{bmatrix}\begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix} = x_1 - 4x_2 - 5x_3 = 0.

Indeed, any \mathbf x that solves the given equation \mathbf A \mathbf x = 0 is simply an element of \mathrm{ker}(T_{\mathbf A}) and vice versa. From our exploration, we can actually compute this set, since

\mathbf x = \begin{bmatrix}4s + 5t \\ s \\ t\end{bmatrix} = s \begin{bmatrix} 4 \\ 1 \\ 0\end{bmatrix} + t \begin{bmatrix} 5 \\ 0 \\ 1\end{bmatrix},\quad s  \in \mathbb K,\quad t \in \mathbb K.

Therefore, \mathrm{ker}(T_{\mathbf A}) = \mathrm{span} \left\{\begin{bmatrix} 4 \\ 1 \\ 0\end{bmatrix}, \begin{bmatrix} 5 \\ 0 \\ 1\end{bmatrix} \right\}. With 1 equation in 3 variables, we get a kernel with basis containing 2 elements. What if we had 2 equations?

What are the solutions (in \mathbb K) to the following system?

\left\{ \begin{aligned} x_1 - 4x_2 - 5x_3 &= 0 \\ x_1 + 2x_2 + 3x_3 &= 0 \end{aligned} \right.

Denoting \mathbf A = \begin{bmatrix} 1 & -4 & -5 \\ 1 & 2 & 3\end{bmatrix} =: \begin{bmatrix} \mathbf a_1^{\mathrm T} \\ \mathbf a_2^{\mathrm T} \end{bmatrix} and \mathbf x = \begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix}, we now want to calculate \mathrm{ker}(T_{\mathbf A}). A careful analysis yields

\mathrm{ker}(T_{\mathbf A}) = \mathrm{ker}(T_{\mathbf a_1^{\mathrm T}}) \cap \mathrm{ker}(T_{\mathbf a_2^{\mathrm T}}).

Unfortunately, evaluating the right-hand side will inevitably lead us back to where we started. We need some clever tricks. Well, recalling the method of elimination, if we subtracted the first equation from the second, we get

\left\{ \begin{aligned} x_1 - 4x_2 - 5x_3 &= 0 \\ 6x_2 + 8x_3 &= 0. \end{aligned} \right.

Multiplying by 1/6 then allows the coefficient of x_2 to become 1, yielding

\left\{ \begin{aligned} x_1 - 4x_2 - 5x_3 &= 0 \\ x_2 + \textstyle \frac 43 x_3 &= 0. \end{aligned} \right.

Finally, adding 4 times of the second equation from the first yields

\left\{ \begin{aligned} \textstyle x_1 \phantom{+ 0x_2} + \frac 13 x_3 &= 0 \\ x_2 + \textstyle \frac 43 x_3 &= 0. \end{aligned} \right.

Denoting \mathbf R := \begin{bmatrix} 1 & 0 & 1/3 \\ 0 & 1 & 4/3\end{bmatrix}, we conclude that

\mathrm{ker}(T_{\mathbf A}) = \mathrm{ker}(T_{\mathbf R}) = \mathrm{span} \left\{ \begin{bmatrix} -1 \\ -4 \\ 3 \end{bmatrix} \right\}.

The key idea is that in each step, we could “undo” the process (e.g. multiplying by 6 “undoes” multiplying by 1/6). At each step, we are transforming each equation, which corresponds to transforming the rows of \mathbf A. The final matrix \mathbf R is known as the reduced row-echelon form where the columns are arranged as follows:

  • The columns \mathbf e_1,\mathbf e_2,\dots are arranged right-ward. These are called pivot columns. Columns that are not pivot columns are non-pivot columns.
  • Before a column that is \mathbf e_1, all columns are \mathbf 0.
  • Any column between \mathbf e_j and \mathbf e_{j+1} belongs to \mathrm{span}(\mathbf e_1,\dots,\mathbf e_j). Here, we use round brackets to emphasise the order of the basis.

Using induction, we can prove that every matrix \mathbf A has a unique reduced row-echelon form.

Therefore, given \mathbf A \in \mathcal M_{m \times n}(\mathbb K), we want to prove that \mathrm{ker}(T_{\mathbf A}) = \mathrm{ker}(T_{\mathbf R}). In fact, this result will hold if there exists an invertible (i.e. bijective) linear transformation S \in \mathrm{GL}_m(\mathbb K) such that

T_{\mathbf R} = S \circ T_{\mathbf A}.

If we can prove that our suggested moves do indeed correspond with bijective linear transformations, we can prove the result. We will use 2 \times 3 matrices to illustrate the technique.

Lemma 2. Suppose T : \mathbb K^n \to \mathbb K^n is invertible (i.e. bijective). Then there exists a matrix \mathcal S(T)^{-1} such that

\mathcal S(T) \mathcal S(T)^{-1} = \mathcal S(T)^{-1} \mathcal S(T) = \mathbf I_n.

Furthermore, \mathcal S(T)^{-1} = \mathcal S(T^{-1}).

Proof. By invertibility,

T \circ T^{-1} = T^{-1} \circ T = I_{\mathbb K^n} \equiv \mathrm{id}_{\mathbb K^n}.

By the definition of \mathcal S,

\mathcal S(T \circ T^{-1}) = \mathcal S(T^{-1} \circ T) = \mathcal S(I_{\mathbb K^n}).

We can check that the map \mathcal S induces a group structure on

\mathrm{GL}_n(\mathbb K) \cong \mathcal S(\mathrm{GL}_n(\mathbb K)) \subseteq \mathcal M_{n \times n}(\mathbb K)

with identity \mathbf I_n := \begin{bmatrix}\mathbf e_1 & \cdots & \mathbf e_n \end{bmatrix}. For the inverse of \mathcal S(T),

\mathcal S(T) \mathcal S(T^{-1}) = \mathcal S(T^{-1}) \mathcal S(T) = \mathbf I_n,

yielding \mathcal S(T)^{-1} = \mathcal S(T^{-1}).

Henceforth, we will make no distinction between \mathcal M_{m \times n}(\mathbb K) and \mathcal L(\mathbb K^n, \mathbb K^m) (with the standard ordered bases), and in particular, regard T_{\mathbf A} = \mathbf A for matrices \mathbf A.

Corollary 1. A matrix \mathbf A \in \mathcal M_{n \times n}(\mathbb K) is invertible if it belongs to \mathrm{GL}_n(\mathbb K).

Let \mathbf A = \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \end{bmatrix} for discussion purposes.

Lemma 3. Multiplying row 1 by the constant k is defined by the invertible matrix

\begin{bmatrix} k & 0 \\ 0 & 1\end{bmatrix}.

Proof. To show that we multiply the row, we carry out the matrix multiplication:

\begin{aligned} \begin{bmatrix} k & 0 \\ 0 & 1\end{bmatrix} \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \end{bmatrix} &=  \begin{bmatrix} ka_{11} & ka_{12} & k a_{13} \\ a_{21} & a_{22} & a_{23} \end{bmatrix}.\end{aligned}

We can similarly verify that the inverse of \begin{bmatrix} k & 0 \\ 0 & 1\end{bmatrix} is \begin{bmatrix} 1/k & 0 \\ 0 & 1\end{bmatrix}, left as an exercise.

Lemma 4. Adding k times of row 1 to row 2 is defined by the invertible matrix

\begin{bmatrix} 1 & 0 \\ k & 1\end{bmatrix}.

Swapping row 1 with 2 is defined by the invertible matrix

\begin{bmatrix} 0 & 1 \\ 1 & 0\end{bmatrix}.

Proof. Exercise. Check that

\begin{bmatrix} 1 & 0 \\ k & 1\end{bmatrix}^{-1} = \begin{bmatrix} 1 & 0 \\ -k & 1\end{bmatrix},\quad \begin{bmatrix} 0 & 1 \\ 1 & 0\end{bmatrix}^{-1} = \begin{bmatrix} 0 & 1 \\ 1 & 0\end{bmatrix}.

The matrices of the forms in Lemmas 3 and 4 are called elementary matrices, which can be defined similarly for n \times n matrices, as well as operations between different rows.

Lemma 5. For any m \times n matrix \mathbf A with reduced row-echelon form \mathbf R, there exists an invertible matrix \mathbf E such that \mathbf E \mathbf A = \mathbf R. In particular, for vectors \mathbf x \in \mathbb K^n, \mathbf b \in \mathbb K^m,

\mathbf A \mathbf x = \mathbf b \quad \iff \quad \mathbf R \mathbf x = \mathbf E\mathbf b.

This procedure of converting a matrix into its reduced row-echelon form is called Gaussian elimination (or more precisely, Gauss-Jordan elimination).

Proof. Thanks to the matrices in Lemmas 3 and 4, there exist a sequence of elementary matrices \mathbf E_1,\dots,\mathbf E_k such that

\mathbf E_k\cdots\mathbf E_2\mathbf E_1\mathbf A = \mathbf R.

Denoting \mathbf E := \mathbf E_k\cdots\mathbf E_2\mathbf E_1, which is invertible, \mathbf E \mathbf A = \mathbf R.

We have come to the fundamental theorem of invertible matrices.

Theorem 1. Let \mathbf A be an n \times n matrix. The following are equivalent.

  • \mathbf A is invertible.
  • The solution to \mathbf A\mathbf x = \mathbf 0 is \mathbf x = \mathbf 0.
  • The reduced row-echelon form of \mathbf A is \mathbf I_n.
  • \mathbf A can be expressed as a product of elementary matrices.

Proof. If \mathbf A is invertible, then \mathbf A is injective, and the second proposition holds.

Suppose the solution to \mathbf A\mathbf x = \mathbf 0 is \mathbf x = \mathbf 0. We claim that the reduced row-echelon form \mathbf R of \mathbf A has no zero rows and no non-pivot columns.

If \mathbf R has a zero row, then \mathbf R\mathbf e_n = \mathbf 0, so that \mathbf A \mathbf e_n = \mathbf 0. Hence, \mathbf A has a non-trivial solution, a contradiction.

Suppose \mathbf R has a non-pivot column \mathbf v. If \mathbf v is the first column, it must be \mathbf 0, and so \mathbf e_1 will be a non-trivial solution to \mathbf R \mathbf e_1 = \mathbf 0 \Rightarrow \mathbf A \mathbf e_1 = \mathbf 0, a contradiction.

If instead \mathbf v is not the first column, then the first column is \mathbf e_1. Suppose \mathbf v = c \mathbf e_1 for some nonzero c \in \mathbb K without loss of generality. Then the first row entries of row 1 will resemble

\begin{bmatrix} 1 & c & r_{13} & \cdots & r_{1n} \end{bmatrix}.

This implies that x_1 = -c x_2 + (\text{terms}), and x_2 can be regarded as a free parameter t \in \mathbb K so that

x_1 = -c t + (\text{terms}).

In particular, choosing t such that x_1 = -c t + (\text{terms}) \neq 0 yields a nonzero \mathbf x such that \mathbf R \mathbf x = \mathbf 0, which yields \mathbf A \mathbf x = \mathbf 0, a contradiction.

Since \mathbf R has no zero rows or non-pivot columns, all n columns are pivot columns. Therefore,

\mathbf R = \begin{bmatrix} \mathbf e_1 & \cdots & \mathbf e_n \end{bmatrix} = \mathbf I_n.

Suppose \mathbf R = \mathbf I_n, which is an elementary matrix. By the proof in Lemma 5, there exist a sequence of elementary matrices \mathbf E_1,\dots,\mathbf E_k such that

\mathbf E_k\cdots\mathbf E_2\mathbf E_1\mathbf A = \mathbf R.

Since inverses of elementary matrices are themselves elementary matrices,

\mathbf A = \mathbf E_1^{-1} \mathbf E_2^{-1} \cdots \mathbf E_k^{-1} \mathbf R

is a product of elementary matrices. Since each elementary matrix is invertible, so is \mathbf A, as required.

Analysing pivot columns is itself interesting, since we can generalise the arguments above to deduce the necessary sizes of the matrices.

Corollary 2. Let \mathbf A \in \mathcal M_{m \times n}(\mathbb K).

  • If \mathbf A is injective, then m \geq n.
  • If \mathbf A is surjective, then m \leq n.

Proof. Let \mathbf R denote the reduced row-echelon form of \mathbf A. Write \mathbf E \mathbf A = \mathbf R. We have the following relationships

\begin{gathered} \#(\text{zero rows}) + \#(\text{non-zero rows})=m, \\ \#(\text{pivot columns}) + \#(\text{non-pivot columns})=n, \\ \#(\text{pivot columns}) = \#(\text{non-zero rows}). \end{gathered}

If \mathbf A is injective, then \mathbf R has no non-pivot columns. On the other hand, we claim that if \mathbf A is surjective, then \mathbf R has no zero rows. We will prove this claim by contrapositive.

Suppose \mathbf R has at least one zero row. Then the equation \mathbf R\mathbf x = \mathbf e_m has no solution since row m yields 0 \neq 1 = \mathbf e_m(m) for any \mathbf x. Therefore, the equation \mathbf A \mathbf x = \mathbf E^{-1}\mathbf e_m has no solution. In other words, there exists a vector, namely \mathbf b := \mathbf E^{-1} \mathbf e_m \in \mathbb K^m such that for any \mathbf x \in \mathbb K^n, \mathbf A \mathbf x \neq \mathbf b. Therefore, \mathbf A is not surjective.

Now that we have established the relative sizes of matrices given its injectivity or surjectivity status, we are finally ready to discussion dimensions. Roughly speaking, a linearly independent set corresponds to an injective matrix, while a spanning set corresponds to a surjective matrix. The numbers of the dimensions will then follow accordingly. But this post is getting as lengthy as it needs to be, and so we delay this discussion to next time.

—Joel Kindiak, 1 Mar 25, 1431H

,

Published by


Responses

  1. What is a Determinant? – KindiakMath

    […] 1. is injective if and only if […]

    Like

  2. Introductory Eigenstuff – KindiakMath

    […] other words, we want to compute . We had a whole other detour into Gaussian elimination just so that we can compute such kernels! Perform Gaussian elimination (with relevant row […]

    Like

Leave a reply to What is a Determinant? – KindiakMath Cancel reply