Manipulating Determinants

Previously, we undertook the heavy duty of defining determinants. Let \mathbb K be a field and V := \mathbb K^n.

Definition 1. Define the determinant \det : \mathcal M_{n \times n}(\mathbb K) \to \mathbb K of an n \times n matrix by

\begin{aligned} \left| \begin{matrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \dots & a_{nn} \end{matrix} \right| &\equiv \det \left( \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \dots & a_{nn} \end{bmatrix} \right) \\ &\equiv \det\left(\begin{bmatrix} a_{11} \\ a_{12} \\ \vdots \\ a_{1n} \end{bmatrix}, \begin{bmatrix} a_{21} \\ a_{22} \\ \vdots \\ a_{2n} \end{bmatrix}, \cdots, \begin{bmatrix} a_{n1} \\ a_{n2} \\ \vdots \\ a_{nn} \end{bmatrix}\right) \\ &:= \sum_{f\in \mathrm{Sym}(\{1,\dots,n\})} \sigma(f) v_{1, f(1)}\cdot \cdots \cdot v_{n, f(n)}. \end{aligned}

By this definition, the determinant is multilinear, alternating, and normalised. By our investigations in previous posts, any function V^n \to \mathbb K that satisfies these three properties can only have the formula in Definition 1. Why then do we describe the determinant in terms of these three properties? So that manipulation makes more sense.

Lemma 1. Let \mathbf A \in \mathcal M_{n \times n}(\mathbb K). For any elementary matrix \mathbf E \in \mathcal M_{n \times N}(\mathbb K),

\det(\mathbf E \mathbf A) = \det(\mathbf E)\det(\mathbf A).

Proof. We just need to verify this fact for each of the three row operations. For the row operation corresponding to multiplying one row by nonzero k, suppose without loss of generality that the first row of \mathbf A gets multiplied by k. Let \mathbf E denote the corresponding elementary matrix. By multi-linearity, the corresponding elementary matrix will then have determinant

\begin{aligned} \det(\mathbf E\mathbf A) &= \det\left(k\begin{bmatrix} a_{11} \\ a_{12} \\ \vdots \\ a_{1n} \end{bmatrix}, \begin{bmatrix} a_{21} \\ a_{22} \\ \vdots \\ a_{2n} \end{bmatrix}, \cdots, \begin{bmatrix} a_{n1} \\ a_{n2} \\ \vdots \\ a_{nn} \end{bmatrix}\right) \\ &= k \det\left(\begin{bmatrix} a_{11} \\ a_{12} \\ \vdots \\ a_{1n} \end{bmatrix}, \begin{bmatrix} a_{21} \\ a_{22} \\ \vdots \\ a_{2n} \end{bmatrix}, \cdots, \begin{bmatrix} a_{n1} \\ a_{n2} \\ \vdots \\ a_{nn} \end{bmatrix}\right) = k\det(\mathbf A). \end{aligned}

By normality,

\det(\mathbf E) = \det(\mathbf E\mathbf I) = k \cdot \det(\mathbf I) = k.

Therefore,

\det(\mathbf E \mathbf A) = k \det(\mathbf A) = \det(\mathbf E)\det(\mathbf A).

For the row operation corresponding to adding k times of one row to another, suppose without loss of generality we are adding k times of row 1 to row 2. Then multi-linearity and alternativity yields \det(\mathbf E\mathbf A) equalling

\begin{aligned} &\det\left(\begin{bmatrix} a_{11} \\ a_{12} \\ \vdots \\ a_{1n} \end{bmatrix}, \begin{bmatrix} a_{21} \\ a_{22} \\ \vdots \\ a_{2n} \end{bmatrix} + k \begin{bmatrix} a_{11} \\ a_{12} \\ \vdots \\ a_{1n} \end{bmatrix}, \cdots, \begin{bmatrix} a_{n1} \\ a_{n2} \\ \vdots \\ a_{nn} \end{bmatrix}\right) \\ &= \det\left(\begin{bmatrix} a_{11} \\ a_{12} \\ \vdots \\ a_{1n} \end{bmatrix}, \begin{bmatrix} a_{21} \\ a_{22} \\ \vdots \\ a_{2n} \end{bmatrix}, \cdots, \begin{bmatrix} a_{n1} \\ a_{n2} \\ \vdots \\ a_{nn} \end{bmatrix}\right) + k\det\left(\begin{bmatrix} a_{11} \\ a_{12} \\ \vdots \\ a_{1n} \end{bmatrix}, \begin{bmatrix} a_{11} \\ a_{12} \\ \vdots \\ a_{1n} \end{bmatrix}, \cdots, \begin{bmatrix} a_{n1} \\ a_{n2} \\ \vdots \\ a_{nn} \end{bmatrix}\right)  \\ &= \det(\mathbf A) + k \cdot 0 =\det(\mathbf A). \end{aligned}

Therefore,

\det(\mathbf E) = \det(\mathbf E \mathbf I) = \det(\mathbf I) = 1 \neq 0.

For row-swapping, the result is immediate due to \det being alternative.

In particular, we have one of the most important uses of determinants: characterising invertible matrices.

Theorem 1. For any square matrix \mathbf A \in \mathcal M_{n \times n}(\mathbb K), \mathbf A is invertible if and only if \det(\mathbf A) \neq 0.

Proof. Let \mathbf R denote the reduced row-echelon form of \mathbf A and \mathbf E_1,\dots,\mathbf E_k be elementary matrices that transform \mathbf A into \mathbf R:

\mathbf A = \mathbf E_1^{-1},\dots,\mathbf E_k^{-1} \mathbf R.

Recall that inverses of elementary matrices are themselves elementary matrices. By induction on Lemma 1,

| \mathbf A | = | \mathbf E_1^{-1}| \dots | \mathbf E_k^{-1} | | \mathbf R |,

where we denote |\cdot| \equiv \det(\cdot) for brevity. Recall that \mathbf A is invertible if and only if \mathbf R = \mathbf I. If \mathbf R = \mathbf I, then |\mathbf A| \neq 0 as a product of nonzero scalars. If \mathbf R \neq \mathbf I, then \mathbf R is not injective, and thus has at least one zero row. Hence,

|\mathbf R| = \det(* , *, \cdots , \mathbf 0 ) = 0 \cdot \det(* , *, \cdots , \mathbf e_n )= 0,

so that |\mathbf A| = 0.

Furthermore, we obtain the crucial multiplicativity property of determinants.

Theorem 2. For \mathbf A, \mathbf B \in \mathcal M_{n \times n}(\mathbf K), \det(\mathbf A\mathbf B) = \det(\mathbf A)\det(\mathbf B).

Proof. If \mathbf A and \mathbf B are both invertible, then both matrices can be expressed as products of elementary matrices, and Lemma 1 yields the desired result. If either \mathbf A or \mathbf B are not invertible (i.e. singular), then by Theorem 1, it suffices to prove that \mathbf A \mathbf B is singular.

If \mathbf B is singular, then \mathbf B is not injective and there exists some nonzero \mathbf v \in \mathbb K^n such that \mathbf B\mathbf v = \mathbf 0. Hence,

(\mathbf A \mathbf B)\mathbf v = \mathbf A(\mathbf B\mathbf v) = \mathbf A\mathbf 0 = \mathbf 0.

Thus, \mathbf A \mathbf B is not injective and thus singular, as required.

Suppose otherwise that \mathbf B is invertible. If \mathbf A \mathbf B is invertible, then \mathbf A = (\mathbf A \mathbf B) \mathbf B^{-1} is invertible too. By contrapositive, if \mathbf A is singular, then \mathbf A \mathbf B is singular, as required.

What these investigations tell us is that to compute the determinant of a square matrices, \mathbf A, we can perform Gauss-Jordan elimination, and then multiply the determinants of the corresponding “actions” along the way. While this certainly gives us some computational grasp on the determinant, we are instant-gratification addicts—can we look at a matrix and compute its determinant immediately and quickly? For instance, in the 2 \times 2 case, we don’t have any issues, since

\left| \begin{matrix} a& b \\ c & d \end{matrix}\right| = ad - bc

as per prior investigations. Denote \mathcal S_n := \mathrm{Sym}(\{1,\dots,n\}) for brevity. For the 3 \times 3 case, the idea is to build it up from the 2 \times 2 case. Recall the computational definition of |\mathbf A|:

\begin{aligned} \left| \begin{matrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{matrix} \right| &= \sum_{f\in \mathcal S_3} \sigma(f) a_{1, f(1)} a_{2, f(2)} a_{3, f(3)}. \end{aligned}

Now, for f \in \mathcal S_3, it is obvious that f(1) \in \{1,2,3\}. Therefore, the right-hand side simplifies to

\displaystyle |\mathbf A| = \sum_{j=1}^3 a_{1, j} \sum_{ \substack{f\in \mathcal S_3 \\ f(1) = j} } \sigma(f) a_{2, f(2)} a_{3, f(3)}.

For each j, since f is bijective (and thus injective), j \notin \{f(2), f(3)\}. Therefore, for any f \in \mathcal S_3 with f(1) = j > 1, define the permutation g by

f = g \begin{pmatrix} 1 & 2 \end{pmatrix} \cdots \begin{pmatrix} 1 & j\end{pmatrix} \iff g = \begin{pmatrix} 1 & j\end{pmatrix}  \cdots \begin{pmatrix} 1 & 2 \end{pmatrix} f.

In general, f(i+1) = g(i) for 1 \leq i \leq j-1 and g(j) = f(1) = j. Defining K_j := \{1,2,3\} \backslash \{j\} = \{k_j(1),k_j(2)\} = \{g(m) : m \neq j\}, where k_j is increasing, since g is bijective, g|_{K_j} is a permutation on K_j, which corresponds to a permutation \tilde g on \mathcal S_2 with \sigma(g) = \sigma(\tilde g), via the relation

(g(2), g(3)) =(k_j ( \tilde g(1) ) , k_j ( \tilde g(2) ) ).

Therefore, \sigma(f) = (-1)^{j-1} \sigma(\tilde g) and

\displaystyle \begin{aligned} |\mathbf A| &= \sum_{j=1}^3 (-1)^{j-1} a_{1, j} \sum_{ \tilde g \in \mathcal S_2 } \sigma(\tilde g) a_{2, k_j ( \tilde g(1) )} a_{3, k_j ( \tilde g(2) )} \\ &= \sum_{j=1}^3 (-1)^{j+1} a_{1, j} \left| \begin{matrix} a_{2,k_j(1)} & a_{2,k_j(2)} \\ a_{3,k_j(1)} & a_{3,k_j(2)} \end{matrix} \right|, \end{aligned}

where we write (-1)^{j-1} =(-1)^{j+1} for convenience. Thus, we can use the 2 \times 2 determinant to define the 3 \times 3 determinant. In fact, the same relation works in the sense of using the 1 \times 1 determinant (i.e. the entry itself) to define the 2 \times 2 determinant. This yields the Laplace expansion of the determinant, also known as cofactor expansion, which is computationally much more convenient than previous ideas.

Theorem 3 (Laplace Expansion). For any \mathbf A \in \mathcal M_{(n+1) \times (n+1)}(\mathbb K), define the (i,j)minor matrix \mathbf A_{ij} \in \mathcal M_{n \times n}(\mathbb K) by

\mathbf A_{ij} := \begin{bmatrix} a_{k_1(i) ,k_1(j)} & a_{k_1(i) ,k_2(j)} & \cdots & a_{k_n(i) ,k_2(j)} \\ a_{k_1(i) ,k_1(j)} & a_{k_1(i) ,k_2(j)} & \cdots & a_{k_n(i) ,k_2(j)} \\ \vdots & \vdots & \ddots & \vdots \\ a_{k_1(i) ,k_1(j)} & a_{k_1(i) ,k_2(j)} & \cdots & a_{k_n(i) ,k_2(j)}\end{bmatrix},

where \{ k_1(t),\dots k_{n}(t) \} = \{ 1,\dots, n+1 \} \backslash \{ t \} and k_m(t) is strictly increasing in m. Then

\displaystyle |\mathbf A| = \sum_{j=1}^n {(-1)}^{i+j} a_{ij}  |\mathbf A_{ij}|,

regardless of the choice of i. The term {(-1)}^{i+j} |\mathbf A_{ij}| is called the (i,j)cofactor of \mathbf A.

Proof. We have proven the case i = 1 in preceding discussions. We used n = 3 for concreteness and simplicity, but the argument can be generalised. For the case of general i, perform i-1 row-swaps and apply the case for i = 1 (with the needful relabelings) to obtain

\begin{aligned} |\mathbf A| &= (-1)^{i-1} \sum_{j=1}^n {(-1)}^{j+1} |\mathbf A_{ij}| =\sum_{j=1}^n {(-1)}^{i+j} |\mathbf A_{ij}|. \end{aligned}

Lemma 2. Doing cofactor expansion on the columns instead of the rows yields the same result: for any fixed j,

\displaystyle |\mathbf A| = \sum_{i=1}^n {(-1)}^{i+j} a_{ji} |\mathbf A_{ji}|.

Proof. It suffices to prove that |[a_{ij}]| = [a_{ji}]. We will use Definition 1 to achieve this goal:

\begin{aligned} \left| \begin{matrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \dots & a_{nn} \end{matrix} \right| &= \sum_{f\in \mathrm{Sym}(\{1,\dots,n\})} \sigma(f) v_{1, f(1)}\cdot \cdots \cdot v_{n, f(n)} \\ &= \sum_{f\in \mathrm{Sym}(\{1,\dots,n\})} \sigma(f) v_{f^{-1}(f(1)), f(1)}\cdot \cdots \cdot v_{f^{-1}(f(n)), f(n)} \\ &= \sum_{f^{-1} \in \mathrm{Sym}(\{1,\dots,n\})} \sigma(f^{-1}) v_{f^{-1}(1), 1}\cdot \cdots \cdot v_{f^{-1}(n), n} \\ &= \left| \begin{matrix} a_{11} & a_{21} & \dots & a_{n1} \\ a_{12} & a_{22} & \dots & a_{n2} \\ \vdots & \vdots & \ddots & \vdots \\ a_{1n} & a_{2n} & \dots & a_{nn} \end{matrix} \right|. \end{aligned}

Corollary 1. If \det(\mathbf A) \neq 0, then

\displaystyle \mathbf A^{-1} = \frac 1{\det(\mathbf A)} \mathrm{adj}(\mathbf A), \quad \mathrm{adj}(\mathbf A) = [{(-1)}^{i+j} |\mathbf A_{ji}|].

Proof. Let a_{ij}' = {(-1)}^{i+j} |\mathbf A_{ji}| denote the (i, j)-term of \mathrm{adj}(\mathbf A). By cofactor-expanding along the i-th row according to Theorem 1,

\begin{aligned} \det(\mathbf A) &= \sum_{k=1}^n {(-1)}^{i+k} a_{ik} |\mathbf A_{ik}| = \sum_{k=1}^n a_{ik} a_{ki}'. \end{aligned}

The (i,j)-entry of \mathbf A\, \mathrm{adj}(\mathbf A) is, by matrix multiplication,

\displaystyle \sum_{k=1}^n a_{ik}a_{kj}' = \begin{cases} \det(\mathbf A), & i = j, \\ 0, & i \neq j,\end{cases}

since for i \neq j, the sum is the determinant of a matrix with two identical rows and thus equals 0. Therefore,

\mathbf A\, \mathrm{adj}(\mathbf A) = \det(\mathbf A) \mathbf I.

Taking inverses on both sides yields the desired result.

Corollary 2. For matrices \mathbf A, \mathbf B, \mathbf C of relevant sizes,

\displaystyle \left| \begin{matrix} \mathbf A & \mathbf B \\ \mathbf 0 & \mathbf C \end{matrix} \right| = |\mathbf A| |\mathbf C|.

Proof Sketch. Apply induction on the size of \mathbf A.

Corollary 3. Applying Theorem 3 on row 1, the determinant of a 3 \times 3 matrix is given by

\left| \begin{matrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{matrix} \right| = a_{11} \left| \begin{matrix} a_{22} & a_{23} \\ a_{32} & a_{33} \end{matrix} \right| - a_{12} \left| \begin{matrix} a_{21} & a_{23} \\ a_{31} & a_{33} \end{matrix} \right| + a_{13} \left| \begin{matrix} a_{21} & a_{22} \\ a_{31} & a_{32} \end{matrix} \right|.

Finally, with all that hard work, we recover our usual formulas for the determinant that are otherwise spawned from thin air in understandably time- and resource-constrained introductory linear algebra courses.

Back to the main fun of linear algebra: eigenvectors and eigenvalues a.k.a. eigenstuff.

—Joel Kindiak, 6 Mar 25, 1400H

,

Published by


Leave a comment