Unraveling the Determinant

We are trying to answer a simple yet very, very nontrivial question: what is a determinant?

Let \mathbb K be a field and \mathbf A \in \mathcal M_{2 \times 2}(\mathbb K). Let V = \mathbb K^2. Recall that

\mathbf A = \begin{bmatrix} a & b \\ c & d\end{bmatrix} \quad \Rightarrow \quad \det(\mathbf A) := ad - bc.

Hence, we can think of \det as a function that takes in the vectors \begin{bmatrix} a \\ b\end{bmatrix}, \begin{bmatrix} c \\ d\end{bmatrix} and returns a scalar ad - bc.

Definition 1. The determinant of a 2 \times 2 matrix is a function \det : V^2 \to \mathbb K defined by

\det\left(\begin{bmatrix} a \\ b\end{bmatrix}, \begin{bmatrix} c \\ d\end{bmatrix}\right) = ad - bc.

From this definition, we obtain several interesting properties of the determinant.

Theorem 1. The function \det : V^2 \to \mathbb K satisfies the following properties:

  • For any \mathbf v_1,\mathbf v_2 \in V, the functions \det(\cdot,\mathbf v_2) : V \to \mathbb K and \det(\mathbf v_1,\cdot) : V \to \mathbb K are linear. We call \det a multilinear function.
  • For any \mathbf v_1,\mathbf v_2 \in V, \det(\mathbf v_1,\mathbf v_2) = -\det(\mathbf v_2,\mathbf v_1). We call \det an alternative function.
  • \det(\mathbf e_1, \mathbf e_2) = 1. We call \det a normalised function.

It turns out that \det is the only function where these properties hold.

Theorem 2. If T : V^2 \to \mathbb K is a multilinear, alternative, and normalised function, then T = \det.

Proof. We observe that for any \begin{bmatrix} a \\ b\end{bmatrix}, \begin{bmatrix} c \\ d\end{bmatrix} \in V, since T is multilinear,

\begin{aligned} T \left( \begin{bmatrix} a \\ b\end{bmatrix}, \begin{bmatrix} c \\ d\end{bmatrix} \right) &= T(a\mathbf e_1 + b \mathbf e_2, c\mathbf e_1 + d\mathbf e_2) \\ &= ac T(\mathbf e_1,\mathbf e_1) + ad T(\mathbf e_1,\mathbf e_2) + bc T(\mathbf e_2,\mathbf e_1) + bd T(\mathbf e_2,\mathbf e_2).\end{aligned}

Since T is alternative, T(\mathbf e_1, \mathbf e_1) = -T(\mathbf e_1,\mathbf e_1), so that T(\mathbf e_1,\mathbf e_1) = T(\mathbf e_2,\mathbf e_2) = 0. Furthermore, since T is normalised.

T(\mathbf e_2,\mathbf e_1) = -T(\mathbf e_1,\mathbf e_2) = -1.

Therefore,

\begin{aligned} T \left( \begin{bmatrix} a \\ b\end{bmatrix}, \begin{bmatrix} c \\ d\end{bmatrix} \right) &= ac \cdot 0 + ad \cdot 1 + bc \cdot (-1) + bd \cdot 0 \\ &= ad-bc = \det\left( \begin{bmatrix} a \\ b\end{bmatrix}, \begin{bmatrix} c \\ d\end{bmatrix} \right).\end{aligned}

Now for the more crucial question: does this definition work for the general case when V = \mathbb K^n? Well, if we can construct it, then we can use it. We are going to use these three properties to construct the determinant function. This function is going to coincide with the usual computational definitions. Henceforth, let V = \mathbb K^n with the standard ordered basis (\mathbf e_1, \dots, \mathbf e_n).

Definition 2. T : V^n \to \mathbb K be a function. T is multilinear if for any i,

T(\mathbf v_1,\dots,\mathbf v_{i-1},\cdot,\mathbf v_{i+1},\dots,\mathbf v_n) : V \to \mathbb K

is linear. T is alternative if for any i < j,

T(\mathbf v_1,\dots,\mathbf v_{i},\cdots,\mathbf v_{j},\dots,\mathbf v_n) = -T(\mathbf v_1,\dots,\mathbf v_{j},\cdots,\mathbf v_{i},\dots,\mathbf v_n).

T is normalised if T(\mathbf e_1,\dots,\mathbf e_n) = 1.

Denote K := \{1,\dots,n\}. For each i, find unique scalars v_{i,j} \in \mathbb K such that

\displaystyle \mathbf v_i = v_{i,1} \mathbf e_1 + \cdots + v_{i,n} \mathbf e_n = \sum_{j=1}^n v_{i,j} \mathbf e_j.

Lemma 1. If T : V^n \to \mathbb K is multilinear, then

\displaystyle T(\mathbf v_1,\dots,\mathbf v_n) = \sum_{f \in \mathcal F(K,K)} v_{1,f(1)} \cdots v_{n,f(n)} T(\mathbf e_{f(1)},\dots,\mathbf e_{f(n)}).

Proof. The first key insight is that summing over all possible ordered choices (j_1,\dots,j_n) corresponds to summing over all possible functions f : K \to K, not necessarily just the bijective ones. With this in mind, a direct though tedious calculation yields

\displaystyle \begin{aligned}\det(\mathbf v_1,\dots,\mathbf v_n) &= \det\left( \sum_{j_1=1}^n v_{1,j_1} \mathbf e_{j_1},\dots, \sum_{j_n=1}^n v_{n,j_n} \mathbf e_{j_n} \right) \\ &= \left( \sum_{j_1=1}^n \cdots \sum_{j_n=1}^n \right) v_{1,j_1} \cdot \dots \cdot v_{n,j_n} T( \mathbf e_{j_1} ,\dots, \mathbf e_{j_n} ) \\ &= \sum_{f \in \mathcal F(K,K)} v_{1,f(1)} \cdot \dots \cdot v_{n,f(n)} T( \mathbf e_{f(1)} ,\dots, \mathbf e_{f(n)} ). \end{aligned}

This now reduces our problem to evaluating

T( \mathbf e_{f(1)} ,\dots, \mathbf e_{f(n)} ),

given any function f : K \to K. If T is alternative, however, then we can whittle down the calculations by one more layer.

Lemma 2. Suppose T is alternative, and let f : K \to K. If f \notin \mathrm{Sym}(K), then

T( \mathbf e_{f(1)} ,\dots, \mathbf e_{f(n)} ) = 0.

Recall that \mathrm{Sym}(K) refers to the set (group, actually) of bijections f : K \to K, also known as permutations.

Proof. Since K is finite, if f is not bijective, then it is not injective. Hence, there exists 1 \leq i < j \leq n such that f(i) = f(j). Since T is alternative,

\begin{aligned} T( \dots, \mathbf e_{f(i)},\dots,\mathbf e_{f(j)}, \dots) &= -T( \dots, \mathbf e_{f(j)},\dots,\mathbf e_{f(i)}, \dots) \\ &= -T( \dots, \mathbf e_{f(i)},\dots,\mathbf e_{f(j)}, \dots). \end{aligned}

By algebruh, T( \dots, \mathbf e_{f(i)},\dots,\mathbf e_{f(j)}, \dots)=0.

Corollary 1. If T is multilinear and alternative, then

\displaystyle T(\mathbf v_1,\dots,\mathbf v_n) = \sum_{f \in \mathrm{Sym}(K)} v_{1,f(1)} \cdots v_{n,f(n)} T(\mathbf e_{f(1)},\dots,\mathbf e_{f(n)}).

Let’s not underestimate the magnitude of what we did here. The key trick is that when we used multi-linearity, we could reduce the computation over all indices into sums over f \in \mathcal F(K,K). This set is massive: there are n^n such functions.

To put this computation into perspective: this means computing a 4 \times 4 determinant originally needs to account for 4^4 = 256 summands! Thankfully, the alternative property comes in clutch, because we can reduce the sum to index over f \in \mathrm{Sym}(K). In the n=4 case, this means 4! = 24 summands. Still tedious, but way less tedious than 256 terms.

So now we reduce the problem one more layer: how do we evaluate

T( \mathbf e_{f(1)} ,\dots, \mathbf e_{f(n)} ),

for f \in \mathrm{Sym}(K)? Let’s first consider the simplest case: f = \mathrm{id}_K. In this case, we get the expression

T(\mathbf e_{1} ,\dots, \mathbf e_{n} ).

If T is normalised, then T(\mathbf e_{1} ,\dots, \mathbf e_{n} )=1. How then do we deal with other kinds of permutations f \in \mathrm{Sym}(K)?

Let’s leverage the alternative property of T. What that refers to is that swapping vectors will cause a sign-flip on the determinant. While we could do a brute-force definition, a special permutation known as a transposition will simplify the process for us.

Definition 3. A bijection \phi \in \mathrm{Sym}(\{1,\dots,n\}) is called a transposition if there exist two distinct elements i, j \in \{1,\dots,n\} such that for any x \in \{1,\dots,n\},

\phi(x) = \begin{cases} j, & x = i,\\ i, & x = j, \\ x & \text{otherwise.} \end{cases}

In this case, we denote \phi = \begin{pmatrix} i & j \end{pmatrix}.

Lemma 3. If T is alternative, then

T(\mathbf e_{\phi(1)}, \dots, \mathbf e_{\phi(n)}) = -T(\mathbf e_{1}, \dots, \mathbf e_{n}).

for any transposition \phi. Furthermore, for any composition \Phi = \phi_j \circ \cdots \circ \phi_1 of transpositions,

\det(\mathbf e_{\Phi(1)}, \dots, \mathbf e_{\Phi(n)}) : = (-1)^jT(\mathbf e_{1}, \dots, \mathbf e_{n}).

Now here’s the golden ticket to the problem: if f can be written as a product of transpositions, i.e. f = \Phi= \phi_j \circ \cdots \circ \phi_1, then

\det(\mathbf e_{f(1)}, \dots, \mathbf e_{f(n)}) : = (-1)^jT(\mathbf e_{1}, \dots, \mathbf e_{n}).

If T is normalised, then the right-hand side yields a simple quantity. However, does this property hold? Tune into the next post to find out.

—Joel Kindiak, 4 Mar 25, 2310H

,

Published by


Leave a comment