When I took my A-Level examinations, I was required to take a Chemistry practical assessment. In this assessment, I collected data and needed to draw a line of best-fit. About that time, I learned linear algebra, and thought I’d apply what I learned to compute the perfect best-fit line. I got an ‘A’ for Chemistry in the end. It helped me quite a bit.
Given the dataset , we want to find a line of best fit, usually denoted
that minimises the total squared-error . How do we achieve this goal? Let’s try to solve the equation directly, which gives the matrix equation
The probability that we can even solve this won’t be high, since we have too much data! For instance, the simple data set will be enough to confound this equation. With this question at hand, let’s ask a more generically formable question: What would be a “best-fit” solution to the usually-unsolvable equation
?
Even more generally, if is an inner product space over
, and
is a linear transformation, what would be a “best-fit” solution to the equation
? By definition, a solution exists if and only if
. What if
?
Lemma 1. For any subspace , the orthogonal complement
is also a subspace of . Furthermore, if
is finite-dimensional, then
Proof. If is finite dimensional, then
is finite-dimensional and contains an orthonormal basis, and
The decomposition with the property
yields the desired result.
Theorem 1. Let be finite-dimensional and
a subspace. Then for any
, there exists unique vectors
,
, such that
We call this expression the orthogonal decomposition of . In fact, by Lemma 1, we must have
Proof. We have established existence by Lemma 1, and it remains to prove uniqueness. Consider two possible decompositions
Then
yielding and
.
So the projection is indeed a powerful tool, since we can always write in terms of its projection. Furthermore, the orthogonal component
will definitely be orthogonal to
. But this raises the question: is
really our best-fit solution?
Definition 1. For any and subspace
, we call
a best approximation of
onto
if
Theorem 2. For any and subspace
,
is the best approximation of
onto
.
Proof. Firstly, for vectors ,
Thus, we have Pythagoras’ theorem for inner product spaces if , then
Therefore, for any ,
For uniqueness, suppose are best approximations. Define
. Then
Hence, is also a best approximation. Defining
, we have the equality
where we set them equal without loss of generality. We can prove the parallelogram identity via
Substituting the various values,
Hence, yields
. Hence
are the one and the same best approximation.
The key take-away is this: the best approximation of onto
is its projection
. But how do we answer our semi-original question: finding a best-fit solution to
?
Corollary 1. Let be finite-dimensional vector spaces and
be an inner product space. Let
be a linear transformation and
. Then the equation
has a best approximation given by
such that
.
Proof. Apply Theorem 2 to the subspace .
This result helps us partly solve our original equation. Let . By Theorem 2, a best approximation solution to the equation
, we need to find
such that
. But how do we actually compute such a
?
The key property that we want to exploit is that . This means that for any
,
It turns out that in solving this problem, we are finally forced to introduce the idea of the matrix transpose, which we will properly discuss in the next post. Remember, we haven’t technically answered our best-fit line question: find meaningful constants such that
As a side-note, we can generalise this problem to a more realistic setting involving machine learning datasets , where we seek sufficiently meaningful parameters
,
such that
More on these ideas in future posts.
—Joel Kindiak, 12 Mar 25, 2102H
Leave a comment