Introductory Eigenstuff

Let C^\infty denote the vector space of infinitely differentiable functions. Given a real constant \lambda \in \mathbb R, what nonzero function f satisfies the property that

\displaystyle \frac{\mathrm d}{\mathrm dx} : C^\infty \to C^\infty \quad \Rightarrow \quad \frac{\mathrm d}{\mathrm dx}(f) = \lambda f?

If we allowed f = 0, then clearly f = 0 works. That’s very uninteresting. For the f \neq 0 case, with some prodding, we will notice that f(x) = e^{\lambda x} fits the bill (and doing some calculus, show that only functions of this type work).

In fact, we have a whole family of functions that work. Since \displaystyle \frac{\mathrm d}{\mathrm dx} is linear, for any g \in \mathcal D,

\displaystyle g \in \mathrm{span}\{f\} \quad \Rightarrow \quad \frac{\mathrm d}{\mathrm dx}(g) = \lambda g.

There are many directions we can take from this observation, but since we are writing in the context of linear algebra, we will take the eigenvector approach. To that end, let’s use the differentiation operator to illustrate: for any g \in \mathcal D,

\begin{aligned} \frac{\mathrm d}{\mathrm dx}(g) = \lambda g \quad &\iff \quad \left(\frac{\mathrm d}{\mathrm dx} - \lambda I\right)(g) = 0 \\  &\iff \quad g \in \mathrm{ker}\left(\frac{\mathrm d}{\mathrm dx} - \lambda I\right). \end{aligned}

Since kernels are vector spaces, the set of solutions

\displaystyle \left\{ g \in \mathcal D : \frac{\mathrm d}{\mathrm dx}(g) = \lambda g \right\}

forms a vector space.

Let \mathbb K be a field, V be a vector space over \mathbb K, and T : V \to V be a linear transformation.

Definition 1. We call a nonzero vector \mathbf v \in V an eigenvector of T if there exists a scalar \lambda \in \mathbb K, called an eigenvalue, such that

T(\mathbf v) = \lambda \mathbf v.

For instance, the function f \in C^\infty defined by f(x) = e^{\lambda x} is an eigenvector of the transformation \displaystyle \frac{\mathrm d}{\mathrm dx} : C^\infty \to C^\infty. Note that we do not assume that V is finite-dimensional, at least not yet in our discussion.

Right away, we can make several useful deductions from Definition 1. Let T^n := T \circ \cdots \circ T denote n-fold composition of T.

Lemma 1. The scalar \lambda is an eigenvalue of T if and only if the linear transformation \lambda I_V - T is not injective. Equivalently, the eigenspace E_{\lambda} := \ker(\lambda I_V - T) contains at least one nonzero element. In particular, T is injective if and only if 0 is not an eigenvalue of T.

Lemma 2. Let \mathbf v be an eigenvector of T with eigenvalue \lambda.

  • For any n \in \mathbb N, T^n(\mathbf v) = \lambda^n \mathbf v.
  • If T is bijective, then T^{-1}(\mathbf v) = \lambda^{-1}\mathbf v.
  • For any linear transformation S : V \to V, if \mathbf v is also an eigenvector of S with eigenvalue \mu, then \mathbf v is an eigenvector of S \circ T and T \circ S with eigenvalue \mu \cdot \lambda .
  • If \mathbf u is an eigenvector of T with a different eigenvalue \mu \neq \lambda, then \{\mathbf u,\mathbf v\} is linearly independent.

All of the results in Lemma 1 and Lemma 2 can be proven by applying Definition 1.

Perhaps before yapping more about eigenstuff, why should we care, really? We could discuss real-world applications in operations research and even some applied-mathematical uses like Markov chains, but since we are discussing mathematics in general, let’s use a pure mathematical example.

Consider the following sequence of Fibonacci numbers:

x_1 = 1, \quad x_2 = 1,\quad x_{n+2} = x_n + x_{n+1}, \quad n \in \mathbb N.

The sequence of terms in this case are 1, 1, 2, 3, 5, \cdots. The question is a simple one: can you derive a formula x_n that yields the n-th Fibonacci number? For instance, x_3 = 2 and x_5 = 5. In fact, 5 is the only number n such that x_n = n.

Since this discussion is set in the context of linear algebra, let’s use linear algebra to tackle this problem. Let’s first convert our problem into vector notation:

\begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}, \quad \begin{bmatrix} x_{n+1} \\ x_{n+2} \end{bmatrix} = \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix} \begin{bmatrix} x_{n} \\ x_{n+1} \end{bmatrix},\quad n \in \mathbb N.

Something interesting that happens is that by letting \mathbf x_n := \begin{bmatrix} x_n \\ x_{n+1} \end{bmatrix}, we have the equation

\mathbf x_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix}, \quad \mathbf x_{n+1} = \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix} \mathbf x_n,\quad n \in \mathbb N.

If we applied the formula on the right-hand side n-1 more times, we get

\begin{bmatrix} x_{n+1} \\ x_{n+2} \end{bmatrix} = \mathbf x_{n+1} = \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix}^n \mathbf x_1 =  \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix}^n \begin{bmatrix} 1 \\ 1 \end{bmatrix}.

Hence, if we can find a simple expression for \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix}^n, we can use our recipe-ingredient analogy to obtain \mathbf x_{n+1}. Then, the (n+1)-th Fibonacci number would correspond to the first entry in \mathbf x_{n+1}. Re-index to \mathbf x_n and we are done.

Now, this is where we run into problems. How do we evaluate \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix}^n? For instance, do we compute 10,000,000 matrix multiplications to compute \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix}^{10,000,000}? Clearly, we will need a better strategy.

Well, treating \mathbf A := \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix} : \mathbb R^2 \to \mathbb R^2 as a linear transformation, Lemma 2 hints at a plausible strategy. If we can find an eigenvector \lambda with eigenvalue \lambda, then

\mathbf A^n \mathbf v = \lambda^n \mathbf v.

Ideally, if we can convert the vector \begin{bmatrix} 1 \\ 1 \end{bmatrix} into some variant of \mathbf v, we can take advantage of the special property

\mathbf A^n \mathbf v = \lambda^n \mathbf v,

then appropriately “re-convert” back into \begin{bmatrix} 1 \\ 1 \end{bmatrix}.

So herein still lies the question: How do we find the eigenvector(s) of \mathbf A? Well, Lemma 1 suggests that we obtain the eigenvalues first. We require \lambda \mathbf I_2 - \mathbf A to not be injective (i.e. be singular), and hence, be singular. This is where our detour into determinants will serve our purposes well. A matrix is singular if and only if its determinant is 0, and thus we need to solve the equation

\left| \begin{matrix} \lambda  &  - 1 \\  - 1 & \lambda - 1 \end{matrix} \right| = \det(\lambda \mathbf I_2 - \mathbf A) = 0.

We can compute the determinant of a 2 \times 2 matrix, so that we need to solve the polynomial:

\lambda^2 - \lambda - 1 = \lambda(\lambda - 1) - (-1)(-1) = 0.

Now, we need to solve this equation, and we should take a pause here: not all polynomials in \mathbb K can be solved by elements in \mathbb K. For instance, the equation \lambda^2 - \lambda - 1 = 0 does not have roots in \mathbb Q, even though its coefficients (1, -1, -1) \in \mathbb Q^3 are all rational. Sometimes, we get solutions. Sometimes, we don’t. In this equation, if we use the field \mathbb R, the quadratic equation yields the roots

\displaystyle \lambda = \frac{-(-1) \pm \sqrt{ (-1)^2 - 4(1)(-1) } }{ 2(1) } = \frac{1 \pm \sqrt{5} }{2} \in \mathbb R.

If instead we had the equation \lambda^2 + 1 = 0, then we would need to seek our solutions in the complex numbers \mathbb C. It turns out that the complex numbers are all that we need, known as the fundamental theorem of algebra. However, we would need to develop complex analysis to actually prove that result properly, and so we will omit its proof. We will accept this result whenever we discuss \mathbb K = \mathbb C.

Anyway, we don’t get one eigenvalue, but two: \lambda = \frac 12 (1 \pm \sqrt 5). The positive variant \phi := \frac 12(1+\sqrt 5) has a name called the golden ratio, and its negative variant can be shown to equal \frac 12 (1-\sqrt 5) = -(-1) - \phi = -1/\phi. Why did we care about our eigenvalues in the first place? To find eigenvectors \mathbf v, then take advantage of the property \mathbf A^n \mathbf v = \phi^n \mathbf v or \mathbf A^n \mathbf v = (-1)^n \phi^{-n} \mathbf v. To do that, we need to find an eigenvector \mathbf v such that

(\lambda \mathbf I_2 - \mathbf A)\mathbf v = \mathbf 0.

In other words, we want to compute E_\lambda = \ker(\lambda \mathbf I_2 - \mathbf A). We had a whole other detour into Gaussian elimination just so that we can compute such kernels! Perform Gaussian elimination (with relevant row operations that I’m too lazy to list out in detail) to reduce \lambda \mathbf I_2 - \mathbf A into

\lambda \mathbf I_2 - \mathbf A = \begin{bmatrix} \lambda  &  - 1 \\  - 1 & \lambda - 1 \end{bmatrix} \longrightarrow \begin{bmatrix} \lambda  &  - 1 \\  0 & \lambda^2 - \lambda-1 \end{bmatrix}.

For both of the roots \lambda that satisfy the equation \lambda^2 - \lambda - 1 = 0, we have

E_{\lambda} = \text{span}\left\{\begin{bmatrix} 1 \\ \lambda \end{bmatrix} \right\}.

We call E_{\lambda} the eigenspace associated with the eigenvalue \lambda. In this example, we have the eigenspaces

E_{\phi} =  \text{span}\left\{\begin{bmatrix} 1 \\ \phi \end{bmatrix} \right\},\quad E_{-\phi^{-1}} =  \text{span}\left\{\begin{bmatrix} 1 \\ -\phi^{-1} \end{bmatrix} \right\}.

This is the key step: using the eigenvector property, we have

\mathbf A^n \begin{bmatrix} 1 \\ \lambda \end{bmatrix} = \lambda^n \begin{bmatrix} 1 \\ \lambda \end{bmatrix},\quad \lambda = \phi, -\phi^{-1}.

Can we find constants c_1,c_2 \in \mathbb R such that

\mathbf x_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix} = c_1 \begin{bmatrix} 1 \\ \phi \end{bmatrix} + c_2 \begin{bmatrix} 1 \\ -\phi^{-1} \end{bmatrix} ?

If so, then using the linearity of \mathbf A^n,

\begin{aligned} \mathbf x_{n+1} = \mathbf A^n \mathbf x_1 &= \mathbf A^n \left( c_1 \begin{bmatrix} 1 \\ \phi \end{bmatrix} + c_2 \begin{bmatrix} 1 \\ -\phi^{-1} \end{bmatrix} \right) \\ &= c_1 \mathbf A^n \begin{bmatrix} 1 \\ \phi \end{bmatrix} + c_2 \mathbf A^n \begin{bmatrix} 1 \\ -\phi^{-1} \end{bmatrix} \\ &= c_1 \phi^n \begin{bmatrix} 1 \\ \phi \end{bmatrix} + c_2 (-\phi^{-1})^n \begin{bmatrix} 1 \\ -\phi^{-1} \end{bmatrix}. \end{aligned}

Therefore, the choice of c_1, c_2 will directly let us obtain the solution

x_{n+1} = c_1 \phi^n + c_2 (-1)^n \phi^{-n}.

Re-indexing then yields the desired answer

x_{n} = c_1 \phi^{n-1} + c_2 (-1)^{n-1} \phi^{-(n-1)}.

So it behooves us to compute c_1, c_2. Interestingly though, what we need to do, effectively, is solve the matrix equation

\begin{bmatrix} 1 & 1 \\ \phi & -\phi^{-1} \end{bmatrix} \begin{bmatrix} c_1 \\ c_2 \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}.

This means we need to do another round of equation-solving, either using Gaussian elimination or computing the inverse matrix. Since the matrix is 2 \times 2 and thus sufficiently small, let’s just compute the inverse matrix using the standard formula:

\displaystyle \begin{bmatrix} 1 & 1 \\ \phi & -\phi^{-1} \end{bmatrix}^{-1} = \frac{1}{-\phi^{-1} - \phi} \begin{bmatrix} -\phi^{-1} & -1 \\ -\phi & 1 \end{bmatrix}.

Then performing usual matrix multiplication,

\begin{aligned} \begin{bmatrix} c_1 \\ c_2 \end{bmatrix} &= \begin{bmatrix} 1 & 1 \\ \phi & -\phi^{-1} \end{bmatrix}^{-1} \begin{bmatrix} 1 \\ 1 \end{bmatrix} \\ &= \frac{1}{-\phi^{-1} - \phi} \begin{bmatrix} -\phi^{-1} & -1 \\ -\phi & 1 \end{bmatrix} \begin{bmatrix} 1 \\ 1 \end{bmatrix} \\ &= \frac{1}{-\phi^{-1} - \phi} \left( \begin{bmatrix} -\phi^{-1}  \\ -\phi \end{bmatrix} + \begin{bmatrix}  -1 \\  1 \end{bmatrix}  \right) \\ &= \frac{1}{-(\phi + \phi^{-1})} \begin{bmatrix} -(1 + \phi^{-1})  \\ -\phi+1 \end{bmatrix} \end{aligned}

Therefore, though complicated, the constants c_1,c_2 are given by

\displaystyle c_1 = \frac{ 1 + \phi^{-1} }{ \phi + \phi^{-1} },\quad c_2 = \frac{\phi-1}{\phi + \phi^{-1}}.

Therefore, the n-th Fibonacci number is given by

\displaystyle x_n = \frac{ 1 + \phi^{-1} }{ \phi + \phi^{-1} } \cdot \phi^{n-1} +  (-1)^{n-1} \cdot \frac{\phi-1}{\phi + \phi^{-1}} \cdot\phi^{-(n-1)}.

Oh, just for completeness, let’s actually simplify these constants. By expressing \phi, -\phi^{-1} in terms of \sqrt 5, we observe that

\displaystyle \frac{\phi-1}{\phi + \phi^{-1}} = \frac{-(1-\phi)}{\frac 12 (1+\sqrt 5) - \frac 12 (1 -\sqrt 5)} = \frac {-\phi^{-1}}{\sqrt 5}.

Similarly,

\displaystyle \frac{ 1 + \phi^{-1} }{ \phi + \phi^{-1} } = \frac 1{\phi} \cdot \frac{\phi+1}{\phi + \phi^{-1}} = \frac 1{\phi} \cdot \frac {\phi^2}{\sqrt 5} = \frac {\phi}{\sqrt 5}.

Therefore, we have the following result.

Theorem 1. The n-th Fibonacci number is given by Binet’s formula:

\displaystyle x_n = \frac {\phi^n -(-\phi)^{-n}}{\sqrt 5} .

Perhaps the rather surprising fact is that in order to describe the integers x_n, we had to use an irrational number \phi. Before this discussion takes yet another tangent, let’s raise more more theoretical linear-algebraic questions: under what circumstances can we actually perform such calculations? This is commonly known as diagonalisation, and it is worth wondering where this name arises from. We will explore these ideas in the next post.

—Joel Kindiak, 6 Mar 25, 1953H

,

Published by


Leave a comment