Category: O-Level Math

Partial Fraction Integrals

Problem 1. Given a real number $\alpha$ , evaluate the integrals

$\displaystyle \int \frac{1}{x-\alpha}\, \mathrm dx \quad \text{and} \quad \int \frac{1}{(x-\alpha)^2}\, \mathrm dx.$

(Click for Solution)

Solution. By the chain rule,

$\begin{aligned} \frac{\mathrm d}{\mathrm dx}\ln|x-\alpha| &= \frac 1{x-\alpha} \\ \frac{\mathrm d}{\mathrm dx} \left(\frac 1{x-\alpha}\right) &= \frac{\mathrm d}{\mathrm dx} \left( (x-\alpha)^{-1} \right) \\ &= (-1)(x-\alpha)^{-2} \\ &= -\frac 1{(x-\alpha)^2}.\end{aligned}$

By linearity,

$\displaystyle \frac{\mathrm d}{\mathrm dx} \left(-\frac 1{x-\alpha}\right) = - \frac{\mathrm d}{\mathrm dx} \left(\frac 1{x-\alpha}\right) = \frac 1{(x-\alpha)^2}.$

Therefore,

$\begin{aligned} \int \frac{1}{x-\alpha}\, \mathrm dx &= \ln|x-\alpha| + C_1, \\\int \frac{1}{(x-\alpha)^2}\, \mathrm dx &= -\frac 1{x-\alpha} + C_2. \end{aligned}$

Problem 2. Given distinct real numbers $\alpha,\beta$ , determine constants $A, B$ such that

$\displaystyle \frac{ 1 }{(x-\alpha)(x-\beta)} = \frac A{x-\alpha} + \frac B{x-\beta}.$

Hence, evaluate $\displaystyle \int \frac{ 1 }{ (x-\alpha)(x-\beta) } \, \mathrm dx$ .

(Click for Solution)

Solution. Firstly, multiply both sides by $x - \alpha$ :

$\displaystyle \frac 1{x-\beta} = A + \frac{B}{x-\beta} \cdot (x-\alpha).$

Setting $x = \alpha$ ,

$\displaystyle \frac 1{\alpha-\beta} = A + \frac{B}{\alpha-\beta} \cdot 0.$

Therefore, $A = 1/(\alpha - \beta)$ .

Similarly, multiply both sides by $x - \beta$ :

$\displaystyle \frac 1{x-\alpha} = \frac{ A }{ x - \alpha } \cdot (x-\beta) + B.$

Setting $x = \beta$ ,

$\displaystyle \frac 1{\beta-\alpha} = \frac{ A }{ \beta - \alpha } \cdot 0 + B.$

Therefore, $B = 1/(\beta-\alpha) = -A$ .

Consolidating,

$\begin{aligned} \frac{ 1 }{(x-\alpha)(x-\beta)} &= \frac {\frac 1{\alpha - \beta}}{x-\alpha} - \frac {\frac 1{\alpha - \beta}}{x-\beta} \\ &= \frac 1{\alpha - \beta}\left( \frac {1}{x-\alpha} - \frac {1}{x-\beta} \right). \end{aligned}$

Problem 3. Use Problems 1 and 2 to evaluate the integral

$\displaystyle \int \frac{ x^2 + x + 1 }{(x-2)(x-3)^2} \, \mathrm dx.$

(Click for Solution)

Solution. Write

$\displaystyle \frac{ x^2 + x + 1 }{(x-2)(x-3)^2} = \frac{x^2 + x+ 1}{x-2} \cdot \frac 1{(x-3)^2}.$

Write $x = (x-2)+2$ and expand the fraction therein:

$\begin{aligned} \frac{ x^2 + x + 1 }{x-2} &= \frac{ ((x-2)+2)^2 + ((x-2)+2) + 1}{x-2} \\ &= \frac{(x-2)^2 + 2 \cdot (x-2) \cdot 2 + 2^2 + (x-2) + 2 + 1}{x-2} \\ &= \frac{ (x-2)^2 + 5(x-2) + 7 }{x-2} \\ &= x-2 + 5 + \frac 7{x-2} \\ &= x+3 + \frac{7}{x-2} \\ &= (x-3)+6 + \frac{7}{x-2}. \end{aligned}$

Therefore, by multiplying excess terms,

$\begin{aligned} \frac{ x^2 + x + 1 }{(x-2)(x-3)^2} &= \left( (x-3)+6 + \frac{7}{x-2} \right) \cdot \frac 1{(x-3)^2} \\ &= \frac 1{x-3} + \frac{6}{(x-3)^2} + \frac 7{(x-2)(x-3)^2}. \end{aligned}$

By Question 2,

$\begin{aligned} \frac 1{(x-2)(x-3)} &= \frac 1{2-3} \left(\frac 1{x-2} - \frac 1{x-3}\right) \\ &= \frac 1{x-3} - \frac 1{x-2}. \end{aligned}$

Therefore,

$\begin{aligned} \frac 1{(x-2)(x-3)^2} &= \frac 1{x-3} \cdot \frac 1{(x-2)(x-3)} \\ &= \frac 1{x-3} \cdot \left(\frac 1{x-3} - \frac 1{x-2}\right) \\ &= \frac 1{(x-3)^2} - \frac 1{(x-2)(x-3)} \\ &= \frac 1{(x-3)^2} - \left(\frac 1{x-3} - \frac 1{x-2}\right) \\ &= \frac 1{x-2} - \frac 1{x-3} + \frac 1{(x-3)^2}. \end{aligned}$

Combining the displays,

$\begin{aligned} \frac{ x^2 + x + 1 }{(x-2)(x-3)^2} &= \frac 1{x-3} + \frac{6}{(x-3)^2} + \frac 7{(x-2)(x-3)^2} \\ &= \frac 1{x-3} + \frac{6}{(x-3)^2} + 7 \cdot \left( \frac 1{x-2} - \frac 1{x-3} + \frac 1{(x-3)^2} \right) \\ &= \frac{7}{x-2} - \frac 6{x-3} + \frac{13}{(x-3)^2}.\end{aligned}$

By the linearity of integration and Question 1,

$\begin{aligned} &\int \frac{ x^2 + x + 1 }{(x-2)(x-3)^2}\, \mathrm dx \\ &= 7 \cdot \int \frac 1{x-2}\, \mathrm dx - 6 \cdot \int \frac 1{x-3}\, \mathrm dx + 13 \cdot \int \frac 1{(x-3)^2}\, \mathrm dx \\ &= 7 \ln|x-2| - 6 \ln |x-3| - \frac{13}{x-3} + C. \end{aligned}$

Remark 1. In the process of evaluating the integral, we have uncovered the partial fraction decomposition for the following expression with distinct $\alpha,\beta$ :

$\displaystyle \frac{px^2 + qx + r}{(x-\alpha)(x-\beta)^2} = \frac A{x-\alpha} + \frac {B}{x-\beta} + \frac C{(x-\beta)^2}.$

We leave it as an exercise to check that

$\begin{aligned} A &= \frac{ p\alpha^2 + q\alpha + r }{(\alpha - \beta)^2}, \\ B &= p -A = p - \frac{ p\alpha^2 + q\alpha + r }{(\alpha - \beta)^2}, \\ C &= \frac{p\beta^2 + q\beta + r }{\beta - \alpha}. \end{aligned}$

Problem 4. Similarly, evaluate

$\displaystyle \int \frac{ x^2 + x + 1 }{(x-2)(x-3) (x-4)} \, \mathrm dx.$

(Click for Solution)

Solution. Using the simplification in Problem 2,

$\begin{aligned} \frac{ x^2 + x + 1 }{(x-2)(x-3) (x-4)} &= \frac{ x^2 + x + 1 }{(x-3) (x-4)} \cdot \frac 1{(x-3)(x-4)} \\ &= \left((x-3)+6 + \frac{7}{x-2}\right) \cdot \frac 1{(x-3)(x-4)} \\ &= \frac{1}{x-4} + \frac 6{(x-3)(x-4)} + \frac 7{(x-2)(x-3)(x-4)}. \end{aligned}$

By Question 2 applied twice,

$\begin{aligned} \frac 1{(x-3)(x-4)} &= \frac 1{3-4} \left( \frac 1{x-3} - \frac 1{x-4} \right) \\ &= \frac 1{x-4} - \frac{1}{x-3} \end{aligned}$

so that

$\begin{aligned} \frac 1{(x-2)(x-3)(x-4)} &= \frac 1{x-2} \cdot \frac 1{(x-3)(x-4)} \\ &= \frac 1{x-2} \left( \frac 1{x-4} - \frac{1}{x-3} \right) \\ &= \frac 1{(x-2)(x-4)} - \frac{1}{(x-2)(x-3)} \\ &= \frac 1{2-4} \left(\frac 1{x-2} - \frac 1{x-4}\right) - \frac 1{2-3} \left(\frac 1{x-2} - \frac 1{x-3}\right) \\ &= \frac 12\left(\frac 1{x-4} - \frac 1{x-2}\right) + \left(\frac 1{x-2} - \frac 1{x-3}\right) \\ &= \frac{\frac 12}{x-2} - \frac 1{x-3} + \frac {\frac 12}{x-4}. \end{aligned}$

Combining the displays,

$\begin{aligned} \frac{ x^2 + x + 1 }{(x-2)(x-3) (x-4)} &= \frac{ x^2 + x + 1 }{(x-3) (x-4)} \cdot \frac 1{(x-3)(x-4)} \\ &= \frac{\frac 72}{x-2} - \frac {13}{x-3}+ \frac{\frac{21}2}{x-4} .\end{aligned}$

By the linearity of integration and Question 1,

$\begin{aligned} \int \frac{ x^2 + x + 1 }{(x-2)(x-3) (x-4)}\, \mathrm dx &= \frac 72 \ln|x-2| - 13 \ln|x-3| + \frac{21}2 \ln|x-4| + C.\end{aligned}$

Remark 2. The corresponding partial fraction decomposition with distinct $\alpha,\beta,\gamma$ is as follows:

$\displaystyle \frac{px^2 + qx + r}{(x-\alpha)(x-\beta)(x-\gamma)} = \frac A{x-\alpha} + \frac {B}{x-\beta} + \frac C{x-\gamma}.$

We leave it as an exercise to check that

$\begin{aligned} A &= \frac{ p\alpha^2 + q\alpha + r }{(\alpha - \beta)(\alpha - \gamma)}, \\ B &= \frac{ p\beta^2 + q\beta + r }{(\beta - \alpha)(\beta - \gamma)}, \\ C &= \frac{ p\gamma^2 + q\gamma + r }{(\gamma - \alpha)(\gamma - \beta)}. \end{aligned}$

Problem 5. Given distinct real numbers $\alpha,\beta$ , determine constants $A,B,C$ such that

$\displaystyle \frac{ \alpha^2+\beta^2 }{(x-\alpha)(x^2+\beta^2) } = \frac A{x-\alpha} + \frac {Bx+C}{x^2 + \beta^2}.$

Hence, evaluate

$\displaystyle \int \frac{ x^2 + x + 1 }{(x-2)(x^2+9) } \, \mathrm dx.$

(Click for Solution)

Solution. By observing that $0 = x^2 -x^2$ ,

$\begin{aligned} \frac{\alpha^2 + \beta^2}{(x-\alpha)(x^2 +\beta^2)} &= \frac{x^2 + \beta^2 - (x^2 - \alpha^2) }{(x-\alpha)(x^2+\beta^2)} \\ &= \frac{x^2 + \beta^2}{(x-\alpha)(x^2+\beta^2)} - \frac{x^2 -\alpha^2}{(x-\alpha)(x^2+\beta^2)} \\ &= \frac 1{x-\alpha} - \frac{(x-\alpha)(x+\alpha)}{(x-\alpha)(x^2+\beta^2)} \\ &= \frac 1{x-\alpha} - \frac{x+\alpha}{x^2+\beta^2}, \end{aligned}$

so that $A = 1, B = -1, C = -\alpha$ . Dividing both sides by $\alpha^2 +\beta^2$ for the next part,

$\displaystyle \frac 1{(x-\alpha)(x^2+\beta^2)} = \frac 1{\alpha^2+\beta^2} \left( \frac 1{x-\alpha} - \frac{x+\alpha}{x^2+\beta^2} \right).$

Using the solution in Problem 3,

$\begin{aligned} \frac{ x^2 + x + 1 }{(x-2)(x^2+9) } &= \frac{ x^2 + x + 1 }{x-2 } \cdot \frac 1{x^2+9} \\ &= \left(x+3 + \frac 7{x-2}\right)\cdot \frac 1{x^2+9} \\ &= \frac{x+3}{x^2+9} + \frac 7{(x-2)(x^2+9)}. \end{aligned}$

Using the first part with $\alpha = 2$ and $\beta = 3$ ,

$\begin{aligned} \frac 1{(x-2)(x^2+9)} &= \frac 1{2^2 + 3^2} \left(\frac 1{x-2} - \frac{x + 2}{x^2 + 3^2}\right) \\ &= \frac 1{13} \left(\frac 1{x-2} - \frac{x+2}{x^2+9}\right). \end{aligned}$

Combining the displays,

$\begin{aligned} \frac{ x^2 + x + 1 }{(x-2)(x^2+9) } &= \frac{x+3}{x^2+9} + \frac 7{(x-2)(x^2+9)} \\ &= \frac{x+3}{x^2+9} + \frac 7{13} \left(\frac 1{x-2} - \frac{x+2}{x^2+9}\right) \\ &= \frac 1{13} \left( \frac{7}{x-2} + \frac{13(x+3) - 7(x+2)}{x^2+9} \right) \\ &= \frac 1{13} \left( \frac{7}{x-2} + \frac{ 6x+ 25}{x^2+9} \right). \end{aligned}$

Using the chain rule,

$\begin{aligned} \frac{\mathrm d}{\mathrm dx} \ln(x^2 + 9) &= \frac{2x}{x^2+9}, \\ \frac{\mathrm d}{\mathrm dx} \tan^{-1}(x/3) &= \frac{1}{1 + (x/3)^2} \cdot \frac 13 = \frac{3}{x^2+9}. \end{aligned}$

Therefore,

$\begin{aligned} \int \frac{x}{x^2+9}\, \mathrm dx &= \frac 12 \ln(x^2+9) + C_1, \\ \int \frac{1}{x^2+9}\, \mathrm dx &= \frac 13 \tan^{-1}(x/3) + C_2, \end{aligned}$

By the linearity of integration,

$\begin{aligned} \int \frac{x^2+x+1}{(x-2)(x^2+9)}\, \mathrm dx &= \frac 1{13} \int \left( \frac{7}{x-2} + \frac{ 6x}{x^2+9}+ \frac{ 25}{x^2+9} \right)\, \mathrm dx \\ &= \frac 1{13}\left( \ln|x-2| + \frac 62 \ln(x^2 + 9) + \frac{25}{3} \tan^{-1}(x/3) \right) + C \\ &= \frac 1{13} \ln|x-2| + \frac{3}{13} \ln(x^2+9) + \frac{25}{39} \tan^{-1}(x/3) + C. \end{aligned}$

Remark 3. The corresponding partial fraction decomposition with distinct $\alpha,\beta$ is as follows:

$\displaystyle \frac{px^2 + qx + r}{(x-\alpha)(x^2+\beta^2)} = \frac A{x-\alpha} + \frac {Bx+C}{x^2 + \beta^2}.$

—Joel Kindiak, 29 Mar 26, 1546H

June 2, 2026
Imaginary Numbers

Define the 2 × 2 matrices $\mathbf I$ and $\mathbf J$ as follows:

$\mathbf I = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}, \quad \mathbf J = \begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix}.$

Problem 1. Evaluate $\mathbf J^2$ .

(Click for Solution)

Solution. Using matrix multiplication,

$\begin{aligned} \mathbf J^2 &= \begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix} \begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix} \\ &= \begin{bmatrix} \begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix} \begin{bmatrix} 0 \\ 1 \end{bmatrix} & \begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix} \begin{bmatrix} -1 \\ 0 \end{bmatrix} \end{bmatrix} \\ &= \begin{bmatrix} 0\begin{bmatrix} 0 \\ 1\end{bmatrix} + 1\begin{bmatrix} -1 \\ 0\end{bmatrix} & (-1)\begin{bmatrix} 0 \\ 1\end{bmatrix} + 0\begin{bmatrix} -1 \\ 0\end{bmatrix} \end{bmatrix} \\ &= \begin{bmatrix} -1 & 0 \\ 0 & -1 \end{bmatrix} = (-1) \begin{bmatrix}1 & 0 \\ 0 & 1 \end{bmatrix} \\ & = (-1)\mathbf I = -\mathbf I. \end{aligned}$

Let $a,b,c,d$ be real numbers.

Problem 2. Show that

$a \mathbf I + b \mathbf J = c \mathbf I + d \mathbf J$

implies that $a = c$ and $b = d$ .

(Click for Solution)

Solution. Using the definition of $\mathbf I$ and $\mathbf J$ ,

$\begin{aligned} a\begin{bmatrix}1 & 0 \\ 0 & 1 \end{bmatrix} + b \begin{bmatrix}0 & -1 \\ 1 & 0 \end{bmatrix} &= c \begin{bmatrix}1 & 0 \\ 0 & 1 \end{bmatrix} + d \begin{bmatrix}0 & -1 \\ 1 & 0 \end{bmatrix} \\ \begin{bmatrix}a & 0 \\ 0 & a \end{bmatrix} + \begin{bmatrix}0 & -b \\ b & 0 \end{bmatrix} &= \begin{bmatrix}c & 0 \\ 0 & c \end{bmatrix} + \begin{bmatrix}0 & -d \\ d & 0 \end{bmatrix} \\ \begin{bmatrix}a & -b \\ b & a \end{bmatrix} &= \begin{bmatrix}c & -d \\ d & c \end{bmatrix}.\end{aligned}$

Comparing the entries of the matrices on both sides yields $a = c$ and $b = d$ .

Define $\mathbf Z := a \mathbf I + b \mathbf J$ and $\mathbf W := c \mathbf I + d \mathbf J$ .

Problem 3. Show that $\mathbf Z \mathbf W = \mathbf W \mathbf Z$ .

(Click for Solution)

Solution. We observe that $\mathbf I \mathbf J = \mathbf J = \mathbf J \mathbf I$ . By Problem 1,

$\begin{aligned}\mathbf Z \mathbf W &= (a \mathbf I + b \mathbf J)(c \mathbf I + d \mathbf J) \\ &= (ac) \mathbf I^2 + (ad) \mathbf I \mathbf J + (bc) \mathbf J \mathbf I + (bd) \mathbf J^2 \\ &= (ac) \mathbf I +(ad)\mathbf J + (bc)\mathbf J + (bd)(-1)\mathbf I \\ &= (ac - bd) \mathbf I + (ad + bc) \mathbf J.\end{aligned}$

Similarly, $\mathbf W \mathbf Z = (ca - db) \mathbf I + (da + cb) \mathbf J$ . Therefore, $\mathbf Z \mathbf W = \mathbf W \mathbf Z$ .

Define $\mathbf Z^* := a \mathbf I - b \mathbf J$ .

Problem 4. Evaluate $\mathbf Z \mathbf Z^*$ . Hence, if $\mathbf Z \neq \mathbf 0$ , construct a matrix $\mathbf Z^{-1}$ with the property that

$\mathbf Z \mathbf Z^{-1} = \mathbf Z^{-1} \mathbf Z = \mathbf I.$

(Click for Solution)

Solution. Using the multiplication in Problem 3 but setting $c = a$ and $d = -b$ ,

$\mathbf Z \mathbf Z^* = (a^2 - b(-b))\mathbf I + (a(-b) + ba)\mathbf J = (a^2 + b^2) \mathbf I.$

If $\mathbf Z \neq \mathbf 0$ , then either $a \neq 0$ or $b \neq 0$ , so that $a^2 + b^2 \neq 0$ . Therefore,

$\begin{aligned} \mathbf Z\left( \frac 1{a^2 + b^2} \mathbf Z^* \right) &= \frac 1{a^2 + b^2} \mathbf Z \mathbf Z^* \\ &= \frac 1{a^2 + b^2} (a^2 + b^2) \mathbf I \\ &= \mathbf I. \end{aligned}$

Denoting $\alpha := (\alpha^2 + \beta^2)^{-1}$ , define $\mathbf Z^{-1} := \alpha \mathbf Z^*$ , so that

$\mathbf Z \mathbf Z^{-1} = \mathbf Z (\alpha \mathbf Z^*) = \mathbf I.$

By Problem 3, $\mathbf Z^{-1} \mathbf Z = \mathbf Z \mathbf Z^{-1} = \mathbf I$ .

Problem 5. Determine the two possible matrices $\mathbf Z = a\mathbf I + b \mathbf J$ such that

$\mathbf Z^2 - 4\mathbf Z + 13\mathbf I = \mathbf 0.$

(Click for Solution)

Solution. Write $\mathbf Z = a\mathbf I + b \mathbf J$ . Using the multiplication in Problem 3 with $c = a$ and $d = b$ ,

$\mathbf Z^2 = (a^2 - b^2) \mathbf I + (2ab) \mathbf J.$

Therefore,

$((a^2 - b^2) \mathbf I + (2ab) \mathbf J) - 4(a\mathbf I + b\mathbf J) + 13\mathbf I = \mathbf 0.$

Grouping the terms together,

$(a^2 - 4a - b^2 + 13) \mathbf I + 2b (a - 2) \mathbf J = \mathbf 0 = 0 \mathbf I + 0\mathbf J.$

Using Problem 2,

$a^2 - 4a - b^2 + 13 = 0\quad \mathbf{and} \quad 2b (a - 2) = 0.$

Using the second equation, either $b = 0$ or $a = 2$ . If $b = 0$ , then substituting into the first equation,

$a^2 - 4a+ 13 = 0.$

However, this equation has discriminant $(-4)^2 - 4 \cdot 1 \cdot 13 = - 36 < 0$ , and so there are no real roots to the equation, a contradiction.

Therefore, we must have $a = 2$ . Substituting into the first equation again,

$2^2 - 4 \cdot 2 - b^2 + 13 = 0 \quad \Rightarrow \quad b^2 = 9.$

Therefore, $b = \pm 3$ . Hence, the two possible matrices for $\mathbf Z$ are

$\mathbf Z = 2\mathbf I +3\mathbf J \quad \text{or} \quad \mathbf Z = 2\mathbf I - 3 \mathbf J.$

We can condense them to the expression $\mathbf Z = 2\mathbf I \pm 3 \mathbf J$ .

Remark 1. By denoting $a \mathbf I \equiv a$ and $b\mathbf J \equiv bi$ , we have created a model for the complex numbers $a + bi$ , where $i^2 = -1$ by Problem 1. In particular, numbers of the form $bi$ are called purely imaginary. The solution to Problem 5 would then look like

$z^2 - 2z + 5 = 0 \quad \iff \quad z = 2 \pm 3i.$

The letter ‘z‘ is used to denote a complex number by convention. Furthermore, the calculation $i^2 = -1$ motivates the (somewhat debatable) notation $\sqrt{-1} := i$ . For more information, see this post.

—Joel Kindiak, 25 Mar 26, 0056H

June 1, 2026
Milk Tea Matrices
Let’s drink some milk tea! Consider the three milk tea chains in Singapore: Chagee, Koi, and LiHo. (There are many others, so please experiment with these other chains if you wish.)

Let $c_t, k_t, \ell_t$ denote the proportion of the population that drinks Chagee, Koi, and LiHo respectively at time $t$ , measured in months. For simplicity, assume that customers are exclusive and loyal—Chagee drinkers do not drink from Koi and vice versa.

Before Chagee came on the scene, the milk tea scene was mostly split between Koi and LiHo, so that $c_0 = 0$ . Let’s suppose that 50% of the population drank Koi and 50% of the population drank LiHo at time t, measured in months, so that $k_0 = 0.5$ and $\ell_0 = 0.5$ . Since LiHo is newer than Koi, suppose the following changes happen after each month:
- 2% of Koi drinkers switch to LiHo,
- 1% of LiHo drinkers switch to Koi.
Example 1. At the end of the first month, what proportion of the population would be Koi drinkers?

Solution. We can represent these changes using the following diagram. Arrows once again represent similar ideas as they do in probability tree diagrams: the arrow from Koi to LiHo with label 0.02 means that 0.02 of Koi drinkers switch to LiHo.

Recall that $c_1, k_1, \ell_1$ denotes the proportion of the population that drinks Chagee, Koi, and LiHo respectively at the end of month $1$ . Since Chagee has not yet existed in the Singapore market, $c_1 = 0$ . Now $k_1$ is determined by two quantities:
- the 98% of Koi customers who remained loyal to Koi,
- the 1% of LiHo customers who switched to Koi.
Therefore,

$k_1 = 0.98 \cdot k_0 + 0.01 \cdot \ell_0.$

Similarly,

$\ell_1 = 0.02\cdot k_0 + 0.99\cdot \ell_0.$

Since $k_0 = \ell_0 = 0.5$ , we can substitute them into both equations and obtain $k_1$ as our desired answer:

$\begin{aligned} k_1 &= 0.98 \cdot 0.5 + 0.01 \cdot 0.5 = 0.495, \\ \ell_1 &= 0.02 \cdot 0.5 + 0.99 \cdot 0.5 = 0.505.\end{aligned}$

Indeed, Koi lost a small amount of its market share, as predicted.

Now suppose the proportion of drinkers as per Example 1. For reasons that will become apparent later, let’s denote the market shares as vectors:

$\mathbf x_t = \begin{bmatrix} c_t \\ k_t \\ \ell_t \end{bmatrix}\quad \Rightarrow \quad \mathbf x_1 = \begin{bmatrix} c_1 \\ k_1 \\ \ell_1 \end{bmatrix} = \begin{bmatrix} 0 \\ 0.495 \\ 0.505 \end{bmatrix}.$

Due to aggressive social media marketing by Chagee’s youthful team, suppose at the end of every month, the following changes happen after each month:
- 4% of Koi drinkers switch to Chagee,
- 3% of LiHo drinkers switch to Chagee.
- All Chagee drinkers keep drinking Chagee.
Question 1. At the end of the second month, what proportion of the population would be Chagee drinkers?

We can represent these changes using the following diagram, now including Chagee in our calculations.

Using similar analysis to Example 1, we write out the following three equations:

$\begin{aligned} c_2 &= 1 \cdot c_1 + 0.04 \cdot k_1 + 0.03 \cdot \ell_1, \\ k_2 &= 0 \cdot c_1 + 0.94 \cdot k_1 + 0.01 \cdot \ell_1, \\ \ell_2 &= 0 \cdot c_1 + 0.02 \cdot k_2 + 0.96 \cdot \ell_2.\end{aligned}$

Use vector notation to simplify our work:

$\begin{aligned} \begin{bmatrix} c_2 \\ k_2 \\ \ell_2 \end{bmatrix} &= \begin{bmatrix} 1 \cdot c_1 + 0.04 \cdot k_1 + 0.03 \cdot \ell_1 \\ 0 \cdot c_1 + 0.94 \cdot k_1 + 0.01 \cdot \ell_1 \\ 0 \cdot c_1 + 0.02 \cdot k_2 + 0.96 \cdot \ell_2 \end{bmatrix} \\ &= \begin{bmatrix} 1 \cdot c_1 \\ 0 \cdot c_1 \\ 0 \cdot c_1 \end{bmatrix} + \begin{bmatrix} 0.04 \cdot k_1 \\ 0.94 \cdot k_1 \\ 0.02 \cdot k_2 \end{bmatrix} + \begin{bmatrix} 0.03 \cdot \ell_1 \\ 0.01 \cdot \ell_1 \\ 0.96 \cdot \ell_2 \end{bmatrix} \\ &= c_1 \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} + k_1 \begin{bmatrix} 0.04 \\ 0.94 \\ 0.02 \end{bmatrix} + \ell_1 \begin{bmatrix} 0.03 \\ 0.01 \\ 0.96 \end{bmatrix}\end{aligned}$

At this point in time, all we need to do is to substitute $c_1 = 0$ , $k_1 = 0.495$ , and $\ell_1 = 0.505$ to obtain our answer (as an exercise, check that $c_2 = 0.03495$ ). But what if we wanted to re-use this information to answer more sophisticated questions? Wouldn’t it be nice to condense this information even further?

Remark 1. The setups we gave are specific kinds of Markov chains and more generally, stochastic processes, an model of describing sequences of random variables that satisfy sufficiently nice probability properties.

We can think of the three vectors as the “ingredients” needed that produce our result, while the numbers $c_1, k_1, \ell_1$ denote a “recipe” in combining these ingredients. To that end, mathematicians and statisticians stack the ingredients side-by-side as a matrix, and place the “recipe” vector on the right-hand side:

$\displaystyle c_1 \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} + k_1 \begin{bmatrix} 0.04 \\ 0.94 \\ 0.02 \end{bmatrix} + \ell_1 \begin{bmatrix} 0.03 \\ 0.01 \\ 0.96 \end{bmatrix} = \begin{bmatrix} 1 & 0.04 & 0.03 \\ 0 & 0.94 & 0.01 \\ 0 & 0.02 & 0.96 \end{bmatrix} \begin{bmatrix} c_1 \\ k_1 \\ \ell_1 \end{bmatrix}.$

This is, in fact, the essential origin-story of the matrix. It wasn’t a-priori defined as a table of numbers equipped with some out-of-the-blue calculations; it was simply a summary of changing data!

Remark 2. Notice that the notion of a 3-dimensional vector just arose from our problem setup. In the physical world, we can still visualise 3 dimensions, but more complicated setups would require the use of $n$ -dimensional vectors, where $n > 3$ . In these settings, physical intuition fails, but we can still reason about them through formalised mathematical definitions.

Definition 1. An $m \times 1$ matrix is an $m$ -dimensional vector. For example, we have the following $3 \times 1$ matrix $\mathbf x_1$ :

$\mathbf x_1 = \begin{bmatrix} c_1 \\ k_1 \\ \ell_1 \end{bmatrix} = \begin{bmatrix} 0 \\ 0.495 \\ 0.505\end{bmatrix}.$

An $m \times n$ matrix is a collection of $n$ vectors, each of them $m$ -dimensional, placed beside each other:

$\mathbf A = \begin{bmatrix} \mathbf a_1 & \cdots & \mathbf a_n \end{bmatrix}$

For example, we have the following $3 \times 3$ matrix $\mathbf A$ and $2 \times 3$ matrix $\mathbf B$ :

$\mathbf A = \begin{bmatrix} 1 & 0.04 & 0.03 \\ 0 & 0.94 & 0.01 \\ 0 & 0.02 & 0.96 \end{bmatrix}, \quad \mathbf B = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}.$

Furthermore, given an $n$ -dimensional vector $\mathbf v = \begin{bmatrix} v_1 \\ \vdots \\ v_n \end{bmatrix}$ , we define the expression $\mathbf A \mathbf v$ to mean

$\mathbf A \mathbf v \equiv \begin{bmatrix} \mathbf a_1 & \cdots & \mathbf a_n \end{bmatrix}\begin{bmatrix} v_1 \\ \vdots \\ v_n \end{bmatrix} := v_1 \mathbf a_1 + \cdots + v_n \mathbf a_n.$

Define the zero matrix by $\mathbf 0 := \begin{bmatrix} \mathbf 0 & \cdots & \mathbf 0 \end{bmatrix}$ .

Example 2. Evaluate the expression

$\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \begin{bmatrix} 7 \\ 8 \\ 9 \end{bmatrix},$

giving your answer in terms of a 2-dimensional vector.

Solution. Using Definition 1,

$\begin{aligned} \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \begin{bmatrix} 7 \\ 8 \\ 9 \end{bmatrix} &= 7 \begin{bmatrix} 1 \\ 4 \end{bmatrix} + 8 \begin{bmatrix} 2 \\ 5 \end{bmatrix} + 9 \begin{bmatrix} 3 \\ 6 \end{bmatrix} \\ &= \begin{bmatrix} 7 \\ 28 \end{bmatrix} + \begin{bmatrix} 16 \\ 40 \end{bmatrix} + \begin{bmatrix} 27 \\ 54 \end{bmatrix} \\ &= \begin{bmatrix} 7 + 16 + 27 \\ 28 + 40 + 54 \end{bmatrix} \\ &= \begin{bmatrix} 50 \\ 114 \end{bmatrix}. \end{aligned}$

Remark 2. The expression in Example 2, being contrived, doesn’t represent any particular real-world example. However, its calculations are identical to that of the milk tea example (which is, itself, not entirely faithful to reality, but simplified for analogous purposes).

Example 3. Given an $m \times n$ matrix $\mathbf A$ , how many dimensions must the vector $\mathbf v$ have so that $\mathbf A\mathbf v$ makes sense?

Solution. Write

$\mathbf A = \begin{bmatrix} \mathbf a_1 & \cdots & \mathbf a_n \end{bmatrix},$

where each $\mathbf a_1,\dots,\mathbf a_n$ is an $m$ -dimensional vector. Since $\mathbf A$ has $n$ “ingredients” that combined with “recipe” vector $\mathbf v$ gives the “dishes” $\mathbf A\mathbf v$ , our “recipe” vector $\mathbf v$ should have $n$ components. That is, $\mathbf v$ should be $n$ -dimensional.

If we could add vectors, could we add matrices? The real question is, why not? Consider the two expressions below:

$\begin{aligned} \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \begin{bmatrix} 7 \\ 8 \\ 9 \end{bmatrix},\quad \begin{bmatrix} 10 & 11 & 12 \\ 13 & 14 & 15 \end{bmatrix} \begin{bmatrix} 7 \\ 8 \\ 9 \end{bmatrix} \end{aligned}.$

Let’s expand both terms using the recipe-ingredient analogy:

$\begin{aligned} \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \begin{bmatrix} 7 \\ 8 \\ 9 \end{bmatrix} &= 7 \begin{bmatrix} 1 \\ 4 \end{bmatrix} + 8 \begin{bmatrix} 2 \\ 5 \end{bmatrix} + 9 \begin{bmatrix} 3 \\ 6 \end{bmatrix},\\ \begin{bmatrix} 10 & 11 & 12 \\ 13 & 14 & 15 \end{bmatrix} \begin{bmatrix} 7 \\ 8 \\ 9 \end{bmatrix} &= 7 \begin{bmatrix} 10 \\ 13 \end{bmatrix} + 8 \begin{bmatrix} 11 \\ 14 \end{bmatrix} + 9 \begin{bmatrix} 12 \\ 15 \end{bmatrix}. \end{aligned}$

Now let’s add both sides of the equation together:

$\begin{aligned} &\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \begin{bmatrix} 7 \\ 8 \\ 9 \end{bmatrix} + \begin{bmatrix} 10 & 11 & 12 \\ 13 & 14 & 15 \end{bmatrix} \begin{bmatrix} 7 \\ 8 \\ 9 \end{bmatrix} \\ &= 7 \left( \begin{bmatrix} 1 \\ 4 \end{bmatrix} + \begin{bmatrix} 10 \\ 13 \end{bmatrix} \right) + 8 \left(\begin{bmatrix} 2 \\ 5 \end{bmatrix} + \begin{bmatrix} 11 \\ 14 \end{bmatrix}\right) + 9 \left(\begin{bmatrix} 3 \\ 6 \end{bmatrix} + \begin{bmatrix} 12 \\ 15 \end{bmatrix}\right) \\ &= 7 \begin{bmatrix} 1 + 10 \\ 4 + 13 \end{bmatrix} + 8 \begin{bmatrix} 2 + 11 \\ 5 + 14 \end{bmatrix} + 9 \begin{bmatrix} 3 + 12 \\ 6 + 15 \end{bmatrix} \\ &= \begin{bmatrix} 1 + 10 & 2 + 11 & 3 + 12 \\ 4 + 13 & 5 + 14 & 6 + 15\end{bmatrix} \begin{bmatrix} 7 \\ 8 \\ 9 \end{bmatrix}, \end{aligned}$

where we consolidated our calculations using the recipe-ingredient analogy again. Therefore, it is reasonable to define

$\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} + \begin{bmatrix} 10 & 11 & 12 \\ 13 & 14 & 15 \end{bmatrix} := \begin{bmatrix} 1 + 10 & 2 + 11 & 3 + 12 \\ 4 + 13 & 5 + 14 & 6 + 15\end{bmatrix},$

so that for any 3-dimensional vector $\mathbf u = \begin{bmatrix}u_1 \\ u_2 \\ u_3\end{bmatrix}$ ,

$\left( \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} + \begin{bmatrix} 10 & 11 & 12 \\ 13 & 14 & 15 \end{bmatrix} \right)\mathbf u = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \mathbf u + \begin{bmatrix} 10 & 11 & 12 \\ 13 & 14 & 15 \end{bmatrix} \mathbf u.$

A similar thought process works for multiplying a matrix by a number (i.e. scalar multiplication) and matrix subtraction.

Definition 2. Let $\mathbf A =\begin{bmatrix} \mathbf a_1 & \cdots & \mathbf a_n\end{bmatrix}$ and $\mathbf B =\begin{bmatrix} \mathbf b_1 & \cdots & \mathbf b_n\end{bmatrix}$ be $m \times n$ matrices. Define matrix addition “ingredient-wise” by

$\mathbf A + \mathbf B :=\begin{bmatrix} \mathbf a_1 + \mathbf b_1 & \cdots & \mathbf a_n + \mathbf b_n\end{bmatrix},$

so that for any $n$ -dimensional vector $\mathbf v$ , $(\mathbf A + \mathbf B) \mathbf v = \mathbf A \mathbf v + \mathbf B \mathbf v$ .

Similarly, given any real number $c$ , define scalar multiplication “ingredient-wise” by

$c\mathbf A :=\begin{bmatrix} c\mathbf a_1 & \cdots & c\mathbf a_n\end{bmatrix},$

so that for any $n$ -dimensional vector $\mathbf v$ , $(c\mathbf A) \mathbf v = \mathbf A(c\mathbf v) = c(\mathbf A \mathbf v)$ .

In particular, define $-\mathbf A := (-1)\mathbf A$ , and $\mathbf A - \mathbf B := \mathbf A + (-\mathbf B)$ .

Example 4. Show that $(\mathbf A -\mathbf B)\mathbf v = \mathbf A \mathbf v - \mathbf B \mathbf v$ .

Solution. By Definition 2 and its implications,

$\begin{aligned} (\mathbf A -\mathbf B)\mathbf v &= (\mathbf A + (-\mathbf B))\mathbf v \\ &= \mathbf A\mathbf v + (-\mathbf B)\mathbf v \\ &= \mathbf A\mathbf v + ((-1)\mathbf B)\mathbf v \\ &= \mathbf A\mathbf v + (-1)(\mathbf B \mathbf v) \\ &= \mathbf A \mathbf v - \mathbf B \mathbf v. \end{aligned}$

Example 5. Evaluate the following expressions:

$\begin{bmatrix} 1 & -2 \\ 3 & 4 \\ 5 & -6 \end{bmatrix} + \begin{bmatrix} -10 & 11 \\ 12 & 13 \\ -14 & -15 \end{bmatrix},\quad 7\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix},\quad \begin{bmatrix} 1 & -2 \\ 3 & 4 \\ 5 & -6 \end{bmatrix} - \begin{bmatrix} -10 & 11 \\ 12 & 13 \\ -14 & -15 \end{bmatrix}.$

Solution. Using Definition 2,

$\begin{aligned} \begin{bmatrix} 1 & -2 \\ 3 & 4 \\ 5 & -6 \end{bmatrix} + \begin{bmatrix} -10 & 11 \\ 12 & 13 \\ -14 & -15 \end{bmatrix} &= \begin{bmatrix} -9 & 9 \\ 15 & 17 \\ -9 & -21\end{bmatrix},\\ 7\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} &= \begin{bmatrix} 7 & 14 & 21 \\ 28 & 35 & 42 \end{bmatrix},\\ \begin{bmatrix} 1 & -2 \\ 3 & 4 \\ 5 & -6 \end{bmatrix} - \begin{bmatrix} -10 & 11 \\ 12 & 13 \\ -14 & -15 \end{bmatrix} &= \begin{bmatrix} 11 & -13 \\ -9 & -9 \\ 19 & 9 \end{bmatrix}.\end{aligned}$

Theorem 1. Given $m\times n$ matrices $\mathbf A,\mathbf B,\mathbf C$ and scalars $c, d$ , the following matrix properties hold:
- $\mathbf A + \mathbf B = \mathbf B + \mathbf A$ ,
- $(\mathbf A + \mathbf B) + \mathbf C = \mathbf A + (\mathbf B + \mathbf C)$ ,
- $\mathbf A + \mathbf 0 = \mathbf 0 + \mathbf A$ ,
- $\mathbf A + (-\mathbf A) = (-\mathbf A) + \mathbf A = \mathbf 0$ ,
- $c(d\mathbf A) = (cd)\mathbf A$ ,
- $1 \mathbf A = \mathbf A$ ,
- $c(\mathbf A +\mathbf B) = c\mathbf A + c\mathbf B$ ,
- $(c + d)\mathbf A = c\mathbf A + d\mathbf A$ .
Proof. Left as a tedious (but ultimately meaningful) exercise.

How might we multiply two matrices together? Consider the expression

$\mathbf A \begin{bmatrix} \mathbf v_1 & \mathbf v_2 \end{bmatrix}.$

Using the ingredient-recipe analogy, the matrix on the right-hand side has two recipes, not one. Therefore, we can think of the expression as cooking two dishes; this is our definition of matrix multiplication:

$\mathbf A \begin{bmatrix} \mathbf v_1 & \mathbf v_2 \end{bmatrix} := \begin{bmatrix} \mathbf A \mathbf v_1 & \mathbf A \mathbf v_2 \end{bmatrix}.$

And we know how to compute the “dishes” $\mathbf A \mathbf v_1$ and $\mathbf A \mathbf v_2$ using Definition 1.

Example 6. Given an $m \times n$ matrix $\mathbf A$ , what must the size of the matrix $\mathbf B$ be in order for the expression $\mathbf A\mathbf B$ to make sense?

Solution. Write

$\mathbf B = \begin{bmatrix} \mathbf b_1 & \cdots & \mathbf b_k \end{bmatrix}.$

By definition,

$\mathbf A \mathbf B = \mathbf A \begin{bmatrix} \mathbf b_1 & \cdots & \mathbf b_k \end{bmatrix} = \begin{bmatrix} \mathbf A \mathbf b_1 & \cdots & \mathbf A \mathbf b_k \end{bmatrix}.$

In order for each $\mathbf A \mathbf b_1,\dots \mathbf A \mathbf b_k$ to make sense, by Example 3, each $\mathbf b_1, \dots, \mathbf b_k$ should be $n$ -dimensional. Since there are $k$ columns in $\mathbf B$ , the size of $\mathbf B$ must be $n \times k$ , where $k$ can be any positive integer.

Example 7. Evaluate the expression

$\begin{bmatrix} 1 & -2 \\ 3 & 4 \\ 5 & -6 \end{bmatrix} \begin{bmatrix} -10 & 12 & -14 \\ 11 & 13 & -15 \end{bmatrix}.$

Solution. Applying Definition 1 to each column,

$\begin{aligned} \begin{bmatrix} 1 & -2 \\ 3 & 4 \\ 5 & -6 \end{bmatrix} \begin{bmatrix} -10 \\ 11 \end{bmatrix} &= -10 \begin{bmatrix} 1 \\ 3 \\ 5 \end{bmatrix} + 11 \begin{bmatrix} -2 \\ 4 \\ -6 \end{bmatrix} = \begin{bmatrix} -32 \\ 14 \\ -116 \end{bmatrix}, \\ \begin{bmatrix} 1 & -2 \\ 3 & 4 \\ 5 & -6 \end{bmatrix} \begin{bmatrix} 12 \\ 13 \end{bmatrix} &= 12 \begin{bmatrix} 1 \\ 3 \\ 5 \end{bmatrix} + 13 \begin{bmatrix} -2 \\ 4 \\ -6 \end{bmatrix} = \begin{bmatrix} -14 \\ 88 \\ -18 \end{bmatrix}, \\ \begin{bmatrix} 1 & -2 \\ 3 & 4 \\ 5 & -6 \end{bmatrix} \begin{bmatrix} -14 \\ -15 \end{bmatrix} &= -14 \begin{bmatrix} 1 \\ 3 \\ 5 \end{bmatrix} + (-15) \begin{bmatrix} -2 \\ 4 \\ -6 \end{bmatrix} = \begin{bmatrix} 16 \\ -102 \\ 20 \end{bmatrix}. \end{aligned}$

Combining the results,

$\begin{bmatrix} 1 & -2 \\ 3 & 4 \\ 5 & -6 \end{bmatrix} \begin{bmatrix} -10 & 12 & -14 \\ 11 & 13 & -15 \end{bmatrix} = \begin{bmatrix} -32 & -14 & 16 \\ 14 & 88 & -102 \\ -116 & -18 & 20 \end{bmatrix}.$

Example 8. Given $\mathbf A = \begin{bmatrix} 1 &2 & 3 \\ 4 & 5 & 6 \\ 7 & 9 & 10 \end{bmatrix}$ , evaluate $\mathbf A^2 := \mathbf A \mathbf A$ .

Solution. By definition,

$\mathbf A^2 = \begin{bmatrix} 1 &2 & 3 \\ 4 & 5 & 6 \\ 7 & 9 & 10 \end{bmatrix}\begin{bmatrix} 1 &2 & 3 \\ 4 & 5 & 6 \\ 7 & 9 & 10 \end{bmatrix}.$

Applying Definition 1 to each column,

$\begin{aligned}\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 9 & 10 \end{bmatrix}\begin{bmatrix} 1 \\ 4 \\ 7 \end{bmatrix} &= 1\begin{bmatrix} 1 \\ 4 \\ 7 \end{bmatrix} + 4\begin{bmatrix} 2 \\ 5 \\ 9 \end{bmatrix} + 7\begin{bmatrix} 3 \\ 6 \\ 10 \end{bmatrix}= \begin{bmatrix} 30 \\ 66 \\ 113 \end{bmatrix}, \\ \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 9 & 10 \end{bmatrix}\begin{bmatrix} 2 \\ 5 \\ 9 \end{bmatrix} &= 2\begin{bmatrix} 1 \\ 4 \\ 7 \end{bmatrix} + 5\begin{bmatrix} 2 \\ 5 \\ 9 \end{bmatrix} + 9\begin{bmatrix} 3 \\ 6 \\ 10 \end{bmatrix}= \begin{bmatrix} 39 \\ 87 \\ 149 \end{bmatrix}, \\ \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 9 & 10 \end{bmatrix}\begin{bmatrix} 3 \\ 6 \\ 10 \end{bmatrix} &= 3\begin{bmatrix} 1 \\ 4 \\ 7 \end{bmatrix} + 6\begin{bmatrix} 2 \\ 5 \\ 9 \end{bmatrix} + 10\begin{bmatrix} 3 \\ 6 \\ 10 \end{bmatrix}= \begin{bmatrix} 45 \\ 102 \\ 175 \end{bmatrix}.\end{aligned}$

Combining the results,

$\mathbf A^2 = \begin{bmatrix} 1 &2 & 3 \\ 4 & 5 & 6 \\ 7 & 9 & 10 \end{bmatrix}\begin{bmatrix} 1 &2 & 3 \\ 4 & 5 & 6 \\ 7 & 9 & 10 \end{bmatrix} = \begin{bmatrix} 30 & 39 & 45 \\ 66 & 87 & 102 \\ 113 & 149 & 175 \end{bmatrix}.$

Example 9. Show that

$\begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix}\begin{bmatrix} b_{11} & b_{12} & \cdots & b_{1k} \\ b_{21} & b_{22} & \cdots & b_{2k} \\ \vdots & \vdots & \ddots & \vdots \\ b_{n1} & b_{n2} & \cdots & b_{nk} \end{bmatrix} = \begin{bmatrix} c_{11} & c_{12} & \cdots & c_{1k} \\ c_{21} & c_{22} & \cdots & c_{2k} \\ \vdots & \vdots & \ddots & \vdots \\ c_{m1} & c_{m2} & \cdots & c_{mk} \end{bmatrix}$

where for any $i,j$ ,

$c_{ij} = a_{i1}b_{1j} + a_{i2}b_{2j} + \cdots + a_{in}b_{nj}.$

Solution. By the ingredient-recipe analogy, the $j$ -th column would be given by

$\begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix}\begin{bmatrix} b_{1j} \\ b_{2j} \\ \vdots \\ b_{nj} \end{bmatrix} = \begin{bmatrix} c_{1j} \\ c_{2j} \\ \vdots \\ c_{mj}\end{bmatrix}.$

Expanding the left-hand side,

$\begin{aligned} \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix} \begin{bmatrix} b_{1j} \\ b_{2j} \\ \vdots \\ b_{nj} \end{bmatrix} &= b_{1j} \begin{bmatrix} a_{11} \\ a_{21} \\ \vdots \\ a_{m1} \end{bmatrix} + b_{2j} \begin{bmatrix} a_{12} \\ a_{22} \\ \vdots \\ a_{m2} \end{bmatrix} + \cdots b_{nj} \begin{bmatrix} a_{1n} \\ a_{2n} \\ \vdots \\ a_{mn} \end{bmatrix} \\ &= \begin{bmatrix} a_{11} b_{1j} \\ a_{21} b_{1j} \\ \vdots \\ a_{m1} b_{1j} \end{bmatrix} + \begin{bmatrix} a_{12} b_{2j} \\ a_{22} b_{2j} \\ \vdots \\ a_{m2} b_{2j} \end{bmatrix} + \cdots + \begin{bmatrix} a_{1n} b_{nj} \\ a_{2n} b_{nj} \\ \vdots \\ a_{mn} b_{nj} \end{bmatrix} \\ &= \begin{bmatrix} a_{11} b_{1j} + a_{12} b_{2j} + \cdots + a_{1n} b_{nj} \\ a_{21} b_{1j} + a_{22} b_{2j} + \cdots + a_{2n} b_{nj} \\ \vdots \\ a_{m1} b_{1j} + a_{m2} b_{2j} + \cdots + a_{mn} b_{nj} \end{bmatrix}. \end{aligned}$

Therefore,

$\begin{bmatrix} a_{11} b_{1j} + a_{12} b_{2j} + \cdots + a_{1n} b_{nj} \\ a_{21} b_{1j} + a_{22} b_{2j} + \cdots + a_{2n} b_{nj} \\ \vdots \\ a_{m1} b_{1j} + a_{m2} b_{2j} + \cdots + a_{mn} b_{nj} \end{bmatrix} = \begin{bmatrix} c_{1j} \\ c_{2j} \\ \vdots \\ c_{mj}\end{bmatrix}.$

In particular, by comparing the $i$ -th row,

$c_{ij} = a_{i1} b_{1j} + a_{i2} b_{2j} + \cdots + a_{in} b_{nj}.$

Remark 3. Example 9 is the conventional definition of matrix multiplication, which I have avoided to define a priori since it seems contrived and miscellaneous, rather than the current presentation which shows how matrix multiplication is a necessary consequence of the information-preserving properties that we aimed to achieve.

And that’s all for matrices! Matrices, at the O-level is simply a tool to summarise systems of linear equations. All of its arithmetical properties simply arise from preserving said information. When put together with vectors, we get the all-encompassing study of linear algebra. Here are some further questions you might think about involving matrices and vectors.

Consider the matrix $\mathbf A = \begin{bmatrix} 1 &2 & 3 \\ 4 & 5 & 6 \\ 7 & 9 & 10 \end{bmatrix}$ and the vector $\mathbf b = \begin{bmatrix}11 \\ 12 \\ 13\end{bmatrix}$ .
- What vector $\mathbf x$ would satisfy the equation $\mathbf A \mathbf x = \mathbf b$ ?
- Is it possible to divide by $\mathbf A$ ?
- Are there numbers $\lambda$ and vectors $\mathbf v$ such that $\mathbf A \mathbf v = \lambda \mathbf v$ ?
- Is it possible to compute $\mathbf A^{2026}$ in an efficient manner?
- Does the $\mathbf A \mathbf x = \mathbf b$ have at least some best approximation?
These questions turn out to be basic problems in undergraduate linear algebra, and are used all the time in applied STEM, like physics, engineering, economics, and finance.

But for now, that is it for O-level mathematics!

—Joel Kindiak, 18 Mar 26, 1612H
May 29, 2026
Solving Cubic Equations

Remark 1. This writeup is a fresh writeup because somehow, and tragically, the original post got permanently deleted.

Our goal is to answer a simple question: how do we solve the cubic equation? Namely, given constants $a,b,c,d$ with $a \neq 0$ , if we know that the real number $x$ satisfies the equation

$ax^3 + bx^2 + cx + d = 0,$

how might we determine the possible values of $x$ ? Firstly, we had better be sure that this equation can be solved, and the power of $3$ is the ticket to why that holds.

Theorem 1. The cubic equation $ax^3 + bx^2 + cx + d = 0$ has at least one real solution.

Proof. Omitted; we can achieve this result by adapting the proof of Theorem 2 in this post.

There is a general “cubic” formula, but we shall work with special cases in which the root in Theorem 1 could be obtained with trial-and-error.

We abbreviate the left-hand side by $f(x) = ax^3 + bx^2 + cx + d$ . Then, we call $\alpha$ a root of the polynomial $f(x)$ if and only if $f(\alpha) = 0$ . In particular,

$a \alpha^3 + b \alpha^2 + c \alpha + d = f(\alpha) = 0.$

Lemma 1. For any real number $\alpha$ ,

$x^3 \pm \alpha^3 = (x \pm \alpha) (x^2 \mp \alpha x + \alpha^2).$

Proof. Firstly, we expand the right-hand side to obtain

$\begin{aligned} (x -\alpha) (x^2 +\alpha x + \alpha^2) &= x^3 + (\alpha - \alpha)x^2 + (\alpha^2 - \alpha^2)x + (-\alpha^3) \\ &= x^3 - \alpha^3. \end{aligned}$

Then we replace $\alpha$ with $-\alpha$ to obtain

$\begin{aligned} x^3 + \alpha^3 &= x^3 - (-\alpha^3) \\ &= x^3 - (-\alpha)^3 \\ &= (x -(-\alpha)) (x^2 +(-\alpha) x + (-\alpha)^2) \\ &= (x+\alpha)(x-\alpha x + \alpha^2). \end{aligned}$

Lemma 2. For any real number $\alpha$ , there exist unique real constants $p,q$ such that

$f(x) - f(\alpha) = a (x-\alpha) (x^2 + kx + m) .$

Proof. Making the substitutions and applying Lemma 1,

$\begin{aligned} f(x) - f(\alpha) &= (a x^3 + b x^2 + c x + d) - (a \alpha^3 + b \alpha^2 + c \alpha + d) \\ &= a(x^3 - \alpha^3) + b(x^2-\alpha^2) + c(x-\alpha) + d(1-1) \\ &= a(x-\alpha)(x^2 + \alpha x + \alpha^2) + b(x-\alpha)(x+\alpha) + c(x-\alpha) \\ &= (x-\alpha)( a (x^2 + \alpha x + \alpha^2) + b (x+\alpha) + c ) \\ &= a(x-\alpha)\left[ (x^2 + \alpha x + \alpha^2) + \frac ba (x+ \alpha) + \frac ca \right] \\ &= a(x-\alpha)\left[ x^2 + \alpha x + \alpha^2 + \frac ba x+ \frac ba \alpha + \frac ca \right] \\ &= a(x-\alpha)\left[ x^2 + \left( \alpha + \frac ba \right) x + \left( \alpha^2 + \frac ba \alpha + \frac ca \right) \right] \\ &= a(x-\alpha) ( x^2 + k x + m ), \end{aligned}$

where we set $k := \alpha+ b/a$ and $m := \alpha^2 + b\alpha/a + c/a$ .

Lemma 2 paves the way for the special case of the remainder and factor theorems—the general case follows a factorisation of $x^n - \alpha^n$ by generalising the argument in Lemma 1.

Theorem 1. Given any real number $\alpha$ , there exists a unique polynomial $q(x)$ and a unique real number $r$ , called the remainder, such that

$f(x) = (x-\alpha)q(x) + r.$

Furthermore, $r = f(\alpha)$ —this result is known as the remainder theorem. Finally, this remainder equals zero if and only if $\alpha$ is a root of $f(x)$ —this result is known as the factor theorem. In this case, we say that $x-\alpha$ is a factor of the polynomial $f(x)$ .

Proof. By Lemma 2,

$f(x) = a(x-\alpha) ( x^2 + k x + m ) + f(\alpha),$

yielding $q(x) = a(x^2 + kx + m)$ and $r = f(\alpha)$ .

For uniqueness, set $x = \alpha$ to obtain $f(\alpha) = r$ . Then

$\begin{aligned} (x-\alpha)q(x) &= f(x) - r \\ &= f(x)-f(\alpha) \\ &= a(x-\alpha) ( x^2 + k x + m ). \end{aligned}$

For $x \neq \alpha$ , dividing by $x-\alpha$ yields $q(x) = x^2 + k x + m$ . For a more complete argument regarding uniqueness, see this post.

While we have discussed the result in an overly abstract manner, perhaps it is worth elucidating the theory with an example.

Example 1. Solve the cubic equation $x^3 - 2x^2 - 3x + 4 = 0$ .

Solution. Define $f(x) := x^3 - 2x^2 - 3x + 4$ . By almost-obvious observation,

$f(1) = 1^3 - 2\cdot 1^2 - 3\cdot 1 + 4 = 1 - 2 - 3 + 4 = 0.$

Therefore, by the factor theorem, $x-1$ is a factor of $f(x)$ . Hence, there exist constants $a,b,c$ such that

$\begin{aligned} f(x) &= (x-1)(ax^2 +bx + c) \\ x^3 - 2x^2 - 3x + 4 &= ax^3 + (b-a)x^2 +(c-b)x - c. \end{aligned}$

Comparing the coefficients of $x^3$ and the constant term respectively, $a = 1$ and $c = -4$ . Comparing the coefficient of $x^2$ ,

$b-a = -2 \quad \Rightarrow \quad b = a-2 = 1-2 = -1.$

Therefore,

$x^3 - 2x^2 - 3x + 4 = (x-1)(x^2 - x - 4).$

Since $x^3 - 2x^2 - 3x + 4 = 0$ , we must have either

$x - 1 = 0\quad \text{or} \quad x^2 - x - 4 = 0.$

In the former, $x = 1$ . In the latter, we solve using the quadratic formula: first compute the discriminant $\Delta = (-1)^2 - 4 \cdot 1 \cdot (-4) = 17$ . Then

$\displaystyle x = \frac{-(-1) \pm \sqrt{17}}{2 \cdot 1} = \frac 12 \pm \frac 12 \sqrt{17}.$

Therefore, $x = 1$ or $x = \frac 12 \pm \frac 12 \sqrt{17}$ .

Remark 2. Solutions can take the form of surds.

Example 2. Solve the cubic equation $4x^3 - 3x^2 - 2x + 1 = 0$ .

Solution. We leave a similar-to-Example 1 solution left as an exercise to the reader. By taking advantage of the similar numbers, we divide the equation by $x^3$ :

$\displaystyle 4 - \frac 3x - \frac 2{x^2} + \frac 1{x^3} = 0.$

Now make the substitution $u = 1/x$ to obtain a rather suspect equation:

$\displaystyle 4 - 3u - 2u^2 + u^3 = 0.$

By re-arranging the terms, we obtain the exact same equation as that in Example 1, just with different letters:

$u^3 - 2u^2 - 3u + 4 = 0.$

Therefore, by Example 1, we have $u = 1$ or $u = \frac 12(1 \pm \sqrt{17})$ . Since $u = 1/x \Rightarrow x = 1/u$ , the former yields $x = 1/1 = 1$ , and the latter takes a bit more work by rationalising the denominator:

$\begin{aligned} x = \frac 1u = \frac 2{1 \pm \sqrt{17}} &= \frac 2{1 \pm \sqrt{17}} \cdot \frac{1 \mp \sqrt{17}}{1 \mp \sqrt{17}} \\ &= \frac 2{1^2 - 17} \cdot (1 \mp \sqrt{17})\\ &= -\frac 1{8} \cdot (1 \mp \sqrt{17}) \\ &= \frac 1{8} \cdot (-1 \pm \sqrt{17}). \end{aligned}$

As mentioned, there is a cubic formula, and it also turns out there is a quartic formula, i.e. a general formula to solve the equation

$a_4 x^4 + a_3 x^3 + a_2 x^2 + a_1 x + a_0 = 0,$

where $a_4 \neq 0$ . How about a quintic formula?

$a_5 x^5 + a_4 x^4 + a_3 x^3 + a_2 x^2 + a_1 x + a_0$

The answer turns out to be no, and requires a study of generalised algebra to prove.

For now, we deal with more contained higher powers, of the form $(a+b)^n$ , called binomials.

—Joel Kindiak, 28 May 26, 1358H

May 28, 2026
The Gambler’s Ruin
Using our basic ideas of probability, let’s discuss a simple yet deep problem in probability theory, which laid the foundational ideas for quantitative finance; the gambler’s ruin.

Disclaimer. This page does not promote or encourage online gambling in any form. The information presented here is for educational purposes only.

The gambler’s ruin can be formulated simply. Let $L < M < W$ be pre-determined integers and $X_t$ denote your wealth, in dollars, at time $t$ :
- You start with $X_0 = M$ .
- On each turn , you toss a fair coin.
  - If the coin lands ‘Head’, you win $\$1$ , so that $X_t = X_{t-1} + 1$ .
  - If the coin lands ‘Tail’, you lose $\$1$ , so that $X_t = X_{t-1} - 1$ .
- The game ends at the first time $\tau$ when $X_{\tau} \in \{L, W\}$ .
Suppose the simplest case $L = 0$ , $M = 1$ , and $W = 2$ .

Example 1. What is the probability that $X_1 = 2$ ? What about the result $X_1 = 0$ ?

Solution. By definition, $X_0 = M = 1$ . Since the coin is fair,

$\begin{aligned} \mathbb P(X_1 = 2) &= \mathbb P(\text{Head}) = 1/2, \\ \mathbb P(X_1 = 0) &= \mathbb P(\text{Tail}) = 1/2. \end{aligned}$

In Example 1, getting a ‘Head’ yields $X_1 = 2 = W$ , and getting a ‘Tail’ yields $X_0 = 0 = L$ . No matter the outcome, we have $X_1 \in \{L, W\}$ . Furthermore, $X_0 = M \notin \{L, W\}$ . Therefore, rather trivially, the game ends at time $\tau = 1$ .

Remark 1. The fun really begins when we vary our setup. In fancy quantitative financial language, we call $\tau$ a stopping time for the time series process. The quantity $L$ models our stop loss, $M$ models our capital, and $W$ models our take profit. The coin being fair is a simplistic starting point for discussion—in the real world the price movements are far less predictable than we would hope for.

Now let’s set $L = 0$ , $M = 2$ , and $W = 4$ .

Example 2. What is the probability that $X_2 = 4$ ? How about $X_2 = 2$ ?

Solution. Since we are dealing with multiple coin tosses, it gets exponentially difficult to intuit the solution. Since this process involves consecutive time steps, we can visualise its process using a probability tree diagram. As its name suggests, it is a diagram that resembles a tree that is described using probabilities.

The number $1/2$ on the lines denote the probability that the wealth increases or decreases by $1 respectively. There are four possible paths that the wealth can take, and they all occur with equal probability. Denote the sample space by the wealth evolutions:

$\Omega = \{ ( 2,3,4 ), ( 2,3,2 ), ( 2,1,2 ), ( 2,1,0 ) \}$

We note that $X_2 = 4$ corresponds to wealth evolutions of the form $(*, *, 4)$ . Since the only outcome when this result takes place is $(2,3,4)$ , we write

$\{X_2 = 4\} = \{ (2, 3, 4) \}$

Therefore,

$\displaystyle \mathbb P(X_2 = 4) = \frac{|\{ (2,3,4) \}|}{|\Omega|} = \frac 14.$

Similarly, $X_2 = 2$ corresponds to wealth evolutions of the form $(*, *, 2)$ . Since there now are two such wealth evolutions, so that

$\{X_2 = 2\} = \{ (2,3,2), (2,1,2) \}$ ,

the required probability is

$\displaystyle \mathbb P(X_2 = 2) = \frac{|\{ (2,3,2), (2,1,2) \}|}{|\Omega|} = \frac 24 = \frac 12.$

Now suppose $L = 0$ , $M = 2$ , $W = 5$ .

Example 3. Evaluate $\mathbb P(X_3 = 3)$ .

Solution. We extend our probability tree diagram as follows.

This time, the “total number of possibilities” argument fails. Why? Because the game ended for wealth trajectory $(2,1,0)$ .

Nevertheless, we can still solve the problem. Each coin toss doesn’t impact the next one, so that the sample space

$\{ (\mathrm H, \mathrm H), (\mathrm H, \mathrm T),(\mathrm T, \mathrm H), (\mathrm T, \mathrm T) \}$

contains outcomes that all share the same probability. For instance,

$\displaystyle \mathbb P( (\mathrm H, \mathrm T) ) = \frac 14 = \frac 12 \cdot \frac 12 = \mathbb P(\mathrm H) \cdot \mathbb P(\mathrm T).$

In fancier language, we say that the coin tosses are independent of each other. This logic holds no matter the coin toss; letting $\xi_1,\xi_2,\xi_3 \in \{\mathrm H, \mathrm T\}$ denote individual coin tosses, we have

$\mathbb P( (\xi_1,\xi_2,\xi_3) ) = \mathbb P(\xi_1) \cdot \mathbb P(\xi_2) \cdot \mathbb P(\xi_3).$

In particular, each path with length $3$ will automatically have a probability of $1/8$ of occurring. Notice that there are three paths that get us to $X_3 = 3$ :

$\{X_3 = 3\} = \{ (2,3,4,3), (2,3,2,3), (2,1,2,3) \}.$

Therefore, $\mathbb P(X_3 = 3) = 3 \times 1/8 = 3/8$ .

Remark 2. When presenting our work, you do not need to be so long-winded as per this writeup. As long as you communicate your thought process through your calculation, you can obtain full credit.

Example 4. How would the answer in Example 3 change if we are working with a biased coin with probability $p$ ? That is, $\mathbb P(\mathrm H) = p$ and $\mathbb P(\mathrm H) = 1-p$ . Give your answer in terms of $p$ .

Solution. We modify our probability tree diagram as follows.

The visual is basically the same, and we would follow the same trajectories

$\{X_3 = 3\} = \{ (2,3,4,3), (2,3,2,3), (2,1,2,3) \}.$

However, each step has as slightly different probability to compute:

$\begin{aligned} \mathbb P((2,3,4,3)) &= p \cdot p \cdot (1-p) = p^2(1-p), \\ \mathbb P((2,3,2,3)) &= p \cdot (1-p) \cdot p = p^2(1-p), \\ \mathbb P((2,1,2,3)) &= (1-p) \cdot p \cdot p = p^2(1-p). \end{aligned}$

Since all paths are distinct, and turn out to have the same probabilities, we can sum the probabilities up as follows:

$\begin{aligned} \mathbb P(X_3 = 3) &= \mathbb P(\{ (2,3,4,3), (2,3,2,3), (2,1,2,3) \}) \\ &= \mathbb P( (2,3,4,3) ) + \mathbb P( (2,3,2,3) ) + \mathbb P( (2,1,2,3) ) \\ &= p^2(1-p) + p^2(1-p) + p^2(1-p) \\ &= 3p^2(1-p). \end{aligned}$

Remark 3. This process can be generalised to what is known in probability and statistics as the binomial distribution. The multiplication procedure is known as the multiplication principle, used to calculate probabilities of in-sequence events. The addition procedure is known as the addition principle, used to calculate probabilities of disjoint events.

Example 5. Toss a biased coin with probability $p$ three times. What is the probability that you get two ‘Heads’? What do you notice?

Solution. Draw the probability tree diagram as follows.

By following the chosen trajectories, the required probability is

$\begin{aligned} \mathbb P(\#(\mathrm H) = 2) &= p \cdot p \cdot (1-p) + p \cdot (1-p) \cdot p + (1-p) \cdot p \cdot p \\ &= p^2 (1-p) + p^2 (1-p) + p^2 (1-p) \\ &= 3p^2 (1-p). \end{aligned}$

The answer in Example 5 matches the answer in Example 4. This observation should not be a surprise—the wealth trajectories in Example 4 are directly determined by the coin tosses in Example 5 and vice versa.

Remark 3. These example suggest that studying probability is less inherently about the underlying random process, but more so about the distributions of the possible outcomes.

Example 6. You have 4 black socks and 6 white socks in your drawer. You choose two socks at random, without replacement (obviously). What is the probability that you get two socks of the same color?

Proof. We draw the following probability tree diagram.

Notice that if the first sock chosen is black, then the probability of the second sock chosen would change (since there is one less black sock present). Following the two routes of matching-coloured socks, the required probability is given by

$\displaystyle \mathbb P(\text{matching colors}) = \frac 4{10} \cdot \frac 39 + \frac 6{10} \cdot \frac 59 = \frac{7}{15}.$

Of course, the coin toss is one of the simpler examples to begin our discussions on probability theory. Another common toy that we use to discuss probabilities would be that of dice. A fair six-sided die has 6 faces: 1, 2, 3, 4, 5, 6, each occurring with equal probability $1/6$ .

Example 7. Roll two fair die simultaneously. Assume that the outcomes of the dice are independent. What is the probability that the numbers sum to $8$ ? Which integer $n$ is the most probability sum?

Solution. Let $X_1$ denote the outcome of the first die and $X_2$ denote the outcome of the second die. We want to evaluate $\mathbb P(X_1 + X_2 = 8)$ . We can illustrate the outcome of sums using the possibility diagam below.

For example, if $X_1 = 3$ and $X_2 = 5$ , then $X_1 + X_2 = 8$ . Since the dice are independent, each cell has a probability of $1/6 \times 1/6 = 1/36$ , which also agrees intuitively with the possibility diagram. Since there are 5 cells whose sum yields 8, the required probability is

$\mathbb P(X_1 + X_2 = 8) = 5 \times 1/36 = 5/36.$

Among all possible sums, 7 has the largest number of cells, namely 6. Therefore, the required integer is $n = 7$ , and in fact,

$\mathbb P(X_1 + X_2 = 7) = 6 \times 1/36 = 1/6.$

Remark 4. If the dice are unfair, we can still manually compute $\mathbb P(X_1 + X_2 = n)$ by adding up the separate cases

$\mathbb P(X_1 + X_2 = n) = \mathbb P(X_1 = k) \cdot \mathbb P(X_2 = n-k)$

one after another for $k = 1,\dots, 6$ . Of course if $n-k < 1$ or $n-k > 6$ , then $\mathbb P(X_2 = n-k) = 0$ . This process is known as taking the discrete convolution between two probability mass functions.

If there is one topic that I insist on discussing applications, it would most certainly be probability. I do think it is a good idea to illustrate probability in the real world. To do that, I’ll need to discuss matrices, which also happens to be our final topic in O-Level mathematics. Of course, we can extend these ideas at great length into the study of stochastic processes and Markov chains, but let’s just touch base with some simple examples to augment our understanding.

—Joel Kindiak, 18 Mar 26, 1539H
May 28, 2026
Basic Probabilistic Events
Previously, we looked at evaluating and summarising data. Data is mostly randomly generated, though not necessarily in a purely unpredictable manner, by chance. It is of our interest, therefore, to now turn to games of chance.

Consider a fair 6-sided die with possible values $1,2,3,4,5,6$ (plural: dice). We collect these outcomes into a set, defined $\{ 1,2,3,4,5,6 \}$ , and collect sub-collections of these outcomes also as sets.

Example 1. Write down the sub-collection of even-numbered outcomes of the die.

Solution. Since the even-numbered outcomes of the die are $2,4,6$ , the required subset is $\{ 2,4,6\}$ .

Definition 1. Let $K, L$ be sets. We write:
- $K = L$ if the two sets have exactly the same elements,
- $K \subseteq L$ and call $K$ a subset of $L$ if $K$ is a sub-collection of $L$ ,
- $K \not\subseteq L$ if $K$ is not a subset of $L$ .
For instance, in Example 1, we have $\{2,4,6\} \subseteq \{1,2,3,4,5,6\}$ . We can illustrate this relationship using a Venn diagram.

Therefore, we will use sets in order to model chance. However, sets alone don’t get at the full picture. We also need to quantify certainty. Intuitively, since there are 3 even numbers in the set $\{ 1,2,3,4,5,6 \}$ , then the probability that we roll an even number on the die should be $3/6 = 1/2$ . We formalise this idea using sets.

For any (finite) set $K$ , let $|K|$ denote the number of elements in the set. For instance, $|\{ 1,2,3,4,5,6 \}| = 6$ and $|\{2,4,6\}| = 3$ .

Definition 2. Let $\Omega$ be a set, which we usually call the universal set of discourse. Given $K \subseteq \Omega$ , define the uniform probability of $K$ by

$\displaystyle \mathbb P(K) = \frac{|K|}{|\Omega|}.$

Using the language of Definition 2, the probability of rolling an even number on the die is displayed as

$\displaystyle \mathbb P(\{2,4,6\}) = \frac{|\{2,4,6\}|}{|\{1,2,3,4,5,6\}|} = \frac 36 = \frac 12.$

Denote $\Omega := \{1,2,3,4,5,6\}$ for simplicity, because we are lazy. Recall that $|\Omega| = 6$ .

Remark 1. Observe that $\Omega \subseteq \Omega$ , and $\displaystyle \mathbb P(\Omega) = \frac{|\Omega|}{|\Omega|} = 1$ . Furthermore, some education systems denote the universal set by the following alternate notation and use them in their assessments: $U, \varepsilon, \mathcal{E}, \xi, \mathscr{E}$ . In the spirit of learning set-formulated probability theory, we will not follow such practice in these blog posts.

Example 2. What is the probability that we would roll a multiple of $3$ ?

Solution. The multiples of $3$ are described by the subset $\{3,6\} \subseteq \Omega$ . Therefore, the required probability is

$\displaystyle \mathbb P(\{3,6\}) = \frac{ |\{3,6\}| }{ |\Omega| } = \frac 26 = \frac 13.$

Example 3. What is the probability that we would roll an even multiple of $3$ ?

Solution. The even numbers are given by the subset $\{2,4,6\}$ , and the multiples of $3$ are given by the subset $\{3,6\}$ . The common number is $6$ , and therefore the subset of $\Omega$ that contains all even multiples of $3$ is $\{6\}$ . Therefore, the required probability is

$\displaystyle \mathbb P(\{6\}) = \frac{|\{6\}|}{|\Omega|} = \frac 16.$

To capture the idea of “common elements”, we use the notion of the intersection. We can illustrate this common-ness using another Venn diagram.

In order to do that, we need to introduce the idea of “membership”.

Definition 3. Let $K$ be a set. We write $x \in K$ to mean that $x$ belongs to $K$ . In this case, we say that $x$ is an element of $K$ . We write $x \notin K$ to mean that $x$ does not belong to $K$ .

For instance, if $K = \{2,4,6\}$ , then $2 \in K$ and $3 \notin K$ . Furthermore, we can write $K$ in set-builder notation:

$K = \{x \in \{1,2,3,4,5,6\} : x\ \text{is even}\}.$

Definition 4. Let $K, L$ be subsets of $\Omega$ . We call the sub-collection $K \cap L$ of common elements the intersection of $K$ and $L$ . Formally, we define this intersection by

$K \cap L := \{x \in \Omega : x \in K\ \text{and}\ x \in L\}.$

For example $\{2,4,6\} \cap \{3,6\} = \{6\}$ .

Example 4. What is the probability that we would roll a number that is both odd and even?

Solution. The subset of odd numbers is $\{1,3,5\}$ and the subset of even numbers is $\{2,4,6\}$ . There…are no numbers in $1,2,3,4,5,6$ that belong to both subsets. The required subset is empty: $\{\}$ . In the language of Definition 4,

$\{1,3,5\} \cap \{2,4,6\} = \{\ \}.$

Therefore, the required probability is

$\displaystyle \mathbb P(\{\ \}) = \frac{|\{\ \}|}{|\Omega|} = \frac 06 = 0.$

Remark 2. We denote $\emptyset := \{\ \}$ , motivated by the observation $|\{\ \}| = 0$ . Furthermore, we say that $K, L$ are mutually disjoint since $K \cap L = \emptyset$ .

Example 5. What is the probability that we would roll a number that is either even or a multiple of $3$ ?

Solution. If we require a number to be at least one of these criterion, we allow it to be taken from either of the subsets $\{2,4,6\}$ or $\{3,6\}$ , then the desired subset would be $\{2,3,4,6\}$ . We can illustrate this “collaboration” using another Venn diagram.

Therefore, the required probability is

$\displaystyle \mathbb P(\{2,3,4,6\}) = \frac{|\{2,3,4,6\}|}{|\Omega|} = \frac 46 = \frac 23.$

Definition 5. Let $K, L$ be subsets of $\Omega$ . We call the sub-collection $K \cap L$ of “collaborated” elements the union of $K$ and $L$ . Formally, we define this union by

$K \cup L := \{x \in \Omega : x \in K\ \text{or}\ x \in L\}.$

For example $\{2,4,6\} \cup \{3,6\} = \{2,3,4,6\}$ .

Example 6. What is the probability that we would roll a number that is not a multiple of $3$ ?

Solution. By accepting all elements of $\{1,2,3,4,5,6\}$ that are not multiples of $3$ , the desired subset is $\{1,2,4,5\}$ . We can visualise this subset, once again, using a Venn diagram.

Therefore, the required probability is

$\displaystyle \mathbb P(\{1,2,4,5\}) = \frac{|\{1,2,4,5\}|}{|\Omega|} = \frac 46 = \frac 23.$

Definition 6. For any $K \subseteq \Omega$ , define the complement of $K$ by

$K' := \Omega \backslash K := \{x \in \Omega : x \notin K\}.$

For example, $\{1,2,4,5\} = \{3,6\}'$ .

At this point, alarm bells should ring, since by Remark 1 and Example 2,

$\displaystyle \mathbb P(\{1,2,4,5\}) + \mathbb P(\{3,6\}) = \frac 23 + \frac 13 = \mathbb P(\{1,2,3,4,5,6\}).$

Furthermore, we notice that

$\begin{aligned} \{1,2,4,5\} \cup \{3,6\} &= \{1,2,3,4,5,6\}, \\ \{1,2,4,5\} \cap \{3,6\} &= \emptyset. \end{aligned}$

That is, we can add probabilities of unions of mutually exclusive subsets.

Theorem 1. Let $K, L$ be mutually disjoint subsets of $\Omega$ . Then

$\mathbb P(K \cup L) = \mathbb P(K) + \mathbb P(L).$

Proof. Since $K, L$ are mutually disjoint, every element in $K \cup L$ belongs either to $K$ and not $L$ , or $L$ and not $K$ .

Therefore, $|K \cup L|$ must equal $|K| + |L|$ , and hence,

$\begin{aligned} \mathbb P(K \cup L) &= \frac{|K \cup L|}{|\Omega|} \\ &= \frac{|K| + |L|}{|\Omega|} \\ &= \frac{|K|}{|\Omega|} + \frac{|L|}{|\Omega|} \\ &= \mathbb P(K) + \mathbb P(L). \end{aligned}$

Remark 3. This property holds for any number of mutually disjoint subsets:

$\mathbb P(K_1 \cup \cdots \cup K_n) = \mathbb P(K_1) + \cdots + \mathbb P(K_n)$

whenever each $K_i \cap K_j = \emptyset$ whenever $i \neq j$ . This result is called the (finite or countable) additivity property of probability.

Corollary 1. Given $K \subseteq \Omega$ ,

$\mathbb P(K') = 1 - \mathbb P(K).$

Proof. By definition, $K \cap K' = \emptyset$ and $K \cup K' = \Omega$ . By Remark 1 and Theorem 1,

$\mathbb P(K) + \mathbb P(K') = \mathbb P(K \cup K') = \mathbb P(\Omega) = 1.$

Therefore, $\mathbb P(K') = 1 - \mathbb P(K)$ .

Remark 4. In particular, $\emptyset = \Omega'$ so that

$\mathbb P(\emptyset) = \mathbb P(\Omega') = 1 - \mathbb P(\Omega) = 1-1 = 0.$

Example 7. Given subsets $K, L$ , not necessarily disjoint, show that

$\mathbb P(K \cup L) = \mathbb P(K) + \mathbb P(L) - \mathbb P(K \cap L).$

Solution. Consider the Venn diagram below for illustrative purposes.

Given $x \in K \cup L$ , there are two non-overlapping cases:
- $x \in K$
- $x \in L$ and $x \notin K$ .
Denoting $L\backslash K := L \cap K'$ for brevity,

$\begin{aligned} K \cup (L \backslash K)&= K \cup L , \\ K \cap (L \backslash K) &= \emptyset. \end{aligned}$

By Theorem 1,

$\mathbb P(K \cup L) = \mathbb P(K) + \mathbb P(L \backslash K).$

On the other hand, we observe that given $x \in L$ , either $x \in K$ or $x \notin K$ . Refer to the zoomed-in Venn diagram below.

Therefore,

$\begin{aligned} (K \cap L) \cup (L \backslash K)&= L , \\ (K \cap L) \cap (L \backslash K) &= \emptyset. \end{aligned}$

By Theorem 1 again,

$\mathbb P(K \cap L) + \mathbb P(L\backslash K) = \mathbb P(L).$

Making $\mathbb P(L\backslash K)$ the subject of the equation,

$\mathbb P(K \cup L) = \mathbb P(K) + \mathbb P(L) - \mathbb P(K \cap L).$

Therefore, we use set notation to describe our intuitive notions of probability. We can formalise these ideas with far more advanced tools, but we shall relegate that rabbit hole as an exercise for the keen reader. We keep these ideas simple for now.

Next time, we solve some simple problems involving probability.

—Joel Kindiak, 18 Mar 26, 1509H
May 27, 2026
Measures of Spread
In the mock data below, the scores of a class test (total score: 10) for two classes, each with 20 students, are plotted in the dot diagram below.

Which of the two classes did better?

This question is vague. What do we mean by “better”? We would usually like to make this decision according to some summarised data (i.e. statistics). Previously, we have learned that the most computationally convenient statistic to describe the centre of the data is the mean, given by the formula

$\displaystyle \bar x = \frac{\Sigma x}{n}.$

Running the calculations, Class Epsilon has mean 7.15 and Class Delta has mean 7.45.

Using the mean as our measurement of aura, we might conclude that Class Epsilon is stronger than Class Delta in the exam.

But you can—and should—object: Class Delta has not just one, but two students who scored full marks! Furthermore, we notice that it seems like Class Delta’s mean score is lowered due to some poor-performing outliers. That is, Class Delta has a larger spread of data when compared to the data of Class Epsilon.

The tool that statisticians use is called the standard deviation. The intuitive idea is that we want to find the average of the deviations of the data points from the sample mean. To ensure that this calculation is mathematically convenient, we square these deviations.

Definition 1. For each data point $x$ , determine its squared deviation by $\varepsilon := (x - \bar x)^2$ . The sample variance $s_x^2$ is then simply defined by $s_x^2 := \bar \varepsilon$ , and the standard deviation is defined by $s_x := \sqrt{\bar{\varepsilon}}$ .

Remark 1. This squared-deviation idea is responsible for linear regression—a fundamental algorithm in modern machine learning.

Theorem 1. The formula to compute the standard deviation $s_x$ of the sample is given by

$\displaystyle s_x := \sqrt{ \frac{ \Sigma x^2 }{n} - \left( \frac{\Sigma x}{n} \right)^2 }.$

Proof. Denote the data set by $\{x_1, x_2,\dots, x_n\}$ . Compute the squared deviations by

$\begin{aligned} \varepsilon_1 &= (x_1 - \bar x)^2 = x_1^2 - 2 \cdot x_1 \cdot \bar x + \bar x^2, \\ \varepsilon_2 &= (x_2 - \bar x)^2 = x_2^2 - 2 \cdot x_2 \cdot \bar x + \bar x^2, \\ \vdots & \quad \quad \quad \vdots \quad \quad \quad \quad \quad \quad \quad \vdots \\ \varepsilon_n &= (x_n - \bar x)^2 = x_2^2 - 2 \cdot x_n \cdot \bar x + \bar x^2. \end{aligned}$

By definition, $\bar{\varepsilon} = (\varepsilon_1 + \varepsilon_2 + \dots + \varepsilon_n)/n$ . Therefore,

$\begin{aligned} n \cdot \bar{\varepsilon} &= \varepsilon_1 + \varepsilon_2 + \cdots + \varepsilon_n \\ &= (x_1 - \bar x)^2 + (x_2 - \bar x)^2 + \cdots + (x_n - \bar x)^2 \\ &= (x_1^2 + x_2^2 + \cdots + x_n^2) - 2 \cdot \underbrace{ (x_1 + x_2 + \cdots + x_n) }_{n \bar x} \cdot\, \bar x + n\bar x^2 \\ &= (x_1^2 + x_2^2 + \cdots + x_n^2) - 2 \cdot n \bar x^2 + n\bar x^2 \\ &= (x_1^2 + x_2^2 + \cdots + x_n^2) - n\bar x^2. \end{aligned}$

Dividing by $n$ on all sides,

$\displaystyle \bar{\varepsilon} = \frac{x_1^2 + x_2^2 + \cdots + x_n^2}{n} - \bar x^2 = \frac{\Sigma x^2}{n} - \left( \frac{ \Sigma x }{n} \right)^2.$

Taking square roots,

$\displaystyle s_x = \sqrt{\frac{\Sigma x^2}{n} - \left( \frac{ \Sigma x }{n} \right)^2}.$

Remark 2. If we had a collection of paired data $\{(x_1,y_1),\dots, (x_n,y_n)\}$ , we can compute the sample covariance $c_{x,y}$ between the data set $\{x_1,\dots, x_n\}$ and $\{y_1,\dots, y_n\}$ by

$\displaystyle c_{x,y} = \frac{ \Sigma (x - \bar x)(y - \bar y)}{n}.$

Observe that $\bar{\varepsilon} = c_{x,x}$ . In this regard, the sample covariance generalises the sample variance. Here, the covariance measures the extent of connection between the two data sets.

Example 1. Using the standard deviation as the measure of spread, Class Epsilon has a standard deviation of approximately 2.01 and Class Delta has a standard deviation of approximately 1.28.

Since the latter is larger, Class Epsilon has a larger spread of scores than Class Delta.

In layperson’s terms, the scores of students in Class Epsilon are more “bunched” together, and thus we can say that the students in Class Epsilon perform more consistently than the students in Class Delta.

However, we should object to this conclusion once again: why did we use the mean and the standard deviation? These statistics are sensitive to outlier data, be it exceedingly high-performing students or exceedingly low-performing students. Why not use the median?

We can, and should: in this case, Class Epsilon has a median score of 7 and Class Delta has a median score of 7. Not helpful. How would we measure the spread of the data?

Definition 2. Sort the dataset into a non-decreasing order

$x_1 \leq x_2 \leq \cdots \leq x_n.$

Denote the:
- minimum by $Q_0$
- the median by $Q_2$ ,
- the maximum by $Q_4$ ,
Define the range of the data set by $Q_4 - Q_0$ .

Obviously, $Q_0 = x_1$ and $Q_4 = x_n$ . If $n$ is even, then

$\displaystyle Q_2 = \textstyle \frac 12 \cdot (x_{n/2} + x_{(n+1)/2}).$

If $n$ is odd, then $Q_2 = x_{(n+1)/2}$ .

Remark 3. The latter Q denotes the word ‘quartile’. Therefore, the minimum can be thought of as the “zeroth” quartile, the median as the second quartile, and the maximum as the fourth quartile.

Example 2. The range in Class Epsilon is 8 and the range in Class Delta is 5. Therefore, there is larger spread in Class Epsilon than Class Delta.

But you should, once again, object to this conclusion. This measure of spread accounts for the vast outliers! Can we obtain a measure of spread that disregards outliers, just like how the median disregards outliers?

Definition 3. Suppose a data set $\{x_1,\dots, x_n\}$ where $n$ is odd, and it has a median of $x_{(n+1)/2}$ . Define:
- the lower quartile $Q_1$ by the median of the data set $\{x_1,\dots, x_{(n+1)/2}\}$ ,
- the upper quartile $Q_3$ by the median of the data set $\{x_{(n+1)/2},\dots, x_n\}$ ,
- the interquartile range by $Q_3 - Q_1$ .
Question 1. How would you define the interquartile range if $n$ were even?

Example 3. By definition,
- Class Epsilon has a lower quartile of 6.5 and upper quartile of 8.5, and hence, an interquartile range of 2.
- Class Delta has a lower quartile of 7 and upper quartile of 8.5, and hence, an interquartile range of 1.5.
Since the former is larger than the latter, we conclude that there is larger spread in Class Epsilon than Class Delta.

We can visualise the ordered information using box-and-whisker diagrams. The endpoints denote the minimum and maximum, the box denotes the interquartile range, and the centre line denotes the median. We can plot both box-and-whisker diagrams below.

Therefore, the box-and-whisker diagram helps us visualise the data in a sufficiently meaningful manner. The distinct vertical lines denote $Q_0, Q_1, Q_2, Q_3, Q_4$ respectively.

Remark 4. For Class Delta, $Q_1 = Q_2 = 7$ , explaining why it appears to have only four lines instead of the expected five.

I have one more idea to discuss—large data sets. So far, our class sizes are small, just 20 sample points. However, if we consider all of the students in the school, we would need to deal with large data sets, say 1000. Suppose also the total score of the assessment is 100, rather than 10. How do we interpret such data? We can use a cumulative frequency diagram.

The $y$ -axis denotes the number of data points, with $0 \leq y \leq 1000$ . The $x$ -axis denotes the score of the assessment, out of 100. The curve $y = f(x)$ plots the following information: $(x,y)$ lies on the curve precisely when $y$ students scored at most $x$ marks in the assessment.

Remark 5. Cumulative frequency diagrams, being discrete, tend to be more jagged than what we see displayed. Nevertheless, this smooth approximation turns out to be mostly accurate relative to our original data.

Example 4. Estimate the median, range, and interquartile range of the data. Use your estimates to represent the data using a box-and-whisker diagram.

Solution. It is clear that $Q_0 = 0$ and $Q_4 = 100$ , so that the range is 100. We estimate $Q_1,Q_2,Q_3$ as follows.

Therefore, we estimate the median to be 69 marks, and the interquartile range to be 18 marks.

Example 5. Using intervals of 10 marks each, estimate the mean and the standard deviation of the data.

Solution. We leave it as an exercise to tabulate the following summarised data.

In particular,

$n = 100,\quad \Sigma x = 68\ 500,\quad \Sigma x^2 = 4\ 897\ 000.$

Therefore, we estimate the mean of the data to be

$\displaystyle \bar x = \frac{68\ 550}{1000} = 68.55,$

and by Theorem 1, the standard deviation of the data to be

$\displaystyle s_x = \sqrt{\frac{4\ 897\ 000}{1000} - \left(\frac{68\ 550}{1000}\right)^2} \approx 14.1.$

Remark 6. In the era of Microsoft Excel and Python, software can compute means and standard deviations of large datasets without using the grouped data approach. They can handle millions of computations—we can’t.

Would you still object? In the spirit of inquiry and scepticism, why not? However, I think my job here is done—I have introduced the key calculations required in secondary school statistics!

Just for fun, for those of you curious about quantitative finance, where you use mathematics and statistics to possibly win the stock market or even the cryptocurrency market. Individuals working in these fields, called quants, use the Sharpe ratio, defined by $\bar x / s_x$ , to determine the riskiness of an asset. Another measure of riskiness known as the mean-variance, defined by $\bar x - s_x^2$ , helps quants optimise the proportion of their assets in order to minimise risk.

Finally, for Singaporeans who (or whose parents) remember the notion of a t-score in the high-stakes Primary School Leaving Examinations (PSLE), the student’s final score for a particular subject is computed using the formula

$\displaystyle 50 + 10 \times \frac{x - \bar x}{s_x},$

and these numbers are summed over the four subjects: English, Mother Tongue, Mathematics, and Science. My PSLE score was 242—make of that as you will. Contrary to popular expectation, I did *not* get A* for Mathematics due to less-than-academically-important reasons.

All of these statistical analyses arise from random phenomenon, and are general grasps of otherwise un-graspable realities. But can we at least quantify such uncertainty? Our attempt at doing so is probability theory, and we will visit this idea briefly the next time.

—Joel Kindiak, 18 Mar 26, 1435H
May 26, 2026
Baby Quantitative Finance

In this post, we will explore some basic notions in quantitative finance.

More specifically, buying and selling stocks.

Suppose 1 unit of a stock KMATH costs $1 at time t = 0. Assume negligible trading fees.

Problem 1. At time t = 0, you buy 200 units of KMATH. What is the value of your position?

(Click for Solution)

Solution. The value of $200$ units of KMATH is

$200 \times \$1 = \$200.$

Problem 2. Suppose at time t = 1, the price per unit of KMATH increased by 10%. What is the value of your position at t = 1?

(Click for Solution)

Solution. The value of the position has increased by $10\%$ , that is,

$10\% \times \$200 = 0.1 \times \$200 = \$20.$

Therefore, the position has a new value of

$\$200 + \$20 = \$220.$

Alternate Solution. If the initial position has a value of $\$P$ and the price increased by a percentage of $r_1$ , then the position increases by the value

$r_1 \times \$P = \$(P \times r_1).$

Therefore, the position would have a new value of

$\begin{aligned} \$P + \$(P \times r_1) &= \$P + r_1 \times \$P \\ &= (1 + r_1) \times \$P \\ &= \$((1 + r_1) \times P).\end{aligned}$

In particular, setting $P = 200$ and

$\displaystyle r_1 = 10\% = \frac{ 10 }{ 100 } = 0.1$

in Problem 2 yields a new value of

$\$((1 + 0.1) \times 200) = \$220.$

Problem 3. Suppose at time t = 2, the price per unit of KMATH decreased by 10%. What is the overall change in your position from t = 0 to t = 2? How about its overall percentage change? Is your position in a profit or a loss?

(Click for Solution)

Solution. We will use the calculation in Remark 1. Let

$\displaystyle r_2 = -10\% = -\frac{ 10 }{ 100 } = -0.1$

denote the percentage decrease of the price per unit of KMATH from $t = 1$ to $t = 2$ .

By Remark 1, the new position at $t = 1$ is $\$( (1+r_1) \times 200 )$ . Therefore, the new position at $t = 2$ has a value of

$\$ ( (1 + r_2) \times ((1 + r_1) \times 200) ) = \$ ( (1 + r_1) \times (1+r_2) \times 200 ).$

Substituting $r = 0.1$ , the new position has a value of

$\begin{aligned} \$ ( (1 + r_1) \times (1+r_2) \times 200 )&= \$ ( (1 + 0.01) \times (1-0.01) \times 100 ) \\ &= \$ ( (1 - 0.1^2) \times 200 ) \\ &= \$ ( (1 - 0.01) \times 200 ) \\ &= \$ 198. \end{aligned}$

The overall percentage change is

$\displaystyle -0.01 \times 100\% = -1\%.$

Since the percentage change is negative, our position currently sits in a loss.

Problem 4. For any positive integer n, let r_n denote the percentage change in your position from t = n – 1 to t = n. Show that the overall percentage change $r$ between t = 0 and t = n is calculated by

1 + r = (1 + r₁) × (1 + r₂) × … × (1 + r_n).

(Click for Solution)

Solution. Let $P_n$ denote the value of the position at time $t = n$ . Applying the alternate solution in Problem 2 repeatedly,

$\begin{aligned} \$P_n &= (1 + r_n) \times \$P_{n-1} \\ &= (1 + r_n) \times (1 + r_{n-1}) \times P_{n-2} \\ &= \, \vdots \\ &= (1 + r_n) \times (1 + r_{n-1}) \times \cdots \times (1 + r_1) \times \$P_0 \\ &= \$ ( (1 + r_n) \times (1 + r_{n-1}) \times \cdots \times (1 + r_1) \times P_0 ). \end{aligned}$

On the other hand, denoting the overall percentage change by $r$ , we have

$\$P_n = \$((1+r) \times P_0)$

Equating the two sides,

$\$ ((1 + r) \times P_0) = \$ ( (1 + r_n) \times (1 + r_{n-1}) \times \cdots \times (1 + r_1) \times P_0 ).$

Dividing by $P_0$ on both sides yields the desired result:

$1 + r = (1 + r_n) \times (1 + r_{n-1}) \times \cdots \times (1 + r_1).$

Problem 5. What is the minimum percentage increase of the price per unit of KMATH from t = 2 to t = 3 required for you to not incur loss?

(Click for Solution)

Solution. Let $r_3$ denote the required percentage increase of the price per unit of KMATH from $t = 2$ to $t = 3$ . By Problem 3, the overall percentage change $r$ is given by

$1 + r = (1 + r_1) \times (1 + r_2) \times (1 + r_3).$

Substituting $r_1 = 0.1$ and $r_2 = -0.1$ , since $1+r_1 > 0$ and $1+r_2 > 0$ , we can divide on both sides to obtain

$\displaystyle 1 + r_3 = \frac{ 1 + r }{ (1 + 0.1) \times (1 - 0.1) } = \frac{ 1 + r }{ 1 - 0.01}.$

Subtracting by $1$ on both sides,

$\displaystyle r_3 = \frac{ 1 + r }{ 1 - 0.01 } - 1.$

Since we do not want to incur loss, the overall percentage change must be non-negative, that is to say, $r \geq 0$ :

$\begin{aligned} r_3 = \frac{ 1 + r }{ 1 - 0.0001 } - 1 \geq \frac{ 1 + 0 }{ 1 - 0.01 } - 1 = \frac{ 1 }{ 99 } > 0.01 = 1\%. \end{aligned}$

In particular, we need more than $1\%$ increase in order to compensate for an overall decrease of $1\%$ .

—Joel Kindiak, 22 Feb 26, 1324H

May 18, 2026
Solving Trigonometric Equations
Problem 1. Determine the value(s) of $x$ such that $\sin(x) = 1/2$ , where
- $0 < x < \pi/2$ ,
- $0 \leq x < 2\pi$ ,
- $x$ is a real number.
We call this process solving a trigonometric equation.

(Click for Solution)

Solution. Using special angles, we recall that for $0 < x < \pi/2$ ,

$\sin(x) = 1/2 \quad \iff \quad x = \pi/6.$

Denote $\alpha := \pi/6$ . Using the extended definitions of $\sin(x)$ , for $0 \leq x < 2\pi$ , we have

$\begin{aligned} \sin(\alpha) &= 1/2,\\ \sin(\pi - \alpha) &= \sin(\alpha) = 1/2,\\ \sin(\pi + \alpha) &= -{ \sin(\alpha) } = -1/2,\\ \sin(2\pi - \alpha) &= -{ \sin(\alpha) } = -1/2. \end{aligned}$

Since only the first two equations work, we have $x = \pi/6$ or

$x = \pi - \pi/6 = 5\pi/6.$

Graphically, we have the following.

Finally, for general $x$ , we recall that $\sin(x)$ is $2\pi$ -periodic:

$\sin(x) = \sin(x+2\pi).$

By repeating this process, given any integer $k$ , we have

$\sin(x + 2k\pi) = \sin(x).$

In particular,

$\sin(\pi/6 + 2k\pi) = \sin(\pi/6) = 1/2.$

Furthermore, for any $x_0 \neq \pi/6$ and $x_0 \neq 5\pi/6$ ,

$\sin(x_0 + 2k\pi) = \sin(x_0) \neq 1/2.$

Therefore, we must have $x = \pi/6 + 2k\pi$ or $x = 5\pi/6 + 2k\pi$ , where $k$ is some integer.

Problem 2. Solve the equation $\cos(x) = 1/2$ , where
- $0 < x < \pi/2$ ,
- $0 \leq x < 2\pi$ .
(Click for Solution)

Solution. Using special angles, we recall that for $0 < x < \pi/2$ ,

$\cos(x) = 1/2 \quad \iff \quad x = \pi/3.$

Using the extended definitions of $\cos(x)$ , for $0 \leq x < 2\pi$ , we have

$\begin{aligned} \cos(\pi/3) &= 1/2,\\ \cos(\pi - \pi/3) &= -{ \cos(\pi/3) } = -1/2,\\ \cos(\pi + \pi/3) &= -{ \cos(\pi/3) } = -1/2,\\ \cos(2\pi - \pi/3) &= \cos(\pi/3) = 1/2. \end{aligned}$

Since only the first and last equations work, we have $x = \pi/3$ or

$x = 2\pi - \pi/3 = 5\pi/3.$

Graphically, we have the following.

Problem 3. Solve the equation $\tan(x) = 1$ , where $0 \leq x < 2\pi$ .

(Click for Solution)

Solution. Using special angles, we recall that for $0 < x < \pi/2$ ,

$\tan(x) = 1 \quad \iff \quad x = \pi/4.$

Using the extended definitions of $\tan(x)$ , for $0 \leq x < 2\pi$ , we have

$\begin{aligned} \tan(\pi/4) &= 1,\\ \tan(\pi - \pi/4) &= -{ \tan(\pi/4) } = -1,\\ \tan(\pi + \pi/4) &= \tan(\pi/4) = 1, \\ \tan(2\pi - \pi/4) &= -{ \tan(\pi/4) } = -1. \end{aligned}$

Since only the first and third equations work, we have $x = \pi/4$ or

$x = \pi + \pi/4 = 5\pi/4.$

Graphically, we have the following.

Problem 4. For each of the cases $r=-1, 0, 1$ , solve the equation $\sin(x) = r$ , where $0 \leq x \leq 2\pi$ .
(Click for Solution)

Solution. Firstly, we work with the case $r = 1$ . Using special angles, we recall that for $0 \leq x \leq \pi/2$ ,

$\sin(x) = 1 \quad \iff \quad x = \pi/2.$

Using the extended definitions of $\sin(x)$ , for $0 \leq x \leq 2\pi$ , we have

$\begin{aligned} \sin(\pi/2) &= 1,\\ \sin(\pi - \pi/2) &= \sin(\pi/2) = 1,\\ \sin(\pi + \pi/2) &= -{ \sin(\pi/2) } = -1, \\ \sin(2\pi - \pi/4) &= -{ \sin(\pi/2) } = -1. \end{aligned}$

Since only the first and third equations work, we have $x = \pi/2$ or

$x = \pi - \pi/2 = \pi/2.$

In fact, we get one and only one solution: $x = \pi/2$ .

Secondly, we work with the case $r = 0$ . Using special angles, we recall that for $0 \leq x \leq \pi/2$ ,

$\sin(x) = 0 \quad \iff \quad x = 0.$

Using the extended definitions of $\sin(x)$ , for $0 \leq x \leq 2\pi$ , we have

$\begin{aligned} \sin(0) &= 0,\\ \sin(\pi - 0) &= \sin(0) = 0,\\ \sin(\pi + 0) &= -{ \sin(0) } = 0, \\ \sin(2\pi - 0) &= -{ \sin(0) } = 0. \end{aligned}$

Now, all equations work, so that we have exactly three solutions: $x = 0, \pi, 2\pi$ .

Finally, we work with the case $r = -1$ . Using the same equations as the case for $r = 1$ , the solutions must be
- $x = \pi + \pi/2 = 3\pi/2$ , or
- $\quad x = 2\pi - \pi/2 = 3\pi/2$ .
In fact, we get one and only one solution: $x = 3\pi/2$ .

Notice that the $x$ -values of interest take the form $k\pi/2$ , where $0 \leq k \leq 4$ is an integer.

Graphically, we have the following.
Problem 5. For each of the cases $r=-1, 0, 1$ , solve the equation $\cos(x) = r$ , where $0 \leq x \leq 2\pi$ .
(Click for Solution)

Solution. We can either adopt the strategy in Problem 4, or simply focus on the special values $k\pi/2$ , where $0 \leq k \leq 4$ is an integer. Observe that

$\begin{aligned} \cos(0) &= \cos(2\pi) = 1, \\ \cos(\pi/2) &= \cos(3\pi/2) = 0, \\ \cos(\pi) &= -1. \end{aligned}$

Therefore, for $0 \leq x \leq 2\pi$ , we must have:
- $\cos(x) = 1 \iff x = 0, 2\pi$ ,
- $\cos(x) = 0 \iff x = \pi/2, 3\pi/2$ ,
- $\cos(x) = -1 \iff x = \pi$ .
Graphically, we have the following.
Problem 6. Fix $-1 \leq r \leq 1$ . Define $0 \leq \alpha \leq \pi/2$ such that

$\sin(\alpha) = |r|.$

Solve the equation $\sin(x) = r$ in terms of $\alpha$ , where $0 \leq x \leq 2\pi$ .
(Click for Solution)

Solution. Using the extended definitions of $\sin(x)$ , for $0 \leq x \leq 2\pi$ , we have

$\begin{aligned} \sin(\alpha) &= |r|,\\ \sin(\pi - \alpha) &= \sin(\alpha) = |r|,\\ \sin(\pi + \alpha) &= -{ \sin(\alpha) } = -|r|,\\ \sin(2\pi - \alpha) &= -{ \sin(\alpha) } = -|r|. \end{aligned}$

We consider the three cases:
- If $r > 0$ , then only the first two equations work: $x = \alpha, \pi -\alpha$ .
- If $r = 0$ , then $\alpha = 0$ : $x = 0, \pi, 2\pi$ by Problem 4.
- If $r < 0$ , then only the last two equations work: $x = \pi + \alpha, 2\pi -\alpha$ .
Graphically, we have the following.
Problem 7. Fix $-1 \leq r \leq 1$ . Define $0 \leq \alpha \leq \pi/2$ such that

$\cos(\alpha) = |r|.$

Solve the equation $\cos(x) = r$ in terms of $\alpha$ , where $0 \leq x \leq 2\pi$ .
(Click for Solution)

Solution. Using the extended definitions of $\cos(x)$ , for $0 \leq x \leq 2\pi$ , we have

$\begin{aligned} \cos(\alpha) &= |r|,\\ \cos(\pi - \alpha) &= -{ \cos(\alpha) } = -|r|,\\ \cos(\pi + \alpha) &= -{ \cos(\alpha) } = -|r|,\\ \cos(2\pi - \alpha) &= { \cos(\alpha) } = |r|. \end{aligned}$

We consider the three cases:
- If $r > 0$ , then only the first and last equations work: $x = \alpha, 2\pi -\alpha$ .
- If $r = 0$ , then $\alpha = 0$ : $x = \pi/2, 3\pi/2$ by Problem 5.
- If $r < 0$ , then only the middle two equations work: $x = \pi - \alpha, \pi +\alpha$ .
Graphically, we have the following.
Problem 8. Fix any real number $r$ . Define $0 \leq \alpha \leq \pi/2$ such that

$\tan(\alpha) = |r|.$

Solve the equation $\tan(x) = r$ in terms of $\alpha$ , where $0 \leq x \leq 2\pi$ .
(Click for Solution)

Solution. Using the extended definitions of $\tan(x)$ , for $0 \leq x \leq 2\pi$ , we have

$\begin{aligned} \tan(\alpha) &= |r|,\\ \tan(\pi - \alpha) &= -{ \tan(\alpha) } = -|r|,\\ \tan(\pi + \alpha) &= { \tan(\alpha) } = |r|,\\ \tan(2\pi - \alpha) &= { \tan(\alpha) } = -|r|. \end{aligned}$

We consider the three cases:
- If $r > 0$ , then only the first and last equations work: $x = \alpha, \pi + \alpha$ .
- If $r < 0$ , then only the middle two equations work: $x = \pi - \alpha, 2\pi -\alpha$ .
- If $r = 0$ , then $\alpha = 0$ : $x = 0, \pi, 2\pi$ .
Graphically, we have the following.
Remark 1. Problems 6–8 detail the ASTC strategy used to solve trigonometric equations. Given a trigonometric equation $f(x) = r$ , the solutions for $0 \leq x \leq 2\pi$ depend on
- the type of trigonometric function $f(x)$ , and
- the sign of $r$ .
Example 1. Given the equation $\sin(x) = r$ , by calculating $0 \leq \alpha \leq \pi/2$ such that $\sin(\alpha) = |r|$ , the candidate solutions in the domain $0 \leq x < 2\pi$ for the equation are

$x = \alpha, \quad x = \pi-\alpha,\quad x = \pi+ \alpha,\quad x = 2\pi -\alpha.$

If $r \geq 0$ , we choose the first two candidates (i.e. ASTC), yielding

$x = \alpha, \quad x = \pi-\alpha.$

If $r < 0$ , we choose the latter two candidates (i.e. ASTC), yielding

$x = \pi + \alpha, \quad x = 2\pi-\alpha.$

Graphically, we have the following.

We then add and subtract enough angles by multiples of $2\pi$ until we recover all desired solutions, as per the solving strategy in Problems 1 and 3.

Example 2. Given the equation $\cos(x) = r$ , by calculating $0 \leq \alpha \leq \pi/2$ such that $\cos(\alpha) = |r|$ , the candidate solutions in the domain $0 \leq x < 2\pi$ for the equation are

$x = \alpha, \quad x = \pi-\alpha,\quad x = \pi+ \alpha,\quad x = 2\pi -\alpha.$

If $r \geq 0$ , we choose the first and last candidates (i.e. ASTC), yielding

$x = \alpha, \quad x = 2\pi-\alpha.$

If $r < 0$ , we choose the middle two candidates (i.e. ASTC), yielding

$x = \pi - \alpha, \quad x = \pi + \alpha.$

We then add and subtract enough angles by multiples of $2\pi$ until we recover all desired solutions, as per the solving strategy in Problems 2 and 5.

Graphically, we have the following.

Example 3. Given the equation $\tan(x) = r$ , by calculating $0 \leq \alpha \leq \pi/2$ such that $\tan(\alpha) = |r|$ , the candidate solutions in the domain $0 \leq x < 2\pi$ for the equation are

$x = \alpha, \quad x = \pi-\alpha,\quad x = \pi+ \alpha,\quad x = 2\pi -\alpha.$

If $r \geq 0$ , we choose the first and third candidates (i.e. ASTC), yielding

$x = \alpha, \quad x = \pi+\alpha.$

If $r < 0$ , we choose the second and fourth candidates (i.e. ASTC), yielding

$x = \pi - \alpha, \quad x = 2\pi - \alpha.$

Graphically, we have the following.

We then add and subtract enough angles by multiples of $2\pi$ until we recover all desired solutions, as per the solving strategy in Problems 2 and 5.

Problem 9. Solve the equation $\sin(2x) = 1/2$ for $0 \leq x \leq 2\pi$ .

(Click for Solution)

Solution. Write $u = 2x$ so that $0 \leq u \leq 4\pi$ . To solve the equation

$\sin(u) = 1/2,$

we appeal to Problem 1 (or ASTC techniques) to conclude that for $0 \leq u \leq 2\pi$ ,

$u = \pi/6,\quad 5\pi/6.$

To account for all solutions $0 \leq u \leq 4\pi$ , we add $2\pi$ to each solution:

$u = \pi/6,\quad 5\pi/6,\quad 13\pi/6,\quad 17\pi/6.$

Since $u = 2x$ , we divide by $2$ to obtain all solutions in $x$ :

$x = \pi/12,\quad 5\pi/12,\quad 13\pi/12,\quad 17\pi/12.$

Problem 10. Solve the equation $\cos(4x) - 3\sin(2x) + 1 = 0$ for $0 \leq x \leq 2\pi$ .

Hint. Use the double-angle formula $\cos(2A) = 1 - 2 \sin^2(A)$ .

(Click for Solution)

Solution. Using the double-angle formula,

$\cos(4x) = \cos(2 \cdot 2x) = 1 - 2 \sin^2(2x).$

Hence, we can simplify the given equation in terms of $\sin(2x)$ :

$\begin{aligned} \cos(4x) - 3\sin(2x) + 1 &= 0 \\ (1 - 2 \sin^2(2x)) - 3\sin(2x) + 1 &= 0 \\ - 2 \sin^2(2x) - 3\sin(2x) + 2 &= 0 \\ 2\sin^2(2x) + 3\sin(2x) - 2 &= 0 \\ (\sin(2x) + 2)(2\sin(2x) - 1) &= 0. \end{aligned}$

Hence, $\sin(2x) + 2 = 0$ or $2 \sin(2x) - 1 = 0$ . In the former,

$\sin(2x) = -2.$

However, since for any $x$ we have $-1 \leq \sin(2x) \leq 1$ , this equation has no solutions. Therefore, $2 \sin(2x) - 1 = 0$ , yielding:

$\sin(2x) = 1/2,\quad 0 \leq x \leq 2\pi.$

We appeal to Problem 9 (or ASTC techniques) to conclude that

$x = \pi/12,\quad 5\pi/12,\quad 13\pi/12,\quad 17\pi/12.$

—Joel Kindiak, 2 Feb 26, 2302H
May 8, 2026
Multiplying Vectors
Problem 1. Illustrate the two vectors $\mathbf u = \begin{bmatrix} a \\ b \end{bmatrix} , \mathbf v = \begin{bmatrix} c \\ d \end{bmatrix}$ in the diagram below.

Show that the angle $\theta$ between $\mathbf u, \mathbf v$ is given by

$ac + bd = \| \mathbf u \| \| \mathbf v \| \cos(\theta).$

(Click for Solution)

Solution. Observe that

$\mathbf u - \mathbf v = \begin{bmatrix} a \\ b \end{bmatrix} - \begin{bmatrix} c \\ d \end{bmatrix} = \begin{bmatrix} a-c \\ b-d \end{bmatrix}.$

Using the law of cosines,

$\begin{aligned} \| \mathbf u - \mathbf v \|^2 &= \| \mathbf u \|^2 + \| \mathbf v \|^2 - 2 \cdot \| \mathbf u \| \cdot \| \mathbf v \| \cdot \cos(\theta).\end{aligned}$

Expanding the display on the left-hand side by Pythagoras’ theorem,

$\begin{aligned} \| \mathbf u - \mathbf v \|^2 &= (a-c)^2 + (b-d)^2 \\ &= (a^2 - 2ac + c^2) + (b^2 - 2bd + d^2) \\ &= (a^2 + b^2) + (c^2 + d^2) - 2(ac + bd) \\ &= \|\mathbf u\|^2 + \|\mathbf v\|^2 - 2(ac + bd) .\end{aligned}$

Comparing both sides of the expression $\frac 12 (\|\mathbf u\|^2 + \|\mathbf v\|^2 - \| \mathbf u - \mathbf v \|^2)$ ,

$ac + bd = \| \mathbf u \| \cdot \| \mathbf v \| \cdot \cos(\theta),$

as required.

Remark 1. The left-hand side is called the dot product of two vectors, defined by

$\displaystyle \begin{bmatrix} a \\ b \end{bmatrix} \cdot \begin{bmatrix} c \\ d \end{bmatrix} := ac+ bd.$

Then the result of Question 1 reduces to the dot product equation

$\mathbf u \cdot \mathbf v = \| \mathbf u \| \| \mathbf v \| \cos(\theta).$

Let $\alpha$ denote a real constant and $\mathbf w$ denote another two-dimensional vector.

Problem 2. Using Remark 1, show that the following equations always hold:
- $\mathbf v \cdot \mathbf v \geq 0$ .
- $\mathbf v \cdot \mathbf v = 0 \Rightarrow \mathbf v = \mathbf 0$ .
- $\mathbf u \cdot \mathbf v = \mathbf v \cdot \mathbf u$ .
- $\mathbf u \cdot (\alpha \mathbf v) = (\alpha \mathbf u) \cdot \mathbf v = \alpha (\mathbf u \cdot \mathbf v)$ .
- $\mathbf u \cdot (\mathbf v + \mathbf w) = \mathbf u \cdot \mathbf v + \mathbf u \cdot \mathbf w$ .
(Click for Solution)

Solution. Write $\mathbf u = \begin{bmatrix} a \\ b \end{bmatrix}$ and $\mathbf v = \begin{bmatrix} c \\ d \end{bmatrix}$ . The first result is almost immediate since $c,d$ are real numbers:

$\begin{aligned} \mathbf v\cdot\mathbf v &= c^2+d^2 \\ &\ge 0 + 0 \\ &= 0. \end{aligned}$

For the second result

$\mathbf v\cdot\mathbf v=0 \quad \Rightarrow \quad c^2+d^2=0.$

Furthermore,

$\begin{aligned} 0 \leq c^2 &= c^2 + 0 \\ &\leq c^2 + d^2 = 0. \end{aligned}$

Hence, $c = 0$ . Similarly, $d = 0$ . Hence $\mathbf v= \begin{bmatrix} 0 \\ 0 \end{bmatrix}= \mathbf 0$ .

The third property is straightforward:

$\begin{aligned} \mathbf u\cdot\mathbf v &= ac+bd \\ &=ca+db=\mathbf v\cdot\mathbf u. \end{aligned}$

Recall that $\alpha \mathbf v = \alpha \begin{bmatrix} c \\d \end{bmatrix} = \begin{bmatrix} \alpha c \\ \alpha d\end{bmatrix}$ . Then

$\begin{aligned} \mathbf u\cdot(\alpha\mathbf v) &= a(\alpha c)+b(\alpha d) \\ &= \alpha(ac+bd) \\ &= \alpha(\mathbf u\cdot\mathbf v), \end{aligned}$

Define $\mathbf w=\begin{bmatrix} p \\ q \end{bmatrix}$ . Then the fifth property is immediate:

$\begin{aligned} \mathbf u\cdot(\mathbf v+\mathbf w) &= a(c+p)+b(d+q) \\ &= (ac+bd)+(ap+bq) \\ &= \mathbf u\cdot\mathbf v+\mathbf u\cdot\mathbf w. \end{aligned}$

Problem 3. Explain why $\| \mathbf v \| = \sqrt{ \mathbf v \cdot \mathbf v }$ . Deduce the following:
- $\|\mathbf v \| \geq 0$ .
- $\|\mathbf v \| = 0 \Rightarrow \mathbf v = \mathbf 0$ .
- $\| \alpha \mathbf v \| = |\alpha| \|\mathbf v \|$ .
- $| \mathbf u \cdot \mathbf v | \leq \| \mathbf u \| \| \mathbf v\|$ .
- $\| \mathbf u + \mathbf v \| \leq \| \mathbf u \| + \| \mathbf v \|$ .
(Click for Solution)

Solution. Using Remark 1 and Pythagoras’ theorem,

$\| \mathbf v \| = \sqrt{c^2 + d^2} = \sqrt{ \mathbf v \cdot \mathbf v }.$

Therefore, we obtain the properties rather straightforwardly:

$\|\mathbf v \| = \sqrt{\mathbf v \cdot \mathbf v} \geq \sqrt{0} = 0$

with equality if and only if $\mathbf v \cdot \mathbf v = 0 \iff \mathbf v = \mathbf 0$ . Next,

$\begin{aligned} \| \alpha \mathbf v \| &= \sqrt{(\alpha \mathbf v) \cdot(\alpha \mathbf v)} \\ &= |\alpha | \sqrt{\mathbf v \cdot \mathbf v} \\ &= |\alpha| \|\mathbf v \|. \end{aligned}$

The fourth property, known as the Cauchy-Schwarz inequality, follows from Problem 1 and the observation that $|{\cos(\theta)}| \leq 1$ :

$\begin{aligned} |\mathbf u \cdot \mathbf v| &= \| \mathbf u \| \|\mathbf v\| |{\cos(\theta)}| \\ &\leq \| \mathbf u \| \|\mathbf v\| \cdot 1 \\ &=\| \mathbf u \| \|\mathbf v\|. \end{aligned}$

The fifth property follows from the fourth:

$\begin{aligned} \| \mathbf u + \mathbf v \|^2 &= (\mathbf u + \mathbf v) \cdot (\mathbf u + \mathbf v) \\ &= (\mathbf u \cdot \mathbf u) + (\mathbf u \cdot \mathbf v) + (\mathbf v \cdot \mathbf u) + (\mathbf v \cdot \mathbf v) \\ &= \| \mathbf u \|^2 + 2(\mathbf u \cdot \mathbf v) + \| \mathbf v \|^2 \\ &\leq \| \mathbf u \|^2 + 2 \| \mathbf u \| \| \mathbf v \| + \| \mathbf v \|^2 \\ &= (\| \mathbf u \| + \| \mathbf v \|)^2 \end{aligned}$

and taking square roots on both sides.

Problem 4. Define $d(\mathbf u, \mathbf v) := \| \mathbf u - \mathbf v\|$ . Show that the following hold:
- $d(\mathbf u, \mathbf v) \geq 0$ .
- $d(\mathbf u, \mathbf v) = 0 \Rightarrow \mathbf u = \mathbf v$ .
- $d(\mathbf u, \mathbf v) = d(\mathbf v, \mathbf u)$ .
- $d(\mathbf u, \mathbf v) \leq d(\mathbf u, \mathbf w) + d(\mathbf w, \mathbf v)$ .
(Click for Solution)

Solution. The results in Problem 4 comes from Problem 3:

$d(\mathbf u, \mathbf v) = \| \mathbf u - \mathbf v \| \geq 0,$

then

$\begin{aligned} d(\mathbf u, \mathbf v) = 0 \quad &\Rightarrow \quad \| \mathbf u - \mathbf v \| = 0 \\ &\Rightarrow \quad \mathbf u - \mathbf v = \mathbf 0 \\&\Rightarrow \quad \mathbf u = \mathbf v. \end{aligned}$

Furthermore,

$\begin{aligned} d(\mathbf u, \mathbf v) &= \| \mathbf u - \mathbf v \| \\ &= \| (-1)(\mathbf v - \mathbf u) \| \\ &= |{-1}| \| \mathbf v - \mathbf u \| \\ &= \| \mathbf v - \mathbf u \| \\ &= d(\mathbf v , \mathbf u) \end{aligned}$

and

$\begin{aligned} d(\mathbf u, \mathbf v) &= \| \mathbf u - \mathbf v \| \\ &= \| (\mathbf u - \mathbf w) + (\mathbf w - \mathbf v) \| \\ &\leq \| \mathbf u - \mathbf w \| + \| \mathbf w - \mathbf v \| \\ &= d(\mathbf u, \mathbf w) + d(\mathbf w, \mathbf v). \end{aligned}$

Remark 2. In the language of linear algebra, Remark 1 and Problem 2 defines a “multiplication” on the set of two-dimensional vectors, turning it into a real inner product space. Problem 3 show that inner product spaces are normed spaces (in that two-dimensional vectors have lengths or norms), while Problem 4 shows that normed spaces are metric spaces (i.e. the notion of distance is a reasonable one).

Problem 5. Given that the point $(x_0, y_0)$ lies on the line $ax+by = c$ , show that any other point $(x,y)$ lies on the line if and only if

$\begin{bmatrix} x - x_0 \\ y - y_0 \end{bmatrix} \cdot \begin{bmatrix} a \\ b \end{bmatrix} = 0.$

(Click for Solution)

Solution. Since $(x_0, y_0)$ lies on the line, we have

$ax_0 + by_0 = c.$

The point $(x,y)$ lies on the line if and only if

$ax + by = c.$

Subtracting the equations yields the equation

$a(x-x_0) + b(y-y_0) = 0.$

On the other hand,

$\begin{aligned} \begin{bmatrix} x - x_0 \\ y - y_0 \end{bmatrix} \cdot \begin{bmatrix} a \\ b \end{bmatrix} &= (x-x_0)a + (y-y_0)b \\ &= a(x-x_0) + b(y-y_0). \end{aligned}$

Then $(x,y)$ lies on the line if and only if the right-hand side equals $0$ , that is,

$\begin{bmatrix} x - x_0 \\ y - y_0 \end{bmatrix} \cdot \begin{bmatrix} a \\ b \end{bmatrix} = 0.$

Remark 3. Problem 5 generalises to an $(n-1)$ -dimensional hyperplane: Given that the point $(v_1, \dots, v_n)$ lies on the $(n-1)$ -dimensional hyperplane with equation

$a_1 x_1 + a_2 x_2 + \cdots + a_n x_n = c,$

any other point $(u_1,\dots, u_n)$ lies on the hyperplane if and only if

$\begin{bmatrix} u_1 - v_1 \\ \vdots \\ u_n - v_n \end{bmatrix} \cdot \begin{bmatrix} a_1 \\ \vdots \\ a_n \end{bmatrix} = 0.$

—Joel Kindiak, 30 Jan 26, 1810H
May 6, 2026