The Infinite Coin Toss

The infinite coin toss is an intuitively straightforward idea that needs effort to properly construct: what do we mean by tossing a coin infinitely many times? And yet, such constructions are crucial to discuss discrete distributions like the geometric and Poisson distributions that find meaningful applications in mathematical finance and other fields of applied mathematics.

Consider the following experiment. Flip a biased coin with \mathbb P(\mathrm H) = p \in [0, 1]. Let \xi_n denote the outcome of the n-th flip (i.e. \xi_n = 1 if the n-th flip is Head, and \xi_n=0 otherwise, excluding the case when the coin lands on its side). Let X denote the number of flips needed for you to obtain the first Head. What is the value of \mathbb P(X = x) for x \in \mathbb N^+? Intuitively, the first (x-1) flips has to be Tails, and the last should be Head. Assuming all coin flips are independent, we should have the probability

\begin{aligned} \mathbb P(X = x) &= \mathbb P(\xi_1 = 0, \dots, \xi_{x-1} = 0, \xi_x = 1) \\ &= \mathbb P(\xi_1 = 0) \cdot \dots  \cdot \mathbb P(\xi_{x-1} = 0) \cdot \mathbb P(\xi_x = 1) \\ &= (1-p)^{x-1} \cdot p. \end{aligned}

The problem is that x could take on any of the infinitely many values in \mathbb N^+, and so a reasonable sample space \Omega will require outcomes of the form (\omega_1,\omega_2,\dots). To be fair, that is not the challenging bit: define

\Omega := \{0, 1\}^{\mathbb N^+},\quad \mathcal F := \mathcal P(\Omega).

The problem arises in defining the underlying probability measure. The issue isn’t Head and Tail; we obtain effectively the same probability space working with 0, 1. The problem is this: how do we evaluate the quantity \mathbb P((0, 0, 0, \dots))?

Your first instinct should be to take the limit of (1-p)^n as n \to \infty, and your instinct is not wrong. But how do we set up the probability space so that this instinct does, in fact, yield a valid answer? Furthermore, by that principle, \mathbb P(\omega) = 0 for any sequence \omega. How do we obtain nonzero probabilities in that case?

To that end, let’s scale down our analyses a bit, lest we overthink in excessive panic. Rather than think of infinitely many coin tosses, let’s start with the first coin toss. No matter which infinite outcome \omega = (\omega_1,\omega_2,\dots) we obtain, we know that \omega_1 = 1 or \omega_1 = 0, and that these outcomes are assigned the probabilities p, 1-p respectively. This observation motivates us to define sub-\sigma-algebras of \mathcal F that agree with our intuitions.

To that end, equip \Omega_0 \equiv \Omega_1 := \{ 0, 1 \} with the usual \sigma-algebra \mathcal G_0 \equiv \mathcal G_1 = \mathcal P(\Omega_1) and probability measure \mathbb P_0(1) = \mathbb P_1(1) = p. Then for any K \in \mathcal P(\Omega_1), define

G_1(K) := \{ \omega \in \Omega :  \omega_1 \in K\} \subseteq \Omega.

Then define the \sigma-algebra \mathcal F_1 := \{ G_1(K) : K \in \mathcal G_1 \} on \Omega and equip it with the probability measure \mathbb P(G_1(K)) := \mathbb P_1(K). We can repeat this procedure n times as follows.

Definition 1. For any n \in \mathbb N^+, equip \Omega_n := \{ 0, 1 \}^n with the usual \sigma-algebra \mathcal G_n = \mathcal P(\Omega_n) and probability measure \mathbb P_n(\omega) = \prod_{i=1}^n \mathbb P_0( \omega_i ). Then for any K \in \mathcal P(\Omega_n), define

G_n(K) := \{ \omega \in \Omega : (\omega_1,\dots,\omega_n) \in K\} \subseteq \Omega.

Then define the \sigma-algebra \mathcal F_n := \{ G_n(K) : K \in \mathcal G_n \} on \Omega and equip it with the probability measure \mathbb P(G_n(K)) := \mathbb P_n(K). Define

\displaystyle \mathcal F^0 := \bigcup_{j=1}^\infty \mathcal F_j.

Lemma 1. We have \mathcal F_1 \subseteq \mathcal F_2 \subseteq \cdots \subseteq \mathcal F^0. Furthermore, \mathcal F^0 is closed under finite unions, though not necessarily countably infinite unions. We call \mathcal F^0 an algebra of subsets of \Omega.

Lemma 2. \mathbb P(\emptyset) = 0 and \mathbb P is countably additive:

\displaystyle \mathbb P\left( \bigsqcup_{i=1}^\infty K_i \right) = \sum_{i=1}^\infty \mathbb P(K_i),\quad  \bigsqcup_{i=1}^\infty K_i \in \mathcal F^0.

Proof. Consider \Omega = \{0, 1\}^{\mathbb N^+} as the topological space

\displaystyle \Omega = \prod_{n \in \mathbb N^+} \{0, 1\}.

Equip \{0, 1\} with the discrete topology, and it is clear that \{0, 1\} is compact. Then \Omega equipped with the product topology is compact by the Tychonoff theorem.

Now fix K_1, K_2,\dots such that K:=\bigsqcup_{i=1}^\infty K_i \in \mathcal F^0. Then \{K_1,K_2,\dots\} forms an open cover of K. Since K \subseteq \Omega is closed, it is compact, and thus is covered by a finite subcollection \{K_1,\dots, K_n\} without loss of generality. Since the union is disjoint, we can conclude that K_j = \emptyset for j > N. Hence,

\displaystyle \mathbb P\left( \bigsqcup_{i=1}^\infty K_i \right) = \mathbb P\left( \bigsqcup_{i=1}^n K_i \right) = \sum_{i=1}^n \mathbb P(K_i) = \sum_{i=1}^{\infty} \mathbb P(K_i).

Unfortunately, we still need a \sigma-algebra on \Omega. Currently, we only have an algebra \mathcal F^0, which is a \sigma-algebra except that it is closed under only finite, rather than countable, unions. Thankfully, an algebra and a countably additive measure allows us to recover a \sigma-algebra and a bona fide measure. This result is called Carathéodory’s extension theorem.

Theorem 1. Let \mathcal F^0 be an algebra of subsets of \Omega. Suppose there exists a countably additive map \mu_0 : \mathcal F^0 \to [0, \infty], which we call a pre-measure on \mathcal F^0. Then there exists a \sigma-algebra \mathcal F \supseteq \mathcal F_0 and a measure \mu : \mathcal F \to [0, \infty] such that \mu|_{\mathcal F_0} = \mu_0. We say that \mu_0 extends to a measure \mu on \mathcal F.

Carathéodory’s extension theorem does more than just help us establish the logical validity of infinite coin flips—it will help us create suitable \sigma-algebras on \mathbb R. But we will need to put in some nontrivial effort to prove it.

Henceforth, assume the hypotheses of Theorem 1. Let \mathcal F^0 \subseteq \mathcal P(\Omega) be an algebra of subsets of \Omega and \mu_0 be a pre-measure on \mathcal F^0.

Lemma 3. Define the outer measure \mu^* : \mathcal P(\Omega) \to [0, \infty] by

\displaystyle \mu^*(K) := \inf \left\{ \sum_{i=1}^\infty \mu_0(K_i) : K \subseteq \bigcup_{i=1}^\infty K_i,\ K_i \in \mathcal F^0 \right\}.

Then \mu^*(K) = \mu_0(K) for K \in \mathcal F^0 and, in addition, satisfies the following properties:

  • \mu^*(K) \leq \mu^*(L) if K \subseteq L,
  • \mu^*(K \cup L) \leq \mu^*(K) + \mu^*(L),
  • \mu^*\left( \bigcup_{i=1}^\infty K_i \right) \leq \sum_{i=1}^\infty \mu^*(K_i).

Proof. For the subset claim, any cover of L is a cover of K, so the result is immediate.

For the subadditivity claim, fix K, L \subseteq \Omega. Fix \epsilon > 0. By the definition of \mu^*(K), \mu^*(L), there exist \{K_i\}, \{L_j\} \subseteq \mathcal F^0 such that

\displaystyle \sum_{i=1}^\infty \mu_0(K_i) \leq \mu^*(K) + \frac{\epsilon}{2},\quad \sum_{j=1}^\infty \mu_0(L_j) \leq \mu^*(L) + \frac{\epsilon}{2},

and K \subseteq \bigcup_{i=1}^\infty K_i, L \subseteq \bigcup_{j=1}^\infty L_j. Define \{M_k\} := \{K_i\} \cup \{L_j\} so that K \cup L \subseteq \bigcup_{k=1}^\infty M_k. By the countable additivity of \mu_0,

\begin{aligned} \mu^*(K \cup L) \leq \sum_{k=1}^\infty \mu_0(M_k) &\leq \sum_{i=1}^\infty \mu_0(K_i) + \sum_{j=1}^\infty \mu_0(L_j) \\ &\leq \mu^*(K) + \mu^*(L) + \epsilon. \end{aligned}

Taking \epsilon \to 0^+ yields the desired result.

For countable subadditivity, if there exists i such that \mu^*(K_i) = \infty, then the inequality holds trivially. Suppose therefore that \mu^*(K_i) < \infty for any i. For each i, find \{K_{i,j} \} \subseteq \mathcal F^0 that covers K_i (i.e. K_i \subseteq \bigcup_{j=1}^\infty K_{i,j}) and

\displaystyle \sum_{j=1}^\infty \mu^*(K_{i,j}) \leq \mu^*(K_i) + \frac{\epsilon}{2^i}.

Then we notice that \{K_{i,j}\}_{i,j} covers \bigcup_{i=1}^\infty K_i, and by the sum of a geometric series

\begin{aligned} \sum_{i=1}^\infty \sum_{j=1}^\infty \mu^*(K_{i,j}) &\leq  \sum_{i=1}^\infty  \left( \mu^*(K_i) + \frac{\epsilon}{2^i} \right) \\ &= \sum_{i=1}^\infty  \mu^*(K_i) + \epsilon \cdot \frac{1/2}{1-1/2} \\ &= \sum_{i=1}^\infty  \mu^*(K_i) + \epsilon. \end{aligned}

Taking \epsilon \to 0^+ yields the desired result.

Lemma 4. The subset \mathcal F \subseteq \mathcal P(\Omega) defined by

\mathcal F := \{K \in \mathcal P(\Omega) : \mu^*(L) = \mu^*(K \cap L) + \mu^*(\Omega \backslash K \cap L), L \subseteq \Omega \}

forms an algebra over \Omega, and \mu^* is countably additive on \mathcal F. We say that each K \in \mathcal F satisfies the Carathéodory condition.

Proof. By Lemma 3, the inequality

\mu^*(K) \leq \mu^*(K \cap L) + \mu^*(\Omega \backslash K \cap L)

holds automatically, so that the only direction that needs to be checked is the direction

\mu^*(L) \geq \mu^*(K \cap L) + \mu^*(\Omega \backslash K \cap L).

It is not hard to see that \emptyset, \Omega \in \mathcal F since \mu^*(\emptyset) = \mu_0(\emptyset) = 0. Furthermore, closure under complementation is obvious, since for any L \subseteq \Omega,

\mu^*(\Omega \backslash K \cap L) + \mu^*(\Omega \backslash (\Omega \backslash K) \cap L) \leq \mu^*(L)

since K \in \mathcal F. It remains to prove closure under finite unions, and by induction, it suffices to prove the two-set case. Fix K_1, K_2 \in \mathcal F. We claim that for any L \subseteq \Omega,

\displaystyle \mu^* ((K_1 \cup K_2) \cap L) + \mu^*(\Omega \backslash (K_1 \cup K_2) \cap L ) \leq \mu^*(L).

By subadditivity in Lemma 3,

\begin{aligned} &\mu^* ((K_1 \cup K_2) \cap L) \\ &= \mu^* ((K_1 \cup (\Omega \backslash K_1 \cap K_2)) \cap L) \\ &= \mu^* ((K_1 \cap L) \cup (\Omega \backslash K_1 \cap K_2 \cap L)) \\ &= \mu^*(K_1 \cap L) + \mu^* (\Omega \backslash K_1 \cap K_2 \cap L) \\  &\leq \mu^*(K_1 \cap K_2 \cap L) + \mu^*(K_1 \cap \Omega \backslash K_2 \cap L) + \mu^*(\Omega \backslash K_1 \cap K_2 \cap L) . \end{aligned}

By definition,

\mu^*(\Omega \backslash (K_1 \cup K_2) \cap L ) = \mu^*(\Omega \backslash K_1 \cap \Omega \backslash  K_2 \cap L ).

Applying the Carathéodory criterion,

\begin{aligned} \mu^*(K_1 \cap K_2 \cap L) + \mu^*(\Omega \backslash K_1 \cap K_2 \cap L) &\leq \mu^*(K_2 \cap L), \\ \mu^*(K_1 \cap \Omega \backslash K_2 \cap L) + \mu^*(\Omega \backslash K_1 \cap \Omega \backslash  K_2 \cap L ) &\leq \mu^*(\Omega \backslash  K_2 \cap L). \end{aligned}

Since K_2 also satisfies the Carathéodory criterion, adding yields the finite union result. For the countably additive claim, we first observe that

\begin{aligned} \mu^*(K_1 \sqcup K_2) &= \mu^*((K_1 \sqcup K_2) \cap K_1) + \mu^*((K_1 \sqcup K_2) \cap \Omega \backslash K_1) \\ &= \mu^*(K_1) + \mu^*(K_2). \end{aligned}

Inductively, \mu^*(\bigsqcup_{i=1}^n K_i) = \sum_{i=1}^n \mu^*(K_i). Now, fix disjoint \{K_i\} \subseteq \mathcal F. Then \mu^*(\bigsqcup_{i=1}^\infty K_i) \geq \sum_{i=1}^n \mu^*(K_i) by monotonicity, so that coupled with countable subadditivity in Lemma 3,

\displaystyle \sum_{i=1}^n\mu^*(K_i) \leq \mu^* \left(\bigsqcup_{i=1}^\infty K_i \right) \leq \sum_{i=1}^\infty \mu^*(K_i).

Taking n \to \infty yields the desired result.

Lemma 5. The set \mathcal F as defined in Lemma 4 forms a \sigma-algebra over \Omega. Furthermore, \mathcal F^0 \subseteq \mathcal F.

Proof. Fix \{K_i\} \subseteq \mathcal F. Since

\displaystyle K := \bigcup_{i=1}^\infty K_i = \bigsqcup_{i=1}^\infty L_i,\quad L_0 := \emptyset, \quad L_{i+1} := K_{i+1} \backslash L_i,

we may assume without loss of generality that \{ K_i \} is pairwise disjoint. We need to establish Carathéodory’s criterion for K, that is, prove that for any L \subseteq \Omega,

\mu^*(K \cap L) + \mu^*(\Omega \backslash K \cap L) \leq \mu^*(L).

By countable subadditivity,

\displaystyle \mu^*(K \cap L) \leq \sum_{i=1}^\infty \mu^*(K_i \cap L) < \infty.

By monotonicity, for any n \in \mathbb N^+,

\displaystyle \sum_{i=1}^n \mu^*(K_i \cap L) + \mu^*(\Omega \backslash K \cap L) \leq \mu^*(L).

Taking n \to \infty yields the desired result.

For the subset claim, fix K \in \mathcal F^0. Fix L \subseteq \Omega. Consider the cover \{L_i\} for L. Then \{ K \cap L_i \} forms a cover for K \cap L and \{\Omega \backslash K \cap L_i\} forms a cover for \Omega \backslash K \cap L. The result follows from bookkeeping:

\begin{aligned} \mu^*(K \cap L) + \mu^*(\Omega \backslash K \cap L) &\leq \sum_{i=1}^\infty \mu^*(K \cap L_i) + \sum_{i=1}^\infty \mu^*(\Omega \backslash K \cap L_i) \\ &\leq \sum_{i=1}^\infty (\mu^*(K \cap L_i) + \mu^*(\Omega \backslash K \cap L_i)) \\ &\leq \sum_{i=1}^\infty \mu^*(L_i) \leq \mu^*(L) + \epsilon. \end{aligned}

Proof of Theorem 1. The map \mu := {\mu^*}|_{\mathcal F} : \mathcal F \to [0, \infty] is a rigorously defined measure (by Lemma 4) on the rigorously defined \sigma-algebra \mathcal F (by Lemma 5) that extends the pre-measure \mu_0 (by Lemma 3).

Corollary 1. The sample space \Omega := \{0, 1\}^{\mathbb N^+} equipped with the algebra \mathcal F^0 can be extended to a probability space (\Omega, \mathcal F, \mathbb P) such that \mathbb P(K) = \mathbb P_n(K) for any K \in \mathcal F_n.

Finally, we can rigorously define the geometric distribution.

Theorem 2. Define the random variable X : \Omega \to \mathbb N^+ by

X(\omega) := \min\{ n \in \mathbb N^+ : \omega_n = 1 \}.

Then for any x \in \mathbb N^+ and p \in (0, 1],

\mathbb P(X = x) = (1-p)^{x-1} \cdot p.

We say that X follows a geometric distribution, denoted X \sim \mathrm{Geo}(p). Furthermore, \mathbb E[X] = 1/p and \mathrm{Var}(X) = (1-p)/p^2.

Remark 1. We also call X a stopping time of some stochastic process.

Proof. We observe that

\begin{aligned} \mathbb P(X = x) = \mathbb P(X^{-1}(x)) &= \mathbb P(G_x(\{ (0,\dots,0, 1) \})) \\ &= \mathbb P_x((0,\dots,0,1)) \\ &= \left( \prod_{i=1}^{x-1} \mathbb P_0(0) \right) \cdot \mathbb P_0(1) \\ &= (1-p) \cdot \dots \cdot (1-p) \cdot p \\ &= (1-p)^{x-1} \cdot p. \end{aligned}

For the expectation,

\begin{aligned} \mathbb E[X] &= \sum_{x=1}^{\infty} x \cdot \mathbb P(X = x) = \sum_{x=1}^\infty x \cdot (1-p)^{x-1} \cdot p, \end{aligned}

if the limit on the right-hand side exists. We first define

\displaystyle f(n) := \sum_{x=1}^n x \cdot (1-p)^{x-1} \cdot p

and observe that

\begin{aligned} \sum_{x=1}^n x \cdot (1-p)^x &= \sum_{x=1}^n x \cdot (1-p)^{x-1}(1-p) \\ &= \sum_{x=1}^n x \cdot (1-p)^{x-1} -\sum_{x=1}^n x \cdot (1-p)^{x-1} \cdot p \\ &= \sum_{x=1}^n x \cdot (1-p)^{x-1} - f(n) \\ &= \sum_{x=1}^n (x-1) \cdot (1-p)^{x-1} + \sum_{x=1}^n (1-p)^{x-1} - f(n). \end{aligned}

By algebruh,

\begin{aligned} f(n) &= \frac {1 - (1-p)^n}{1-(1-p)} - n (1-p)^n.\end{aligned}

Taking n \to \infty, we have f_0(n) \cdot (1-p)^n \to 0 for any polynomial f_0, so that f(n) \to 1/p. Therefore, \mathbb E[X] = 1/p. For the variance, we first define

\displaystyle g(n) := \sum_{x=1}^n x^2 \cdot (1-p)^{x-1} \cdot p,

which converges as n \to \infty by the ratio test. By observation,

\begin{aligned} g(n)& = \sum_{x=1}^n x^2 \cdot (1-p)^{x-1} \cdot p \\ &= \sum_{x=1}^n (x-1)^2 \cdot (1-p)^{x-1} \cdot p + 2 \cdot \sum_{x=1}^n x \cdot \mathbb P(X = x) - \sum_{x=1}^n \mathbb P(X = x) \\ &= (1-p) \cdot g(n-1) + 2 \cdot \sum_{x=1}^n x \cdot \mathbb P(X = x) - \sum_{x=1}^n \mathbb P(X = x).\end{aligned}

Taking n \to \infty on both sides,

\mathbb E[X^2] = (1-p) \cdot \mathbb E[X^2] + 2 \cdot \mathbb E[X] - 1.

By algebruh,

\displaystyle \mathbb E[X^2] = \frac 1p \left( 2 \cdot \frac 1p - 1 \right) = \frac {2-p}{p^2}.

Hence,

\displaystyle \mathrm{Var}(X) = \mathbb E[X^2] - \mathbb E[X]^2 = \frac {2-p}{p^2} - \frac 1{p^2} = \frac {1-p}{p^2}.

Not only can we define the geometric distribution, but we can finally define the length measure on \mathbb R in a measure-theoretically useful manner. This we will do next time.

—Joel Kindiak, 7 Jul 25, 2310H

,

Published by


Leave a comment