Probability Nomenclature

Probability is our attempt at making sense of random events in our rather random world. The simplest example—a coin toss—will surprisingly motivate many topics that we will formulate with greater and greater generality.

Flip a coin. You get 1 point if ‘Head’ comes up, and nothing if ‘Tail’ comes up. You start with a score of 0. After one flip, what is your score? Well, it depends. Does the coin land ‘Head’? Does it land ‘Tail’? It could go either way. How do we model this phenomenon. Well, there are 2 outcomes, and if they are equally likely, then we can say that each outcome has a probability of 1/2.

In mathematical notation, we let the sample space \Omega := \{\text{H}, \text{T}\} denote the set of possible outcomes after one flip of the coin. What are the different events that we can measure? Surely we have the non-negative probabilities

\mathbb P( \{ \text{H} \} ) = \frac 12,\quad \mathbb P( \{ \text{T} \} ) = \frac 12.

But more is true. What is the probability that we get either a ‘Head’ or a ‘Tail’? In the real would it could be the case that a coin lands on its edge, but in a mathematical world, we preclude such a possibility. This means that there are only two possible outcomes: a ‘Head’ or a ‘Tail’, and they don’t overlap. At least one of them must be true! Hence, it makes sense to assert

\mathbb P(\Omega) = \mathbb P(\{ \text{H}, \text{T} \}) = 1.

This discussion almost seems trivial, until we make one modification. Now flip the coin two times. You get 1 point if ‘Head’ comes up, and nothing if ‘Tail’ comes up. You start with a score of 0. After two flips, what is your score?

Let’s list down the different possible outcomes:

\Omega = \{ ( \text{H}, \text{H} ), ( \text{H}, \text{T} ), ( \text{T}, \text{H} ), ( \text{T}, \text{T} ) \} = \{\text H, \text T\}^2.

How do we assign their probabilities? And furthermore, how do we determine our score? Assuming no funny business, we would think that all outcomes are equally likely. Once again, at least one outcome should hold, so we stipulate \mathbb P(\Omega) = 1. Furthermore, for any two distinct outcomes \omega_1 \neq \omega_2, we stipulate the event that either one occurs as their sum:

\mathbb P(\{\omega_1, \omega_2\}) = \mathbb P(\{\omega_1\} \cup \{\omega_2\}) := \mathbb P(\{\omega_1\}) + \mathbb P(\{\omega_2\}).

We can now formalise probability theory for finite sample spaces; the infinite case deserves a much, much longer elaboration.

Definition 1. Let \Omega be a finite sample space and \mathcal F := \mathcal P(\Omega) denote its power set (i.e. the collection of all of its possible subsets). We call the map \mu : \mathcal F  \to [0, \infty) a finite measure on \Omega if

\displaystyle \mu(\{\omega_1, \dots, \omega_n\}) = \sum_{i=1}^n \mu(\{\omega_i\}) for distinct outcomes \omega_1,\dots,\omega_n \in \Omega.

Additionally, if \mu(\Omega) = 1, we call \mu a probability measure, and commonly denote it by \mu \equiv \mathbb P.

Lemma 1. For any finite sample space \Omega and finite measure \mu : \mathcal P(\Omega) \to [0, \infty) with \mu(\Omega) > 0, there exists a probability measure \mathbb P : \mathcal P(\Omega) \to [0, 1].

Proof. Define \displaystyle \mathbb P := \frac 1{\mu(\Omega)} \cdot \mu and verify Definition 1: for distinct outcomes \omega_1,\dots,\omega_n \in \Omega,

\begin{aligned} \mathbb P(\{\omega_1, \dots, \omega_n\}) &= \frac 1{\mu(\Omega)} \cdot \mu(\{\omega_1, \dots, \omega_n\}) \\ &= \frac 1{\mu(\Omega)}  \sum_{i=1}^n \mu( \{\omega_i\} ) \\ &=   \sum_{i=1}^n \frac 1{\mu(\Omega)} \cdot \mu( \{\omega_i\} ) =   \sum_{i=1}^n \mathbb P( \{\omega_i\} ). \end{aligned}

Hence, we can discuss most ideas using the more general \mu and particularise to \mathbb P whenever 0 < \mu(\Omega) < \infty.

Theorem 1. If |\Omega| is finite and there exists m > 0 such that for any outcome \omega \in \Omega, \mu(\{\omega\}) = m, then m = \mu(\Omega)/|\Omega|. We call \mu the uniform measure on \Omega. In the case m = 1, we call \mu the counting measure, since \mu(\Omega) = |\Omega|.

Proof. Denoting \Omega = \{\omega_1,\dots,\omega_{|\Omega|}\},

\begin{aligned} m \cdot |\Omega| &= \sum_{i=1}^{|\Omega|} m = \sum_{i=1}^{|\Omega|} \mu(\{\omega_i\})  = \mu(\Omega). \end{aligned}

Therefore, m = \mu(\Omega)/|\Omega|.

In particular, for any K \subseteq \Omega, \mu(K) = |K| so that we can regard | \cdot | : 2^\Omega \to \mathbb N_0 as the counting measure.

Corollary 1. Under the conditions of Theorem 1, if \mathbb P is a uniform probability measure, then for any K \subseteq \Omega, \mathbb P(K) = |K|/|\Omega|.

Proof. By Theorem 1, for any \omega \in \Omega,

\displaystyle \mathbb P(\{\omega\}) = \frac{\mathbb P(\Omega)}{|\Omega|} = \frac{1}{|\Omega|}.

Hence,

\displaystyle \mathbb P(K) = \sum_{\omega \in K} \mathbb P(\{\omega\}) =  \sum_{\omega \in K} \frac{1}{|\Omega|} = |K| \cdot \frac{1}{|\Omega|} = \frac{|K|}{|\Omega|}.

In particular,

\mathbb P(\{ ( \text{H}, \text{H} ) \} ) = \mathbb P( \{ ( \text{H}, \text{T} ) \} ) = \mathbb P( \{ ( \text{T}, \text{H} ) \} ) = \mathbb P(\{ ( \text{T}, \text{T} ) \}) = \frac 14.

Let’s return to the situation

\Omega = \{\text H, \text T\}^2 = \{ ( \text{H}, \text{H} ), ( \text{H}, \text{T} ), ( \text{T}, \text{H} ), ( \text{T}, \text{T} ) \}.

Recall that our goal is to determine the score X after 2 coin flips. There are three possible scores: X \in \{ 0, 1, 2 \}. Here’s a simple question: what is the probability that we end up with a score of 1? Well, we notice that we can interpret the score X as a function X : \Omega \to \{0, 1, 2\} defined as follows:

X(( \text{H}, \text{H} )) = 2, \quad X(( \text{H}, \text{T} )) = X(( \text{T}, \text{H} )) = 1,\quad X(( \text{T}, \text{T} )) = 0.

This means that the outcomes \omega \in \{ ( \text{H}, \text{T} ), ( \text{T}, \text{H} ) \} yield X(\omega) = 1. Using inverse-image notation,

X^{-1}(\{1\}) = \{\omega \in \Omega : X(\omega) = 1\} = \{ ( \text{H}, \text{T} ), ( \text{T}, \text{H} ) \}.

Well then, what do we mean by \mathbb P(X = 1)? Intuitively, since there are 2 outcomes with equal probability, we should have a combined probability of 2 \times 1/4 = 1/2. The probability is derived from the subset X^{-1}(\{1\}) \subseteq \Omega of desired outcomes:

\begin{aligned} \mathbb P(X = 1) &:= \mathbb P(X^{-1}(\{1\})) \\ &= \sum_{\omega \in X^{-1}(\{1\})} \mathbb P(\{\omega\}) \\ &= \sum_{\omega \in X^{-1}(\{1\})} \frac{1}{|\Omega|} \\ &= \frac{1}{|\Omega|} \cdot |X^{-1}(\{1\})| \\ &= \frac 14 \cdot 2 = \frac 12. \end{aligned}

Definition 2. Let \Omega be a finite sample space and \mu be a measure on \Omega. We call X : \Omega \to \mathbb R a random variable on \Omega. It is clear that X(\Omega) = \{x_1,\dots,x_n\} is finite.

Theorem 2. As per the notation in Definition 2, denote \mathcal F_X := \mathcal P(X(\Omega)). Then the map \mu_X : \mathcal F_X \to [0, \infty) defined by \mu_X(K) := \mu(X^{-1}(K)) for any K \subseteq X(\Omega) is a measure on X(\Omega). We abbreviate \mu_X \equiv \mu \circ X^{-1} and call it the push-forward measure of \mu under X.

Proof. For any \{x_1,\dots,x_n\} \subseteq X(\Omega),

\begin{aligned} \mu_X(\{x_1,\dots,x_n\}) &= \mu(X^{-1}(\{x_1,\dots,x_n\}) ) \\ &= \mu(\{\omega \in \Omega : X(\omega) \in \{x_1, \dots, x_n\} \}) \\ &= \mu \left(\bigcup_{i=1}^n \{\omega \in \Omega : X(\omega) \in \{x_i\} \} \right) \\ &= \sum_{i=1}^n \mu(\{\omega \in \Omega : X(\omega) \in \{x_i\} \}) \\ &= \sum_{i=1}^n \mu(X^{-1}( \{x_i\} ) ) \\ &= \sum_{i=1}^n \mu_X ( \{x_i\} ). \end{aligned}

Before continuing, you might wonder: what’s with the rather complicated terminology? Well, it’s to tease you on the key vocabulary that one cannot escape when studying measure theory and probability.

In fact, there is even a simpler question that isn’t entirely obvious, given a set \Omega, how do we compute |\Omega|? This is where we need to discuss some basic combinatorics.

—Joel Kindiak, 22 Jun 25, 1249H

Published by


Leave a comment