Neither Random Nor Variable

A random variable is neither random nor a variable. Let me explain.

Flip a fair coin twice. Let X denote the number of Heads that you obtain. What is the value of X? Well, it depends on each outcome in the sample space \Omega:

\Omega := \{(\mathrm H, \mathrm H), (\mathrm H, \mathrm T), (\mathrm T, \mathrm H), (\mathrm T, \mathrm T)\}.

Denote \mathcal F = \mathcal P(\Omega) and the uniform probability measure on \mathcal F by \mathbb P(\cdot). Since X denotes the number of Heads, we have

X((\mathrm H, \mathrm H)) = 2, \quad X((\mathrm H, \mathrm T)) = X ((\mathrm T, \mathrm H)) = 1,\quad X((\mathrm T, \mathrm T)) = 0.

Notice then the randomness arises in selecting the outcome \omega \in \Omega, not in the measurement X(\omega) (though the former inevitably influences the latter). The quantity X(\omega) just records the number of Heads, and each occurs with some probability (which we will explore later on).

Thus, while we call X a random variable, it is neither random (though that is captured by the randomness of \omega), nor a variable (though that is captured by X(\Omega)).

Let (\Omega, \mathcal F) and (\Psi, \mathcal G) be measurable spaces.

Definition 1. A map f : \Omega \to \Psi is \mathcal F/\mathcal Gmeasurable if for any K \in \mathcal G, f^{-1}(K) \in \mathcal F. We omit the prefix \mathcal F/\mathcal G when the context is clear.

Lemma 1. Suppose \mathcal F = \mathcal P(\Omega). Any map f : \Omega \to \Psi is measurable, where \Psi is equipped with the \sigma-algebra \mathcal P(\Psi).

Proof. For any K \in \mathcal P(\Psi),

K \subseteq \Psi \quad \Rightarrow \quad f^{-1}(K) \subseteq \Omega \quad \Rightarrow \quad f^{-1}(K) \in \mathcal P(\Omega) = \mathcal F.

Lemma 2. Suppose furthermore there exists a probability measure \mathbb P on (\Omega, \mathcal F). Then the map \mathbb P_X := \mathbb P \circ X^{-1} : \mathcal G \to [0, 1] is a probability measure on the measurable space ( \Psi, \mathcal G ), called the push-forward measure of X.

Lemmas 1 and 2 are crucial for this purpose: they illustrate that, all things considered, the underlying sample space (\Omega, \mathcal F, \mathbb P) is not as nearly as relevant as the measures induced on \mathbb N. Eventually, we will want to develop measures on \mathbb R too, though that will take substantially more effort.

Equip \mathbb N with the \sigma-algebra \mathcal P(\mathbb N)

Definition 2. Let (\Omega, \mathcal F, \mathbb P) be any probability space. A discrete random variable is a map X : \Omega \to \mathbb N. Since we equipped \mathbb N with the \sigma-algebra \mathcal P(\mathbb N), X is automatically measurable. We call its pushforward measure \mathbb P_X the distribution of X, and for convenience denote, for x \in \mathbb N,

\mathbb P_X(x) = \mathbb P(X^{-1}(x)) = \mathbb P(\{\omega \in \Omega : X(\omega) = x \} ) \equiv \mathbb P(X=x).

Without loss of generality, given that a discrete random variable X has distribution \mathbb P_X, we assume that (\Omega, \mathcal F, \mathbb P) = (\mathbb N, \mathcal P(\mathbb N), \mathbb P_X) and that with respect to this choice, X = \mathrm{id}.

Lemma 3. For any discrete random variable X, \displaystyle \sum_{x = 0}^\infty \mathbb P(X = x) = 1.

The support of a discrete random variable is \mathbb P_X^{-1}((0, 1]). The probability mass function or p.m.f. of X is defined by f_X := \mathbb P(X = \cdot) : \mathbb R \to [0, \infty). The cumulative distribution function or c.d.f. of X is defined by

\displaystyle F_X : \mathbb R \to [0, 1],\quad F_X(x) := \mathbb P(X \leq x) \equiv \sum_{t = 0}^{\lfloor x \rfloor} f_X(t).

To illustrate meaningful examples, let’s consider the flipping of a biased coin with parameter p \in [0, 1]. This means the probability of getting ‘Head’ is some fixed value p, which may or may not be 1/2.

Example 1. Consider the usual sample space \Omega := \{ \mathrm H, \mathrm T\} equipped with the \sigma-algebra \mathcal F := \mathcal P(\Omega). Define the probability measure \mathbb P : \mathcal F \to [0, 1] by \mathbb P(\{\mathrm H\}) = p, which implies that

\mathbb P(\{\mathrm T\}) = \mathbb P(\Omega \backslash \{\mathrm H\}) = 1 - p.

Define the random variable X : \Omega \to \mathbb N defined by X(\mathrm H) = 1 and X(\mathrm T) = 0. Then

\mathbb P(X = 1) = \mathbb P(X^{-1}(\{1\}) = \mathbb P(\{\mathrm H\}) = p,\quad \mathbb P(X = 0) = 1- p,

so that \mathbb P(X = x) = 0 for x \notin \{0, 1\}.

Definition 3. Let (\Omega, \mathcal F, \mathbb P) be any probability space and X : \Omega \to \mathbb N be a discrete random variable. We say that X follows a Bernoulli distribution with parameter p \in [0, 1], denoted X \sim \mathrm{Ber}(p), if

\mathbb P(X = x) = \begin{cases}1-p, & x = 0, \\ p, & x = 1, \\ 0, & \text{otherwise.}\end{cases}

Example 2. The random variable X defined in Example 1 has the distribution \mathrm{Ber}(p). The random variable Y := 1 - X has the distribution \mathrm{Ber}(1-p) since

\mathbb P(Y = 1) = \mathbb P(X = 0) = 1 - p.

Example 3. The discrete random variable U is uniform on (a, b] given integers -1 \leq a < b if

\displaystyle \mathbb P(U = u) = \frac{1}{b-a},\quad u \in (a, b] \cap \mathbb N.

Having properly defined a discrete random variable, a natural question arises: are there approaches to create new random variables from old ones? For instance, given two random variables X and Y, what is the distribution of their sum X + Y? In the special case of Example 2, it is obvious that by construction, X + Y = X+ ( 1-X) = 1. But what if X,Y don’t have any obvious connection?

Next time, we explore the idea of independence of random variables and think about ways to combine them.

—Joel Kindiak, 27 Jun 25, 1300H

,

Published by


Leave a comment