The Mathematical Average

Given a discrete random variable X taking values in \mathbb Z, what is its expectation, if it exists?

Let’s suppose X takes on finitely many values, i.e. the set

\mathrm{supp}(X) := \{x \in \mathbb Z : \mathbb P(X = x) > 0\}

is finite. Denote \mathrm{supp}(X) = \{x_1,\dots,x_n\}. The expectation of X is the value that we expect X to take. In a sense, we want to obtain the “center value” \mu that the random variable X will take.

We can think of the center value \mu as a “pivot” on a “balance beam”. Each x_i < \mu will induce a “weight” of \mathbb P(X = x_i) that tilts the beam anticlockwise, and similarly, each x_i > \mu will induce a “weight” of \mathbb P(X = x_i) that tilts the beam clockwise. Intuitively, the total contributions ought to cancel out, yielding the equality

\displaystyle \sum_{x \in \mathrm{supp}(X)} (x - \mu) \cdot \mathbb P(X = x) = 0.

Expanding the left-hand side,

\begin{aligned} \sum_{x \in \mathrm{supp}(X)} (x - \mu) \cdot \mathbb P(X = x) &= \sum_{x \in \mathrm{supp}(X)} x \cdot \mathbb P(X = x) - \mu \cdot \sum_{x \in \mathrm{supp}(X)} \mathbb P(X = x) \\ &= \sum_{x \in \mathrm{supp}(X)} x \cdot \mathbb P(X = x) - \mu \cdot 1 \\ &= \sum_{x \in \mathrm{supp}(X)} x \cdot \mathbb P(X = x) - \mu. \end{aligned}

Therefore,

\displaystyle \sum_{x \in \mathrm{supp}(X)} x \cdot \mathbb P(X = x) = \mu.

Using measure notation \mathbb P_X \equiv \mathbb P(X \in \cdot) and \mathbb P_X(x) \equiv \mathbb P_X(\{x\}) for brevity,

\displaystyle  \sum_{x \in \mathbb Z} x \cdot \mathbb P_X(x) = \mu,

where the left-hand side is well-defined if \mathrm{supp}(X) is finite. This quantity we will formally define as the expectation, if the sum exists (even if the sum is infinite).

Let X be a \mathbb Z-valued random variable.

Definition 1. The expectation of X, denoted \mathbb E[X], is defined by

\displaystyle \mathbb E[X] := \sum_{x \in \mathbb Z} x \cdot \mathbb P_X(x)

whenever the right-hand side exists.

Example 1. For p \in [0, 1], if X \sim \mathrm{Ber}(p), then

\begin{aligned} \mathbb E[X] &= 0 \cdot \mathbb P_X(0) + 1 \cdot \mathbb P_X(1) \\ &= 0 \cdot (1-p) + 1 \cdot p = p. \end{aligned}

Lemma 1. Let (\Omega, \mathcal F, \mathbb P) be a probability space and X : \Omega \to \mathbb Z be a random variable. Then

\displaystyle \mathbb P_X(x) = \sum_{\omega \in \Omega} \mathbb I\{X(\omega) = x\} \cdot \mathbb P(\{\omega\}).

Whenever both sides are well-defined,

\displaystyle \mathbb E[X] = \sum_{\omega \in \Omega} X(\omega) \cdot \mathbb P(\{ \omega \}).

Proof. We first observe that

\displaystyle \mathbb P_X(x) = \mathbb P(X(\omega) = x) = \sum_{\omega : X(\omega)=x} \mathbb P(\{ \omega \}) = \sum_{\omega \in \Omega} \mathbb I\{X(\omega)=x\} \cdot \mathbb P(\{ \omega \}).

Hence,

\begin{aligned} \sum_{x \in \mathbb Z} x \cdot \mathbb P_X(x) &= \sum_{x \in \mathbb Z} x \cdot \sum_{\omega \in \Omega} \mathbb I\{X(\omega)=x\} \cdot \mathbb P(\{ \omega \}) \\ &=\sum_{x \in \mathbb Z}  \sum_{\omega \in \Omega} x \cdot \mathbb I\{X(\omega)=x\} \cdot \mathbb P(\{ \omega \}) \\ &= \sum_{\omega \in \Omega} \sum_{x \in \mathbb Z}  x \cdot \mathbb I\{X(\omega)=x\} \cdot \mathbb P(\{ \omega \}) \\ &= \sum_{\omega \in \Omega} X(\omega) \cdot \mathbb P(\{ \omega \}) . \end{aligned}

Lemma 2. For any map g : \mathbb Z \to \mathbb Z, g(X) := g \circ X is a \mathbb Z-valued random variable.

Theorem 1. If \mathbb E[g(X)] exists, then

\displaystyle \mathbb E[g(X)] = \sum_{x \in \mathbb Z} g(x) \cdot \mathbb P_X(x),

whenever both sides are well-defined.

Proof. Defining Y := g(X) by Lemma 2, the proof and result in Lemma 1 yields

\begin{aligned} \mathbb E[g(X)] = \mathbb E[Y] &= \sum_{\omega \in  \Omega} Y(\omega) \cdot \mathbb P(\{\omega\}) \\ &= \sum_{\omega \in  \Omega} g(X(\omega)) \cdot \mathbb P(\{\omega\}) \\ &= \sum_{\omega \in  \Omega} \sum_{x \in \mathbb Z} g(X(\omega)) \cdot \mathbb I\{X(\omega) = x\} \cdot \mathbb P(\{\omega\}) \\ &= \sum_{x \in \mathbb Z} \sum_{\omega \in  \Omega} g(X(\omega)) \cdot \mathbb I\{X(\omega) = x\} \cdot \mathbb P(\{\omega\}) \\ &= \sum_{x \in \mathbb Z} g(x) \cdot \sum_{\omega \in  \Omega} \mathbb I\{X(\omega) = x\} \cdot \mathbb P(\{\omega\}) \\ &= \sum_{x \in \mathbb Z} g(x) \cdot \mathbb P_X(x). \end{aligned}

Corollary 1. Let Y be a \mathbb Z-valued random variable. For any map g : \mathbb Z^2 \to \mathbb Z, g(X,Y) := g \circ (X,Y) is a \mathbb Z-valued random variable. Furthermore,

\displaystyle \mathbb E[g(X,Y)] = \sum_{(x, y) \in \mathbb Z^2} g(x,y) \cdot \mathbb P_{X,Y}(x,y).

Furthermore, if \mathbb E[X], \mathbb E[Y] exist, then the following hold:

  • \mathbb E[X + Y] = \mathbb E[X] + \mathbb E[Y],
  • \mathbb E[\alpha X] = \alpha \cdot \mathbb E[X] for any \alpha \in \mathbb Z,
  • \mathbb E[X - \mathbb E[X]] = 0.

Proof. We prove the first identity for simplicity. Define g(x,y) = x+y. Then

\begin{aligned} \mathbb E[g(X,Y)] &= \sum_{(x, y) \in \mathbb Z^2} g(x,y) \cdot \mathbb P_{X,Y}(x,y) \\ \mathbb E[X+Y]&= \sum_{(x, y) \in \mathbb Z^2} (x+y) \cdot \mathbb P_{X,Y}(x,y) \\ &= \sum_{(x, y) \in \mathbb Z^2} x \cdot \mathbb P_{X,Y}(x,y) + \sum_{(x, y) \in \mathbb Z^2} y \cdot \mathbb P_{X,Y}(x,y) \\ &= \sum_{x \in \mathbb Z} x \cdot \mathbb P_{X}(x) + \sum_{y \in \mathbb Z} y \cdot \mathbb P_{Y}(y) \\ &= \mathbb E[X] + \mathbb E[Y], \end{aligned}

where the simplifications arise from

\begin{aligned} \sum_{(x, y) \in \mathbb Z^2} x \cdot \mathbb P_{X,Y}(x,y) &= \sum_{x \in \mathbb Z} \sum_{y \in \mathbb Z} x \cdot \mathbb P_{X,Y}(x,y) \\ &= \sum_{x \in \mathbb Z} \sum_{y \in \mathbb Z} x \cdot \mathbb P(Y = y \mid X = x) \cdot \mathbb P_X(x) \\ &= \sum_{x \in \mathbb Z} x \cdot \mathbb P_X(x) \cdot \sum_{y \in \mathbb Z} \mathbb P(Y = y \mid X = x) \\ &= \sum_{x \in \mathbb Z} x \cdot \mathbb P_X(x) \cdot 1 \\ &= \sum_{x \in \mathbb Z} x \cdot \mathbb P_X(x).\end{aligned}

Thankfully, all series here are valid due to linearity.

Example 2. For n \in \mathbb N and p \in [0, 1], if X \sim \mathrm{Bin}(n, p), then \mathbb E[X] = np.

Proof. Find independent identically distributed (i.i.d.) Bernoulli random variables \xi_1, \dots, \xi_n \sim \mathrm{Ber}(p) such that

\displaystyle X = \sum_{i=1}^n \xi_i.

By Corollary 1,

\displaystyle \mathbb E[X] = \mathbb E\left[ \sum_{i=1}^n \xi_i \right] = \sum_{i=1}^n \mathbb E[\xi_i] = \sum_{i=1}^n p = np.

Example 3. Equip the finite sample (x_1,\dots,x_n) \subseteq \mathbb Z with the uniform probability measure and induced random variable X := \mathrm{id}. Then

\begin{aligned} \mathbb E[X] &= \sum_{x \in \mathbb Z} x \cdot \mathbb P_X(x) \\ &= \sum_{i=1}^n x_i \cdot \mathbb P_X(x_i) \\ &= \sum_{i=1}^n x_i \cdot \frac 1n = \frac 1n \cdot \sum_{i=1}^n x_i =: \bar x. \end{aligned}

Thus, the right-hand side is called the mean of the sample (x_1,\dots,x_n).

The expectation has a cousin—the covariance and its child the variance. We will discuss these ideas more in the next post.

—Joel Kindiak, 1 Jul 25, 1915H

,

Published by


Leave a comment