Adding Random Variables

Given two random variables X, Y with known distributions (for example, X \sim \mathrm{Ber}(p) and Y \sim \mathrm{Ber}(q)), what is the distribution of their sum X + Y? This question is highly nontrivial, and is worth some elementary experimentation.

Let’s work with the Bernoulli example and see what we can get. The first case is, in a sense, the “easiest” case to work with: X = 1 and Y = 1. Using ordered-pair notation, (X, Y) = (1, 1). In this case, X + Y = 2. Furthermore, if X = 0, then X + Y \leq 1 < 2, so that there is, in a sense, only one possible case to handle. Therefore, at least informally,

\mathbb P(X + Y = 2) = \mathbb P((X, Y) = (1, 1)).

However, the probability on the right side raises a new question: what do we mean by \mathbb P((X, Y) = ( 1, 1))? If we insist (rather intuitively) that

\mathbb P((X, Y) = ( 1, 1)) = \mathbb P(X = 1) \cdot \mathbb P(Y = 1),

we are implicitly assuming that X,Y are independent in some sense (but we need to formalise this notion later on), which may not always be the case! Our first task therefore isn’t even to compute X + Y, but to examine what we mean by \mathbb P((X, Y) = \cdot ), i.e. the joint distribution of X and Y.

Lemma 1. Let (\Omega, \mathcal F, \mathbb P) be a probability space and X_1, \dots, X_n : \Omega \to \mathbb Z be discrete random variables. The product map \mathbf X_n := (X_1, \dots, X_n) : \Omega \to \mathbb Z^n defined by

\mathbf X_n(\omega) := (X_1(\omega), \dots, X_n(\omega)),\quad \omega \in \Omega

is measurable, when \mathbb Z is equipped with the usual \sigma-algebra \mathcal P(\mathbb Z^n), and induces a push-forward measure \mathbb P_{\mathbf X_n} : \mathcal P(\mathbb Z^n) \to [0, 1] defined by

\begin{aligned} \mathbb P_{\mathbf X_n}(K) &:= \mathbb P(\mathbf X_n \in K) = \mathbb P(\mathbf X_n^{-1}(K)),\quad K \in \mathcal P(\mathbb Z^n).\end{aligned}

While \mathbf X_n shares many properties of discrete random variables, its range is not a subset of \mathbb Z, and to reduce confusion we restrict the term “discrete random variables” to the original definition.

But what would be the joint distribution of (X,Y)? That is, what is the value of

\begin{aligned} \mathbb P_{X,Y}(\{ (x, y) \}) &= \mathbb P(\{ X = x \} \cap \{Y = y\}) \\ &\equiv \mathbb P( X = x , Y = y ) \end{aligned}

for any x, y \in \mathbb Z? Perhaps \mathbb P_{X,Y} could be defined arbitrarily, as long as we have each value in [0, 1] and

\displaystyle \sum_{ (x, y) \in \mathbb Z^2 } \mathbb P( X = x , Y = y ) = 1.

However, there are two more sneaky conditions.

Lemma 2. For any y \in \mathbb Z with \mathbb P(Y = y) > 0,

\displaystyle \sum_{x \in \mathbb Z} \mathbb P( X =x ,  Y = y ) = \mathbb P(Y = y).

Similarly, for any x \in \mathbb Z with \mathbb P(X = x) > 0,

\displaystyle \sum_{y \in \mathbb Z} \mathbb P( X = x ,  Y = y ) = \mathbb P(X = x).

Proof. Recall that for any y \in \mathbb Z such that \mathbb P(Y = y) \equiv \mathbb P_Y(\{y\}) > 0, the conditional probability

\displaystyle \mathbb P( \cdot  \mid  Y = y )  : \mathcal F \to [0, 1]

is a probability measure, which induces a probability measure

\mathbb P( X \in \cdot  \mid  Y = y ) : \mathcal P(\mathbb Z) \to [0, 1].

Thus, we must require that

\displaystyle \sum_{x \in \mathbb Z} \mathbb P( X \in x \mid  Y = y ) = 1.

Hence,

\begin{aligned} \sum_{x \in \mathbb Z} \mathbb P( X =x ,  Y = y ) &= \sum_{x \in \mathbb Z} \mathbb P( X =x \mid  Y = y ) \cdot \mathbb P(Y = y) \\ &= \mathbb P(Y = y) \cdot \sum_{x \in \mathbb Z} \mathbb P( X =x \mid  Y = y ) \\ & = \mathbb P(Y = y) \cdot 1 \\ &= \mathbb P(Y = y). \end{aligned}

Thankfully, convergence issues resolve themselves here.

Still, do we have an algorithmic way to compute the joint distribution? If the events X^{-1}(\{ x \}) \equiv X^{-1}(x), Y^{-1}(y) \in \mathcal F are independent for any x, y \in \mathbb N, then

\begin{aligned} \mathbb P(X = x, Y = y) &= \mathbb P(\{X = x \} \cap \{ Y = y\}) \\ &= \mathbb P(X^{-1}(x) \cap Y^{-1}(y)) \\ &= \mathbb P(X^{-1}(x)) \cdot \mathbb P(Y^{-1}(y)) \\ &= \mathbb P(X = x) \cdot \mathbb P (Y = y).\end{aligned}

Hence, we define the independence of the random variables in this manner.

Definition 1. The discrete random variables X_1,\dots , X_n : \Omega \to \mathbb N are independent if for any x_1,\dots,x_n \in \mathbb N, the events X_1^{-1}(x_1),\dots, X_n^{-1}(x_n) are independent. In this case,

\mathbb P((X_1,\dots,X_n) = (x_1,\dots,x_n)) = \mathbb P(X_1 = x_1) \cdot \cdots \cdot \mathbb P(X_n = x_n).

Using p.m.f. notation,

f_{X_1,\dots,X_n}(x_1,\dots, x_n) = f_{X_1}(x_1) \cdot \cdots \cdot f_{X_n}(y),\quad (x_1,\dots,x_n) \in \mathbb Z^n.

Example 1. Suppose X \sim \mathrm{Ber}(p) and Y \sim \mathrm{Ber}(q) are independent random variables. Evaluate the distribution of X + Y : \Omega \to \mathbb Z.

Solution. We note that the joint distribution f_{X,Y} is given by

\begin{aligned} f_{X,Y}(0, 0) &= (1-p) \cdot (1-q),\\ f_{X,Y}(0, 1) &= (1-p) \cdot q,\\ f_{X,Y}(1, 0) &= p \cdot (1-q), \\ f_{X,Y}(1, 1) &= p \cdot q.\end{aligned}

We observe that X + Y \in \{0, 1, 2\}. Hence,

\begin{aligned} f_{X+Y}(0) &= f_{X,Y}(0, 0) = (1-p) \cdot (1-q), \\ f_{X+Y}(1) &= f_{X,Y}(0, 1) + f_{X,Y}(1, 0) \\ &=  (1-p) \cdot q + p \cdot (1-q), \\ f_{X+Y}(2) &= f_{X,Y}(1, 1)  =  p \cdot q. \end{aligned}

Notice that f_{X+Y}(1) is evaluated by summing the cases of (0, 1) and (1, 0). More generally, we have

\begin{aligned} f_{X+Y}(k) &= \sum_{x + y = k} f_{X,Y}(x, y) \\ &= \sum_{x + y = k} f_{X}(x) \cdot f_Y(y) \\ &= \sum_{x \in \mathbb Z} f_X(x) \cdot f_Y(k-x) \\ &=: (f_X * f_Y)(k), \end{aligned}

where the quantity on the right is called the discrete convolution of f_X and f_Y. Time to generalise!

Theorem 1. Let X, Y : \Omega \to \mathbb Z be discrete random variables. Given any function g : \mathbb Z^2 \to \mathbb Z, the random variable g(X,Y) : \Omega \to \mathbb Z exists and is given by the distribution

\displaystyle \mathbb P_{g(X,Y)}(k) = \sum_{g(x,y) = k} f_{X,Y}(x,y) = \sum_{(x,y) \in g^{-1}(k)} f_{X,Y}(x,y).

If X, Y are independent, then the distribution is given by

\displaystyle \mathbb P_{g(X,Y)}(k) = \sum_{(x,y) \in g^{-1}(k)} f_{X}(x) \cdot f_Y(y).

In particular,

\displaystyle \mathbb P_{X+Y}(k) = \sum_{x\in \mathbb Z} f_{X}(x) \cdot f_Y(k-x).

Example 2. Fix p \in [0, 1]. Given that X_i \sim \mathrm{Ber}(p) are independent, X_1 + X_2 has a distribution given by

\displaystyle \mathbb P_{X_1+X_2}(k) = \begin{cases} (1-p)^2, & k = 0, \\ 2p(1-p), & k = 1, \\ p^2, & k = 2. \end{cases}

Furthermore, for any n \in \mathbb N^+, the distribution of S_n := X_1 + \cdots + X_n is given by

\displaystyle \mathbb P_{S_n}(k) = {n \choose k} p^k(1-p)^{n-k}.

Proof. The case n = 1 is trivial. Example 1 establishes the case n = 2. For the general case, we prove by induction. Suppose for any m,

\displaystyle \mathbb P_{S_m}(k) = {m \choose k} p^k(1-p)^{m-k},\quad 0 \leq k \leq m.

Suppose X_{m+1} \sim \mathrm{Ber}(p). Then S_{m+1} = S_m + X_{m+1}. By the discrete convolution and Pascal’s identity,

\begin{aligned} \mathbb P_{S_{m+1}}(k) &= \mathbb P_{S_m + X_{m+1}}(k) \\ &= \sum_{x\in \mathbb Z} \mathbb P_{S_m}(x) \cdot \mathbb P_{X_{m+1}}(k-x) \\ &= \mathbb P_{S_m}(k) \cdot \mathbb P_{X_{m+1}}(0) + \mathbb P_{S_m}(k-1) \cdot \mathbb P_{X_{m+1}}(1) \\ &= {m \choose k} p^k(1-p)^{m-k} \cdot (1-p) + {m \choose k-1} p^{k-1}(1-p)^{m-(k-1)} \cdot p \\ &=\left( {m \choose k} + {m \choose k-1} \right) \cdot p^k(1-p)^{(m+1)-k} \\ &= {m+1 \choose k} p^k(1-p)^{(m+1)-k}.\end{aligned}

Example 2 is basically the definition of the Binomial distribution.

Definition 2. Fix n \in \mathbb N and p \in [0, 1]. We say that the random variable X follows a Binomial distribution with parameters n, p, denoted X \sim \mathrm{Bin}(n, p), if there exists independent Bernoulli random variables \xi_n \sim \mathrm{Ber}(p) such that

\displaystyle X = \xi_1 + \cdots + \xi_n.

Trivially, \mathrm{Bin}(1, p) = \mathrm{Ber}(p).

Corollary 1. If X \sim \mathrm{Bin}(m, p) and Y \sim \mathrm{Bin}(n, p) are independent, then X + Y \sim \mathrm{Bin}(m+n, p).

Proof. Find independent random variables \xi_i such that

X = \xi_1 + \cdots + \xi_m,\quad Y = \xi_{m+1} + \dots + \xi_{m+n}.

Then

X + Y = \xi_1 + \cdots + \xi_m + \xi_{m+1} + \dots + \xi_{m+n} \sim \mathrm{Bin}(m+n, p).

With some distributions at hand, we ask a reasonable question: how do we calculate the averages of these distributions? We will answer this question using expectations next time.

—Joel Kindiak, 29 Jun 25, 2229H

,

Published by


Response

  1. Neither Random Nor Variable – KindiakMath

    […] But what if don’t have any obvious connection? Next time, we explore the idea of independence of random variables and think about ways to combine them. […]

    Like

Leave a comment