Adding Random Variables

Given two random variables $X, Y$ with known distributions (for example, $X \sim \mathrm{Ber}(p)$ and $Y \sim \mathrm{Ber}(q)$ ), what is the distribution of their sum $X + Y$ ? This question is highly nontrivial, and is worth some elementary experimentation.

Let’s work with the Bernoulli example and see what we can get. The first case is, in a sense, the “easiest” case to work with: $X = 1$ and $Y = 1$ . Using ordered-pair notation, $(X, Y) = (1, 1)$ . In this case, $X + Y = 2$ . Furthermore, if $X = 0$ , then $X + Y \leq 1 < 2$ , so that there is, in a sense, only one possible case to handle. Therefore, at least informally,

$\mathbb P(X + Y = 2) = \mathbb P((X, Y) = (1, 1)).$

However, the probability on the right side raises a new question: what do we mean by $\mathbb P((X, Y) = ( 1, 1))$ ? If we insist (rather intuitively) that

$\mathbb P((X, Y) = ( 1, 1)) = \mathbb P(X = 1) \cdot \mathbb P(Y = 1),$

we are implicitly assuming that $X,Y$ are independent in some sense (but we need to formalise this notion later on), which may not always be the case! Our first task therefore isn’t even to compute $X + Y$ , but to examine what we mean by $\mathbb P((X, Y) = \cdot )$ , i.e. the joint distribution of $X$ and $Y$ .

Lemma 1. Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space and $X_1, \dots, X_n : \Omega \to \mathbb Z$ be discrete random variables. The product map $\mathbf X_n := (X_1, \dots, X_n) : \Omega \to \mathbb Z^n$ defined by

$\mathbf X_n(\omega) := (X_1(\omega), \dots, X_n(\omega)),\quad \omega \in \Omega$

is measurable, when $\mathbb Z$ is equipped with the usual $\sigma$ -algebra $\mathcal P(\mathbb Z^n)$ , and induces a push-forward measure $\mathbb P_{\mathbf X_n} : \mathcal P(\mathbb Z^n) \to [0, 1]$ defined by

$\begin{aligned} \mathbb P_{\mathbf X_n}(K) &:= \mathbb P(\mathbf X_n \in K) = \mathbb P(\mathbf X_n^{-1}(K)),\quad K \in \mathcal P(\mathbb Z^n).\end{aligned}$

While $\mathbf X_n$ shares many properties of discrete random variables, its range is not a subset of $\mathbb Z$ , and to reduce confusion we restrict the term “discrete random variables” to the original definition.

But what would be the joint distribution of $(X,Y)$ ? That is, what is the value of

$\begin{aligned} \mathbb P_{X,Y}(\{ (x, y) \}) &= \mathbb P(\{ X = x \} \cap \{Y = y\}) \\ &\equiv \mathbb P( X = x , Y = y ) \end{aligned}$

for any $x, y \in \mathbb Z$ ? Perhaps $\mathbb P_{X,Y}$ could be defined arbitrarily, as long as we have each value in $[0, 1]$ and

$\displaystyle \sum_{ (x, y) \in \mathbb Z^2 } \mathbb P( X = x , Y = y ) = 1.$

However, there are two more sneaky conditions.

Lemma 2. For any $y \in \mathbb Z$ with $\mathbb P(Y = y) > 0$ ,

$\displaystyle \sum_{x \in \mathbb Z} \mathbb P( X =x , Y = y ) = \mathbb P(Y = y).$

Similarly, for any $x \in \mathbb Z$ with $\mathbb P(X = x) > 0$ ,

$\displaystyle \sum_{y \in \mathbb Z} \mathbb P( X = x , Y = y ) = \mathbb P(X = x).$

Proof. Recall that for any $y \in \mathbb Z$ such that $\mathbb P(Y = y) \equiv \mathbb P_Y(\{y\}) > 0$ , the conditional probability

$\displaystyle \mathbb P( \cdot \mid Y = y ) : \mathcal F \to [0, 1]$

is a probability measure, which induces a probability measure

$\mathbb P( X \in \cdot \mid Y = y ) : \mathcal P(\mathbb Z) \to [0, 1].$

Thus, we must require that

$\displaystyle \sum_{x \in \mathbb Z} \mathbb P( X \in x \mid Y = y ) = 1.$

Hence,

$\begin{aligned} \sum_{x \in \mathbb Z} \mathbb P( X =x , Y = y ) &= \sum_{x \in \mathbb Z} \mathbb P( X =x \mid Y = y ) \cdot \mathbb P(Y = y) \\ &= \mathbb P(Y = y) \cdot \sum_{x \in \mathbb Z} \mathbb P( X =x \mid Y = y ) \\ & = \mathbb P(Y = y) \cdot 1 \\ &= \mathbb P(Y = y). \end{aligned}$

Thankfully, convergence issues resolve themselves here.

Still, do we have an algorithmic way to compute the joint distribution? If the events $X^{-1}(\{ x \}) \equiv X^{-1}(x), Y^{-1}(y) \in \mathcal F$ are independent for any $x, y \in \mathbb N$ , then

$\begin{aligned} \mathbb P(X = x, Y = y) &= \mathbb P(\{X = x \} \cap \{ Y = y\}) \\ &= \mathbb P(X^{-1}(x) \cap Y^{-1}(y)) \\ &= \mathbb P(X^{-1}(x)) \cdot \mathbb P(Y^{-1}(y)) \\ &= \mathbb P(X = x) \cdot \mathbb P (Y = y).\end{aligned}$

Hence, we define the independence of the random variables in this manner.

Definition 1. The discrete random variables $X_1,\dots , X_n : \Omega \to \mathbb N$ are independent if for any $x_1,\dots,x_n \in \mathbb N$ , the events $X_1^{-1}(x_1),\dots, X_n^{-1}(x_n)$ are independent. In this case,

$\mathbb P((X_1,\dots,X_n) = (x_1,\dots,x_n)) = \mathbb P(X_1 = x_1) \cdot \cdots \cdot \mathbb P(X_n = x_n).$

Using p.m.f. notation,

$f_{X_1,\dots,X_n}(x_1,\dots, x_n) = f_{X_1}(x_1) \cdot \cdots \cdot f_{X_n}(y),\quad (x_1,\dots,x_n) \in \mathbb Z^n.$

Example 1. Suppose $X \sim \mathrm{Ber}(p)$ and $Y \sim \mathrm{Ber}(q)$ are independent random variables. Evaluate the distribution of $X + Y : \Omega \to \mathbb Z$ .

Solution. We note that the joint distribution $f_{X,Y}$ is given by

$\begin{aligned} f_{X,Y}(0, 0) &= (1-p) \cdot (1-q),\\ f_{X,Y}(0, 1) &= (1-p) \cdot q,\\ f_{X,Y}(1, 0) &= p \cdot (1-q), \\ f_{X,Y}(1, 1) &= p \cdot q.\end{aligned}$

We observe that $X + Y \in \{0, 1, 2\}$ . Hence,

$\begin{aligned} f_{X+Y}(0) &= f_{X,Y}(0, 0) = (1-p) \cdot (1-q), \\ f_{X+Y}(1) &= f_{X,Y}(0, 1) + f_{X,Y}(1, 0) \\ &= (1-p) \cdot q + p \cdot (1-q), \\ f_{X+Y}(2) &= f_{X,Y}(1, 1) = p \cdot q. \end{aligned}$

Notice that $f_{X+Y}(1)$ is evaluated by summing the cases of $(0, 1)$ and $(1, 0)$ . More generally, we have

$\begin{aligned} f_{X+Y}(k) &= \sum_{x + y = k} f_{X,Y}(x, y) \\ &= \sum_{x + y = k} f_{X}(x) \cdot f_Y(y) \\ &= \sum_{x \in \mathbb Z} f_X(x) \cdot f_Y(k-x) \\ &=: (f_X * f_Y)(k), \end{aligned}$

where the quantity on the right is called the discrete convolution of $f_X$ and $f_Y$ . Time to generalise!

Theorem 1. Let $X, Y : \Omega \to \mathbb Z$ be discrete random variables. Given any function $g : \mathbb Z^2 \to \mathbb Z$ , the random variable $g(X,Y) : \Omega \to \mathbb Z$ exists and is given by the distribution

$\displaystyle \mathbb P_{g(X,Y)}(k) = \sum_{g(x,y) = k} f_{X,Y}(x,y) = \sum_{(x,y) \in g^{-1}(k)} f_{X,Y}(x,y).$

If $X, Y$ are independent, then the distribution is given by

$\displaystyle \mathbb P_{g(X,Y)}(k) = \sum_{(x,y) \in g^{-1}(k)} f_{X}(x) \cdot f_Y(y).$

In particular,

$\displaystyle \mathbb P_{X+Y}(k) = \sum_{x\in \mathbb Z} f_{X}(x) \cdot f_Y(k-x).$

Example 2. Fix $p \in [0, 1]$ . Given that $X_i \sim \mathrm{Ber}(p)$ are independent, $X_1 + X_2$ has a distribution given by

$\displaystyle \mathbb P_{X_1+X_2}(k) = \begin{cases} (1-p)^2, & k = 0, \\ 2p(1-p), & k = 1, \\ p^2, & k = 2. \end{cases}$

Furthermore, for any $n \in \mathbb N^+$ , the distribution of $S_n := X_1 + \cdots + X_n$ is given by

$\displaystyle \mathbb P_{S_n}(k) = {n \choose k} p^k(1-p)^{n-k}.$

Proof. The case $n = 1$ is trivial. Example 1 establishes the case $n = 2$ . For the general case, we prove by induction. Suppose for any $m$ ,

$\displaystyle \mathbb P_{S_m}(k) = {m \choose k} p^k(1-p)^{m-k},\quad 0 \leq k \leq m.$

Suppose $X_{m+1} \sim \mathrm{Ber}(p)$ . Then $S_{m+1} = S_m + X_{m+1}$ . By the discrete convolution and Pascal’s identity,

$\begin{aligned} \mathbb P_{S_{m+1}}(k) &= \mathbb P_{S_m + X_{m+1}}(k) \\ &= \sum_{x\in \mathbb Z} \mathbb P_{S_m}(x) \cdot \mathbb P_{X_{m+1}}(k-x) \\ &= \mathbb P_{S_m}(k) \cdot \mathbb P_{X_{m+1}}(0) + \mathbb P_{S_m}(k-1) \cdot \mathbb P_{X_{m+1}}(1) \\ &= {m \choose k} p^k(1-p)^{m-k} \cdot (1-p) + {m \choose k-1} p^{k-1}(1-p)^{m-(k-1)} \cdot p \\ &=\left( {m \choose k} + {m \choose k-1} \right) \cdot p^k(1-p)^{(m+1)-k} \\ &= {m+1 \choose k} p^k(1-p)^{(m+1)-k}.\end{aligned}$

Example 2 is basically the definition of the Binomial distribution.

Definition 2. Fix $n \in \mathbb N$ and $p \in [0, 1]$ . We say that the random variable $X$ follows a Binomial distribution with parameters $n, p$ , denoted $X \sim \mathrm{Bin}(n, p)$ , if there exists independent Bernoulli random variables $\xi_n \sim \mathrm{Ber}(p)$ such that

$\displaystyle X = \xi_1 + \cdots + \xi_n.$

Trivially, $\mathrm{Bin}(1, p) = \mathrm{Ber}(p)$ .

Corollary 1. If $X \sim \mathrm{Bin}(m, p)$ and $Y \sim \mathrm{Bin}(n, p)$ are independent, then $X + Y \sim \mathrm{Bin}(m+n, p)$ .

Proof. Find independent random variables $\xi_i$ such that

$X = \xi_1 + \cdots + \xi_m,\quad Y = \xi_{m+1} + \dots + \xi_{m+n}.$

Then

$X + Y = \xi_1 + \cdots + \xi_m + \xi_{m+1} + \dots + \xi_{m+n} \sim \mathrm{Bin}(m+n, p).$

With some distributions at hand, we ask a reasonable question: how do we calculate the averages of these distributions? We will answer this question using expectations next time.

—Joel Kindiak, 29 Jun 25, 2229H

KindiakMath

Adding Random Variables

Response

Leave a comment Cancel reply

Adding Random Variables

Share this:

Response

Leave a comment Cancel reply