Given two random variables with known distributions (for example,
and
), what is the distribution of their sum
? This question is highly nontrivial, and is worth some elementary experimentation.
Let’s work with the Bernoulli example and see what we can get. The first case is, in a sense, the “easiest” case to work with: and
. Using ordered-pair notation,
. In this case,
. Furthermore, if
, then
, so that there is, in a sense, only one possible case to handle. Therefore, at least informally,
However, the probability on the right side raises a new question: what do we mean by ? If we insist (rather intuitively) that
we are implicitly assuming that are independent in some sense (but we need to formalise this notion later on), which may not always be the case! Our first task therefore isn’t even to compute
, but to examine what we mean by
, i.e. the joint distribution of
and
.
Lemma 1. Let be a probability space and
be discrete random variables. The product map
defined by
is measurable, when is equipped with the usual
-algebra
, and induces a push-forward measure
defined by
While shares many properties of discrete random variables, its range is not a subset of
, and to reduce confusion we restrict the term “discrete random variables” to the original definition.
But what would be the joint distribution of ? That is, what is the value of
for any ? Perhaps
could be defined arbitrarily, as long as we have each value in
and
However, there are two more sneaky conditions.
Lemma 2. For any with
,
Similarly, for any with
,
Proof. Recall that for any such that
, the conditional probability
is a probability measure, which induces a probability measure
Thus, we must require that
Hence,
Thankfully, convergence issues resolve themselves here.
Still, do we have an algorithmic way to compute the joint distribution? If the events are independent for any
, then
Hence, we define the independence of the random variables in this manner.
Definition 1. The discrete random variables are independent if for any
, the events
are independent. In this case,
Using p.m.f. notation,
Example 1. Suppose and
are independent random variables. Evaluate the distribution of
.
Solution. We note that the joint distribution is given by
We observe that . Hence,
Notice that is evaluated by summing the cases of
and
. More generally, we have
where the quantity on the right is called the discrete convolution of and
. Time to generalise!
Theorem 1. Let be discrete random variables. Given any function
, the random variable
exists and is given by the distribution
If are independent, then the distribution is given by
In particular,
Example 2. Fix . Given that
are independent,
has a distribution given by
Furthermore, for any , the distribution of
is given by
Proof. The case is trivial. Example 1 establishes the case
. For the general case, we prove by induction. Suppose for any
,
Suppose . Then
. By the discrete convolution and Pascal’s identity,
Example 2 is basically the definition of the Binomial distribution.
Definition 2. Fix and
. We say that the random variable
follows a Binomial distribution with parameters
, denoted
, if there exists independent Bernoulli random variables
such that
Trivially, .
Corollary 1. If and
are independent, then
.
Proof. Find independent random variables such that
Then
With some distributions at hand, we ask a reasonable question: how do we calculate the averages of these distributions? We will answer this question using expectations next time.
—Joel Kindiak, 29 Jun 25, 2229H
Leave a comment