Common Continuous Probabilities

Having ventured far into the measure-theoretic world and taking great pains to verify that taking the length \lambda([a,b]) =  b-a of a compact interval [a,b] is mathematically legitimate, we turn our attention to modelling continuous data. If the uniform counting measure considered the “basic” distribution in discrete probability, then the uniform continuous distribution would naturally be considered the “basic” distribution in continuous probability.

Definition 1. Let \Omega be a sample space. The uniform continuous distribution on [a, b] is the random variable X : \Omega \to \mathbb R with a distribution defined as follows: for any measurable K \subseteq [a, b],

\displaystyle \begin{aligned} \mathbb P_X(K) &= \frac{1}{b-a} \cdot \int_K \mathbb I_{[a, b]} \, \mathrm d\lambda \\ &\equiv \frac{1}{b-a} \cdot \int_a^b \mathbb I_K(x) \, \mathrm dx. \end{aligned}

Here, the probability density function is given by \displaystyle f_X = \frac{1}{b-a} \cdot \mathbb I_{[a,b]}. Henceforth also we will disregard the sample space unless we need to undertake some theoretic construction.

As usual, \lambda here denotes the Lebesgue measure on \mathbb R, and in this case, we write X \sim \mathcal U(a, b).

The expected value and variance tends to be useful statistical quantities for our purposes. Nevertheless, there is a special expected value and a special variance that causes the rest of our calculations to become trivial.

Lemma 1. Suppose \mathbb E[X] = \mu and \mathrm{Var}(X) = \sigma^2. Define the standardised random variable \tilde X by

\displaystyle \tilde X := \frac{X - \mu}{\sigma}.

Then \mathbb E[\tilde X] = 0 and \mathrm{Var}(\tilde X) = 1. Conversely, for any X = a + b \cdot U with b \neq 0, if the p.d.f. f_U of U is continuous, then the p.d.f. f_X of X can be written in terms of the p.d.f. of U:

\displaystyle f_X(x) = \frac 1b \cdot f_U\left( \frac{x-a}{b} \right).

Furthermore, \mathbb E[X] = a + b \cdot \mathbb E[U] and \mathrm{Var}(X) = b^2 \cdot \mathrm{Var}(U).

Proof. For the p.d.f. result, by the fundamental theorem of calculus,

\begin{aligned} f_X(x) &= \frac{\mathrm d}{\mathrm dx} \int_{-\infty}^x f_X(t)\, \mathrm dt \\ &= \frac{\mathrm d}{\mathrm dx} (\mathbb P(X \leq x)) \\ &= \frac{\mathrm d}{\mathrm dx} (\mathbb P(a + b \cdot U \leq x)) \\ &= \frac{\mathrm d}{\mathrm dx} \left( \mathbb P\left( U \leq \frac{x - a}{b} \right) \right) \\ &= \frac{\mathrm d}{\mathrm dx} \int_{-\infty}^{(x-a)/b} f_U(t)\, \mathrm dt \\ &= \frac 1b \cdot f_U\left( \frac{x-a}{b} \right). \end{aligned}

The other results follow from expectation and variance properties.

Theorem 1. If X \sim \mathcal U(a, b), then \mathbb E[X] = (a+b)/2 and \mathrm{Var}(X) = (b-a)^2/12.

Proof. Let U \sim \mathcal U(0, 1) for simplicity, so that

\displaystyle f_X(x) = \frac{1}{b-a}\cdot \mathbb I_{[a, b]}(x) =  \mathbb I_{[0, 1]} \left( \frac{x-a}{b-a} \right) \cdot \frac{1}{b-a}

implies that

X = a + (b-a) \cdot U.

By definition,

\begin{aligned} \mathbb E[U] &= \int_{-\infty}^{\infty} x \cdot \mathbb I_{[0,1]}(x)\, \mathrm dx = \int_{0}^{1} x\, \mathrm dx =\frac 12.\end{aligned}

Similarly,

\begin{aligned} \mathbb E[U^2] &= \int_{0}^{1} x^2\, \mathrm dx = \frac 13,\end{aligned}

so that

\displaystyle \mathrm{Var}(U) = \mathbb E[U^2] - \mathbb E[U]^2 = \frac 13 - \left(\frac 14\right)^2 = \frac 1{12}.

Then the general case follows from

\begin{aligned} \mathbb E[X] &= a + (b-a) \cdot \frac 12 = \frac{a+b}{2}, \\ \mathrm{Var}(X) &= (b-a)^2 \cdot \frac 1{12} = \frac{(b-a)^2}{12}. \end{aligned}

We can generalise Lemma 1 to any differentiable, invertible transformation of U.

Lemma 2. For any X = g(U) where g is a invertible differentiable transformation, if the p.d.f. f_U of U is continuous, then the p.d.f. f_X of X can be written in terms of the p.d.f. of U:

\displaystyle f_X(x) = f_U (g^{-1}(x)) \cdot (g^{-1})'(x).

Proof. Apply the same arguments as in Lemma 1, but apply the chain rule.

The continuous analog of the geometric distribution would be the exponential distribution.

Theorem 2. The random variable X follows an exponential distribution with rate parameter \lambda > 0, denoted X \sim \mathrm{Exp}(\lambda), if is defined by the probability density function

f_X(x) = \lambda e^{-\lambda x} \cdot \mathbb I_{[0,\infty)}.

\mathbb E[X] = 1/\lambda and \mathrm{Var}(X) = 1/\lambda^2.

Proof. Let U \sim \mathrm{Exp}(1) for simplicity, so that

\displaystyle f_X(x) = f_U(\lambda x) \cdot \lambda \quad \Rightarrow \quad X = \frac 1\lambda \cdot U

gives the final result. Integrating by parts,

\begin{aligned}\mathbb E[U] &= \int_0^\infty xe^{-x}\, \mathrm dx \\ &= [-xe^{-x} - e^{-x}]_0^\infty \\ &= (0 - 0) - (0 - 1) = 1, \\ \mathbb E[U^2] &= \int_0^\infty x^2 e^{-x}\, \mathrm dx \\ &= \left[-x^2 e^{-x} \right]_0^\infty + 2\int_0^\infty x \cdot e^{-x}\, \mathrm dx  \\ &= (0 - 0) + 2 \cdot \mathbb E[U] = 2,\end{aligned}

so that \mathrm{Var}(U) = \mathbb E[U^2] - \mathbb E[U]^2 = 2- 1^2 = 1.

Another crucial distribution in probability and statistics is the normal distribution. It is a useful model to represent all kings of continuous data—heights of humans, test scores for exams, and even month-on-month returns on investment. Actually more is true—as long as we have a continuous nonzero function f that is integrable, we obtain a corresponding probability distribution.

Lemma 3. For any continuous and integrable nonzero function f \geq 0, there exists a continuous random variable X with probability density function

\displaystyle f_X = C_f \cdot f,

where C_f := (\int_{-\infty}^{\infty} f(x)\, \mathrm dx)^{-1} is called the normalising constant of f.

Theorem 3. The random variable X follows an normal distribution with mean \mu and variance \sigma^2, denoted X \sim \mathcal{N}(\mu, \sigma^2), if is defined by the probability density function

\displaystyle f_X(x) = \frac 1{\sigma\sqrt{2\pi}}\exp\left( - \frac{ (x - \mu)^2 }{2\sigma^2} \right).

\mathbb E[X] = \mu and \mathrm{Var}(X) = \sigma^2.

Proof. Denoting Z \sim \mathcal N(0, 1),

\displaystyle f_X(x) = \frac 1{\sigma} \cdot f_Z\left( \frac{x-\mu}{\sigma} \right) \quad \Rightarrow \quad X = \mu + \sigma \cdot Z,

so it suffices to prove that \mathbb E[Z] = 0 and \mathrm{Var}(Z) = 1. We remark that the normalising constant of e^{-z^2/2} is 1/\sqrt{2\pi} by the Gaussian integral

\displaystyle \int_{-\infty}^{\infty} e^{-z^2}\, \mathrm dz = \sqrt{\pi}

and a change of variables. The former is immediate since the function ze^{-z^2/2} is odd and its integral over \mathbb R converges absolutely, so that

\displaystyle \sqrt{2\pi} \cdot \mathbb E[Z] = \int_{-\infty}^{\infty} ze^{-z^2/2}\, \mathrm dz = 0.

The latter requires integration by parts (here, as with the exponential distribution calculation, all integrals converge absolutely so we do not need to worry about the limits):

\begin{aligned}\sqrt{2\pi} \cdot \mathbb E[Z^2] &= \int_{-\infty}^{\infty} z^2 e^{-z^2/2}\, \mathrm dz \\ &= \int_{-\infty}^{\infty} -z \cdot (-z e^{-z^2/2})\, \mathrm dz \\ &= \left[ -z \cdot e^{-z^2/2} \right]_{-\infty}^{\infty} - \int_{-\infty}^{\infty} (-1) \cdot e^{-z^2/2} \, \mathrm dz \\ &= (0 - 0) - (-1) \cdot \sqrt{2\pi} = \sqrt{2\pi}. \end{aligned}

Therefore,

\mathrm{Var}(Z) = \mathbb E[Z^2] - \mathbb E[Z]^2 = 1 - 0^2 = 1.

What makes the normal distribution so ubiquitous is how it arises from samples of virtually any distribution.

Theorem 4 (Central Limit Theorem). Let X_1,X_2,\dots be i.i.d. random variables with mean 0 and variance 1. Define

\displaystyle \bar X_n := \frac{1}{n} \sum_{i=1}^n X_i.

Then \sqrt{n} \cdot \bar X_n \to Z in the following sense: for any \epsilon > 0, there exists n > 0 such that for any z \in \mathbb R,

\displaystyle |\mathbb P(\sqrt{n} \cdot \bar X_n \leq z) - \mathbb P(Z \leq z)| < \epsilon.

We say that \sqrt{n} \cdot \bar X_n converges in distribution to the standard normal distribution Z \sim \mathcal N(0, 1).

Proof. Delayed.

Our main goal in these posts on probability is to prove this statement rigorously. The central limit theorem is responsible for our intuition about probability arising from repeated experiments.

A more immediate application comes from simulation. It turns out that the humble uniform random variable U \sim \mathcal U(0, 1) (implemented using pseudorandom numbers) can be used to generate all other random variables.

Theorem 5. Suppose U \sim \mathcal U(0, 1). For any random variable X \sim \mathbb Q, there exists a measurable function F such that F(U) \sim \mathbb Q.

Proof. Let F_X := \mathbb Q((-\infty, \cdot ]) denote the c.d.f. of X. We observe that F_X(x) \in [0, 1] by definition. Therefore, the idea is to generate U \in [0, 1], then define the variate V := F_X^{-1}(U), so that F_X(V) = U.

To that end, define the measurable map F by

\displaystyle F(u) := \sup \{x \in \mathbb R : F_X(x) \leq u\}.

Then define V := F(U). Observe that \{V \leq v\} = \{U \leq F_X(v)\}. Therefore

\begin{aligned} \mathbb P(V \leq v) &= \mathbb P(U \leq F_X(v)) = F_X(v) = \mathbb P(X \leq v). \end{aligned}

Therefore, F(U) = V \sim \mathbb Q.

But we have a slightly more urgent question to answer: what’s the distribution of X +Y in general? We need to construct the relevant sample space, and prove relevant properties in the multivariable setting, before we can legitimately continue on our quest to prove the central limit theorem. In fact, once we have at least done so for normal distributions, we will be in a sufficiently good place to discuss the principle of statistical hypothesis testing—a quantitative implementation of the scientific method.

—Joel Kindiak. 19 Jul 25, 2258H

Published by


Leave a comment