These problems arise from my actual experience, but numbers have been fudged to protect confidentiality.
Problem 1 (Population Mean). As I taught my classes, I noticed that students are exceedingly taller than I. My height is 160 cm, so I suspect that the average height of students is not 160 cm. By collecting the heights cm of 30 randomly chosen students, I obtained the following data:
Test at the 5% significance level to determine whether my suspicion is justified.
(Click for Solution)
Solution. Let denote the height of a randomly chosen student in cm, and .
We first set up the null and alternative hypotheses:
Denote the population variance by and . Assume holds, so that . Since , by the central limit theorem,
Since is unknown, we need to estimate it using :
Furthermore, we estimate using :
Hence, our calculated test statistic will be
Since , , so that using either a – or a -test would yield similar results. Denote and the significance level .
Using a -table, .
Using a -table, .
Whether we let or , it is true that . Therefore, there is sufficient evidence to reject and conclude that Joel’s suspicion is justified, i.e. the average height of students is larger than cm.
Problem 2 (Confidence Intervals). Keep the scenario as Problem 1 but denote the true population mean by . Use the -test for simplicity. Determine the interval of values that can take such that there is insufficient evidence to reject the null hypothesis at the 5% significance.
(Click for Solution)
Solution. By definition,
We do not reject if and only if . Therefore,
Therefore,
Remark 1. We call this calculated interval the -confidence interval for . Denoting a specific sample , let denote the corresponding computed unbiased estimators for respectively. Then the computed corresponding confidence interval will equal
Hence, different samples would yield different confidence intervals. Since is random, so is . Furthermore, defining , mimicking the computation above yields
Thus, we have the following interpretation of a -confidence interval: the probability that a randomly chosen confidence interval will contain the (deterministic though unknown) population mean is .
Problem 3 (Population Proportion). I went to a nearby café, and noticed that there were more women than men in the café. Out of 50 people present, 32 were women.
I suspect that it is true in general that there were more women than men in Starbucks on average. Test at the 5% significance level to determine whether my suspicion is justified.
(Click for Solution)
Solution. Let be a Bernoulli random variable that represents the gender of a person. Here denotes that the person is a man and denotes that the person is a woman. Denote , which yields the proportion of women in the café.
We first set up the null and alternative hypotheses:
Assume holds, so that . We next estimate using :
Since and , by the central limit theorem,
Hence, our calculated test statistic, the -value, will be as follows:
Using a -table, , which holds. Therefore, there is sufficient evidence to reject and conclude that Joel’s suspicion is justified, i.e. there are more women than men on average.
Problem 4 (Goodness-of-Fit). A total of 750 students took an assessment worth marks. For each , let denote the number of students who scored marks out of 10. We have the following data:
Assuming that scores are continuous, determine at the 5% significance level if the scores can be well-approximated using a normal distribution.
(Click for Solution)
Solution. Let denote the score of a randomly chosen student with and . We first set up the null and alternative hypotheses:
We first estimate and using and respectively. Denoting the scores by , the summary statistics are
Hence,
Now we assume holds, so that . Denoting
we will use the test statistic
which follows a -distribution with degrees of freedom. For a proof for why this distribution works, refer to this document. Using relevant -table look-up values (or a spreadsheet application), we obtain the following values for (rounded to the nearest integer for readability, but whose original value we use in the final computation):
Piecing all of the values together,
Using a -table, , which does not hold. Therefore, there is (woefully) insufficient evidence to reject and we cannot conclude that does not follow a normal distribution.
Problem 5 (Population Variance). Using the data in Problem 4, and assuming that the scores are normally distributed, test at the 5% significance level to determine if the standard deviation of assessment scores is greater than 2.
(Click for Solution)
Solution. We first set up the null and alternative hypotheses:
We use the test statistic :
Using a spreadsheet application, . Therefore, there is sufficient evidence to reject and conclude that , which implies .
Definition 1. A continuous random variable is said to follow an exponential distribution with rate parameter, denoted , if
Suppose .
Problem 1. Prove the following properties:
,
,
,
satisfies the memoryless property.
(Click for Solution)
Solution. The c.d.f. of for is given by
Hence,
For the second result, we use the tail-probability characterisation of the expectation, where the interchange of integrals is valid by Fubini’s theorem:
Hence, for ,
For the variance, we adopt a similar approach:
Therefore,
For the memoryless property,
Problem 2. Suppose is independent to .
Calculate the distribution of .
If , evaluate the p.d.f. of .
(Click for Solution)
Solution. Denoting ,
Hence, . To evaluate the p.d.f. of , we compute the convolution of their individual p.d.f.s:
Definition 2. A continuous random variable is said to follow a gamma distribution with shape parameter and rate parameter, denoted if it has a p.d.f. given by
Problem 3. Prove the following properties:
if , then , ,
if are i.i.d., then ,
if and , then .
(Click for Solution)
Solution. Suppose . By definition of the expectation,
Hence, , and
We prove the second result by induction. Suppose and are independent. To evaluate the p.d.f. of , we compute the convolution of their individual p.d.f.s:
Therefore, . Inductively, if are i.i.d.,
For the final property, denoting ,
Hence, .
Given probability distributions , write if there exists a random variable such that and .
Problem 4. Prove the following properties:
,
,
for i.i.d. , ,
for any fixed , if , then .
(Click for Solution)
Solution. We note that if , since ,
so that . If , then
The last two results are immediate corollaries of Problem 3.
These probability distributions are examples of the exponential family of probability distributions.
Feynman’s trick in differentiating under the integral sign has been creatively wielded to evaluate otherwise intractable integrals. In this exercise, we prove Feynman’s trick and use it to evaluate the seemingly intractable Dirichlet integral
Let be a measure space and be a function such that for each , is measurable.
Problem 1. Suppose the following conditions:
For any , is continuous.
There exists some non-negative integrable such that for any , .
Prove that the map defined by is continuous.
(Click for Solution)
Solution. Fix . For any , since is continuous,
so that pointwise. Furthermore,
so that and are all integrable.
Since is integrable, by Lebesgue’s dominated convergence theorem,
so that is continuous, as required.
Problem 2. Suppose the following conditions:
There exists some such that is integrable.
For each , is differentiable with derivative at denoted by .
There exists some non-negative integrable such that for any , .
Prove that the map defined by is differentiable on and
(Click for Solution)
Solution. We first check that is well-defined. By hypothesis, is well-defined. Fix . By the mean value theorem, there exists between and latex t$ such that
By performing more analysis, is integrable, so that is well-defined.
Now fix . For any , since each is measurable,
is measurable. Furthermore, pointwise. We claim that , since the mean value theorem gives between and such that
By algebruh and the triangle inequality, each is integrable. Hence, by Lebesgue’s dominated convergence theorem,
On the other hand, by bookkeeping
Therefore,
Remark 1. Thanks to Problem 2, our proof that in the study of differential equations becomes a logically correct one.
Problem 3. Use Problem 2 to evaluate .
(Click for Solution)
Solution. Define the function by
that satisfies the hypotheses of Problem 2, and our goal is to evaluate . Applying Problem 2 and integrating by parts,
Integrating and applying the first fundamental theorem of calculus,
Problem 1. Let be i.i.d.. Let denote the permutation
such that . Denoting , evaluate for each .
(Click for Solution)
Solution. Since whenever , we can assume .
We will obtain the distribution of . Fix . Let denote the number of sample points that are less than , which follows a binomial distribution. It follows that , so that
Hence, by recalling the properties of the Beta distribution,
Problem 2. Calculate the average number of rolls of a fair six-sided die that you need to roll in order for the sum of all rolls to be a multiple of .
(Click for Solution)
Solution. Let denote the -th roll and denote the sum of the first rolls. Define the stopping time by
$latex\displaystyle N := \inf_{n \in \mathbb N} \{6 \mid X_n\}.$
We claim that . For any ,
For each ,
which is one of the six possible numbers with equal probability:
Therefore, so that as well.
Problem 3. What is the probability of getting an odd number of heads out of independent flips of a fair coin?
(Click for Solution)
Solution. Let denote the number of heads out of independent flips of a fair coin. Then the required probability is
Using properties involving the binomial coefficient,
Therefore,
In particular,
Since , we must have , as required.
Problem 4. Given , calculate .
(Click for Solution)
Solution. Denoting , we observe that
Therefore, by the tail integral for expectation,
Problem 5. You’re the second-best player in a single-elimination tournament with players. Assume the brackets are randomly seeded, and the better player always wins each match. What is the probability you reach the finals?
(Click for Solution)
Solution. Each tournament will have stages, and at stage , there will be players. In order to reach the final stage, we need to be in a different “bracket” with the best player. At stage , there are two “brackets”, and each bracket has players. Therefore, the required probability is
Problem 6. Consider the sample space and the sequence of random variables with the property that
Assuming that has identical distribution, evaluate .
(Click for Solution)
Solution. Denote . By the law of total probability,
Recall that the quantity was defined inductively using Pascal’s identity
and denotes the number of -subsets of a set of size (i.e. distinct objects).
Problem 1. Prove that .
(Click for Solution)
Solution. Fix a set of distinct objects. There are possible -subsets of items that we can remove from that set. Therefore, every -subset of items left behind is obtained by exactly one corresponding -subset of items removed. Therefore, the number of -subsets (left behind) equals the number of -subsets (removed), yielding
Problem 2. Prove that .
(Click for Solution)
Solution. We can interpret the identity as counting the number of -committeees out of persons, and among the persons, we choose persons in a “core team”. This is the quantity counted by the left-hand side:
On the right-hand side, we count the same quantity differently: first choose the “core team” members, then choose the remaining members out of the remaining persons:
Since both types of counting give the same total,
Problem 3. Prove that .
(Click for Solution)
Solution. Replacing with in Problem 2,
Problem 4. Prove that .
(Click for Solution)
Solution. Using Problems 1 and 3,
Problem 5. Prove that .
(Click for Solution)
Solution. Replacing with in Problem 2,
Problem 6. For any , count the number of -tuples such that
(Click for Solution)
Solution. Consider a row of stars and bars. Let denote the number of stars before the 1st bar, denote the number of stars after the 1st bar and before the 2nd bar, and so on and so forth. Each arrangement of the stars and bars then corresponds to each desired -tuple. Thus, the required number is the number of places to place the bars:
Problem 7. For any , suppose is prime. Count the number of -tuples such that
(Click for Solution)
Solution. Since is prime, we require one of the sums to equal , and the other to equal . If the terms sum to , then there are a total of possible options of . By Problem 6, there are a total of