Lecture 6

Two-sample

hypothesis tests

- Two-sample hypothesis tests
- Two-sample t-test
- Two-sample t-test: Example
- Two-sample F-test
- Two-sample F-test: Example

Two-sample

hypothesis tests

In Lecture 4:

- Looked at data before and after the 2008 crash
- In this case data for each month is directly comparable
- Can then construct the difference between the 2007 and 2009 values
- Analysis reduces from a two-sample to a one-sample problem

How do we analyze two samples that cannot be paired together in this way?

Want to compare the mean and the variance of 2 independent samples

- First sample
- Sample size n_1, sample mean \overline{x}_1, sample variance s^2_1

- Second sample
- Sample size n_2, sample mean \overline{x}_2, sample variance s^2_2

- We may have n_1 \neq n_2
- Samples cannot be paired!

What we can do:

- Use a two-sample t-test to test for a difference in means
- Use a two-sample F-test to test for a difference in variances

Hypothesis testing starts to get interesting with 2 or more samples

t-test and F-test show the normal distribution family in action

This is also the maths behind regression

- Same method thus applies to seemingly unrelated problems
- Regression is a big subject in statistics

- Want to compare the means of two independent samples
- At the same time variance is unknown
- Therefore both variances are estimated with sample variance
- Test statistic ends up being t-distribution linked to the total number of observations

Want to compare the variance of two independent samples

This can be done by studying the ratio of the sample variances s^2_2/s^2_1

We have already shown that \frac{(n_1-1)S^2_1}{\sigma^2_1} \sim \chi^2_{n_1-1} \qquad \frac{(n_2-1)S^2_2}{\sigma^2_2}\sim\chi^2_{n_2-1}

Hence we can study statistic \frac{S^2_1 / \sigma_1^2}{S^2_2 / \sigma_2^2}

We will see that the above has F-distribution

Two-sample t-test

**Assumptions**: Suppose given samples from 2 normal populations

- X_1, \ldots ,X_n iid with distribution N(\mu_X,\sigma^2)
- Y_1, \ldots ,Y_m iid with distribution N(\mu_Y,\sigma^2)

**Goal**: Compare means \mu_X and \mu_Y. We consider test
H_0 \colon \mu_X = \mu_Y \qquad H_1 \colon \mu_X \neq \mu_Y

**Note:**

- In general n \neq m so that one-sample t-test cannot be applied
- The two populations have same variance \sigma^2

**t-statistic**: The general form is
T = \frac{\text{Estimate}-\text{Hypothesised value}}{\text{e.s.e}}

In our case we can estimate \mu_X - \mu_Y with the sample means \text{Estimate} = \overline{X} - \overline{Y}

Since we are testing for difference in mean, we have \text{Hypothesised value} = \mu_X - \mu_Y

The

*Estimated Standard Error*is the standard deviation of estimator \text{e.s.e} = \text{Standard Deviation of } \overline{X} -\overline{Y}

Therefore the two-sample t-statistic is T = \frac{\overline{X} - \overline{Y} - (\mu_X - \mu_Y)}{\text{e.s.e}}

Under the Null Hypothesis that \mu_X = \mu_Y the t-statistic becomes T = \frac{\overline{X} - \overline{Y} }{\text{e.s.e}}

The general rule is \text{df} = \text{Sample size} - \text{No. of estimated parameters}

Sample size in two-sample t-test:

- n in the first sample
- m in the second sample
- Hence total number of observations is n + m

No. of estimated parameters is 2 – \mu_X and \mu_Y

Hence degree of freedoms in two-sample t-test is df= n + m - 2

Recall: We are assuming populations have same variance \sigma^2_X = \sigma^2_Y = \sigma^2

We need to compute the estimated standard error \text{e.s.e} = \text{Standard Deviation of } \overline{X} -\overline{Y}

We have already computed the variance of the sample mean in the Lemma in Slide 17 Lecture 13

Since \overline{X} \sim N(\mu_X,\sigma^2) and \overline{Y} \sim N(\mu_Y,\sigma^2), by the Lemma we get {\rm Var}[\overline{X}] = \frac{\sigma^2}{n} \qquad {\rm Var}[\overline{Y}] = \frac{\sigma^2}{m}

Since X_i and Y_i are independent we get {\rm Cov}(X_i,Y_j)=0

By bilinearity of covariance we infer {\rm Cov}( \overline{X} , \overline{Y} ) = \sum_{i=1}^n \sum_{j=1}^m {\rm Cov}(X_i,Y_j) = 0

We can then compute \begin{align*} {\rm Var}[ \overline{X} - \overline{Y} ] & = {\rm Var}[ \overline{X} ] + {\rm Var}[ \overline{Y} ] - 2 {\rm Cov}( \overline{X} , \overline{Y} ) \\ & = {\rm Var}[ \overline{X} ] + {\rm Var}[ \overline{Y} ] \\ & = \sigma^2 \left( \frac{1}{n} + \frac{1}{m} \right) \end{align*}

Taking the square root gives \text{S.D.}(\overline{X} - \overline{Y} )= \sigma \ \sqrt{\frac{1}{n}+\frac{1}{m}}

Thus the form of the t-statistic becomes T = \frac{\overline{X} - \overline{Y} - (\mu_X - \mu_Y)}{\text{e.s.e}} = \frac{\overline{X} - \overline{Y} - (\mu_X - \mu_Y)}{\sigma \ \sqrt{\dfrac{1}{n}+\dfrac{1}{m}}}

Two-sample F-test

Suppose given random samples from 2 normal populations:

- X_1, \ldots, X_n iid random sample from N(\mu_X, \sigma_X^2)
- Y_1, \ldots, Y_m iid random sample from N(\mu_Y, \sigma_Y^2)

**Problem**:

- We want to compare
**variability**of the 2 populations - A way to do it is by estimating the variances ratio \frac{\sigma_X^2}{\sigma_Y^2}

**Question**:

- Suppose the variances \sigma_X^2 and \sigma_Y^2 are
**unknown** - How can we estimate the ratio \sigma_X^2 /\sigma_Y^2 \, ?

**Answer**:

Estimate the ratio \sigma_X^2 /\sigma_Y^2 \, using sample variances S^2_X / S^2_Y

The F-distribution allows to compare the quantities \sigma_X^2 /\sigma_Y^2 \qquad \text{and} \qquad S^2_X / S^2_Y

Suppose given random samples from 2 normal populations:

- X_1, \ldots, X_n iid random sample from N(\mu_X, \sigma_X^2)
- Y_1, \ldots, Y_m iid random sample from N(\mu_Y, \sigma_Y^2)

The random variable F = \frac{ S_X^2 / \sigma_X^2 }{ S_Y^2 / \sigma_Y^2 } has F-distribution with n-1 and m-1 degrees of freedom.

We need to prove F = \frac{ S_X^2 / \sigma_X^2 }{ S_Y^2 / \sigma_Y^2 } \sim F_{n-1,m-1}

By the Theorem in Slide 48 Lecture 3 we have that \frac{S_X^2}{ \sigma_X^2} \sim \frac{\chi_{n-1}^2}{n-1} \,, \qquad \frac{S_Y^2}{ \sigma_Y^2} \sim \frac{\chi_{m-1}^2}{m-1}

Therefore F = \frac{ S_X^2 / \sigma_X^2 }{ S_Y^2 / \sigma_Y^2 } = \frac{U/p}{V/q} where we have U \sim \chi_{p}^2 \,, \qquad V \sim \chi_q^2 \,, \qquad p = n-1 \,, \qquad q = m - 1

By the Theorem in Slide 43 we infer the thesis F = \frac{U/p}{V/q} \sim F_{n-1,m-1}

It is asymptotically unbiased. Show that {\rm I\kern-.3em E}\left[ \frac{S_X^2}{S_Y^2} \right] = \frac{m-1}{m-3} \frac{\sigma_X^2}{\sigma_Y^2} \to \frac{\sigma_X^2}{\sigma_Y^2} as m \to \infty. Therefore the sample variance ratio S_X^2 /S_Y^2 is estimator for variance ratio \sigma_X^2 /\sigma_Y^2