```
# Probability woman exceeds 180cm in height
# P(X > 180) = 1 - P(X <= 180)
1 - pnorm(180, mean = 163, sd = 8)
```

`[1] 0.01679331`

Lecture 5

Hypothesis tests in R

Part 2

- One-sample variance test
- One-sample variance test: Example
- F-distribution

One-sample variance test

- Suppose to have a population with normal distribution N(\mu,\sigma^2)
- Mean \mu and variance \sigma^2 are
**unknown**

- Mean \mu and variance \sigma^2 are
**Questions**about \mu and \sigma^2- What is my best guess of the value?
- How far away from the true value am I likely to be?

**Answers:**- The one-sample
**t-test**answers questions about \mu - The one-sample
**variance ratio test**answers questions about \sigma^2

- The one-sample

The-one sample variance test uses

**chi-squared distribution****Recall:**Chi-squared distribution with p degrees of freedom is \chi_p^2 = Z_1^2 + \ldots + Z_p^2 where Z_1, \ldots, Z_p are iid N(0, 1)

**Assumption:** Suppose given sample X_1,\ldots, X_n iid from N(\mu,\sigma^2)

**Goal:** Estimate variance \sigma^2 of population

**Test:**

Suppose \sigma_0 is guess for \sigma

The one-sided hypothesis test for \sigma is H_0 \colon \sigma = \sigma_0 \qquad H_1 \colon \sigma > \sigma_0

Consider the sample variance S^2 = \frac{ \sum_{i=1}^n X_i^2 - n \overline{X}^2 }{n-1}

Since we believe H_0, the variance is \sigma = \sigma_0

S^2 cannot be too far from the true variance \sigma

Therefore we

**cannot**have that S^2 \gg \sigma^2 = \sigma_0^2

If we observe S^2 \gg \sigma_0^2 then our guess \sigma_0 is probably wrong

Therefore we

**reject**H_0 if S^2 \gg \sigma_0^2The

**rejection**condition S^2 \gg \sigma_0^2 is equivalent to \frac{(n-1)S^2}{\sigma_0^2} \gg 1 where n is the sample size

We define our

**test statistic**as \chi^2 := \frac{(n-1)s^2}{\sigma_0^2}The

**rejection**condition is hence \chi^2 \gg 1

Recall that \frac{(n-1)S^2}{\sigma^2} \sim \chi_{n-1}^2

Assuming \sigma=\sigma_0, we therefore have \chi^2 = \frac{(n-1)S^2}{\sigma_0^2} = \frac{(n-1)S^2}{\sigma^2} \sim \chi_{n-1}^2

We

**reject**H_0 if \chi^2 = \frac{(n-1)S^2}{\sigma_0^2} \gg 1As \chi^2 \sim \chi_{n-1}^2, we decide to rejct H_0 if \chi^2 > \chi_{n-1}^2(0.05)

By definition the

**critical value**\chi_{n-1}^2(0.05) is such that P(\chi_{n-1}^2 > \chi_{n-1}^2(0.05) ) = 0.05

x^* := \chi_{n-1}^2(0.05) is point on x-axis such that P(\chi_{n-1}^2 > x^* ) = 0.05

In the picture we have n = 12 and \chi_{11}^2(0.05) = 19.68

- Find Table 13.5 in this file
- Look at the row with Degree of Freedom n-1 (or its closest value)
- Find
**critical value**\chi^2_{n-1}(0.05) in column \alpha = 0.05 **Example**: n=12, DF =11, \chi^2_{11}(0.05) = 19.68

Given the test statistic \chi^2 the

**p-value**is defined as p := P( \chi_{n-1}^2 > \chi^2 )Notice that p < 0.05 \qquad \iff \qquad \chi^2 > \chi_{n-1}^2(0.05)

This is because \chi^2 > \chi_{n-1}^2(0.05) iff p = P(\chi_{n-1}^2 > \chi^2) < P(\chi_{n-1}^2 > \chi_{n-1}^2(0.05) ) = 0.05

Suppose given

- Sample x_1, \ldots, x_n of size n from N(\mu,\sigma^2)
- Guess \sigma_0 for \sigma

The one-sided hypothesis test is
H_0 \colon \sigma = \sigma_0 \qquad H_1 \colon \sigma > \sigma_0
The **variance ratio test** consists of 3 steps

**Calculation**: Compute the chi-squared statistic \chi^2 = \frac{(n-1) s^2}{\sigma_0^2} where sample mean and variance are \overline{x} = \frac{1}{n} \sum_{i=1}^n x_i \,, \qquad s^2 = \frac{\sum_{i=1}^n x_i^2 - n \overline{x}^2}{n-1}

**Statistical Tables or R**: Find either- Critical value in Table 13.5 \chi_{n-1}^2(0.05)
- p-value in R p := P( \chi_{n-1}^2 > \chi^2 )

**Interpretation**:- Reject H_0 if \chi^2 > \chi_{n-1}^2(0.05) \qquad \text{ or } \qquad p < 0.05
- Do not reject H_0 if \chi^2 \leq \chi_{n-1}^2(0.05) \qquad \text{ or } \qquad p \geq 0.05

One-sample variance test: Example

Month | J | F | M | A | M | J | J | A | S | O | N | D |
---|---|---|---|---|---|---|---|---|---|---|---|---|

Cons. Expectation | 66 | 53 | 62 | 61 | 78 | 72 | 65 | 64 | 61 | 50 | 55 | 51 |

Cons. Spending | 72 | 55 | 69 | 65 | 82 | 77 | 72 | 78 | 77 | 75 | 77 | 77 |

Difference | -6 | -2 | -7 | -4 | -4 | -5 | -7 | -14 | -16 | -25 | -22 | -26 |

**Data:***Consumer Expectation*(CE) and*Consumer Spending*(CS) in 2011**Assumption:**CE and CS are normally distributed

Month | J | F | M | A | M | J | J | A | S | O | N | D |
---|---|---|---|---|---|---|---|---|---|---|---|---|

Cons. Expectation | 66 | 53 | 62 | 61 | 78 | 72 | 65 | 64 | 61 | 50 | 55 | 51 |

Cons. Spending | 72 | 55 | 69 | 65 | 82 | 77 | 72 | 78 | 77 | 75 | 77 | 77 |

Difference | -6 | -2 | -7 | -4 | -4 | -5 | -7 | -14 | -16 | -25 | -22 | -26 |

**Remark**: Monthly data on CE and CS can be matched- Hence consider: \quad Difference = CE - CS
- CE and CS normal \quad \implies \quad Difference \sim N(\mu,\sigma^2)

**Question:**Test the following hypothesis: H_0 \colon \sigma = 1 \qquad H_1 \colon \sigma > 1

If X \sim N(\mu,\sigma^2) then P( \mu - 2 \sigma \leq X \leq \mu + 2\sigma ) \approx 0.95

Recall: \quad Difference = (CE - CS) \sim N(\mu,\sigma^2)

Hence if \sigma = 1 P( \mu - 2 \leq {\rm CE} - {\rm CS} \leq \mu + 2 ) \approx 0.95

**Meaning of variance ratio test:**\sigma=1 \quad \implies \quad \text{CS index is within } \pm{2\sigma} \text{ of CE index with probability } 0.95

Month | J | F | M | A | M | J | J | A | S | O | N | D |
---|---|---|---|---|---|---|---|---|---|---|---|---|

Difference | -6 | -2 | -7 | -4 | -4 | -5 | -7 | -14 | -16 | -25 | -22 | -26 |

**Calculations**: With data on Difference compute

- Sample mean: \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = \frac{-6-2-7-4- \ldots -22-26}{12} = -\frac{138}{12} = -11.5

Month | J | F | M | A | M | J | J | A | S | O | N | D |
---|---|---|---|---|---|---|---|---|---|---|---|---|

Difference | -6 | -2 | -7 | -4 | -4 | -5 | -7 | -14 | -16 | -25 | -22 | -26 |

**Calculations**: With data on Difference compute

- Sample variance: \begin{align*} \sum_{i=1}^{n} x_i^2 & = (-6)^2 + (-2)^2 + (-7)^2 + \ldots + (-22)^2 + (-26)^2 = 2432 \\ s^2 & = \frac{\sum_{i=1}^n x^2_i- n \bar{x}^2}{n-1} = \frac{2432-12(-11.5)^2}{11} = \frac{845}{11} = 76.8182 \end{align*}

Month | J | F | M | A | M | J | J | A | S | O | N | D |
---|---|---|---|---|---|---|---|---|---|---|---|---|

Difference | -6 | -2 | -7 | -4 | -4 | -5 | -7 | -14 | -16 | -25 | -22 | -26 |

**Calculations**: With data on Difference compute

- Chi-squared statistic: \chi^2 = \frac{(n-1)s^2}{\sigma_0^2} = \frac{11 \left(\frac{845}{11}\right) }{1} = 845

**Statistical Tables**:- Sample size is n = 12
- In Table 13.5 find \chi_{11}^2(0.05) = 19.68

**Interpretation**:- Test statistic is \chi^2 = 845
- We
**reject**H_0 because \chi^2 = 845 > 19.68 = \chi_{n-1}^2(0.05)

**Conclusion**:- We accept H_1: The standard deviation satisfies \sigma > 1
- A better estimate for \sigma could be sample standard deviation s=\sqrt{\frac{845}{11}}=8.765
- This suggests: With probability 0.95 \text{CS index is within } \pm{2 \times 8.765 = \pm 17.53 } \text{ of CE index}

**Goal**: Perform chi-squared variance ratio test in R

For this we need to compute p-value p = P(\chi_{n-1}^2 > \chi^2)

Thus, we need to compute probabilities for chi-squared distribution in R

- R can natively do calculations with known probability distrubutions
- Example: Let X be r.v. with N(\mu,\sigma^2) distribution

R command |
Meaning |
---|---|

`pnorm(x, mean = mu, sd = sig)` |
P(X \leq x) |

`qnorm(p, mean = mu, sd = sig)` |
P(X \leq q) = p |

`dnorm(x, mean = mu, sd = sig)` |
f(x) where f is distr. of X |

`rnorm(n, mean = mu, sd = sig)` |
n random samples from distr. of X |

**Note**: Syntax of commands

**norm** = normal \qquad **p** = probability \qquad **q** = quantile

**d** = distribution \qquad **r** = random

- Suppose average height of women is normally distributed N(\mu,\sigma^2)
- Assume mean \mu = 163 cm and standard deviation \sigma = 8 cm

- Suppose average height of women is normally distributed N(\mu,\sigma^2)
- Assume mean \mu = 163 cm and standard deviation \sigma = 8 cm

- Suppose average height of women is normally distributed N(\mu,\sigma^2)
- Assume mean \mu = 163 cm and standard deviation \sigma = 8 cm

- Suppose average height of women is normally distributed N(\mu,\sigma^2)
- Assume mean \mu = 163 cm and standard deviation \sigma = 8 cm

- Commands for chi-squared distrubution are similar
`df = n`

denotes n degrees of feedom

R command |
Meaning |
---|---|

`pchisq(x, df = n)` |
P(X \leq x) |

`qchisq(p, df = n)` |
P(X \leq q) = p |

`dchisq(x, df = n)` |
f(x) where f is distr. of X |

`rchisq(m, df = n)` |
m random samples from distr. of X |

- From Tables we found quantile \chi_{11}^2 (0.05) = 19.68
**Question:**Compute such quantile in R

- From Tables we found quantile \chi_{11}^2 (0.05) = 19.68
**Question:**Compute such quantile in R

The \chi^2 statistic for variance ratio test has distribution \chi_{n-1}^2

**Question:**Compute the p-value p := P(\chi_{n-1}^2 > \chi^2)

The \chi^2 statistic for variance ratio test has distribution \chi_{n-1}^2

**Question:**Compute the p-value p := P(\chi_{n-1}^2 > \chi^2)Observe that p := P(\chi_{n-1}^2 > \chi^2) = 1 - P(\chi_{n-1}^2 \leq \chi^2)

The code is therefore

Month | J | F | M | A | M | J | J | A | S | O | N | D |
---|---|---|---|---|---|---|---|---|---|---|---|---|

Cons. Expectation | 66 | 53 | 62 | 61 | 78 | 72 | 65 | 64 | 61 | 50 | 55 | 51 |

Cons. Spending | 72 | 55 | 69 | 65 | 82 | 77 | 72 | 78 | 77 | 75 | 77 | 77 |

Difference | -6 | -2 | -7 | -4 | -4 | -5 | -7 | -14 | -16 | -25 | -22 | -26 |

**Back to Example:**Monthly data on CE and CS**Question:**Test the following hypothesis: H_0 \colon \sigma = 1 \qquad H_1 \colon \sigma > 1

Month | J | F | M | A | M | J | J | A | S | O | N | D |
---|---|---|---|---|---|---|---|---|---|---|---|---|

Cons. Expectation | 66 | 53 | 62 | 61 | 78 | 72 | 65 | 64 | 61 | 50 | 55 | 51 |

Cons. Spending | 72 | 55 | 69 | 65 | 82 | 77 | 72 | 78 | 77 | 75 | 77 | 77 |

Difference | -6 | -2 | -7 | -4 | -4 | -5 | -7 | -14 | -16 | -25 | -22 | -26 |

- Start by entering data into R

- Compute chi-squared statistic \chi^2 = \frac{(n-1) s^2}{\sigma^2_0}

- Compute the p-value p = P(\chi_{n-1}^2 > \chi^2) = 1 - P(\chi_{n-1}^2 \leq \chi^2)

```
# Compute p-value
p_value <- 1 - pchisq(chi_squared, df = n - 1)
# Print p-value
cat("The p-value for one-sided variance test is", p_value)
```

- The full code can be downloaded here variance_ratio_test.R

- Running variance_ratio_test.R gives the following output:

`The p-value for one-sided variance test is 0`

- Since p < 0.05 we
**reject**H_0 - Therefore the true variance seems to be \sigma^2 > 1

F-distribution

**Recall**: Chi-squared distribution with p degrees of freedom is
\chi_p^2 = Z_1^2 + \ldots + Z_p^2
where Z_1, \ldots, Z_n are iid N(0, 1)

Chi-squared distribution was used to:

Describe distribution of sample variance S^2: \frac{(n-1)S^2}{\sigma^2} \sim \chi_{n-1}^2

Define t-distribution: t_p \sim \frac{U}{\sqrt{V/p}} where U \sim N(0,1) and V \sim \chi_p^2

F-distribution is:

- Defined in terms of ratio of \chi_p^2
- Describes
**variance estimators**for independent samples

The r.v. F has F-distribution with p and q degrees of freedom if the pdf is
f_F(x) = \frac{ \Gamma \left(\frac{p+q}{2} \right) }{ \Gamma \left( \frac{p}{2} \right) \Gamma \left( \frac{q}{2} \right) }
\left( \frac{p}{q} \right)^{p/2} \,
\frac{ x^{ (p/2) - 1 } }{ [ 1 + (p/q) x ]^{(p+q)/2} } \,, \quad x > 0

**Notation**: F-distribution with p and q degrees of freedom is denoted by F_{p,q}

The F-distribution is obtained as ratio of 2 independent chi-squared distributions

Suppose that U \sim \chi_p^2 and V \sim \chi_q^2 are independent. Then
X := \frac{U/p}{V/q} \sim F_{p,q}

Similar to the proof (seen in Homework 3) that \frac{U}{\sqrt{V/p}} \sim t_p where U \sim N(0,1) and V \sim \chi_p^2 are independent

In our case we need to prove X := \frac{U/p}{V/q} \sim F_{p,q} where U \sim \chi_p^2 and V \sim \chi_q^2 are independent

U \sim \chi_{p}^2 and V \sim \chi_q^2 are independent. Therefore \begin{align*} f_{U,V} (u,v) & = f_U(u) f_V(v) \\ & = \frac{ 1 }{ \Gamma \left( \frac{p}{2} \right) \Gamma \left( \frac{q}{2} \right) 2^{(p+q)/2} } u^{\frac{p}{2} - 1} v^{\frac{q}{2} - 1} e^{-(u+v)/2} \end{align*}

Consider the change of variables x(u,v) := \frac{u/p}{v/q} \,, \quad y(u,v) := u + v

This way we have X = \frac{U/p}{V/q} \,, \qquad Y = U + V

We can compute f_X via f_{X}(x) = \int_{0}^\infty f_{X,Y}(x,y) \, dy

- The joint pdf f_{X,Y} can be computed by inverting the change of variables x(u,v) := \frac{u/p}{v/q} \,, \quad y(u,v) := u + v and using the formula f_{X,Y}(x,y) = f_{U,V}(u(x,y),v(x,y)) \, |\det J| where J is the Jacobian of the inverse transformation (x,y) \mapsto (u(x,y),v(x,y))

Since f_{U,V} is known, then also f_{X,Y} is known

Moreover the integral f_{X}(x) = \int_{0}^\infty f_{X,Y}(x,y) \, dy can be explicitly computed, yielding the thesis f_{X}(x) = \frac{ \Gamma \left(\frac{p+q}{2} \right) }{ \Gamma \left( \frac{p}{2} \right) \Gamma \left( \frac{q}{2} \right) } \left( \frac{p}{q} \right)^{p/2} \, \frac{ x^{ (p/2) - 1 } }{ [ 1 + (p/q) x ]^{(p+q)/2} }

Suppose X \sim F_{p,q} with q>2. Then {\rm I\kern-.3em E}[X] = \frac{q}{q-2}

If X \sim F_{p,q} then 1/X \sim F_{q,p}

If X \sim t_q then X^2 \sim F_{1,q}

Requires a bit of work. It will be left as Homework assignment

By the Theorem in Slide 43 we have X \sim F_{p,q} \quad \implies \quad X = \frac{U/p}{V/q} with U \sim \chi_p^2 and V \sim \chi_q^2 independent. Therefore \frac{1}{X} = \frac{V/q}{U/p} \sim F_{q,p}

- Suppose X \sim t_q. The Theorem in Slide 66 of Lecture 3 guarantees that X = \frac{U}{\sqrt{V/q}} where U \sim N(0,1) and V \sim \chi_q^2 are independent. Therefore X^2 = \frac{U^2}{V/q}

- Since U \sim N(0,1), by definition U^2 \sim \chi_1^2.
- Moreover U^2 and V are independet, since U and V are independent
- Theorem in Slide 43 implies X^2 = \frac{U^2}{V/q} \sim \frac{\chi_1^2/1}{\chi_q^2/q} \sim F_{1,q}