Statistical Models

Lecture 5

Lecture 5:
Hypothesis tests in R
Part 2

Outline of Lecture 5

  1. One-sample variance test
  2. F-distribution

Part 1:
One-sample variance test

Estimating mean and variance

  • Suppose to have a population with normal distribution N(\mu,\sigma^2)
    • Mean \mu and variance \sigma^2 are unknown
  • Questions about \mu and \sigma^2
    1. What is my best guess of the value?
    2. How far away from the true value am I likely to be?
  • Answers:
    • The one-sample t-test answers questions about \mu
    • The one-sample variance ratio test answers questions about \sigma^2

Chi-squared distribution

  • The-one sample variance test uses chi-squared distribution

  • Recall: Chi-squared distribution with p degrees of freedom is \chi_p^2 = Z_1^2 + \ldots + Z_p^2 where Z_1, \ldots, Z_n are iid N(0, 1)

One-sample one-sided variance ratio test

Goal

  • Estimate variance \sigma^2 of normal population N(\mu,\sigma^2)

  • Suppose \sigma_0 is guess for \sigma

  • The one-sided hypotheses test is H_0 \colon \sigma \leq \sigma_0 \qquad H_1 \colon \sigma > \sigma_0

One-sample one-sided variance ratio test

What to do?

  • Consider the sample variance S^2

  • The sample variance S^2 cannot be too far from the true variance \sigma

  • Since we believe H_0, true variance is \sigma \leq \sigma_0

  • Therefore we cannot have that S^2 \gg \sigma_0^2 (bacause then S^2 \gg \sigma^2)

One-sample one-sided variance ratio test

What to do?

  • Therefore we reject H_0 if S^2 \gg \sigma_0^2

  • The rejection condition S^2 \gg \sigma_0^2 is equivalent to \frac{(n-1)S^2}{\sigma_0^2} \gg 1

One-sample one-sided variance ratio test

What to do?

  • Assuming \sigma=\sigma_0, the rejection condition becomes \frac{(n-1)S^2}{\sigma_0^2} = \frac{(n-1)S^2}{\sigma^2} \gg 1

  • Recall that \frac{(n-1)S^2}{\sigma^2} \sim \chi_{n-1}^2

One-sample one-sided variance ratio test

What to do?

  • We define our test statistic as \chi^2 := \frac{(n-1)s^2}{\sigma_0^2}

  • We reject H_0 if \chi^2 is too large

  • As \chi^2 is observed from \chi_{n-1}^2, we decide to rejct if \chi^2 > \chi_{n-1}^2(0.05)

  • By definition the critical value \chi_{n-1}^2(0.05) is such that P(\chi_{n-1}^2 > \chi_{n-1}^2(0.05) ) = 0.05

One-sample one-sided variance ratio test

Critical values of chi-squared

  • x^* := \chi_{n-1}^2(0.05) is point on x-axis such that P(\chi_{n-1}^2 > x^* ) = 0.05

  • In the picture we have n = 12 and \chi_{11}^2(0.05) = 19.68

One-sample one-sided variance ratio test

Critical values of chi-squared – Table 13.5 (file here)

  • Look at the row with Degree of Freedom n-1 (or its closest value)
  • Find critical value \chi^2_{n-1}(0.05) in column \alpha = 0.05
  • Example: n=12, DF =11, \chi^2_{11}(0.05) = 19.68

One-sample one-sided variance ratio test

p-value

  • Given the test statistic \chi^2 the p-value is defined as p := P( \chi_{n-1}^2 > \chi^2 )

  • Notice that p < 0.05 \qquad \iff \qquad \chi^2 > \chi_{n-1}^2(0.05)

This is because \chi^2 > \chi_{n-1}^2(0.05) iff p = P(\chi_{n-1}^2 > \chi^2) < P(\chi_{n-1}^2 > \chi_{n-1}^2(0.05) ) = 0.05

One-sample one-sided variance ratio test

Procedure

Suppose \sigma_0 is guess for \sigma. The one-sided hypothesis is H_0 \colon \sigma \leq \sigma_0 \qquad H_1 \colon \sigma > \sigma_0 The variance ratio test or chi-squared test consists of 3 steps:

  1. Calculation: Compute the chi-squared statistics \chi^2 = \frac{(n-1) s^2}{\sigma_0^2} where sample mean and variance are \overline{x} = \frac{1}{n} \sum_{i=1}^n x_i \,, \qquad s^2 = \frac{\sum_{i=1}^n x_i^2 - n \overline{x}^2}{n-1}

One-sample one-sided variance ratio test

Procedure

  1. Statistical Tables or R: Find either
    • Critical value in Table 13.5 \chi_{n-1}^2(0.05)
    • p-value in R p := P( \chi_{n-1}^2 > \chi^2 )

One-sample one-sided variance ratio test

Procedure

  1. Interpretation:
    • Reject H_0 if \chi^2 > \chi_{n-1}^2(0.05) \qquad \text{ or } \qquad p < 0.05
    • Do not reject H_0 if \chi^2 \leq \chi_{n-1}^2(0.05) \qquad \text{ or } \qquad p \geq 0.05

Variance ratio test - Example

Month J F M A M J J A S O N D
Cons. Expectation 66 53 62 61 78 72 65 64 61 50 55 51
Cons. Spending 72 55 69 65 82 77 72 78 77 75 77 77
Difference -6 -2 -7 -4 -4 -5 -7 -14 -16 -25 -22 -26


  • Data: Consumer Expectation (CE) and Consumer Spending (CS) in 2011
  • Assumption: CE and CS are normally distributed

Variance ratio test - Example

Month J F M A M J J A S O N D
Cons. Expectation 66 53 62 61 78 72 65 64 61 50 55 51
Cons. Spending 72 55 69 65 82 77 72 78 77 75 77 77
Difference -6 -2 -7 -4 -4 -5 -7 -14 -16 -25 -22 -26


  • Remark: Monthly data on CE and CS can be matched
    • Hence consider: \quad Difference = CE - CS
    • CE and CS normal \quad \implies \quad Difference \sim N(\mu,\sigma^2)
  • Question: Test the following hypothesis: H_0 \colon \sigma \leq 1 \qquad H_1 \colon \sigma > 1

Variance ratio test - Example

Motivation of test

  • If X \sim N(\mu,\sigma^2) then P( \mu - 2 \sigma \leq X \leq \mu + 2\sigma ) \approx 0.95

  • Recall: \quad Difference = (CE - CS) \sim N(\mu,\sigma^2)

  • Hence if \sigma = 1 P( \mu - 2 \leq {\rm CE} - {\rm CS} \leq \mu + 2 ) \approx 0.95

  • Meaning: \sigma=1 \quad \implies \quad \text{CS index is within } \pm{2\sigma} \text{ of CE index with probability } 0.95

Variance ratio test - Example

Test by hand

Month J F M A M J J A S O N D
Difference -6 -2 -7 -4 -4 -5 -7 -14 -16 -25 -22 -26


  1. Calculations: With data on Difference compute
  • Sample mean: \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = \frac{-6-2-7-4- \ldots -22-26}{12} = -\frac{138}{12} = -11.5

Variance ratio test - Example

Test by hand

Month J F M A M J J A S O N D
Difference -6 -2 -7 -4 -4 -5 -7 -14 -16 -25 -22 -26


  1. Calculations: With data on Difference compute
  • Sample variance: \begin{align*} \sum_{i=1}^{n} x_i^2 & = (-6)^2 + (-2)^2 + (-7)^2 + \ldots + (-22)^2 + (-26)^2 = 2432 \\ s^2 & = \frac{\sum_{i=1}^n x^2_i- n \bar{x}^2}{n-1} = \frac{2432-12(-11.5)^2}{11} = \frac{845}{11} = 76.8182 \end{align*}

Variance ratio test - Example

Test by hand

Month J F M A M J J A S O N D
Difference -6 -2 -7 -4 -4 -5 -7 -14 -16 -25 -22 -26


  1. Calculations: With data on Difference compute
  • Chi-squared statistic: \chi^2 = \frac{(n-1)s^2}{\sigma_0^2} = \frac{11 \left(\frac{845}{11}\right) }{1} = 845

Variance ratio test - Example

Test by hand

  1. Statistical Tables: Find either
    • We have n = 12
    • In Table 13.5 find \chi_{11}^2(0.05) = 19.68
  2. Interpretation:
    • Test statistic is \chi^2 = 845
    • We reject H_0 because \chi^2 = 845 > 19.68 = \chi_{n-1}^2(0.05)

Variance ratio test - Example

Test by hand

  1. Conclusion:
    • We accept H_1: The standard deviation satisfies \sigma > 1
    • A better estimate for \sigma could be sample standard deviation s=\sqrt{\frac{845}{11}}=8.765
    • This suggests: With probability 0.95 \text{CS index is within } \pm{2 \times 8.765 = \pm 17.53 } \text{ of CE index}

Variance ratio test - Example

Test in R

Goal: Perform chi-squared variance ratio test in R

  • For this we need to compute p-value p = P(\chi_{n-1}^2 > \chi^2)

  • Thus, we need to compute probabilities for chi-squared distribution in R

Probability Distributions in R

  • R can natively do calculations with known probability distrubutions
  • Example: Let X be r.v. with N(\mu,\sigma^2) distribution
R command Meaning
pnorm(x, mean = mu, sd = sig) P(X \leq x)
qnorm(p, mean = mu, sd = sig) P(X \leq q) = p
dnorm(x, mean = mu, sd = sig) f(x) where f is distr. of X
rnorm(n, mean = mu, sd = sig) n random samples from distr. of X

Note: Syntax of commands

norm = normal \qquad p = probability \qquad q = quantile

d = distribution \qquad r = random

Probability Distributions in R

Example

  • Suppose average height of women is normally distributed N(\mu,\sigma^2)
  • Assume mean \mu = 163 cm and standard deviation \sigma = 8 cm


# Probability woman exceeds 180cm in height
# P(X > 180) = 1 - P(X <= 180)

1 - pnorm(180, mean = 163, sd = 8)
[1] 0.01679331

Probability Distributions in R

Example

  • Suppose average height of women is normally distributed N(\mu,\sigma^2)
  • Assume mean \mu = 163 cm and standard deviation \sigma = 8 cm


# The upper 10th percentile for women height, that is,
# height q such that P(X <= q) = 0.9

qnorm(0.90, mean = 163, sd = 8)
[1] 173.2524

Probability Distributions in R

Example

  • Suppose average height of women is normally distributed N(\mu,\sigma^2)
  • Assume mean \mu = 163 cm and standard deviation \sigma = 8 cm


# Value of density at 163

dnorm(163, mean = 163, sd = 8)
[1] 0.04986779

Probability Distributions in R

Example

  • Suppose average height of women is normally distributed N(\mu,\sigma^2)
  • Assume mean \mu = 163 cm and standard deviation \sigma = 8 cm


# Generate random sample of size 5

rnorm(5, mean = 163, sd = 8)
[1] 168.5487 150.9497 167.6267 167.8335 162.1466

Probability Distributions in R

Chi-squared distribution

  • Commands for chi-squared distrubution are similar
  • df = n denotes n degrees of feedom
R command Meaning
pchisq(x, df = n) P(X \leq x)
qchisq(p, df = n) P(X \leq q) = p
dchisq(x, df = n) f(x) where f is distr. of X
rchisq(m, df = n) m random samples from distr. of X

Probability Distributions in R

Example

  • From Tables we found quantile \chi_{11}^2 (0.05) = 19.68
  • Compute such quantile in R


# Compute 0.95 quantile for chi-squared with 11 degrees of freedom

quantile <- qchisq(0.95, df = 11)

cat("The 0.95 quantile for chi-squared with df = 11 is", quantile)
The 0.95 quantile for chi-squared with df = 11 is 19.67514

Probability Distributions in R

Example

  • The \chi^2 statistic for variance ratio test has distribution \chi_{n-1}^2

  • Compute the p-value p := P(\chi_{n-1}^2 > \chi^2) = 1 - P(\chi_{n-1}^2 \leq \chi^2)

Probability Distributions in R

Example

Suppose that

  • Degrees of freedom = 18
  • \chi^2 = 56.88


# Compute p-value for chi^2 = 56.88 and df = 18

chi_squared <- 56.88

p_value <- pchisq(chi_squared, df = 18)

cat("The p-value for chi^2 =", chi_squared, "with df = 18 is", p_value)
The p-value for chi^2 = 56.88 with df = 18 is 0.9999935

Variance ratio test - Example

Test in R

Month J F M A M J J A S O N D
Cons. Expectation 66 53 62 61 78 72 65 64 61 50 55 51
Cons. Spending 72 55 69 65 82 77 72 78 77 75 77 77
Difference -6 -2 -7 -4 -4 -5 -7 -14 -16 -25 -22 -26
  • Back to Example: Monthly data on CE and CS

  • Question: Test the following hypothesis: H_0 \colon \sigma \leq 1 \qquad H_1 \colon \sigma > 1

Variance ratio test - Example

Test in R

Month J F M A M J J A S O N D
Cons. Expectation 66 53 62 61 78 72 65 64 61 50 55 51
Cons. Spending 72 55 69 65 82 77 72 78 77 75 77 77
Difference -6 -2 -7 -4 -4 -5 -7 -14 -16 -25 -22 -26
  • Start by entering data into R
# Enter Consumer Expectation and Consumer Spending data
CE <- c(66, 53, 62, 61, 78, 72, 65, 64, 61, 50, 55, 51)
CS <- c(72, 55, 69, 65, 82, 77, 72, 78, 77, 75, 77, 77)

# Compute difference
difference <- CE - CS

Variance ratio test - Example

Test in R

  • Compute chi-squared statistic \chi^2 = \frac{(n-1) s^2}{\sigma^2_0}
# Compute sample size
n <- length(difference)

# Enter null hypothesis
sigma_0 <- 1

# Compute sample variance
variance <- var(difference)

# Compute chi-squared statistic
chi_squared <- (n - 1) * variance / sigma_0 ^ 2

Variance ratio test - Example

Test in R

  • Compute the p-value p = P(\chi_{n-1}^2 > \chi^2) = 1 - P(\chi_{n-1}^2 \leq \chi^2)
# Compute p-value
p_value <- 1 - pchisq(chi_squared, df = n - 1)

# Print p-value
cat("The p-value for one-sided variance test is", p_value)

Variance ratio test - Example

Running the code

The p-value for one-sided variance test is 0
  • Since p < 0.05 we reject H_0

Part 2:
F-distribution

Overview

Recall: Chi-squared distribution with p degrees of freedom is \chi_p^2 = Z_1^2 + \ldots + Z_p^2 where Z_1, \ldots, Z_n are iid N(0, 1)

Overview

Chi-squared distribution was used to:

  • Describe distribution of sample variance S^2: \frac{(n-1)S^2}{\sigma^2} \sim \chi_{n-1}^2

  • Define t distribution: t_p \sim \frac{U}{\sqrt{V/p}} where U \sim N(0,1) and V \sim \chi_p^2

F-distribution

F-distribution is:

  • Defined in terms of ratio of \chi_p^2
  • Describes variance estimators for independent samples

F-distribution

Definition
The r.v. F has F-distribution with p and q degrees of freedom if the pdf is f_F(x) = \frac{ \Gamma \left(\frac{p+q}{2} \right) }{ \Gamma \left( \frac{p}{2} \right) \Gamma \left( \frac{q}{2} \right) } \left( \frac{p}{q} \right)^{p/2} \, \frac{ x^{ (p/2) - 1 } }{ [ 1 + (p/q) x ]^{(p+q)/2} } \,, \quad x > 0

Notation: F-distribution with p and q degrees of freedom is denoted by F_{p,q}

F-distribution

The F-distribution is obtained as ratio of 2 independent chi-squared distributions

Theorem
Suppose that U \sim \chi_p^2 and V \sim \chi_q^2 are independent. Then X := \frac{U/p}{V/q} \sim F_{p,q}

F-distribution

Idea of Proof

  • Similar to the proof (seen in Homework 3) that \frac{U}{\sqrt{V/p}} \sim t_p where U \sim N(0,1) and V \sim \chi_p^2 are independent

  • In our case we need to prove X := \frac{U/p}{V/q} \sim F_{p,q} where U \sim \chi_p^2 and V \sim \chi_q^2 are independent

F-distribution

Idea of Proof

  • U \sim \chi_{p}^2 and V \sim \chi_q^2 are independent. Therefore \begin{align*} f_{U,V} (u,v) & = f_U(u) f_V(v) \\ & = \frac{ 1 }{ \Gamma \left( \frac{p}{2} \right) \Gamma \left( \frac{q}{2} \right) 2^{(p+q)/2} } u^{\frac{p}{2} - 1} v^{\frac{q}{2} - 1} e^{-(u+v)/2} \end{align*}

  • Consider the change of variables x(u,v) := \frac{u/p}{v/q} \,, \quad y(u,v) := u + v

F-distribution

Idea of Proof

  • This way we have X = \frac{U/p}{V/q} \,, \qquad Y = U + V

  • We can compute f_X via f_{X}(x) = \int_{0}^\infty f_{X,Y}(x,y) \, dy

F-distribution

Idea of Proof

  • The joint pdf f_{X,Y} can be computed by inverting the change of variables x(u,v) := \frac{u/p}{v/q} \,, \quad y(u,v) := u + v and using the formula f_{X,Y}(x,y) = f_{U,V}(u(x,y),v(x,y)) \, |\det J| where J is the Jacobian of the inverse transformation (x,y) \mapsto (u(x,y),v(x,y))

F-distribution

Idea of Proof

  • Since f_{U,V} is known, then also f_{X,Y} is known

  • Moreover the integral f_{X}(x) = \int_{0}^\infty f_{X,Y}(x,y) \, dy can be explicitly computed, yielding the thesis f_{X}(x) = \frac{ \Gamma \left(\frac{p+q}{2} \right) }{ \Gamma \left( \frac{p}{2} \right) \Gamma \left( \frac{q}{2} \right) } \left( \frac{p}{q} \right)^{p/2} \, \frac{ x^{ (p/2) - 1 } }{ [ 1 + (p/q) x ]^{(p+q)/2} }

Properties of F-distribution

Theorem
  1. Suppose X \sim F_{p,q} with q>2. Then {\rm I\kern-.3em E}[X] = \frac{q}{q-2}

  2. If X \sim F_{p,q} then 1/X \sim F_{q,p}

  3. If X \sim t_q then X^2 \sim F_{1,q}

Properties of F-distribution

Proof of Theorem

  1. Requires a bit of work. It will be left as Homework assignment

  2. By the Theorem in Slide 43 we have X \sim F_{p,q} \quad \implies \quad X = \frac{U/p}{V/q} with U \sim \chi_p^2 and V \sim \chi_q^2 independent. Therefore \frac{1}{X} = \frac{V/q}{U/p} \sim F_{q,p}

Properties of F-distribution

Proof of Theorem

  1. Suppose X \sim t_q. The Theorem in Slide 66 of Lecture 3 guarantees that X = \frac{U}{\sqrt{V/q}} where U \sim N(0,1) and V \sim \chi_q^2 are independent. Therefore X^2 = \frac{U^2}{V/q}

Properties of F-distribution

Proof of Theorem

  • Since U \sim N(0,1), by definition U^2 \sim \chi_1^2.
  • Moreover U^2 and V are independet, since U and V are independent
  • Theorem in Slide 43 implies X^2 = \frac{U^2}{V/q} \sim \frac{\chi_1^2/1}{\chi_q^2/q} \sim F_{1,q}