Statistical Models

Lecture 5

Lecture 5:
Two-sample F-test &
Goodness-of-fit test

Outline of Lecture 5

  1. The goodness-of-fit test
  2. Worked Examples

Part 1:
The goodness-of-fit test

Scenario 1: Simple counts

Data: in the form of numerical counts

Test: difference between observed counts and predictions of theoretical model

Example: Blood counts

  • We conducted blood type testing on a sample of 6004 individuals, and the results are summarized below.

    A B AB O
    2162 738 228 2876
  • We want to compare the above data to the theoretical probability model

    A B AB O
    1/3 1/8 1/24 1/2

Scenario 2: Counts with multiple factors

Manager Won Drawn Lost
Moyes 27 9 15
Van Gaal 54 25 24
Mourinho 84 32 28
Solskjaer 91 37 40
Rangnick 11 10 8
ten Hag 61 12 28

Example: Relative performance of Manchester United managers

  • Each football manager has Win, Draw and Loss count

Scenario 2: Counts with multiple factors

Manager Won Drawn Lost
Moyes 27 9 15
Van Gaal 54 25 24
Mourinho 84 32 28
Solskjaer 91 37 40
Rangnick 11 10 8
ten Hag 61 12 28

Questions:

  • Is the number of Wins, Draws and Losses uniformly distributed?
  • Are there differences between the performances of each manager?

Plan

  1. In this Lecture:
    • We study Scenario 1 – Simple counts
    • Chi-squared goodness-of-fit test
  2. Next week:
    • We will study Scenario 2 – Counts with multiple factors
    • Chi-squared test of independence

Categorical Data

  • Finite number of possible categories or types
  • Observations can only belong to one category
  • O_i refers to observed count of category i
Type 1 Type 2 \ldots Type n
O_1 O_2 \ldots O_n
  • E_i refers to expected count of category i
Type 1 Type 2 \ldots Type n
E_1 E_2 \ldots E_n

Chi-squared goodness-of-fit test

Goal: Compare expected counts E_i with observed counts O_i

Null hypothesis: Expected counts match the Observed counts

H_0 \colon O_i = E_i \,, \qquad \forall \, i = 1, \ldots, n

Method: Look for evidence against the null hypothesis

  • Distance between observed counts and expected counts is large

  • For example, if (O_i - E_i)^2 \geq c for some chosen constant c

Chi-squared statistic

Definition
The chi-squared statistic is \chi^2 := \sum_{i=1}^n \frac{(O_i-E_i)^2}{E_i}

Remark:

  • \chi^2 represents the global distance between observed and expected counts

  • Indeed, we have that \chi^2 = 0 \qquad \iff \qquad O_i = E_i \,\,\,\, \text{ for all } \,\,\,\, i = 1 , \, \ldots , \, n

Tests using Chi-squared statistic

Null hypothesis: Expected counts match the Observed counts

H_0 \colon O_i = E_i \,, \qquad \forall \, i = 1, \ldots, n

Remarks:

  • If H_0 is to be believed, we expect small differences between O_i and E_i
    • Therefore \chi^2 will be small (and non-negative)
  • If H_0 is wrong, it will happen that some O_i are larger than the E_i
    • Therefore \chi^2 will be large (and non-negative)
  • The above imply that tests on \chi^2 should be one-sided (right-tailed)

The Multinomial distribution

Models the following experiment

  • The experiment consists of m independent trials
  • Each trial results in one of n distinct possible outcomes
  • The probability of the i-th outcome is p_i on every trial, with 0 \leq p_i \leq 1 \qquad \qquad \sum_{i=1}^n p_i = 1
  • X_i counts the number of times i-th outcome occurred in the m trials. It holds \sum_{i=1}^n X_i = m

Multinomial distribution

Schematic visualization


Outcome type 1 \ldots n Total
Counts X_1 \ldots X_n X_1 + \ldots + X_n = m
Probabilities p_1 \ldots p_n p_1 + \ldots + p_n = 1

The case n = 2

For n = 2, the multinomial reduces to a binomial:

  • Each trial has n = 2 possible outcomes
  • X_1 counts the number of successes
  • X_2 = m − X_1 counts the number of failures in m trials
  • Probability of success is p_1
  • Probability of failure is p_2 = 1 - p_1
Outcome types 1 2
Counts X_1 X_2 = m - X_1
Probabilities p_1 p_2 = 1 - p_1

Formal definition

Definition
Let m,n \in \mathbb{N} and p_1, \ldots, p_n numbers such that 0 \leq p_i \leq 1 \,, \qquad \quad \sum_{i=1}^n p_i = 1 The random vector \mathbf{X}= (X_1, \ldots, X_n) has multinomial distribution with m trials and cell probabilities p_1,\ldots,p_n if joint pmf is f (x_1, \ldots , x_n) = \frac{m!}{x_1 ! \cdot \ldots \cdot x_n !} \ p_1^{x_1} \cdot \ldots \cdot p_n^{x_n} \,, \qquad \forall \, x_i \in \mathbb{N}\, \, \text{ s.t. } \, \sum_{i=1}^n x_i = m We denote \mathbf{X}\sim \mathop{\mathrm{Mult}}(m,p_1,\ldots,p_n)

Properties of Multinomial distribution

Suppose that \mathbf{X}= (X_1, \ldots, X_n) \sim \mathop{\mathrm{Mult}}(m,p_1,\ldots,p_n)

  • If we are only interested in outcome i, the remaining outcomes are failures

  • This means X_i is binomial with m trials and success probability p_i

  • We write X_i \sim \mathop{\mathrm{Bin}}(m,p_i) and the pmf is f(x_i) = P(X = x_i) = \frac{m!}{x_i! \cdot (1-x_i)!} \, p_i^{x_i} (1-p_i)^{1-x_i} \qquad \forall \, x_i = 0 , \ldots , m

  • Since X_i \sim \mathop{\mathrm{Bin}}(m,p_i) it holds {\rm I\kern-.3em E}[X_i] = m p_i \qquad \qquad {\rm Var}[X_i] = m p_i (1-p_i)

Statistical Model: Multinomial Counts

  • O_i refers to observed count of category i

  • E_i refers to expected count of category i

  • We suppose that Type i is observed with probability p_i and 0 \leq p_i \leq 1 \,, \qquad \quad p_1 + \ldots + p_n = 1

  • Total number of observations is m

  • The counts are modelled by (O_1, \ldots, O_n) \sim \mathop{\mathrm{Mult}}(m, p_1, \ldots, p_n)

  • The expected counts are modelled by E_i := {\rm I\kern-.3em E}[ O_i ] = m p_i

The chi-squared statistic

Consider counts and expected counts (O_1, \ldots, O_n) \sim \mathop{\mathrm{Mult}}(m, p_1, \ldots, p_n) \qquad \qquad E_i := m p_i

Definition
The chi-squared statistic for multinomial counts is defined by \chi^2 = \sum_{i=1}^n \frac{(O_i-E_i)^2}{E_i} = \sum_{i=1}^n \frac{( O_i - m p_i )^2}{ m p_i }

Question: What is the distribution of \chi^2 \,?

Distribution of chi-squared statistic

Theorem
Suppose the counts (O_1, \ldots, O_n) \sim \mathop{\mathrm{Mult}}(m,p_1, \ldots, p_n). Then \chi^2 = \sum_{i=1}^n \frac{( O_i - m p_i )^2}{ m p_i } \ \stackrel{{\rm d}}{\longrightarrow} \ \chi_{n-1}^2 when sample size m \to \infty, where the convergence is in distribution

  • Hence, the distribution of \chi^2 is approximately \chi_{n-1}^2 when m is large

  • The above Theorem is due to Karl Pearson in his 1900 paper link

  • Proof is difficult. Seven different proofs are presented in this paper link

Heuristic proof of Theorem

  • Since O_i \sim \mathop{\mathrm{Bin}}(m, p_i), the Central Limit Theorem implies \frac{O_i - {\rm I\kern-.3em E}[O_i]}{ \sqrt{{\rm Var}[O_i] } } = \frac{O_i - m p_i }{ \sqrt{m p_i(1 - p_i) } } \ \stackrel{{\rm d}}{\longrightarrow} \ N(0,1) as m \to \infty

  • In particular, since (1-p_i) in constant, we have \frac{O_i - m p_i }{ \sqrt{m p_i } } \ \approx \ \frac{O_i - m p_i }{ \sqrt{m p_i(1 - p_i) } } \ \approx \ N(0,1)

Heuristic proof of Theorem

  • Squaring the previous expression, we get \frac{(O_i - m p_i)^2 }{ m p_i } \ \approx \ N(0,1)^2 = \chi_1^2

  • If the above random variables were pairwise independent, we would obtain \chi^2 = \sum_{i=1}^n \frac{(O_i - m p_i)^2 }{ m p_i } \ \approx \ \sum_{i=1}^n \chi_1^2 = \chi_n^2

  • However the O_i are not independent, because of the linear constraint O_1 + \ldots + O_n = m (total counts have to sum to m)

Heuristic proof of Theorem

  • A priori, there should be n degrees of freedom

  • However, the linear constraint reduces degrees of freedom by 1

    • because one can choose the first n-1 counts, and the last one is given by O_n = m - O_1 - \ldots - O_{n-1}
  • Thus, we have n-1 degrees of freedom, implying that \chi^2 = \sum_{i=1}^n \frac{(O_i - m p_i)^2 }{ m p_i } \ \approx \ \chi_{n-1}^2

This is not a proof! The actual proof is super technical

Quality of chi-squared approximation

  • Define expected counts E_i := m p_i

  • Consider approximation from Theorem: \chi^2 = \sum_{i=1}^n \frac{(O_i - E_i)^2 }{ E_i } \ \approx \ \chi_{n-1}^2

  • The approximation is:

    Good E_i \geq 5 \, \text{ for all } \, i = 1 , \ldots n
    Bad E_i < 5 \, for some \, i = 1 , \ldots n
  • How to compute the p-value: When approximation is

    • Good: Use \chi_{n-1}^2 approximation of \chi^2
    • Bad: Use Monte Carlo simulations (more on this later)

The chi-squared goodness-of-fit test

Setting:

  • Population consists of individuals of n different types
  • p_i is probability that an individuals selected at random is of type i

Problem: p_i is unknown and needs to be estimated

Hypothesis: As guess for p_1,\ldots,p_n, we take p_1^0, \ldots, p_n^0 such that 0 \leq p_i^0 \leq 1 \qquad \qquad \sum_{i=1}^n p_i^0 = 1

The chi-squared goodness-of-fit test

Formal Hypothesis: We test for equality of p_i to p_i^0 \begin{align*} & H_0 \colon p_i = p_i^0 \qquad \text{ for all } \, i = 1, \ldots, n \\ & H_1 \colon p_i \neq p_i^0 \qquad \text{ for at least one } \, i \end{align*}

Sample:

  • We draw m items from population
  • O_i denotes the number of items of type i drawn
  • According to our model, (O_1, \ldots, O_n) \sim \mathop{\mathrm{Mult}}(m,p_1, \ldots, p_n)

The chi-squared goodness-of-fit test

Data: Vector of counts (o_1,\ldots,o_n)

Schematically: We can represent probabilities and counts in a table


Type 1 \ldots n Total
Counts o_1 \ldots o_n m
Probabilities p_1 \ldots p_n 1

Procedure: 3 Steps

  1. Calculation:
    • Compute total counts and expected counts m = \sum_{i=1}^n o_i \qquad \quad E_i = m p_i^0
    • Compute the chi-squared statistic \chi^2 = \sum_{i=1}^n \frac{ (o_i - E_i)^2 }{E_i}

  1. Statistical Tables or R:
    • Check that E_i \geq 5 for all i = 1, \ldots, n
    • In this case \chi^2 \ \approx \ \chi_{n-1}^2
    • Find critical value \chi^2_{n-1}(0.05) in Table 2
    • Alternatively, compute p-value in R


  1. Interpretation: Reject H_0 when either p < 0.05 \qquad \text{ or } \qquad \chi^2 \in \,\,\text{Rejection Region}
Alternative Rejection Region p-value
\exists \, i \,\, s.t. \,\, p_i \neq p_i^0 \chi^2 > \chi^2_{n-1}(0.05) P(\chi_{n-1}^2 > \chi^2)

Reject H_0 if \chi^2 statistic falls in the Rejection Region (in gray)

The chi-squared goodness-of-fit test in R

  1. Store the counts o_1,\ldots, o_n in R vector
    • counts <- c(o1, ..., on)
  2. Store the null hypothesis probabilities p_1^0,\ldots, p_n^0 in R vector
    • null.p <- c(p1, ..., pn)
  3. Perform a goodness-of-fit test on counts with null.p
Alternative R command
\exists \, i \,\, s.t. \,\, p_i \neq p_i^0 chisq.test(counts, p = null.p)
  1. Read output: similar to two-sample t-test and F-test
    • The main quantity of interest is p-value

Comment 1

Default null probabilities

If null probabilities are not specified:

  • R assumes equal probability for each type

  • For example, assume that counts <- c(o1, ..., on)

  • Equal probability means that p_i^0 = \frac{1}{n} \,, \qquad \forall \, i = 1, \ldots, n

  • Test counts against equal probabilities with the command

    • chisq.test(counts)

Comment 2

Quality of chi-squared approximation

  • By default, R computes p-value via the \chi_{n-1}^2 approximation

  • R will warn you if expected counts are too low, and tell you to use Monte Carlo

  • To compute p-value via Monte Carlo simulation, use option

    • simulate.p.value = T

Part 2:
Worked Examples

Example 1: Blood counts

  • Suppose to have counts of blood type for some people

  • We also have theoretical probabilities that we want to test

    Blood type A B AB O
    Count 2162 738 228 2876
    Probability 1/3 1/8 1/24 1/2
  • Hence the null hypothesis probabilities are p_1^0 = \frac{1}{3} \qquad \quad p_2^0 = \frac{1}{8} \qquad \quad p_3^0 = \frac{1}{24} \qquad \quad p_4^0 = \frac{1}{2}

  • The alternative hypothesis is: \qquad H_1 \colon p_i \neq p_i^0 \,\, for at least one \, i

Goodness-of-fit test by hand

  1. Calculation:
    • Compute total counts m = \sum_{i=1}^n o_i = 2162 + 738 + 228 + 2876 = 6004
    • Compute expected counts \begin{align*} E_1 & = m p_1^0 = 6004 \times \frac{1}{3} = 2001.3 \\ E_2 & = m p_2^0 = 6004 \times \frac{1}{8} = 750.5 \\ E_3 & = m p_3^0 = 6004 \times \frac{1}{24} = 250.2 \\ E_4 & = m p_4^0 = 6004 \times \frac{1}{2} = 3002 \end{align*}

  1. Calculation:
    • Compute the chi-squared statistic \begin{align*} \chi^2 & = \sum_{i=1}^n \frac{ (o_i - E_i)^2 }{E_i} \\ & = \frac{ (2162 − 2001.3)^2 }{ 2001.3 } + \frac{ (738 − 750.5)^2 }{ 750.5 } \\ & \phantom{ = } + \frac{ (228 − 250.2)^2 }{ 250.2 } + \frac{ (2876 − 3002)^2 }{ 3002 } \\ & = 20.36 \end{align*}

  1. Statistical Tables:
    • Degrees of freedom are \, {\rm df} = n - 1 = 3
    • We have computed E_1 = 2001.3 \qquad E_2 = 750.5 \qquad E_3 = 250.2 \qquad E_4 = 3002
    • Hence E_i \geq 5 for all i = 1, \ldots, n
    • In this case \chi^2 \ \approx \ \chi_{3}^2
    • In chi-squared Table 2 we find critical value \chi^2_{3} (0.05) = 7.82

  1. Interpretation:
    • We have that \chi^2 = 20.36 > 7.82 = \chi_{3}^2 (0.05)
    • Therefore we reject H_0
    • This means we accept the alternative H_1 \colon p_i \neq p_i^0 \qquad \text{ for at least one } \, i

Conclusion: Observed counts suggest at least one of null probabilities p_i^0 is wrong

Goodness-of-fit test in R

  • We use R to perform a goodness-of-fit test on alternative H_1 \colon p_i \neq p_i^0 \qquad \text{ for at least one } \, i

  • The code can be downloaded here good_fit.R

# Enter counts and null hypothesis probabilities
counts <- c(2162, 738, 228, 2876)
null.p <- c(1/3, 1/8, 1/24, 1/2)

# Perform goodness-of-fit test
# Store the output and print

ans <- chisq.test(counts, p = null.p)
print(ans)

Output

  • Running the code in the previous slide we obtain

    Chi-squared test for given probabilities

data:  counts
X-squared = 20.359, df = 3, p-value = 0.000143


  • Chi-squared statistic is \, \chi^2 = 20.359

  • Degrees of freedom are \, {\rm df} = 3

  • p-value is p \approx 0

  • Therefore p < 0.05

  • We reject H_0

Conclusion: At least one of the theoretical probabilities appears to be wrong

Example 2: Fair dice

  • A dice with 6 faces is rolled 100 times
  • The counts observed are
Outcome 1 2 3 4 5 6
Count 13 17 9 17 18 26
  • Exercise: Is the dice fair?
    • Formulate appropriate goodness-of-fit test
    • Implement this test in R

Solution

  • The dice is fair if

P(\text{rolling }\, i) = \frac16 \,, \quad \forall \, i = 1, \ldots, 6

  • Therefore, we test the following hypothesis \begin{align*} & H_0 \colon p_i = \frac{1}{6} \qquad \text{ for all } \, i = 1, \ldots, 6 \\ & H_1 \colon p_i \neq \frac{1}{6} \qquad \text{ for at least one } \, i \end{align*}

  • The R code is given below
# Enter counts and null hypothesis probabilities
counts <- c(13, 17, 9, 17, 18, 26)
null_p. <- rep(1/6, 6)

# Perform goodness-of-fit test
chisq.test(counts, p = null.p)


  • Note that each type is equally likely

  • Therefore, we can achieve same result without specifying null probabilities

# Enter counts
counts <- c(13, 17, 9, 17, 18, 26)

# Perform goodness-of-fit test assuming equal probabilities
chisq.test(counts)

Solution

  • Both codes in the previous slide give the following output

    Chi-squared test for given probabilities

data:  counts
X-squared = 9.68, df = 5, p-value = 0.08483


  • Chi-squared statistic is \, \chi^2 = 9.68

  • Degrees of freedom are \, {\rm df} = 5

  • p-value is p \approx 0.08

  • Therefore p > 0.05

  • We cannot reject H_0

Conclusion: The dice appears to be fair

Example 3: Voting data

  • Assume there are two candidates: Republican and Democrat
  • Voter can choose one of these, or be undecided
  • 100 people are surveyed, and the results are
Republican Democrat Undecided
35 40 25
  • Hystorical data suggest that undecided voters are 30\% of population

  • Exercise: Is difference between Republican and Democratic significant?

    • Formulate appropriate goodness-of-fit test
    • Implement this test in R
    • You are not allowed to use chisq.test

Solution

  • Hystorical data suggest that undecided voters are 30\% of population. Hence p_3^0 = 0.3

  • Want to test if there is difference between Republican and Democrat

  • Hence null hypothesis is p_1^0 = p_2^0

  • Since probabilities must sum to 1, we get p_1^0 + p_2^0 + p_3^0 = 1 \quad \implies \quad p_1^0 = 0.35\,, \qquad p_2^0 = 0.35 \,, \qquad p_3^0 = 0.3

  • We test the hypothesis H_0 \colon p_i = p_i^0 \,\, \text{ for all } \, i = 1, \ldots, 3 \,, \qquad H_1 \colon p_i \neq p_i^0 \, \text{ for some }\, i

  • Perform goodness-of-fit test without using chisq.test

  • First, we enter the data

# Enter counts and null hypothesis probabilities
counts <- c(35, 40 , 25)
null.p <- c(0.35, 0.35, 0.3)
  • Compute the total number of counts m = o_1 + \ldots + o_n
# Compute total counts
m <- sum(counts)
  • Compute degrees of freedom \, {\rm df} = n - 1
# Compute degrees of freedom
degrees <- length(counts) - 1

  • Compute the expected counts E_i = m p_i^0
# Compute expected counts
exp.counts <- m * null.p
  • We now check that the expected counts satisfy E_i \geq 5
# Print expected counts with a message
cat("The expected counts are:", exp.counts)

# Check if the expected counts are larger than 5
if (all(exp.counts >= 5)) {
  cat("Expected counts are larger than 5.", 
      "\nThe chi-squared approximation is valid!")
} else {
  cat("Warning, low expected counts.",
      "\nMonte Carlo simulation must be used.")
}
The expected counts are: 35 35 30
Expected counts are larger than 5. 
The chi-squared approximation is valid!

  • Compute the chi-squared statistic \chi^2 = \sum_{i=1}^n \frac{( o_i - E_i )^2}{E_i}
# Compute chi-squared statistic
chi.squared <- sum( (counts - exp.counts)^2 / exp.counts )
  • We know that all the counts are larger than 5. Thus \chi^2 \approx \chi_{n-1}^2

  • We can therefore compute the p-valued with the formula p = P( \chi_{n-1}^2 > \chi^2 ) = 1 - P( \chi_{n-1}^2 \leq \chi^2 )

# Compute p-value
p_value <- 1 - pchisq(chi.squared, df = degrees)

# Print p-value
cat("The p-value is:", p_value)

The expected counts are: 35 35 30
Expected counts are larger than 5. 
The chi-squared approximation is valid!
The p-value is: 0.4612526


  • Therefore p-value is p > 0.05

  • We do not reject H_0

Conclusion: No reason to believe that Republicans and Democrats are not tied