16  Errors in Hypothesis Testing

In this chapter you will learn about two types of errors that we can potentially make in hypothesis testing when we draw conclusions.


16.1 Type I and Type II Errors

The conclusions we draw about hypotheses are based on a p-value. Remember the p in p-value stands for probability. That means the conclusions are probabilistic in nature. For example, in our teen sleep case study the p-value was .0000000000000433. If we were interpreting this value, we would say:

If the null hypothesis is true, the probability of seeing a sample mean at least as extreme as 7.5 (the mean we observed in the data), is .0000000000000433.

From this we reasoned that because seeing the sample mean we observed was so unlikely that the null hypothesis was probably not true; we rejected the null hypothesis. In reality, however, this decision to reject might be wrong. It may be that the null hypothesis is true, and we just observed something that was a really, really low probability event. After all, the probability of seeing such an extreme sample mean did not have a zero probability.


16.1.1 Type I Errors

Anytime we reject the null hypothesis in a hypothesis test, we might make what is called a Type I error.1 A Type I error is rejecting the null hypothesis when it is, in fact, true. In our example this would mean concluding that the average amount of sleep for all teens was less than 9 hours, when in fact it really wasn’t.

Unfortunately, when we reject the null hypothesis, we can never know if we made a Type I error or not. In order to determine that, you would actually need to know what the population mean was, in which case you wouldn’t have to test what it was. While we can’t know if we made a Type I error, we can set the probability of making a Type I error. The probability of making a Type I error is called \(\alpha\) (alpha)2 and is the value we compare our p-value to. In the social sciences, this probability is almost always .05. That is why we compare our p-value to .05 in order to decide whether to reject the null hypothesis.


16.1.2 Type II Errors

A Type II error is failing to reject the null hypothesis when it is, in fact, false. For example, if we had found a p-value in the teen sleep case study that was bigger than .05, we would have failed to reject the null hypothesis. In this case, our conclusion would have been that the average amount of sleep for all teens was NOT less than 9 hours. Once again, once you have drawn a conclusion about the null hypothesis, there is some probability that the conclusion you drew was wrong.

Similar to our potential for making an error when we reject a null hypothesis, we can never know for sure whether we have made a Type II error or not after we fail to reject a null hypothesis. We can only know the probability of making a Type II error which is conventionally termed \(\beta\) (beta)3. The value of \(\beta\) is somewhat complex, but has an inverse relationship with the value of \(\alpha\), such that as the value of \(\alpha\) gets smaller, the value of \(\beta\) gets larger, and vice versa. This means that if you make \(\alpha\) smaller (to protect against making a Type I error), you will make the probability of making a Type II error larger!

The type of error you can make is completely dependent on the conclusion you draw about the null hypothesis. For example, if we reject the null hypothesis, the only type of error you can make is a Type I error. You cannot make a Type II error since that requires failing to reject the null hypothesis. That is, when you reject the null hypothesis, the probability of making a Type II error is 0. Similarly, if you fail to reject the null hypothesis, then you cannot make a Type I error; the probability of making a Type I error after you fail to reject the null hypothesis is 0.


16.1.3 Some Practice with Type I and Type II Errors

Your Turn

In the continuous assessment case study from Chapter 10, we evaluated the following set of hypotheses about whether, on average, Ethiopian primary school teachers agree/disagree with the statement that they assess students’ prior knowledge:

\[ \begin{split} H_0: \mu = 2.5 \\[1ex] H_A: \mu \neq 2.5 \end{split} \]

Based on the results of the one-sample t-test, \(t(29) = -0.41\), \(p = 0.687\), which type of error could we have made if we were using an \(\alpha\)-value of .05? Explain.

Using the context of the problem (i.e., survey responses of the assessment of prior knowledge) and the hypotheses tested, what does it mean to make this type of error?

In the house prices case study from Chapter 10, we evaluated the following set of hypotheses about whether, on average, houses near the University of Minnesota campus more expensive than $322.46k (the average price of a single-family house in Minneapolis as of May 2023).

\[ \begin{split} H_0: \mu = 322.46 \\[1ex] H_A: \mu > 322.46 \end{split} \]

Based on the results of the one-sample t-test, \(t(14) = 3.12\), \(p = 0.004\), which type of error could we have made if we were using an \(\alpha\)-value of .01? Explain.

Using the context of the problem (i.e., house prices near the UMN campus) and the hypotheses tested, what does it mean to make a this type of error?

In the case study from Chapter 13, we evaluated the following set of hypotheses about whether the average decline in IQ scores for all persistent marijuana users that became dependent as adults is the same as the average decline in IQ scores for all persistent marijuana users that became dependent as teens.

\[ \begin{split} H_0: \mu_{\text{Adult-onset}} = \mu_{\text{Teen-onset}} \\[1ex] H_0: \mu_{\text{Adult-onset}} > \mu_{\text{Teen-onset}} \end{split} \] Based on the results of the two-sample t-test, \(t(35) = 2.47\), \(p = 0.009\), which type of error could we have made if we were using an \(\alpha\)-value of .05? Explain. Also, using the context of the problem and the hypotheses tested, what does it mean to make a this type of error?

In the case study from Chapter 15, we evaluated the following set of hypotheses about whether the proportion of Bachelor contestants that are still together with the suitor they chose is different than the proportion of Bachelorette contestants that are still together with the suitor they chose.

\[ \begin{split} H_0: \pi_{\text{Bachelor}} = \pi_{\text{Bachelorette}} \\[1ex] H_A: \pi_{\text{Bachelor}} \neq \pi_{\text{Bachelorette}} \end{split} \]

Based on the results of the two-sample z-test, \(z = -.772\), \(p = 0.440\), which type of error could we have made if we were using an \(\alpha\)-value of .05? Explain. Also, using the context of the problem and the hypotheses tested, what does it mean to make a this type of error?


16.2 References


  1. Conventionally, we use Roman numerals to indicate the different types of errors we can make in hypothesis testing.↩︎

  2. Fun Fact: \(\alpha\) is the first letter in the Greek alphabet (equivalent to the Roman letter “a”) so that is why it is associated with the first type (i.e., Type I) error.↩︎

  3. Fun Fact: \(\beta\) is the second letter in the Greek alphabet (equivalent to the Roman letter “b”) so that is why it is associated with the second type (i.e., Type II) error.↩︎