| Name | Group | Followers |
|---|---|---|
| Emo Phillips | Comedian | 16.3 |
| Steven Wright | Comedian | 6.5 |
| Maria Bamford | Comedian | 183 |
| Arthur Greenleaf Holmes | Comedian | 12.9 |
| Kristin Key | Comedian | 334 |
| Tim Minchin | Musician | 357 |
| Jonathan Coulton | Musician | 12.8 |
| Susanna Hoffs | Musician | 85.8 |
| Madi Diaz | Musician | 83.1 |
| Arlo Guthrie | Musician | 18.1 |
Bootstrap Test: Modeling Sampling Variation
In some studies, researchers do not assign study participants to groups/conditions. As an example, imagine that Dr. Bunsen Honeydew wants to study whether Comedians or Musicians have more followers on Instagram (IG). He collects data by selecting 5 of his favorite comedians and 5 of his favorite musicians and then getting the number of followers from IG. These data are shownin Table 1.
In this study, the two groups being compared are comedians and musicians. The 10 subjects in the data, of course, were not assigned by Dr. Honeydew to these groups — they “self-selected” into the groups based on whether they chose to become a comedian or a musician. When the study participants are not assigned to conditions by a researcher the study is referred to as an observational study.
In observational studies, we often think about the underlying variation that we need to account for to make statistical inferences differently than we do when participants have been assigned to groups by the researcher. In the latter situation (i.e., statistical experiments), we model and account for the experimental variation that arises due to random assignment. In observational studies, since there is no random assignment, we instead model and account for sampling variation, similar to when we were trying to account for random sampling.
Bootstrapping: Modeling Sampling Variation when Comparing Groups
To analyze data from an observational study, we need to adapt our randomization test to account for sampling variation rather than experimental variation. To account for sampling variation in the randomization test, we change replacement option for the sampling device producing the response/outcome attribute so that the it is being sampled with replacement. (Note that the group labels should still be sampled without replacement since we want to model the same number of participants in each group as in any observational data.) Sampling the outcome values with replacement is called bootstrapping and the subsequent test to compare groups is referred to as a bootstrap test.
Figure 1 shows the TinkerPlots sampler to carry out a bootstrap test using Dr. Honeydew’s data.