Observational Studies and the Bootstrap Test

In some studies, researchers do not assign study participants to groups/conditions. One example of this is the second study described in Validity Evidence and Conclusions: Peanut Allergies. In this study, the two groups being compared, mothers that ate peanuts during pregnancy and mothers that did not eat peanuts during pregnancy, were not assigned by the researchers — they “self-selected” into the groups based on whether or not they ate peanuts during pregnancy. When the study participants are not assigned to conditions by a researcher the study is referred to as an observational study.

Inferences a researcher can draw from an observational study are much weaker. Typically cause-and-effect conclusions cannot be made based on data collected from observational studies. This is because the possibility of alternative explanations always exists.

Drawing causal conclusions from an observational study is misleading and can even be construed as unethical. In 1988, results released to the public from the National Household Survey on Drug Abuse created the false perception that crack cocaine smoking was related to ethnicity. The analysis, which was based on observational data (researchers cannot assign race) showed that the rates of crack use among blacks and Hispanics were twice as high as among whites. The data were re-analyzed in 1992 by researchers from Johns Hopkins University to take into account social factors such as where the users lived and how easily the drug could be obtained. They found that after adjusting for these factors, there were no differences among blacks, Hispanics and whites in the use of crack cocaine.

Analyzing Data from Observational Studies

Although you probably won’t be able to draw a cause-and-effect inference, it is still useful to study and analyze observational data. Observational studies are often precursors to randomized studies— recall that one criteria of cause is that correlation exists, and observational data is a good for examining correlation.

To analyze data from an observational study, researchers use similar methods as they use for data from a study in which the study participants were randomly assigned to groups. The key difference, however, is that the variation in results is not due to experimental variation. It turns out that the variation is more akin to sampling variation. This means we need to adapt our randomization test to account for sampling variation rather than experimental variation.

To account for sampling variation in the randomization test, we change replacement option so that the outcome variable is sampled with replacement. Note that the group labels should still be sampled without replacement. (We want to model the same number of participants in each group as in any observational data.)

Sampling the outcome values with replacement will increase the variation in the simulated results (larger standard deviation of the mean values). This increased variation is compatible with sampling from a larger population. Sampling with replacement from the observed data is called bootstrapping. Subsequently, this test is referred to as a bootstrap test. The table below shows a comparison of the randomization test and the bootstrap test.


Test Variation Modeled TinkerPlots™
Randomization Experimental variation Two sampling devices:
  (1) outcomes (sampled without replacement); and
  (2) group labels (sampled without replacement)
Bootstrap Sampling variation Two sampling devices:
  (1) outcomes (sampled with replacement); and
  (2) group labels (sampled without replacement)