💪 Model Selection (In-Class Activity)
The data in states-2019.csv include statistics collected from Wikipedia, the 2019 American Community Survey, and the National Centers for Environmental Information.
Your goal is to use the data (and everything you’ve learned so far in your coursework) to create a model to predict variation in life expectancy. Keep track of the process you use to create this model, including:
- Which predictors should be included in the model?
- What is the criteria/evidence you are using to make these decisions?
- When in the process do you identify problematic observations?
- Do you remove those problematic observations or not?
- What criteria/evidence are you using to make these decisions?
- When in the process do you examine the model for collinearity?
- What is the criteria/evidence you are using to make this decision?
- What (if anything) will you do to fix this?
- When in the process do you examine the tenability of assumptions?
Also pay attention to when in the process you are making decisions based on sample evidence (graphs/statistics) versus when those decisions are being made using statistical inference (hypothesis tests, confidence intervals). Your group will be asked to report back to the class on the process, criteria, and evidence you used.