The goal of this assignment is to build your understanding of using information criteria for model selection.
In this assignment, you will use the data from the file wine.csv to examine several different predictors of wine rating (a measure of the wine’s quality). The literature has suggested that price of wine is quite predictive of a wine’s quality. You will be carrying out a replication study (using a different data set) of a study published by Snipes and Taylor (2014).
Submit either an HTML file or, if you are not using R Markdown, a PDF file of your responses to the following questions. Please adhere to the following guidelines for further formatting your assignment:
This assignment is worth 15 points.
Read the article Model selection and Akaike Information Criteria: An example from wine ratings and prices.
Fit the same nine candidate models that Snipes and Taylor fitted in their analysis, using the wine.csv data. In these models use wine rating (rating) as the outcome. The point is not to replicate their exact data, but to use the same set of predictors—even though in our dataset the predictors have different levels (e.g., our data includes more regions than Snipes and Taylor’s data, and we will treat year as a continuous variable; don’t categorize it). By using a different set of data we can more vigorously evaluate the underlying working hypotheses.
Compute and report the likelihood for Model 1 given the residuals and set of model assumptions. Use dnorm() for this computation, and show your syntax for full credit.
Create a table of the log-likelihoods for the nine candidate models. (Use the logLik() function to compute these values.)
Compute and interpret the likelihood ratio for comparing the empirical support between Model 3 and Model 4.
Can we carry out a likelihood ratio test to evaluate whether the amount of empirical support when comparing Model 3 and Model 4 is more than we expect because of sampling error? If so, compute and report the results from the \(\chi^2\)-test. If not, explain why not.
Compute and interpret the likelihood ratio for comparing the empirical support between Model 3 and Model 6.
Can we carry out a likelihood ratio test to evaluate whether the amount of empirical support when comparing Model 3 and Model 6 is more than we expect because of sampling error? If so, compute and report the results from the \(\chi^2\)-test. If not, explain why not.
Create a table of model evidence that includes the following information for each of the nine candidate models. (2pts.)
Use this table of model evidence to answer Questions 8–14.
Use the AICc values to select the working hypothesis with the most empirical evidence.
Interpret the model probability/AICc weight for the working hypothesis with the most empirical evidence.
Compute and interpret the evidence ratio that compares the two working hypotheses with the most empirical evidence.
Based on previous literature, Snipes and Taylor hypothesized that price was an important predictor of wine quality. Based on your analyses, is price an important predictor of wine quality? Justify your response by referring to the model evidence. (Hint: Pay attention to which models include price and which do not.)
Does the empirical evidence support adopting more than one working hypothesis? Justify your response by referring to the model evidence.
Does the empirical evidence from the Snipes and Taylor analyses support adopting more than one candidate model? Justify your response by by referring to the model evidence.
Based on your responses to the last two questions, which set of analyses (yours or Snipes and Taylor) has more model selection uncertainty? Explain.