Assignment 05

CATEGORICAL PREDICTORS

This goal of this assignment is to give you experience fitting and interpreting regression models with categorical predictors. This assignment is worth 13 points. Each question is worth 1 point unless otherwise noted.

Copyright EPSY 8251, 2024
Published

September 26, 2024

For this assignment, you will be fitting several regression models to examine whether there are differences in the engagement level of reviewers on IMDb for Scooby-Doo episodes/movies based on which members of Mystery Inc caught the villain. To do so, you will use the data in the file scoobydoo.csv.

Submit your responses to each of the questions below in a PDF document. All graphics should be resized so that they do not take up more room than necessary and also should have an appropriate caption.


Preparation: Create Dummy Variables

Create four dummy variables for the analysis; one for each condition of the caught_by attribute. Also create another dummy variable to represent media format.


Unadjusted Group Differences Model: ANOVA

Fit the regression model that uses the dummy predictors for which Mystery Inc. members caught the villain to predict variation in IMDb engagement. In this model, use Shaggy/Scooby as the reference group.

  1. Write the fitted regression equation.

  2. Which conditions of caught_by, if any, differ from Shaggy/Scooby in the average engagement on IMDb produced (more than we expect because of sampling variation)? Explain.

  3. Report and interpret the \(R^2\) value for this model.

  4. Which comparisons of the caught_by condition reflected in the omnibus null hypothesis are not represented in this fitted model?


Adjusted Group Differences Model: ANCOVA

Again, fit the regression model that uses the dummy predictors for which Mystery Inc. members caught the villain to predict variation in average IMDb engagement, but this time control for differences in (1) IMDb rating, (2) number of catchphrases uttered, and (3) media format. Again, use Shaggy/Scooby as the reference group.

  1. Write the fitted regression equation.

  2. Which conditions of caught_by, if any, differ from Shaggy/Scooby in the average IMDb engagment (more than we expect because of sampling variation) after controlling for differences in these other predictors? Explain

  3. Report and interpret the \(R^2\) value for this model.

  4. Compute and report the adjusted mean engagement levels for each level of the caught_by variable. Show your work. (You can check your work using the emmeans() function, but do not use that to show your work for this question!)


Assumptions

  1. Create the density plot of the marginal distribution of the standardized residuals from the ANCOVA model. Add the confidence envelope for the normal distribution. Explain whether or not this plot suggests problems about meeting the normality assumption.

  2. Create the scatterplot of the standardized residuals versus the fitted values from the ANCOVA model. Include any smoothers and confidence envelopes that will allow you to evaluate the linearity assumption. In the plot identify observation with extreme residuals (\(\leq-3\) or \(\geq3\)) by indicating the row number of that observation in the plot.

  3. Explain whether or not this plot suggests problems about meeting the linearity and homogeneity of variance assumptions.


Pairwise Differences and Adjusted Means

Use the results from comparing the pairwise differences between the conditions of caught_by in the ANCOVA model to answer the questions in this section.

  1. Create a table (suitable for publication) that presents each of the possible pairwise contrasts (null hypotheses) of interest, the unadjusted p-values from the controlled model, and the Benjamini–Hochberg adjusted p-values for the controlled differences. (Note: To obtain all of these, you may need to fit additional models.)

  2. Create a visualization to show the adjusted mean engagement levels for different members of Mystery Inc that caught the villain. This visualization should mimic that of Figure 20.2 in the textbook. (This should be based on the Benjamini–Hochberg adjusted p-values.) Feel free to use any software tool you want to create this visualization.