Assignment 02

INTRODUCTION TO REGRESSION

This goal of this assignment is to give you experience fitting and interpreting simple regression models. This assignment is worth 13 points. Each question is worth 1 point unless otherwise noted.

Copyright EPSY 8251, 2024
Published

October 8, 2024

Should more money be spent on public schools or should that money be spent elsewhere? Both sides of this ongoing public debate have been argued passionately, using a multitude of anecdotal evidence. Although we will not settle this debate, we will examine data akin to the types of data that policy makers use to make funding decisions. Specifically, you will use the data from the file state-education.csv to examine whether teacher salaries are related to SAT scores at the state level.

Submit a PDF document of your responses to the following questions. Please adhere to the following guidelines for further formatting your assignment:


Preparation

Before carrying out any analyses, create a predictor called salary_thousand that indicates the average 2020–21 state salary in thousands of dollars (e.g., salary = 52143; salary_thousand = 52.143). This variable (not salary) should be used in all analyses in the assignment.


Questions

  1. Create a density plot of the distribution of total SAT scores. Make sure your plot has a caption.

  2. Examine the structure and formatting of Table 1 at https://zief0002.github.io/musings/creating-tables-to-present-statistical-results.html. Mimic the format and structure of this table to create a table to present the numerical summary information for the distributions of SAT total scores and salaries. Provide the same measures for these variables as is given in Table 1 in the article. Re-create the formatting of Table 1 as closely as you can. Finally, make sure the table you create also has an appropriate caption.

  3. Create a scatterplot of the distribution of SAT total scores (Y) conditioned on teacher salaries (X). Make sure your plot has a caption.

  4. Describe the relationship between teacher salaries and SAT total scores. Be sure to comment on the structural form, direction and strength of the relationship. Also comment on any potential observations that deviate from following this relationship (unusual observations or clusters of observations).

  5. Compute and report the correlation coefficient.

  6. Is the correlation coefficient and appropriate measure for this relationship? Explain by referring to the functional form you described in Question 4.

  7. Regress total SAT scores on teacher salaries. Write the fitted equation using Equation Editor (or some other program that correctly types mathematical expressions).

  8. Interpret the value of the intercept from the regression equation using the context of the data.

  9. Interpret the value of the slope from the regression equation using the context of the data.

  10. Compute, report, and interpret the value for \(R^2\) based on values from the ANOVA decomposition. Show your work for full credit.

  11. Compute and report the predicted mean SAT score for Minnesota students based on the average teacher salary in the state using the fitted regression equation. Show your work for full credit.

  12. Compute and report the residual for Minnesota. Show your work for full credit.

  13. Explain what the sign and magnitude of the residual value you computed in Question 12 tells you about how Minnesota’s mean SAT score compares to the mean SAT score for states having the same average teacher salary as Minnesota.