Assignment 02

SIMPLE LINEAR REGRESSION: DESCRIPTION

Published

July 31, 2023

Should more money be spent on public schools or should that money be spent elsewhere? Both sides of this ongoing public debate have been argued passionately, using a multitude of anecdotal evidence. Although we will not settle this debate, we will examine data akin to the types of data that policy makers use to make funding decisions.

The goal of this assignment is to give you experience fitting and interpreting simple regression models. In this assignment, you will use the data from the file state-education.csv to examine the relationship between teacher salaries and student total SAT scores at the state level.

Instructions

Submit a PDF document of your responses to the following questions. Please adhere to the following guidelines for further formatting your assignment:

  • All plots should be resized so that they do not take up more room than necessary.
  • All figures and tables should have a name (e.g., Figure 1) and an appropriate caption.

This assignment is worth 10 points.


Preparation

Before carrying out any analyses, create a predictor called salary_thousand that indicates the average 2020–21 state salary in thousands of dollars (e.g., salary = 52143; salary_thousand = 52.143). This variable (not salary) should be used in all analyses in the assignment.


Questions

  1. Create a density plot of the distribution of total SAT scores. Make sure your plot has a caption.

  2. Examine the structure and formatting of Table 1 at https://zief0002.github.io/musings/creating-tables-to-present-statistical-results.html. Mimic the format and structure of this table to create a table to present the numerical summary information for the distributions of SAT total scores and salaries. Provide the same measures for these variables as is given in Table 1 in the article. Re-create the formatting of Table 1 as closely as you can. Finally, make sure the table you create also has an appropriate caption.

  3. Create a scatterplot of the distribution of SAT total scores (Y) conditioned on teacher salaries (X). Make sure your plot has a caption.

  4. Describe the relationship between teacher salaries and SAT total scores. Be sure to comment on the structural form, direction and strength of the relationship. Also comment on any potential observations that deviate from following this relationship (unusual observations or clusters of observations).

  5. Regress total SAT scores on teacher salaries. Write the fitted equation using Equation Editor (or some other program that correctly types mathematical expressions).

  6. Interpret the value of the intercept from the regression equation using the context of the data.

  7. Interpret the value of the slope from the regression equation using the context of the data.

  8. Compute, report, and interpret the value for \(R^2\) based on values from the ANOVA decomposition. Show your work for full credit.

  9. Compute and report the predicted mean total SAT score for Minnesota students based on the average teacher salary in the state using the fitted regression equation. Show your work for full credit.

  10. Compute and report the residual for Minnesota. Show your work for full credit.

  11. Explain what the sign and magnitude of the residual value you computed in Question 10 tells you about how Minnesota’s mean SAT score compares to the mean SAT score for states having the same average teacher salary as Minnesota.