Simple Regression Worksheet
Directions
Work with one or more other students to complete each of the tasks in this document. As part of this, include the syntax you use to complete each tasks in a script file. As you write your script file, adhere to good coding practices:
- Include comments
- Include spaces
- Include a line break after every pipe operator you use.
You will also need to answer some questions in a Word or Google document.
Task 1: Import Data
Import the riverview.csv data into an object named city. Also, examine the data codebook so you are familiar with the different attributes.
Task 2: Marginal Distribution of Seniority-level
Create a density plot of the years of seniority attribute (seniority). You may also want to produce summary statistics for this attribute. Describe the shape, center (i.e., typical value), and variability. Be sure to use the data context in this description.
Task 3: Relationship between Seniority-level and Income
Create a scatterplot of the relationship between seniority-level and income. In this plot assume income is the outcome and seniority-level is the predictor. Describe this relationship by indicating the functional form, direction, magnitude, strength, and any potential outliers. Be sure to use the data context in this description.
Task 4: Compute the Correlation Coefficient
Compute and report the correlation coefficient between seniority-level and income.
Task 5: Fit the Regression Model
Fit the regression model that uses seniority-level to predict variation in income. Write the fitted equation. Be sure you can write the fitted equation using Equation Editor in Microsoft Word/Google Docs. (This includes adding any hats, or subscripts!)
Task 6: Coefficient Interpretations
Interpret the intercept and slope from the fitted equation.
Task 7: Compute the Sum of Squared Error (SSE) for the Fitted Model
Compute the SSE for the model. Include the syntax you used to compute this.
Task 8: Fit an Intercept-Only Model and Compute the SSE for It
Fit an intercept-only model predicting variation in incomes. Use that model to compute the SSE. Include the syntax you used to compute this.
Task 9: Compute the Proportion Reduction in Error (PRE)
Use the two SSE measures to compute the PRE. Show your work. Also interpret the value using the data’s context.