Assignment 05

Using WLS to Model Data with Outliers

Published

August 5, 2023

The goal of this assignment is to give you experience using methods for estimating regression results under violation of homoskedasticity. You will again use the data from the file stack-1979.csv to evaluate the hypothesis from political science that suggests that countries that have a stronger Socialist party have less income inequality.


Instructions

Submit a printed document of your responses to the following questions. Please adhere to the following guidelines for further formatting your assignment:

  • All plots should be resized so that they do not take up more room than necessary.
  • All figures and tables should have a name (e.g., Figure 1) and an appropriate caption.

In questions that ask you to “use matrix algebra” to solve the problem, you can either show your syntax and output from carrying out the matrix operations, or you can use Equation Editor to input the matrices involved in your calculations.

This assignment is worth 20 points.


Exploratory Analysis

  1. Start by creating a scatterplot to examine the relationship between socialist party strength (predictor) and income inequality (outcome).

  2. Are there observations that look problematic in this plot? If so, identify the country(ies).

  3. Fit a linear model regressing income inequality on socialist party strength. Examine and report a set of regression diagnostics that allow you to identify any observations that are regression outliers.


Weighted Least Squares Estimation

Rather than removing regression outliers from the data, we can instead fit a model that accounts for these observations. For example, fitting a model that allows for higher variance at x-values that have outliers. With higher variances, we would expect more extreme observations because of the increased variance. The WLS model allows for heteroskedasticity and can be used to model data that have extreme observations.

  1. Compute the empirical weights that you will use in the WLS estimation. Report the weight for the United States. (Hint: We do not know the true variances in the population.)

  2. Fit the WLS model. Report the fitted equation.

  3. Based on the model results, what is suggested about the research hypothesis that countries with more socialist tendencies have less income inequality?

  4. Create a scatterplot that shows the relationship between socialist party strength and income inequality. Include the country names as labels (or instead of the points). Include both the OLS and WLS regression lines on this plot.

  5. Based on the plot, comment on how the residuals from the WLS model compare to the residuals from the OLS model.

  6. Based on your response to Question #8, how will the model-level \(R^2\) value from the WLS model compare to the model-level \(R^2\) from the OLS model. Explain.

  7. The mathematical formulaa for computing the studentized residuals for both the OLS and WLS models is given below. Compute and report the studentized residuals, using this formula, from both the OLS and WLS models for any regression outliers you identified in Question #2. (Hint: Remember that in an OLS regression the weight is 1 for each observation.)

\[ e^{\prime}_i = \frac{e_i}{s_{e(-i)}\sqrt{1-h_{ii}}} \times \sqrt{w_i} \]

  1. Based on the values of the studentized residuals in the WLS model, are the observations you identified as regression outliers from the OLS model still regression outliers in the WLS model? Why or why not?

  2. Explain why the is the case by referring to the formula.

  3. Create and report residual plots of the studentized residuals versus the fitted values for the OLS and WLS models. Comment on which model better fits the assumptions.


Including Covariates

Now include the energy covariate into the model to examine the effect of socialist strength after controlling for economic development. Since the model has changed, we need to re-compute the weights and re-carry out the WLS analysis.

  1. Use matrix algebra to compute the empirical weights based on the two-predictor model and report the weight for the United States. (That is use matrix algebra to carry out the steps in the 5-step process of computing weights when error variances are unknown for the WLS.)

  2. Fit the two-predictor WLS model using matrix algebra. Report the fitted equation.

  3. Compute and report the standard errors of the two-predictor WLS model using matrix algebra.

  4. Using your results from Questions #14 and #15, compute and report the t-values and p-values. While you can use the output of the tidy(), summary(), or other functions that automatically compute p-values to check your work, use the pt() function to answer this question. (Show your work or syntax for full credit.)

  5. Based on the two-predictor WLS model results, what is suggested about the research hypothesis that countries with more socialist tendencies have less income inequality?

  6. Based on the two-predictor OLS model results, what is suggested about the research hypothesis that countries with more socialist tendencies have less income inequality?

  7. Which set of the model results should we trust. Explain by referring to the tenability of the assumptions.