Assignment 01

Matrix Algebra for Linear Regression

Published

July 27, 2022


The goal of this assignment is to give you experience using matrix algebra to compute various analytic output for regression. In this assignment, you will use the data given below that includes measurements for 10 countries on: infant mortality rate per 1000 live births (infant), the per-capita income (pci) and world region (region) of the country.

country infant pci region
Algeria 86.3 400 Africa
Bolivia 60.4 200 Americas
Burundi 150.0 68 Africa
Dominican Republic 48.8 406 Americas
Kenya 55.0 169 Africa
Malawi 148.3 130 Africa
Nicaragua 46.0 507 Americas
Paraguay 38.6 347 Americas
Rwanda 132.9 61 Africa
Trinidad & Tobago 26.2 732 Americas


Instructions

Submit a printed document of your responses to the following questions. Please adhere to the following guidelines for further formatting your assignment:

  • All plots should be resized so that they do not take up more room than necessary.
  • All figures and tables should have a name (e.g., Figure 1) and an appropriate caption.

In questions that ask you to “use matrix algebra” to solve the problem, you can either show your syntax and output from carrying out the matrix operations, or you can use Equation Editor to input the matrices involved in your calculations.

This assignment is worth 20 points.


Unstandardized Regression

You will be fitting the model lm(infant ~ 1 + pci + region + pci:region). Within this model, use dummy coding to encode the region predictor and make Americas the reference group.

  1. Write out the elements of the matrix \(\mathbf{X}^{\intercal}\mathbf{X}\), where \(\mathbf{X}\) is the design matrix.

  2. Does \(\mathbf{X}^{\intercal}\mathbf{X}\) have an inverse? Explain.

  3. Compute (using matrix algebra) and report the vector of coefficients, b for the OLS regression.

  4. Compute (using matrix algebra) and report the variance–covariance matrix of the coefficients.

  5. Use the values from b (Question 3) and from the variance–covariance matrix you reported in the previous question to find the 95% CI for the coefficient associated with the main-effect of PCI. (Hint: If you need to refresh yourself on how CIs are computed, see here.)

  6. Compute (using matrix algebra) and report the hat-matrix, H. Also show how you would use the values in the hat-matrix to find \(\hat{y}_1\) (the predicted value for Algeria).

  7. Compute (using matrix algebra) and report the vector of residuals, e.

  8. Compute (using matrix algebra) and report the estimated value for the RMSE.

  9. Given the assumptions of the OLS model and the RMSE estimate you computed in the previous question, compute and report the variance–covariance matrix of the residuals.


ANOVA Decomposition

In this section you will be re-creating the output from the ANOVA decomposition for the model fitted in the previous section.

  1. Compute (using matrix algebra) and report the model, residual, and total sum of squares terms in the ANOVA decomposition table. (2pts)

  2. Compute (using matrix algebra) and report the model, residual, and total degrees of freedom terms in the ANOVA decomposition table. (2pts)

  3. Use the values you obtained in Questions 11 and 12 to compute the model and residual mean square terms.

  4. Use the mean square terms you found in Question 13 to compute the F-value for the model (i.e., to test \(H_0:\rho^2=0\)). Also compute the p-value associated with this F-value. (Hint: If you need to refresh yourself on how F-values or p-values are computed, see here.)


Regression: Effects-Coding

Now consider fitting the model to the data to examine whether there is an effect of region (no other predictors) on infant mortality. In this model, we will use effects-coding to encode the region variable (see here). This model is often expressed as:

\[ \mathrm{Infant~Mortality}_i = \mu + \alpha_{\mathrm{Region}} + \epsilon_i \]

  1. Write out the design matrix that would be used to fit this model.

  2. Compute (using matrix algebra) and report the vector of coefficients, b, from the OLS regression.

  3. Compute (using matrix algebra) and report the variance–covariance matrix for the coefficients.

  4. Explain why the sampling variances for the coefficients are the same and why the sampling covariance is zero by referring to computations produced in the matrix algebra. (2pts)