country | infant | pci | region |
---|---|---|---|
Algeria | 86.3 | 400 | Africa |
Bolivia | 60.4 | 200 | Americas |
Burundi | 150.0 | 68 | Africa |
Dominican Republic | 48.8 | 406 | Americas |
Kenya | 55.0 | 169 | Africa |
Malawi | 148.3 | 130 | Africa |
Nicaragua | 46.0 | 507 | Americas |
Paraguay | 38.6 | 347 | Americas |
Rwanda | 132.9 | 61 | Africa |
Trinidad & Tobago | 26.2 | 732 | Americas |
Assignment 01
Matrix Algebra for Linear Regression
The goal of this assignment is to give you experience using matrix algebra to compute various analytic output for regression. In this assignment, you will use the data given below that includes measurements for 10 countries on: infant mortality rate per 1000 live births (infant
), the per-capita income (pci
) and world region (region
) of the country.
Instructions
Submit a printed document of your responses to the following questions. Please adhere to the following guidelines for further formatting your assignment:
- All plots should be resized so that they do not take up more room than necessary.
- All figures and tables should have a name (e.g., Figure 1) and an appropriate caption.
In questions that ask you to “use matrix algebra” to solve the problem, you can either show your syntax and output from carrying out the matrix operations, or you can use Equation Editor to input the matrices involved in your calculations.
This assignment is worth 20 points.
Unstandardized Regression
You will be fitting the model lm(infant ~ 1 + pci + region + pci:region)
. Within this model, use dummy coding to encode the region
predictor and make Americas
the reference group.
Write out the elements of the matrix \(\mathbf{X}^{\intercal}\mathbf{X}\), where \(\mathbf{X}\) is the design matrix.
Does \(\mathbf{X}^{\intercal}\mathbf{X}\) have an inverse? Explain.
Compute (using matrix algebra) and report the vector of coefficients, b for the OLS regression.
Compute (using matrix algebra) and report the variance–covariance matrix of the coefficients.
Use the values from b (Question 3) and from the variance–covariance matrix you reported in the previous question to find the 95% CI for the coefficient associated with the main-effect of PCI. (Hint: If you need to refresh yourself on how CIs are computed, see here.)
Compute (using matrix algebra) and report the hat-matrix, H. Also show how you would use the values in the hat-matrix to find \(\hat{y}_1\) (the predicted value for Algeria).
Compute (using matrix algebra) and report the vector of residuals, e.
Compute (using matrix algebra) and report the estimated value for the RMSE.
Given the assumptions of the OLS model and the RMSE estimate you computed in the previous question, compute and report the variance–covariance matrix of the residuals.
ANOVA Decomposition
In this section you will be re-creating the output from the ANOVA decomposition for the model fitted in the previous section.
Compute (using matrix algebra) and report the model, residual, and total sum of squares terms in the ANOVA decomposition table. (2pts)
Compute (using matrix algebra) and report the model, residual, and total degrees of freedom terms in the ANOVA decomposition table. (2pts)
Use the values you obtained in Questions 11 and 12 to compute the model and residual mean square terms.
Use the mean square terms you found in Question 13 to compute the F-value for the model (i.e., to test \(H_0:\rho^2=0\)). Also compute the p-value associated with this F-value. (Hint: If you need to refresh yourself on how F-values or p-values are computed, see here.)
Regression: Effects-Coding
Now consider fitting the model to the data to examine whether there is an effect of region (no other predictors) on infant mortality. In this model, we will use effects-coding to encode the region variable (see here). This model is often expressed as:
\[ \mathrm{Infant~Mortality}_i = \mu + \alpha_{\mathrm{Region}} + \epsilon_i \]
Write out the design matrix that would be used to fit this model.
Compute (using matrix algebra) and report the vector of coefficients, b, from the OLS regression.
Compute (using matrix algebra) and report the variance–covariance matrix for the coefficients.
Explain why the sampling variances for the coefficients are the same and why the sampling covariance is zero by referring to computations produced in the matrix algebra. (2pts)