Chapter 30 Assumptions of the Regression Model

In this chapter, you will learn about how matrix algebra is used to express and understand the distributional assumptions underlying the regression model. To illustrate computations, we will again use the following toy data set to fit a regression model using SAT and Self-Esteem to predict variation in GPA.

Table 30.1: Example set of education data.
ID SAT GPA Self-Esteem IQ
1 560 3.0 11 112
2 780 3.9 10 143
3 620 2.9 19 124
4 600 2.7 7 129
5 720 3.7 18 130
6 380 2.4 13 82


30.1 Regression Assumptions

There are several distributional assumptions about the errors for the regression model, namely that the residuals (conditioned on each fitted value) are:

  • Independent;
  • Normally distributed;
  • Average residual is 0; and
  • Constant variance.

Mathematically, we express this as part of the model as:

\[ \epsilon_{i|\hat{y}} \overset{\mathrm{i.i.d~}}{\sim} \mathcal{N}\left(0, \sigma^2_{\epsilon}\right) \] The mathematical expression says the residuals conditioned on \(\hat{y}\) (having the same fitted value) are independent and identically normally distributed with a mean of 0 and some variance (\(\sigma^2_{\epsilon}\)). (See here for a reminder about the distributional assumptions for regression models.)


30.2 Expressing the Assumptions uising Matrix Algebra

We can express these same assumptions using matrix notation. First recall that the residuals from the model can be expressed as a vector:

\[ \boldsymbol{\epsilon} = \begin{bmatrix}\epsilon_1 \\ \epsilon_2\\ \epsilon_3\\ \vdots \\ \epsilon_n\end{bmatrix} \]

Note: Here we use \(\boldsymbol{\epsilon}\) rather than e to indicate that the vector is for the residuals in the population. The assumption that the average residual (at each \(\hat{y}\)) is zero can be written using expectations as:

\[ \begin{split} E(\boldsymbol{\epsilon}) = \mathbf{0} \\[2em] E\begin{pmatrix}\begin{bmatrix}\epsilon_1 \\ \epsilon_2\\ \epsilon_3\\ \vdots \\ \epsilon_n\end{bmatrix}\end{pmatrix} = \begin{bmatrix}0 \\ 0\\ 0\\ \vdots \\ 0\end{bmatrix} \end{split} \]

The constant variance assumption says that the variance of the residuals (at each \(\hat{y}\)) is the same, \(\sigma^2_{\epsilon}\). Remember that this is the residual variance for the model, which we estimate as:

\[ \hat\sigma^2_{\epsilon} = \frac{\mathbf{e}^\intercal\mathbf{e}}{n - \mathrm{tr}(\mathbf{H})} \]

The independence assumption implies that the covariance between two residuals is 0. Namely,

\[ \mathrm{Cov}(\epsilon_i, \epsilon_j) = 0 \]

for all \(i\neq j\).


30.3 Variance-Covariance Matrix of the Residuals

If we combine the assumption of independence with the assumption of constant variance, we can write out the variance-covariance matrix of the residuals.

\[ \boldsymbol{\Sigma_\epsilon} = \begin{bmatrix}\sigma^2_{\epsilon} & 0 & 0 & \ldots & 0 \\ 0 & \sigma^2_{\epsilon} & 0 & \ldots & 0\\ 0 & 0 & \sigma^2_{\epsilon} & \ldots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \ldots & \sigma^2_{\epsilon}\end{bmatrix} \]

Notice that the variance-covariance matrix of the residuals is a diagonal, scalar matrix. The variances are given along the main diagonal and the covariances are the off-diagonal elements. This matrix indicates that the variances are all the same value and the covariances between residuals is 0 (i.e., independence).

This matrix can also be expressed as the product of \(\sigma^2_{\epsilon}\) and an \(n \times n\) identity matrix:

\[ \begin{split} \boldsymbol{\Sigma_\epsilon} &= \begin{bmatrix}\sigma^2_{\epsilon} & 0 & 0 & \ldots & 0 \\ 0 & \sigma^2_{\epsilon} & 0 & \ldots & 0\\ 0 & 0 & \sigma^2_{\epsilon} & \ldots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \ldots & \sigma^2_{\epsilon}\end{bmatrix} \\[2ex] &= \sigma^2_{\epsilon} \begin{bmatrix}1 & 0 & 0 & \ldots & 0 \\ 0 & 1 & 0 & \ldots & 0\\ 0 & 0 & 1 & \ldots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \ldots & 1\end{bmatrix}\\[2ex] &= \sigma^2_{\epsilon}~ \mathbf{I} \end{split} \]

Using our toy example, the variance-covariance matrix of the residuals can be computed as:

# Create vector of outcomes
y = c(3.0, 3.9, 2.9, 2.7, 3.7, 2.4)

# Create design matrix
X = matrix(
  data = c(
    rep(1, 6),
    560, 780, 620, 600, 720, 380, 
    11, 10, 19, 7, 18, 13
    ),
  ncol = 3
)

# Compute residual variance
b = solve(t(X) %*% X) %*% t(X) %*% y
e = y - X %*% b
ss_resid = t(e) %*% e
var_e = ss_resid / df_resid

# Compute variance-covariance matrix of residuals
as.numeric(var_e) * diag(6)
       [,1]   [,2]   [,3]   [,4]   [,5]   [,6]
[1,] 0.0912 0.0000 0.0000 0.0000 0.0000 0.0000
[2,] 0.0000 0.0912 0.0000 0.0000 0.0000 0.0000
[3,] 0.0000 0.0000 0.0912 0.0000 0.0000 0.0000
[4,] 0.0000 0.0000 0.0000 0.0912 0.0000 0.0000
[5,] 0.0000 0.0000 0.0000 0.0000 0.0912 0.0000
[6,] 0.0000 0.0000 0.0000 0.0000 0.0000 0.0912


30.4 Regression Model: Revisited

Using matrix notation, we compactly specify the regression model as:

\[ \mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \]

where \(\boldsymbol{\epsilon}_{\vert\hat{y}}\) is a vector of independent random variables (conditioned on some predicted value) that is normally distributed with: \(E(\boldsymbol{\epsilon})=0\) and \(\boldsymbol{\Sigma_\epsilon}= \sigma^2_{\epsilon}\mathbf{I}\).