Chapter 29 Standard Errors and Variance Estimates

In this chapter, you will learn about how matrix algebra is used to compute standard errors and variance estimates in regression. These allow us to compute confidence intervals and carry out hypothesis tests.

To illustrate computations, we will again use the following toy data set to fit a regression model using SAT and Self-Esteem to predict variation in GPA.

Table 29.1: Example set of education data.
ID SAT GPA Self-Esteem IQ
1 560 3.0 11 112
2 780 3.9 10 143
3 620 2.9 19 124
4 600 2.7 7 129
5 720 3.7 18 130
6 380 2.4 13 82


29.1 Residual Variance of the Model

Recall that the estimate of the residual variance (i.e., mean squared residuals) for the model is

\[ \hat\sigma^2_{\epsilon} = \frac{\mathrm{SS}_{\mathrm{Residuals}}}{\mathrm{df}_{\mathrm{Residuals}}} \]

Using matrices, the sum of squared error (\(\mathrm{SS}_{\mathrm{Residuals}}\)) is

\[ \mathrm{SS}_{\mathrm{Residuals}} = \mathbf{e}^\intercal\mathbf{e} \]

The degrees-of-freedom associated with the residuals is \(\mathrm{df}_{\mathrm{Residuals}} = n-k\), where \(k\) is the number of coefficients (including the intercept) being estimated in the model. Recall that the value of \(k\) is the trace of the H-matrix. This implies,

\[ \hat\sigma^2_{\epsilon} = \frac{\mathbf{e}^\intercal\mathbf{e}}{n - \mathrm{tr}(\mathbf{H})} \]

Using our toy example,

# Create vector of outcomes
y = c(3.0, 3.9, 2.9, 2.7, 3.7, 2.4)

# Create design matrix
X = matrix(
  data = c(
    rep(1, 6),
    560, 780, 620, 600, 720, 380, 
    11, 10, 19, 7, 18, 13
    ),
  ncol = 3
)

# Compute SS_residual
b = solve(t(X) %*% X) %*% t(X) %*% y
e = y - X %*% b
ss_resid = t(e) %*% e

# Compute df_residual
H = X %*% solve(t(X) %*% X) %*% t(X)
k = sum(diag(H))
df_resid = 6 - k

# Compute estimate of error variance
var_e = ss_resid / df_resid
var_e
       [,1]
[1,] 0.0912

If we take the sqaure root of this estimate, that gives us the residual standard error (RSE) of the model, a.k.a., the root mean square error (RMSE).

# Find RSE/RMSE
sqrt(var_e)
      [,1]
[1,] 0.302


29.2 Coefficient Variances and Covariances

In matrix applications, it is typical to indicate the error variances and covariances for the coefficients in a variance–covariance matrix. In this matrix the error variances for each of the coefficients is given along the main diagonal of the matrix. The covariances between the coefficients are given in the off-diagonal elements. For example, the variance–covariance matrix for a simple regression model is:

\[ \boldsymbol{\sigma^2_B} = \begin{bmatrix}\mathrm{Var}(b_0) & \mathrm{Cov}(b_0,b_1) \\ \mathrm{Cov}(b_0,b_1) & \mathrm{Var}(b_1) \end{bmatrix} \]

Examining the variance-covariance matrix of the coefficients, we find that it is a square, symmetric matrix with dimensions of \(k \times k\). This matrix is defined as:

\[ \boldsymbol{\sigma^2_B} = \sigma^2_{\epsilon} (\mathbf{X}^{\intercal}\mathbf{X})^{-1} \]

In our toy example,

# Variance-covariance matrix of the coefficients
V_b = as.numeric(var_e) * solve(t(X) %*% X)
V_b
           [,1]       [,2]       [,3]
[1,]  0.4741276 -5.504e-04 -9.477e-03
[2,] -0.0005504  9.501e-07 -2.246e-06
[3,] -0.0094769 -2.246e-06  8.344e-04

We use as.numeric() to convert the estimate of the model-level error variance to a numeric scalar. The estimated variance for each of the coefficients are:

  • \(\mathrm{Var}(\hat\beta_0) = 0.474\)
  • \(\mathrm{Var}(\hat\beta_{\mathrm{SAT}}) = 0.00000095\)
  • \(\mathrm{Var}(\hat\beta_{\mathrm{Self\mbox{-}Esteem}}) = 0.00083\)

The covariances between these coefficients are also given in this matrix:

  • \(\mathrm{Cov}(\hat\beta_0,\hat\beta_{\mathrm{SAT}}) = -0.00055\)
  • \(\mathrm{Cov}(\hat\beta_0,\hat\beta_{\mathrm{Self\mbox{-}Esteem}}) = -0.0095\)
  • \(\mathrm{Cov}(\hat\beta_{\mathrm{SAT}},\hat\beta_{\mathrm{Self\mbox{-}Esteem}}) = -0.0000022\)

In practice, the variance-covariance matrix of the regression coefficients can be obtained directly from R using the vcov() function.

# Fit model
lm.1 = lm(y ~ 1 + X[ , 2] + X[ , 3])

# Obtain variance-covariance matrix of the coefficients
vcov(lm.1)
            (Intercept)     X[, 2]     X[, 3]
(Intercept)   0.4741276 -5.504e-04 -9.477e-03
X[, 2]       -0.0005504  9.501e-07 -2.246e-06
X[, 3]       -0.0094769 -2.246e-06  8.344e-04


29.3 Standard Errors for the Coefficients

While the variances are mathematically convenient, in applied work, the standard errors (SEs) of the coefficients are more commonly presented. Recall that the standard errors are given in the same metric as the coefficient estimates, and represent the uncertainty in that estimate. They are also used to construct test statistics (e.g., z or t) and confidence intervals for the estimates. The estimated standard error for each regression coefficient can be found by computing the square root of the variance estimates, that is:

\[ \mathrm{SE}(\beta_j) = \sqrt{\mathrm{Var}(\beta_j)} \]

Thus, we can find the standard errors by computing the square roots of the diagonal elements in the variance–covariance matrix of the coefficients.

# Compute SEs
se = sqrt(diag(V_b))

# View SEs
se
[1] 0.6885692 0.0009747 0.0288855


29.4 Correlation Between the Coefficients

Recall that correlation, which is a standardized covariance, is often times more interpretable than the covariance. We can obtain the correlation coefficient between two coefficients, \(\hat\beta_j\) and \(\hat\beta_k\), using

\[ \mathrm{Cor}(b_j,b_k) = \frac{\mathrm{Cov}(b_j,b_k)}{\sqrt{\mathrm{Var}(b_j)\times \mathrm{Var}(b_k)}} \]

Using the values from our example, we can compute the correlation between each set of regression coefficients:

\[ \begin{split} \mathrm{Cor}(\hat\beta_0,\hat\beta_{\mathrm{SAT}}) &= \frac{-0.00055}{\sqrt{0.474 \times 0.00000095}} &= -0.820 \\[2ex] \mathrm{Cor}(\hat\beta_0,\hat\beta_{\mathrm{Self\mbox{-}Esteem}}) &= \frac{-0.0095}{\sqrt{0.474 \times 0.00083}} &= -0.479 \\[2ex] \mathrm{Cor}(\hat\beta_{\mathrm{SAT}},\hat\beta_{\mathrm{Self\mbox{-}Esteem}}) &= \frac{-0.0000022}{\sqrt{0.00000095 \times 0.00083}} &= -0.078 \end{split} \]

In R, if we have assigned the variance-covariance matrix to an object, we can use indexing to access the different elements to compute the correlation.

# Compute correlation between b_0 and b_SAT
V_b[1, 2] / sqrt(V_b[1, 1] * V_b[2, 2])
[1] -0.82
# Compute correlation between b_0 and b_selfesteem
V_b[1, 3] / sqrt(V_b[1, 1] * V_b[3, 3])
[1] -0.4765
# Compute correlation between b_SAT and b_selfesteem
V_b[2, 3] / sqrt(V_b[2, 2] * V_b[3, 3])
[1] -0.07976

The correlations indicates that the intercept is highly, negatively correlated with both the other coefficients. Whereas the effects of SAT and self-esteem seem to be mostly uncorrelated (i.e., independent).


29.5 Inference: Model-Level

The model-level (i.e., omnibus) null hypothesis for the regression model

\[ Y_i = \beta_0 + \beta_1(X_{1_i}) + \beta_2(X_{2_i}) + \beta_3(X_{3_i}) + \ldots + \beta_k(X_{k_i}) + \epsilon_i \]

is commonly expressed as:

\[ H_0: \beta_1 = \beta_2 = \beta_3 = \ldots = \beta_k = 0 \]

If we write the set of predictor coefficients as a vector, \(\boldsymbol{\beta}^\intercal = \begin{bmatrix} \beta_1 & \beta_2 & \beta_3 & \ldots & \beta_k\end{bmatrix}\), then we can write the model-level null hypothesis using vector notation:

\[ H_0: \boldsymbol{\beta} = 0 \]

Note that the we can express the vector of coefficients as either a column vector (\(\boldsymbol{\beta}\)) or as a row vector (\(\boldsymbol{\beta}^\intercal\)). To test this we compute an F-statistic based on the ratio of the mean square for the model and that for the residuals. This is evaluted in an F-distribution having \(k-1\) and \(n-k\) degrees of freedom. (See here for a reminder about the model-level test.) Using our toy-example:

# Compute MS_model
mc_y = y - mean(y) # Mean-center y
ss_model = t(mc_y) %*% X %*% solve(t(X) %*% X) %*% t(X) %*% mc_y
ms_model = ss_model / (k - 1)
ms_model
       [,1]
[1,] 0.7132
# Compute MS_residual
ss_resid = t(mc_y) %*% mc_y - t(mc_y) %*% X %*% solve(t(X) %*% X) %*% t(X) %*% mc_y
ms_resid = ss_resid / (6 - k)
ms_resid
       [,1]
[1,] 0.0912
# Compute F
F_stat = ms_model / ms_resid
F_stat
     [,1]
[1,] 7.82
# Compute p-value
1 - pf(F_stat, df1 = (k-1), df2 = (6-k))
        [,1]
[1,] 0.06456


29.6 Inference: Coefficient-Level

The coefficient-level null hypotheses for the regression model are:

\[ \begin{split} H_0: \beta_j = 0 \end{split} \]

for each j. To test this we compute a t-statistic based on the ratio of the coefficent estimate and that its associated standard error. This is evaluted in a t-distribution having \(n-k\) degrees of freedom. (See here for a reminder about the coefficient-level tests.) Using our toy-example:

# Compute t for all three coefficients
t_stat = b / se
t_stat
       [,1]
[1,] 0.9574
[2,] 3.9041
[3,] 0.3180
# Compute p-value
2 * pt(-abs(t_stat), df = (6-k), lower.tail = TRUE)
        [,1]
[1,] 0.40901
[2,] 0.02984
[3,] 0.77130

We could also compute a confidence interval using,

\[ B_j \pm t^*(\mathrm{SE}_{B_j}) \]

where \(B_j\) is the jth coefficient, \(\mathrm{SE}_{B_j}\) is the standard error associated with that coefficient, and \(t^*\) is the appropriate critical value. Using our toy example,

# Obtain critical value for 95% CI
t_star = abs(qt(.025, df = (6-k)))

# Compute lower limit of CI
b - t_star*se
           [,1]
[1,] -1.5321211
[2,]  0.0007035
[3,] -0.0827395
# Compute upper limit of CI
b + t_star*se
         [,1]
[1,] 2.850548
[2,] 0.006908
[3,] 0.101113