Chapter 28 Sums of Squares in Regression

In this chapter, you will learn about how matrix algebra is used to compute sum of square terms in regression.


28.1 Sums of Squares

Recall that the regression model can be used to partition the total variation in the outcome into that which is explained by the regression model and that which is not explained by the model. To do this we compute sums of square terms such that:

\[ \mathrm{SS}_{\mathrm{Total}} = \mathrm{SS}_{\mathrm{Model}} + \mathrm{SS}_{\mathrm{Residual}} \]

where, algebraically,

\[ \begin{split} \mathrm{SS}_{\mathrm{Total}} &= \sum(Y_i - \bar{Y})^2\\[2ex] \mathrm{SS}_{\mathrm{Model}} &= \sum(\hat{Y}_i - \bar{Y})^2\\[2ex] \mathrm{SS}_{\mathrm{Residual}} &= \sum(Y_i - \hat{Y}_i)^2 \end{split} \]

We can also express each of these sums of squares using matrix notation. Remember that:

\[ \mathbf{y} = \hat{\mathbf{y}} + \mathbf{e} \]

To make this mathematically easier, we will subtract the mean vector from the outcome.

\[ \mathbf{y} - \bar{\mathbf{y}} = \hat{\mathbf{y}} + \mathbf{e} - \bar{\mathbf{y}} \]

Re-arranging the right-hand side of the equation:

\[ \mathbf{y} - \bar{\mathbf{y}} = (\hat{\mathbf{y}}- \bar{\mathbf{y}}) + \mathbf{e} \]

Now remember that the residuals and the fitted values are independent, which means that the vectors are orthogonal and the Pythagorean Theorem applies. That is:

\[ (\mathbf{y} - \bar{\mathbf{y}})^2 = (\hat{\mathbf{y}}- \bar{\mathbf{y}})^2 + \mathbf{e}^2 \]

Since these are vectors, squaring them creates dot products, which are sums. In this case, the sums of squares. That is, in matrix notation:

\[ \begin{split} \mathrm{SS}_{\mathrm{Total}} &= (\mathbf{y} - \bar{\mathbf{y}})^2 \\[2ex] &= (\mathbf{y} - \bar{\mathbf{y}})^\intercal(\mathbf{y} - \bar{\mathbf{y}})\\[4ex] \mathrm{SS}_{\mathrm{Model}} &= (\hat{\mathbf{y}}- \bar{\mathbf{y}})^2\\[2ex] &= (\hat{\mathbf{y}}- \bar{\mathbf{y}})^\intercal (\hat{\mathbf{y}}- \bar{\mathbf{y}}) \\[4ex] \mathrm{SS}_{\mathrm{Residual}} &= \mathbf{e}^2 \\[2ex] &= \mathbf{e}^\intercal \mathbf{e} \end{split} \]


28.1.1 Mean Centering the Outcome

If the outcome, y, is mean centered—it is a mean deviation vector—then the math is simplified further. In this case, since the mean of a mean centered variable is zero, the quantity \(\mathbf{y} - \bar{\mathbf{y}}=\mathbf{y} - \mathbf{0} =\mathbf{y}\), and the total sum of squares is, simply,

\[ \mathrm{SS}_{\mathrm{Total}} = \mathbf{y}^\intercal\mathbf{y} \]

Similarly, the model sum of squares also is simplified using a mean centered outcome to,

\[ \mathrm{SS}_{\mathrm{Model}} = \hat{\mathbf{y}}^\intercal\hat{\mathbf{y}} \] Finally, the residual sum of squares is

\[ \mathrm{SS}_{\mathrm{Residual}} = \mathbf{e}^\intercal\mathbf{e} \]

This means that for mean centered data,

\[ \begin{split} \mathrm{SS}_{\mathrm{Total}} &= \mathrm{SS}_{\mathrm{Model}} + \mathrm{SS}_{\mathrm{Residual}} \\[2ex] \mathbf{y}^\intercal\mathbf{y} &= \hat{\mathbf{y}}^\intercal\hat{\mathbf{y}} + \mathbf{e}^\intercal\mathbf{e} \end{split} \]


28.2 Sums of Squares as Functions of the Data

The model and residual sums of squares can also be written as products of the design matrix, X, and the vector of outcomes, y. To do this, we will make use of the relationships between \(\hat{\mathbf{y}}\), e, and the H-matrix. Remember that

\[ \begin{split} \hat{\mathbf{y}} &= \mathbf{Hy} \\[2ex] \mathbf{e} &= (\mathbf{I} - \mathbf{H}) \mathbf{y} \end{split} \]

where \(\mathbf{H}=\mathbf{X} (\mathbf{X}^\intercal\mathbf{X})^{-1} \mathbf{X}^\intercal\).

We can re-write the sum of squared model using this formula,

\[ \begin{split} \mathrm{SS}_{\mathrm{Model}} &= \hat{\mathbf{y}}^\intercal\hat{\mathbf{y}} \\[2ex] &= (\mathbf{Hy})^\intercal\mathbf{Hy} \end{split} \]

Using the rules of transposes

\[ \mathrm{SS}_{\mathrm{Model}} = \mathbf{y}^\intercal\mathbf{H}^\intercal\mathbf{Hy} \]

Also, remember that the H matrix is square and symmetric (\(\mathbf{H}=\mathbf{H}^\intercal\)) and that it is idempotent (\(\mathbf{HH} = \mathbf{H}\)). This implies that \(\mathbf{HH}^\intercal = \mathbf{H}\), and,

\[ \begin{split} \mathrm{SS}_{\mathrm{Model}} &= \mathbf{y}^\intercal\mathbf{H}^\intercal\mathbf{Hy} \\[2ex] &= \mathbf{y}^\intercal\mathbf{Hy} \\[2ex] &= \mathbf{y}^\intercal\mathbf{X} (\mathbf{X}^\intercal\mathbf{X})^{-1} \mathbf{X}^\intercal\mathbf{y} \end{split} \] We can carry out a similar substitution and reduction to re-write the sum of squared residuals.

\[ \begin{split} \mathrm{SS}_{\mathrm{Model}} &= \mathbf{y}^\intercal\mathbf{H}^\intercal\mathbf{Hy} \\[2ex] &= \mathbf{y}^\intercal\mathbf{Hy} \\[2ex] &= \mathbf{y}^\intercal\mathbf{X} (\mathbf{X}^\intercal\mathbf{X})^{-1} \mathbf{X}^\intercal\mathbf{y} \end{split} \]

We can write \(\hat{\mathbf{y}}=\mathbf{Xb}\), which means

\[ \begin{split} \mathrm{SS}_{\mathrm{Model}} &= (\mathbf{Xb})^\intercal\mathbf{Xb} \\[2ex] &= \mathbf{b}^\intercal\mathbf{X}^\intercal\mathbf{Xb} \end{split} \]

Then, since \(\mathbf{b}=(\mathbf{X}^\intercal\mathbf{X})^{-1}\mathbf{X}^\intercal\mathbf{y}\), we can substitute, getting,

\[ \begin{split} \mathrm{SS}_{\mathrm{Model}} &= \mathbf{b}^\intercal\mathbf{X}^\intercal\mathbf{Xb}\\[2ex] &= \bigg((\mathbf{X}^\intercal\mathbf{X})^{-1}\mathbf{X}^\intercal\mathbf{y}\bigg)^\intercal\mathbf{X}^\intercal\mathbf{X}\bigg((\mathbf{X}^\intercal\mathbf{X})^{-1}\mathbf{X}^\intercal\mathbf{y}\bigg) \end{split} \]

Then, using the rules of transposes and inverses,

\[ \begin{split} \mathrm{SS}_{\mathrm{Model}} &= \bigg((\mathbf{X}^\intercal\mathbf{X})^{-1}\mathbf{X}^\intercal\mathbf{y}\bigg)^\intercal\mathbf{X}^\intercal\mathbf{X}\bigg((\mathbf{X}^\intercal\mathbf{X})^{-1}\mathbf{X}^\intercal\mathbf{y}\bigg) \\[2ex] &= \mathbf{y}^\intercal (\mathbf{X}^\intercal)^\intercal \bigg((\mathbf{X}^\intercal\mathbf{X})^{-1}\bigg)^\intercal \mathbf{X}^\intercal\mathbf{X}(\mathbf{X}^\intercal\mathbf{X})^{-1}\mathbf{X}^\intercal\mathbf{y} \\[2ex] &= \mathbf{y}^\intercal\mathbf{X} \bigg((\mathbf{X}^\intercal\mathbf{X})^\intercal\bigg)^{-1} \mathbf{I}\mathbf{X}^\intercal\mathbf{y} \\[2ex] &= \mathbf{y}^\intercal\mathbf{X} (\mathbf{X}^\intercal\mathbf{X})^{-1} \mathbf{X}^\intercal\mathbf{y} \end{split} \]

Finally, we can also rewrite the residual sum of squares, again, using the rules of matrix algebra.

\[ \begin{split} \mathrm{SS}_{\mathrm{Residual}} &= \mathbf{e}^\intercal\mathbf{e} \\[2ex] &= \bigg((\mathbf{I}-\mathbf{H})\mathbf{y}\bigg)^\intercal (\mathbf{I}-\mathbf{H})\mathbf{y} \\[2ex] &= \mathbf{y}^\intercal (\mathbf{I}-\mathbf{H})^\intercal (\mathbf{I}-\mathbf{H})\mathbf{y} \\[2ex] &= \mathbf{y}^\intercal (\mathbf{I}-\mathbf{H})\mathbf{y} \\[2ex] &= \mathbf{y}^\intercal (\mathbf{y}-\mathbf{Hy}) \\[2ex] &= \mathbf{y}^\intercal\mathbf{y} - \mathbf{y}^\intercal\mathbf{Hy} \\[2ex] &= \mathbf{y}^\intercal\mathbf{y} - \mathbf{y}^\intercal\mathbf{X} (\mathbf{X}^\intercal\mathbf{X})^{-1} \mathbf{X}^\intercal\mathbf{y} \end{split} \]

And, of course \(\mathrm{SS}_{\mathrm{Total}}=\mathbf{y}^\intercal\mathbf{y}\). Each of the sums of squares is a function of the design matrix or the vector of outcomes. Don’t forget that for these computations, the vector of outcomes needs to be mean-centered!


28.2.1 Example Using R

To illustrate these computations, we will again use the following toy data set to fit a regression model using SAT and Self-Esteem to predict variation in GPA.

Table 28.1: Example set of education data.
ID SAT GPA Self-Esteem IQ
1 560 3.0 11 112
2 780 3.9 10 143
3 620 2.9 19 124
4 600 2.7 7 129
5 720 3.7 18 130
6 380 2.4 13 82

Fitting the model and partitioning the variation (not shown), we find:

  • \(\mathrm{SS}_{\mathrm{Total}} = 1.7\)
  • \(\mathrm{SS}_{\mathrm{Model}} = 1.426\)
  • \(\mathrm{SS}_{\mathrm{Residual}} = 0.274\)

Using the matrix algebra, we can replicate these values.

# Create vector of outcomes
y = c(3.0, 3.9, 2.9, 2.7, 3.7, 2.4)

# Mean-center y
mc_y = y - mean(y)

# Create design matrix
X = matrix(
  data = c(
    rep(1, 6),
    560, 780, 620, 600, 720, 380, 
    11, 10, 19, 7, 18, 13
    ),
  ncol = 3
)


# Compute SS_total
t(mc_y) %*% mc_y
     [,1]
[1,]  1.7
# Compute SS_model
t(mc_y) %*% X %*% solve(t(X) %*% X) %*% t(X) %*% mc_y
      [,1]
[1,] 1.426
# Compute SS_residuals
t(mc_y) %*% mc_y - t(mc_y) %*% X %*% solve(t(X) %*% X) %*% t(X) %*% mc_y
       [,1]
[1,] 0.2736