Chapter 22 Statistical Application: Estimating Regression Coefficients with LU Decomposition

In OLS regression our goal is to estimate regression coefficients (b) from a data matrix (X) and vector of outcomes (y). To do this we want to solve the following equation for b:

\[ (\mathbf{X}^{\intercal}\mathbf{X})\mathbf{b} = \mathbf{X}^{\intercal}\mathbf{y} \]

Pre-multiplying both sides by \((\mathbf{X}^{\intercal}\mathbf{X})^{-1}\) gives us our conventional solution for b, namely \(\mathbf{b}=(\mathbf{X}^{\intercal}\mathbf{X})^{-1}\mathbf{X}^{\intercal}\mathbf{y}\). However, this requires us to find the inverse of \(\mathbf{X}^{\intercal}\mathbf{X}\). Unfortunately, the computational functions that are used to compute inverses of matrices sometimes end up being numerically unstable. For example, consider the following simulated data set of \(n=50\) cases:

# Number of cases
n = 50

# Create 50 x-values evenly spread b/w 1 and 500 
x = seq(from = 1, to = 500, len = n)

# Create X matrix
X = cbind(1, x, x^2, x^3)

# Create b matrix
b = matrix(
  data = c(1, 1, 1, 1), 
  nrow = 4
  )

# Create vector of y-values
set.seed(1)
y = X %*% b + rnorm(n, mean = 0, sd = 1)

# Find determinant of (X^T)X
det(t(X) %*% X)

[1] 3.113748e+32

Here we have simulated data using the following statistical model:

\[ y_i = 1 + 1(x_i) + 1(x_i^2) + 1(x_i^3) + e_i \qquad \mathrm{where~} e_i\overset{i.i.d.}{\sim}\mathcal{N}(0,1) \]

The determinant of \(\mathbf{X}^{\intercal}\mathbf{X}\) is not zero (although it is close to zero), so we should be able to find an inverse. What happens if we try to use solve() to find the inverse of the \(\mathbf{X}^{\intercal}\mathbf{X}\) matrix to obtain the regression coefficient estimates? Here crossprod(X) is equivalent to t(X) %*% X, and crossprod(X, y) is equivalent to t(X) %*% y.

# Try to compute b
solve(crossprod(X)) %*% crossprod(X, y)

Error in solve.default(crossprod(X)): system is computationally singular: reciprocal condition number = 2.93617e-17

The standard R function for inverse is computationally singular. To see why this happens, we can take a closer look at the \(\mathbf{X}^{\intercal}\mathbf{X}\) matrix. Here we will examine these values:

# Set the number of digits
options(digits = 4)

# Compute X^T(X) matrix
crossprod(X)

                    x                    
  5.000e+01 1.253e+04 4.217e+06 1.597e+09
x 1.253e+04 4.217e+06 1.597e+09 6.454e+11
  4.217e+06 1.597e+09 6.454e+11 2.716e+14
  1.597e+09 6.454e+11 2.716e+14 1.176e+17

Note the difference of several orders of magnitude. On a computer, we have a limited range of numbers. This makes some numbers behave like 0, when we also have to consider very large numbers. This in turn leads to what is essentially division by 0, which produces errors.

22.0.1 Estimating Regression Coefficients Using LU Decomposition

Remember, our goal is to solve the following for b.

\[ (\mathbf{X}^{\intercal}\mathbf{X})\mathbf{b} = \mathbf{X}^{\intercal}\mathbf{y} \]

This is exactly the type of problem that LU decomposition can help solve. Note that the first “term”, \((\mathbf{X}^{\intercal}\mathbf{X})\), is a square matrix. The right-hand side of the equation is known from the data, and we want to solve for the elements of b. This is the exact format of \(\mathbf{AX}=\mathbf{Y}\). So rather than trying to find the inverse of \((\mathbf{X}^{\intercal}\mathbf{X})\), we can carry out the decomposition on the \(\mathbf{X}^{\intercal}\mathbf{X}\) matrix to produce an L and U matrix that we can then use to solve for b.

# LU decomposition
library(matrixcalc)
lu_decomp = lu.decomposition(crossprod(X))

# View results
lu_decomp

$L
          [,1]   [,2]  [,3] [,4]
[1,] 1.000e+00      0   0.0    0
[2,] 2.505e+02      1   0.0    0
[3,] 8.435e+04    501   1.0    0
[4,] 3.195e+07 227105 751.5    1

$U
     [,1]    [,2]      [,3]      [,4]
[1,]   50   12525 4.217e+06 1.597e+09
[2,]    0 1079851 5.410e+08 2.452e+11
[3,]    0       0 1.863e+10 1.400e+13
[4,]    0       0 1.953e-03 3.095e+14

Now we can solve for Z in the equation \(\mathbf{LZ}=\mathbf{Y}\), but remembering that here, the right-hand side of this equation is \(\mathbf{X}^{\intercal}\mathbf{y}\), so we are solving for Z in:

\[ \mathbf{LZ}=\mathbf{X}^{\intercal}\mathbf{y} \]

In our example,

\[ \begin{bmatrix} 1 & 0 & 0 & 0\\ 2.640052 \times 10^{-3} & 1 & 0 & 0\\ 7.840596 \times 10^{-6} & 7.919771 \times 10^{-3} & 1 & 0\\ 3.129978 \times 10^{-8} & 7.211598 \times 10^{-5} & 2.495756 \times 10^{-2} & 1 \end{bmatrix}\begin{bmatrix} z_{11} \\ z_{21} \\ z_{31} \\ z_{41} \\ \end{bmatrix} = \begin{bmatrix} 1.601685 \times 10^{9}\\ 6.470034 \times 10^{11}\\ 2.722570 \times 10^{14}\\ 1.178380 \times 10^{17}\\ \end{bmatrix} \]

Then we can solve the equation:

\[ \mathbf{Ub}=\mathbf{Z} \]

which in our example is:

\[ \begin{bmatrix} 1.597455 \times 10^{9} & 6.454018 \times 10^{11} & 2.7161 \times 10^{14} & 1.175658 \times 10^{17}\\ 1 & -1.064388 \times 10^{8} & -7.166250 \times 10^{10} & -3.876978 \times 10^{13}\\ 1 & 1 & 3.542182 \times 10^{7} & 3.066370 \times 10^{10}\\ 1 & 1 & 1 & -5.169923 \times 10^{7} \end{bmatrix}\begin{bmatrix} b_{11} \\ b_{21} \\ b_{31} \\ b_{41} \\ \end{bmatrix} = \begin{bmatrix} 1.601685 \times 10^{9}\\ 6.469992 \times 10^{11}\\ 2.722518 \times 10^{14}\\ 1.178312 \times 10^{17} \end{bmatrix} \]

# Solve for Z
Z = forwardsolve(lu_decomp$L,  crossprod(X,y))

# Solve for b
b = backsolve(lu_decomp$U, Z)

# View b
b

       [,1]
[1,] 0.9038
[2,] 1.0066
[3,] 1.0000
[4,] 1.0000

We have successfully estimated the coefficients, which are all near 1. Note that fitting the model using lm() we obtain the same coefficients.

# Fit model using columns of the X matrix
lm.1 = lm(y ~ 1 + X[ , 2] + X[ , 3] + X[ , 4])

# View coefficientw
coef(lm.1)

(Intercept)      X[, 2]      X[, 3]      X[, 4] 
     0.9038      1.0066      1.0000      1.0000

The lm() function (and most other statistical software) uses decomposition to solve for coefficients rather than inverting the \(\mathbf{X}^\intercal\mathbf{X}\) matrix. Although the estimates here are the same as those from LU decomposition, the lm() function actually uses QR decomposition. We will examine this method in the next chapter.