21Statistical Application: Estimating Regression Coefficients with LU Decomposition
In OLS regression our goal is to estimate regression coefficients (b) from a data matrix (X) and vector of outcomes (y). To do this we want to solve the following equation for b:
Pre-multiplying both sides by \((\mathbf{X}^{\intercal}\mathbf{X})^{-1}\) gives us our conventional solution for b, namely \(\mathbf{b}=(\mathbf{X}^{\intercal}\mathbf{X})^{-1}\mathbf{X}^{\intercal}\mathbf{y}\). However, this requires us to find the inverse of \(\mathbf{X}^{\intercal}\mathbf{X}\). Unfortunately, the computational functions that are used to compute inverses of matrices sometimes end up being numerically unstable. For example, consider the following simulated data set of \(n=50\) cases:
# Number of casesn =50# Create 50 x-values evenly spread b/w 1 and 500 x =seq(from =1, to =500, len = n)# Create X matrixX =cbind(1, x, x^2, x^3)# Create b matrixb =matrix(data =c(1, 1, 1, 1), nrow =4 )# Create vector of y-valuesset.seed(1)y = X %*% b +rnorm(n, mean =0, sd =1)# Find determinant of (X^T)Xdet(t(X) %*% X)
[1] 3.113748e+32
Here we have simulated data using the following statistical model:
The determinant of \(\mathbf{X}^{\intercal}\mathbf{X}\) is not zero (although it is close to zero), so we should be able to find an inverse. What happens if we try to use solve() to find the inverse of the \(\mathbf{X}^{\intercal}\mathbf{X}\) matrix to obtain the regression coefficient estimates? Here crossprod(X) is equivalent to t(X) %*% X, and crossprod(X, y) is equivalent to t(X) %*% y.
# Try to compute bsolve(crossprod(X)) %*%crossprod(X, y)
Error in solve.default(crossprod(X)): system is computationally singular: reciprocal condition number = 2.93617e-17
The standard R function for inverse is computationally singular. To see why this happens, we can take a closer look at the \(\mathbf{X}^{\intercal}\mathbf{X}\) matrix. Here we will examine these values:
# Set the number of digitsoptions(digits =4)# Compute X^T(X) matrixcrossprod(X)
x
50 12525 4.217e+06 1.597e+09
x 12525 4217364 1.597e+09 6.454e+11
4217364 1597455115 6.454e+11 2.716e+14
1597455115 645401757068 2.716e+14 1.176e+17
Note the difference of several orders of magnitude. On a computer, we have a limited range of numbers. This makes some numbers behave like 0, when we also have to consider very large numbers. This in turn leads to what is essentially division by 0, which produces errors.
21.0.1 Estimating Regression Coefficients Using LU Decomposition
Remember, our goal is to solve the following for b.
This is exactly the type of problem that LU decomposition can help solve. Note that the first “term”, \((\mathbf{X}^{\intercal}\mathbf{X})\), is a square matrix. The right-hand side of the equation is known from the data, and we want to solve for the elements of b. This is the exact format of \(\mathbf{AX}=\mathbf{Y}\). So rather than trying to find the inverse of \((\mathbf{X}^{\intercal}\mathbf{X})\), we can carry out the decomposition on the \(\mathbf{X}^{\intercal}\mathbf{X}\) matrix to produce an L and U matrix that we can then use to solve for b.
# LU decompositionlibrary(matrixcalc)lu_decomp =lu.decomposition(crossprod(X))# View resultslu_decomp
Now we can solve for Z in the equation \(\mathbf{LZ}=\mathbf{Y}\), but remembering that here, the right-hand side of this equation is \(\mathbf{X}^{\intercal}\mathbf{y}\), so we are solving for Z in:
The lm() function (and most other statistical software) uses decomposition to solve for coefficients rather than inverting the \(\mathbf{X}^\intercal\mathbf{X}\) matrix. Although the estimates here are the same as those from LU decomposition, the lm() function actually uses QR decomposition. We will examine this method in the next chapter.