đź“ť More Path Analysis
In this set of notes, we will use the data from path-model-achievement.csv to fit a path model to explain effects on high school achievement. The data were simulated to include attributes for 1000 students from information provided by Keith (2015).
# Load libraries
library(tidyverse)
library(broom)
library(lavaan)
# Import data
= read_csv("https://raw.githubusercontent.com/zief0002/epsy-8264/master/data/path-model-achievement.csv")
keith keith
Hypothesized Path Model
The hypothesized path model is shown below. This model was based on previous literature related to student achievement and on Keith’s experiences as an educator and educational researcher.
Estimating the Path Coefficients
We need to estimate 10 path coefficients between the measured variables and four additional paths between unmeasured variables and each of the measured effects. To do this we can use a regression model that includes the hypothesized causes to predict variation in each of the effects. The regression coefficients obtained are the path coefficients between each of the hypothesized causes and effect.
For example, the variable high school motivation is an effect in our model (i.e., it has arrows going into it). The hypothesized causes of high school motivation are family background, and academic ability. To obtain those two path coefficients:
# Fit model
= lm(motivation ~ 1 + fam_back + ability, data = keith)
lm.motivation
# Obtain path coefficients
tidy(lm.motivation)
The path coefficient for the path from family background to high school motivation (p4) is 0.127, and that for the path from academic ability to high school motivation (p7) is 0.152. These represent the direct effects of family background and academic ability on high school motivation, respectively. Here we interpret the direct effect of academic ability on high school motivation:
- Given the adequacy of our model, academic ability has a small positive effect on high school motivation. Each one-standard deviation increase in academic ability will result in a 0.127-standard deviation unit change on high school motivation, on average.
We can also estimate the path between the unmeasured variables (\(d_2\)) and high school motivation. This is computed as:
\[ \sqrt{1-R^2} \]
where \(R^2\) is the coefficient of determination from the fitted regression. In our example
# Obtain R2
glance(lm.motivation)
# Compute path
sqrt(1 - 0.0552)
[1] 0.9720082
The path between \(d_2\) and high school motivation is 0.972. This represents the effects of all other causes of high school motivation that we did not include in our hypothesized path model. This suggests that many of the influences of high school motivation are unaccounted for by the path model. Presumably, these are not important in positing the causal model. (Note if they are important, they should be included as measured variables in the path model.) The regression models to estimate the remaining paths are provided below (output not given), as well as the path estimates for the hypothesized path model.
# Effect: Academic Ability
= lm(ability ~ 1 + fam_back, data = sim_dat)
lm.abilty tidy(lm.abilty)
glance(lm.abilty)
sqrt(1 - 0.174)
# Effect: High School Coursework
= lm(coursework ~ 1 + fam_back + ability + motivation, data = sim_dat)
lm.coursework tidy(lm.coursework)
glance(lm.coursework)
sqrt(1 - 0.348)
# Effect: High School Achievement
= lm(achieve ~ 1 + fam_back + ability + motivation + coursework, data = sim_dat)
lm.achieve tidy(achieve)
glance(lm.achieve)
sqrt(1 - 0.629)
Estimating Effects
We can now use the path coefficients to estimate the effects on high school achievement. Based on the hypothesized path model, each of the measured variables has a direct effect on high school achievement.
- The direct effect of family background on high school achievement is 0.069.
- The direct effect of academic ability on high school achievement is 0.551.
- The direct effect of high school motivation on high school achievement is 0.013.
- The direct effect of high school coursework on high school achievement is 0.310.
There are also indirect effects of these four causes of high school achievement. For example high school motivation not only directly influences high school achievement, but also influences it by influencing high school coursework (i.e., more motivated students take higher level courses which results in higher achievement).
Each indirect effect is computed as the product of the path coefficients connecting the potential cause and effect. For example, the indirect effect of high school motivation on high school achievement via high school coursework is:
\[ \begin{split} \mathrm{Indirect~Effect} &= 0.267 \times 0.310 \\[1em] &= 0.083 \end{split} \]
For some of these measured variables there are multiple indirect effects on achievement. The overall indirect effects are computed as the sum of each of the indirect effects. For example the indirect effects of academic ability on high school achievement are:
- Academic ability influences high school coursework which, in turn, influences high school achievement. This indirect effect is \(0.374 \times 0.310 = 0.083\)
- Academic ability influences high school motivation which, in turn, influences high school achievement. This indirect effect is \(0.152 \times 0.013 = 0.002\)
- Academic ability influences high school motivation which, in turn, influences high school coursework, which then influences high school achievement. This indirect effect is \(0.152 \times 0.267 \times 0.310 = 0.013\)
Thus the overall indirect effects of academic ability on high school achievement is:
\[ \begin{split} \mathrm{All~Indirect~Effects} &= (0.374 \times 0.310) + (0.152 \times 0.013) + (0.152 \times 0.267 \times 0.310) \\[1em] &= 0.130 \end{split} \]
The sum total of the direct and indirect effects give us the total effect of each hypothesized cause. For example, the total effect of academic ability on high school achievement is:
\[ \begin{split} \mathrm{Total~Effects} &= 0.551 + 0.130 \\[1em] &= 0.681 \end{split} \]
Putting all of this together,
Given the adequacy of our model, academic ability has a fairly large, positive effect on high school achievement. Each one-standard deviation increase in academic ability will result in a 0.681-standard deviation unit change in high school achievement, on average. Much of that influence is from the direct effect of academic ability on high school achievement (\(\hat\beta=0.551\)). A small part of academic ability’s effect on high school achievement is due to its influence on other factors (e.g., motivation) which, in turn, effect high school achievement.
The table below gives the direct, indirect, and total effects for each of the hypothesized causes of high school achievement.
Measure | Direct | Indirect | Total |
---|---|---|---|
Family Background | 0.069 | 0.348 | 0.417 |
Academic Ability | 0.551 | 0.131 | 0.682 |
High School Motivation | 0.013 | 0.083 | 0.096 |
High School Coursework | 0.310 | --- | 0.310 |
The largest direct influences of high school achievement are from high school coursework and academic ability. Family background also has a large influence on high school achievement, but primarily indirectly. Lastly, high school motivation has a small effect both directly and indirectly on high school achievement.
Fitting the Path Model with lavaan
The {lavaan}
package includes the sem()
function, which can estimate coefficients in a path model. To use this function, we define the path model by identifying each of the effects on a separate line in a character string. This string takes the model formula from each of the lm()
functions we fitted earlier. We can also include the path names as multipliers of the influences in this model formula. Here we use the paths from our hypothesized path model to define the effects in the path model:
# Define path model
= "
path.model ability ~ p1*fam_back
motivation ~ p4*fam_back + p7*ability
coursework ~ p2*fam_back + p5*ability + p9*motivation
achieve ~ p3*fam_back + p6*ability + p8*motivation + p10*coursework
"
Then we can give this model as input to the sem()
function along with the data frame that includes the data to estimate the model. We assign this to an object and use summary()
to obtain the results. The rsquare=TRUE
argument in summary()
print the \(R^2\) values, the latter of which we can use to estimate the paths from the unmeasured variables to the respective effects. You can also include other options to output additional estimates (e.g., ci = TRUE
for confidence intervals). See here for additional information.
# Fit model
.1 = sem(path.model, data = keith)
pm
# Results
summary(pm.1, rsquare = TRUE)
lavaan 0.6-12 ended normally after 1 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 14
Number of observations 1000
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|)
ability ~
fam_back (p1) 0.417 0.029 14.508 0.000
motivation ~
fam_back (p4) 0.127 0.034 3.741 0.000
ability (p7) 0.152 0.034 4.502 0.000
coursework ~
fam_back (p2) 0.165 0.028 5.838 0.000
ability (p5) 0.374 0.028 13.194 0.000
motivatn (p9) 0.267 0.026 10.158 0.000
achieve ~
fam_back (p3) 0.069 0.022 3.202 0.001
ability (p6) 0.551 0.023 23.758 0.000
motivatn (p8) 0.013 0.021 0.604 0.546
courswrk (p10) 0.310 0.024 12.996 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.ability 0.825 0.037 22.361 0.000
.motivation 0.944 0.042 22.361 0.000
.coursework 0.651 0.029 22.361 0.000
.achieve 0.371 0.017 22.361 0.000
R-Square:
Estimate
ability 0.174
motivation 0.055
coursework 0.348
achieve 0.629
We can also estimate and obtain tests for the indirect effects as well. To do this we need to add model syntax defining each indirect effect we wish to include. This model syntax uses the path names to define how to compute that effect. For example to include the indirect effect of academic ability on high school achievement via high school motivation we would add the following line into our character string:
indirect_ability_via_motivation := p7*p8
The path model would then be defined and fitted:
# Define path model
.2 = "
path.model ability ~ p1*fam_back
motivation ~ p4*fam_back + p7*ability
coursework ~ p2*fam_back + p5*ability + p9*motivation
achieve ~ p3*fam_back + p6*ability + p8*motivation + p10*coursework
indirect_ability_via_motivation := p7*p8
"
# Fit model
.2 = sem(path.model.2, data = keith)
pm
# Results
summary(pm.2, rsquare = TRUE)
lavaan 0.6-12 ended normally after 1 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 14
Number of observations 1000
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|)
ability ~
fam_back (p1) 0.417 0.029 14.508 0.000
motivation ~
fam_back (p4) 0.127 0.034 3.741 0.000
ability (p7) 0.152 0.034 4.502 0.000
coursework ~
fam_back (p2) 0.165 0.028 5.838 0.000
ability (p5) 0.374 0.028 13.194 0.000
motivatn (p9) 0.267 0.026 10.158 0.000
achieve ~
fam_back (p3) 0.069 0.022 3.202 0.001
ability (p6) 0.551 0.023 23.758 0.000
motivatn (p8) 0.013 0.021 0.604 0.546
courswrk (p10) 0.310 0.024 12.996 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.ability 0.825 0.037 22.361 0.000
.motivation 0.944 0.042 22.361 0.000
.coursework 0.651 0.029 22.361 0.000
.achieve 0.371 0.017 22.361 0.000
R-Square:
Estimate
ability 0.174
motivation 0.055
coursework 0.348
achieve 0.629
Defined Parameters:
Estimate Std.Err z-value P(>|z|)
indrct_blty_v_ 0.002 0.003 0.599 0.549
Here the indirect effect of academic ability on high school achievement via high school motivation is not statistically significant. This is likely because the direct effect of high school motivation on high school achievement is also not statistically significant (i.e., if that path is 0 then the product we get when computing the indirect effect will also be 0).
Here we define and compute the overall indirect effects for each of the hypothesized causes:
# Define path model
.3 = "
path.model ability ~ p1*fam_back
motivation ~ p4*fam_back + p7*ability
coursework ~ p2*fam_back + p5*ability + p9*motivation
achieve ~ p3*fam_back + p6*ability + p8*motivation + p10*coursework
indirect_fam_back := p1*p5*p10 + p1*p6 + p1*p7*p8 + p1*p7*p9*p10 + p4*p8 + p4*p9*p10 + p2*p10
indirect_ability := p7*p8 + p7*p9*p10 + p5*p10
indirect_motivation := p9*p10
"
# Fit model
.3 = sem(path.model.3, data = keith)
pm
# Results
summary(pm.2, rsquare = TRUE)
lavaan 0.6-12 ended normally after 1 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 14
Number of observations 1000
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|)
ability ~
fam_back (p1) 0.417 0.029 14.508 0.000
motivation ~
fam_back (p4) 0.127 0.034 3.741 0.000
ability (p7) 0.152 0.034 4.502 0.000
coursework ~
fam_back (p2) 0.165 0.028 5.838 0.000
ability (p5) 0.374 0.028 13.194 0.000
motivatn (p9) 0.267 0.026 10.158 0.000
achieve ~
fam_back (p3) 0.069 0.022 3.202 0.001
ability (p6) 0.551 0.023 23.758 0.000
motivatn (p8) 0.013 0.021 0.604 0.546
courswrk (p10) 0.310 0.024 12.996 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.ability 0.825 0.037 22.361 0.000
.motivation 0.944 0.042 22.361 0.000
.coursework 0.651 0.029 22.361 0.000
.achieve 0.371 0.017 22.361 0.000
R-Square:
Estimate
ability 0.174
motivation 0.055
coursework 0.348
achieve 0.629
Defined Parameters:
Estimate Std.Err z-value P(>|z|)
indrct_blty_v_ 0.002 0.003 0.599 0.549
Finally, we can fit a path model and estimate the direct, indirect, and total effects.
# Define path model
.4 = "
path.model ability ~ p1*fam_back
motivation ~ p4*fam_back + p7*ability
coursework ~ p2*fam_back + p5*ability + p9*motivation
achieve ~ p3*fam_back + p6*ability + p8*motivation + p10*coursework
indirect_fam_back := p1*p5*p10 + p1*p6 + p1*p7*p8 + p1*p7*p9*p10 + p4*p8 + p4*p9*p10 + p2*p10
indirect_ability := p7*p8 + p7*p9*p10 + p5*p10
indirect_motivation := p9*p10
total_fam_back := p3 + p1*p5*p10 + p1*p6 + p1*p7*p8 + p1*p7*p9*p10 + p4*p8 + p4*p9*p10 + p2*p10
total_ability := p6 + p7*p8 + p7*p9*p10 + p5*p10
total_motivation := p8 + p9*p10
total_coursework := p10
"
# Fit model
.4 = sem(path.model.4, data = keith)
pm
# Results
summary(pm.4, rsquare = TRUE)
lavaan 0.6-12 ended normally after 1 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 14
Number of observations 1000
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|)
ability ~
fam_back (p1) 0.417 0.029 14.508 0.000
motivation ~
fam_back (p4) 0.127 0.034 3.741 0.000
ability (p7) 0.152 0.034 4.502 0.000
coursework ~
fam_back (p2) 0.165 0.028 5.838 0.000
ability (p5) 0.374 0.028 13.194 0.000
motivatn (p9) 0.267 0.026 10.158 0.000
achieve ~
fam_back (p3) 0.069 0.022 3.202 0.001
ability (p6) 0.551 0.023 23.758 0.000
motivatn (p8) 0.013 0.021 0.604 0.546
courswrk (p10) 0.310 0.024 12.996 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.ability 0.825 0.037 22.361 0.000
.motivation 0.944 0.042 22.361 0.000
.coursework 0.651 0.029 22.361 0.000
.achieve 0.371 0.017 22.361 0.000
R-Square:
Estimate
ability 0.174
motivation 0.055
coursework 0.348
achieve 0.629
Defined Parameters:
Estimate Std.Err z-value P(>|z|)
indirct_fm_bck 0.348 0.024 14.751 0.000
indirect_ablty 0.131 0.013 9.868 0.000
indirect_mtvtn 0.083 0.010 8.003 0.000
total_fam_back 0.417 0.029 14.508 0.000
total_ability 0.682 0.023 29.460 0.000
total_motivatn 0.095 0.021 4.448 0.000
total_courswrk 0.310 0.024 12.996 0.000
Empirically Reducing the Model
The path between high school motivation and and high school achievement was not statistically significant (p = 0.546). This suggests that the effect we saw in the data may be attributable to sampling variation. One possibility is to reduce the model by removing this path.
Note that removing a path can change both direct and indirect effects. In our path model, we would need to remove any path that traverses through path p8. There are two approaches to doing this:
One approach is to remove the path and use the original estimates to re-compute the direct, indirect, and total effects. For example, to compute the indirect effects of academic ability on high school achievement we would use:
\[ \begin{split} \mathrm{Indirect~Effects} &= (0.374 \times 0.310) + (0.152 \times 0.267 \times 0.310) \\[1em] &= 0.128521 \end{split} \]
A second approach is to re-fit the model to the empirical data. The fitted path model is shown below.
# Define path model
.5 = "
path.model ability ~ p1*fam_back
motivation ~ p4*fam_back + p7*ability
coursework ~ p2*fam_back + p5*ability + p9*motivation
achieve ~ p3*fam_back + p6*ability + p10*coursework
indirect_fam_back := p1*p5*p10 + p1*p6 + p1*p7*p9*p10 + p4*p9*p10 + p2*p10
indirect_ability := p7*p9*p10 + p5*p10
indirect_motivation := p9*p10
total_fam_back := p3 + p1*p5*p10 + p1*p6 + p1*p7*p9*p10 + p4*p9*p10 + p2*p10
total_ability := p6 + p7*p9*p10 + p5*p10
total_coursework := p10
"
# Fit model
.5 = sem(path.model.5, data = keith)
pm
# Results
summary(pm.5, rsquare = TRUE)
lavaan 0.6-12 ended normally after 1 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 13
Number of observations 1000
Model Test User Model:
Test statistic 0.365
Degrees of freedom 1
P-value (Chi-square) 0.546
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|)
ability ~
fam_back (p1) 0.417 0.029 14.508 0.000
motivation ~
fam_back (p4) 0.127 0.034 3.741 0.000
ability (p7) 0.152 0.034 4.502 0.000
coursework ~
fam_back (p2) 0.165 0.028 5.838 0.000
ability (p5) 0.374 0.028 13.194 0.000
motivatn (p9) 0.267 0.026 10.158 0.000
achieve ~
fam_back (p3) 0.070 0.022 3.240 0.001
ability (p6) 0.551 0.023 23.758 0.000
courswrk (p10) 0.314 0.023 13.841 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.ability 0.825 0.037 22.361 0.000
.motivation 0.944 0.042 22.361 0.000
.coursework 0.651 0.029 22.361 0.000
.achieve 0.371 0.017 22.361 0.000
R-Square:
Estimate
ability 0.174
motivation 0.055
coursework 0.348
achieve 0.629
Defined Parameters:
Estimate Std.Err z-value P(>|z|)
indirct_fm_bck 0.347 0.024 14.740 0.000
indirect_ablty 0.130 0.013 9.866 0.000
indirect_mtvtn 0.084 0.010 8.189 0.000
total_fam_back 0.417 0.029 14.508 0.000
total_ability 0.682 0.023 29.460 0.000
total_courswrk 0.314 0.023 13.841 0.000
Computing the indirect effects of academic ability on high school achievement for this re-fitted model we would use:
\[ \begin{split} \mathrm{Indirect~Effects} &= (0.374 \times 0.314) + (0.152 \times 0.267 \times 0.314) \\[1em] &= 0.1301794 \end{split} \]
Or, we could extract the estimate for indirect_ablty
from the output. To get these results to more decimal places, we can use the parameterEstimates()
function, which outputs a data frame of the results. We want to extract the est
column, which includes the estimates. Note that the indirect effects of academic ability on high school achievement is the 16th element of the est
vector.
# Data frame of parameter estimates
parameterEstimates(pm.5)
# Estimates column
parameterEstimates(pm.5)$est
[1] 0.41700000 0.12651448 0.15224346 0.16516282 0.37442021 0.26686292
[7] 0.07021160 0.55114506 0.31441104 0.82528489 0.94380759 0.65137255
[13] 0.37079389 0.99900000 0.34678840 0.13049578 0.08390465 0.41700000
[19] 0.68164084 0.31441104
# Indirect effects of academic ability on high school achievement
parameterEstimates(pm.5)$est[16]
[1] 0.1304958
Whether you reduce the model and re-fit it to the data, or use the estimates from the original fitted model and set non-significant effects to zero, the differences are minimal; in this case only one of the path coefficients changed when rounded to three decimal places.