đź“ť More Path Analysis

Published

August 2, 2023


In this set of notes, we will use the data from path-model-achievement.csv to fit a path model to explain effects on high school achievement. The data were simulated to include attributes for 1000 students from information provided by Keith (2015).

# Load libraries
library(tidyverse)
library(broom)
library(lavaan)

# Import data
keith = read_csv("https://raw.githubusercontent.com/zief0002/redesigned-adventure/main/data/path-model-achievement.csv")
keith


Hypothesized Path Model

The hypothesized path model is shown below. This model was based on previous literature related to student achievement and on Keith’s experiences as an educator and educational researcher.

Figure 1: Hypothesized path model for several predictors of high school achievement.


Estimating the Path Coefficients

We need to estimate 10 path coefficients between the measured variables and four additional paths between unmeasured variables and each of the measured effects. To do this we can use a regression model that includes the hypothesized causes to predict variation in each of the effects. The regression coefficients obtained are the path coefficients between each of the hypothesized causes and effect.

For example, the variable high school motivation is an effect in our model (i.e., it has arrows going into it). The hypothesized causes of high school motivation are family background, and academic ability. To obtain those two path coefficients:

# Fit model
lm.motivation = lm(motivation ~ 1 + fam_back + ability, data = keith)

# Obtain path coefficients
tidy(lm.motivation)

The path coefficient for the path from family background to high school motivation (p4) is 0.127, and that for the path from academic ability to high school motivation (p7) is 0.152. These represent the direct effects of family background and academic ability on high school motivation, respectively. Here we interpret the direct effect of academic ability on high school motivation:

  • Given the adequacy of our model, academic ability has a small positive effect on high school motivation. Each one-standard deviation increase in academic ability will result in a 0.127-standard deviation unit change on high school motivation, on average.

We can also estimate the path between the unmeasured variables (\(d_2\)) and high school motivation. This is computed as:

\[ \sqrt{1-R^2} \]

where \(R^2\) is the coefficient of determination from the fitted regression. In our example

# Obtain R2
glance(lm.motivation)
# Compute path
sqrt(1 - 0.0552)
[1] 0.9720082

The path between \(d_2\) and high school motivation is 0.972. This represents the effects of all other causes of high school motivation that we did not include in our hypothesized path model. This suggests that many of the influences of high school motivation are unaccounted for by the path model. Presumably, these are not important in positing the causal model. (Note if they are important, they should be included as measured variables in the path model.) The regression models to estimate the remaining paths are provided below (output not given), as well as the path estimates for the hypothesized path model.

# Effect: Academic Ability
lm.abilty = lm(ability ~ 1 + fam_back, data = sim_dat)
tidy(lm.abilty)
glance(lm.abilty)
sqrt(1 - 0.174)

# Effect: High School Coursework
lm.coursework = lm(coursework ~ 1 + fam_back + ability + motivation, data = sim_dat)
tidy(lm.coursework)
glance(lm.coursework)
sqrt(1 - 0.348)

# Effect: High School Achievement
lm.achieve = lm(achieve ~ 1 + fam_back + ability + motivation + coursework, data = sim_dat)
tidy(achieve)
glance(lm.achieve)
sqrt(1 - 0.629)

Figure 2: Hypothesized path model for several predictors of high school achievement. The path estimates are also included.


Estimating Effects

We can now use the path coefficients to estimate the effects on high school achievement. Based on the hypothesized path model, each of the measured variables has a direct effect on high school achievement.

  • The direct effect of family background on high school achievement is 0.069.
  • The direct effect of academic ability on high school achievement is 0.551.
  • The direct effect of high school motivation on high school achievement is 0.013.
  • The direct effect of high school coursework on high school achievement is 0.310.

There are also indirect effects of these four causes of high school achievement. For example high school motivation not only directly influences high school achievement, but also influences it by influencing high school coursework (i.e., more motivated students take higher level courses which results in higher achievement).

Each indirect effect is computed as the product of the path coefficients connecting the potential cause and effect. For example, the indirect effect of high school motivation on high school achievement via high school coursework is:

\[ \begin{split} \mathrm{Indirect~Effect} &= 0.267 \times 0.310 \\[1em] &= 0.083 \end{split} \]

For some of these measured variables there are multiple indirect effects on achievement. The overall indirect effects are computed as the sum of each of the indirect effects. For example the indirect effects of academic ability on high school achievement are:

  • Academic ability influences high school coursework which, in turn, influences high school achievement. This indirect effect is \(0.374 \times 0.310 = 0.083\)
  • Academic ability influences high school motivation which, in turn, influences high school achievement. This indirect effect is \(0.152 \times 0.013 = 0.002\)
  • Academic ability influences high school motivation which, in turn, influences high school coursework, which then influences high school achievement. This indirect effect is \(0.152 \times 0.267 \times 0.310 = 0.013\)

Thus the overall indirect effects of academic ability on high school achievement is:

\[ \begin{split} \mathrm{All~Indirect~Effects} &= (0.374 \times 0.310) + (0.152 \times 0.013) + (0.152 \times 0.267 \times 0.310) \\[1em] &= 0.130 \end{split} \]

The sum total of the direct and indirect effects give us the total effect of each hypothesized cause. For example, the total effect of academic ability on high school achievement is:

\[ \begin{split} \mathrm{Total~Effects} &= 0.551 + 0.130 \\[1em] &= 0.681 \end{split} \]

Putting all of this together,

Given the adequacy of our model, academic ability has a fairly large, positive effect on high school achievement. Each one-standard deviation increase in academic ability will result in a 0.681-standard deviation unit change in high school achievement, on average. Much of that influence is from the direct effect of academic ability on high school achievement (\(\hat\beta=0.551\)). A small part of academic ability’s effect on high school achievement is due to its influence on other factors (e.g., motivation) which, in turn, effect high school achievement.

The table below gives the direct, indirect, and total effects for each of the hypothesized causes of high school achievement.

Standardized direct, indirect, and total effects of substantive predictors influencing high school achievment.
Effect
Measure Direct Indirect Total
Family Background 0.069 0.348 0.417
Academic Ability 0.551 0.131 0.682
High School Motivation 0.013 0.083 0.096
High School Coursework 0.310 --- 0.310

The largest direct influences of high school achievement are from high school coursework and academic ability. Family background also has a large influence on high school achievement, but primarily indirectly. Lastly, high school motivation has a small effect both directly and indirectly on high school achievement.


Fitting the Path Model with lavaan

The {lavaan} package includes the sem() function, which can estimate coefficients in a path model. To use this function, we define the path model by identifying each of the effects on a separate line in a character string. This string takes the model formula from each of the lm() functions we fitted earlier. We can also include the path names as multipliers of the influences in this model formula. Here we use the paths from our hypothesized path model to define the effects in the path model:

# Define path model
path.model = "
  ability ~ p1*fam_back
  motivation ~ p4*fam_back + p7*ability
  coursework ~ p2*fam_back + p5*ability + p9*motivation
  achieve ~ p3*fam_back + p6*ability + p8*motivation + p10*coursework
"

Then we can give this model as input to the sem() function along with the data frame that includes the data to estimate the model. We assign this to an object and use summary() to obtain the results. The rsquare=TRUE argument in summary() print the \(R^2\) values, the latter of which we can use to estimate the paths from the unmeasured variables to the respective effects. You can also include other options to output additional estimates (e.g., ci = TRUE for confidence intervals). See here for additional information.

# Fit model
pm.1 = sem(path.model, data = keith)

# Results
summary(pm.1, rsquare = TRUE)
lavaan 0.6.15 ended normally after 1 iteration

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        14

  Number of observations                          1000

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  ability ~                                           
    fam_back  (p1)    0.417    0.029   14.508    0.000
  motivation ~                                        
    fam_back  (p4)    0.127    0.034    3.741    0.000
    ability   (p7)    0.152    0.034    4.502    0.000
  coursework ~                                        
    fam_back  (p2)    0.165    0.028    5.838    0.000
    ability   (p5)    0.374    0.028   13.194    0.000
    motivatn  (p9)    0.267    0.026   10.158    0.000
  achieve ~                                           
    fam_back  (p3)    0.069    0.022    3.202    0.001
    ability   (p6)    0.551    0.023   23.758    0.000
    motivatn  (p8)    0.013    0.021    0.604    0.546
    courswrk (p10)    0.310    0.024   12.996    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .ability           0.825    0.037   22.361    0.000
   .motivation        0.944    0.042   22.361    0.000
   .coursework        0.651    0.029   22.361    0.000
   .achieve           0.371    0.017   22.361    0.000

R-Square:
                   Estimate
    ability           0.174
    motivation        0.055
    coursework        0.348
    achieve           0.629

We can also estimate and obtain tests for the indirect effects as well. To do this we need to add model syntax defining each indirect effect we wish to include. This model syntax uses the path names to define how to compute that effect. For example to include the indirect effect of academic ability on high school achievement via high school motivation we would add the following line into our character string:

indirect_ability_via_motivation := p7*p8

The path model would then be defined and fitted:

# Define path model
path.model.2 = "
  ability ~ p1*fam_back
  motivation ~ p4*fam_back + p7*ability
  coursework ~ p2*fam_back + p5*ability + p9*motivation
  achieve ~ p3*fam_back + p6*ability + p8*motivation + p10*coursework
  indirect_ability_via_motivation := p7*p8
"

# Fit model
pm.2 = sem(path.model.2, data = keith)

# Results
summary(pm.2, rsquare = TRUE)
lavaan 0.6.15 ended normally after 1 iteration

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        14

  Number of observations                          1000

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  ability ~                                           
    fam_back  (p1)    0.417    0.029   14.508    0.000
  motivation ~                                        
    fam_back  (p4)    0.127    0.034    3.741    0.000
    ability   (p7)    0.152    0.034    4.502    0.000
  coursework ~                                        
    fam_back  (p2)    0.165    0.028    5.838    0.000
    ability   (p5)    0.374    0.028   13.194    0.000
    motivatn  (p9)    0.267    0.026   10.158    0.000
  achieve ~                                           
    fam_back  (p3)    0.069    0.022    3.202    0.001
    ability   (p6)    0.551    0.023   23.758    0.000
    motivatn  (p8)    0.013    0.021    0.604    0.546
    courswrk (p10)    0.310    0.024   12.996    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .ability           0.825    0.037   22.361    0.000
   .motivation        0.944    0.042   22.361    0.000
   .coursework        0.651    0.029   22.361    0.000
   .achieve           0.371    0.017   22.361    0.000

R-Square:
                   Estimate
    ability           0.174
    motivation        0.055
    coursework        0.348
    achieve           0.629

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)
    indrct_blty_v_    0.002    0.003    0.599    0.549

Here the indirect effect of academic ability on high school achievement via high school motivation is not statistically significant. This is likely because the direct effect of high school motivation on high school achievement is also not statistically significant (i.e., if that path is 0 then the product we get when computing the indirect effect will also be 0).

Here we define and compute the overall indirect effects for each of the hypothesized causes:

# Define path model
path.model.3 = "
  ability ~ p1*fam_back
  motivation ~ p4*fam_back + p7*ability
  coursework ~ p2*fam_back + p5*ability + p9*motivation
  achieve ~ p3*fam_back + p6*ability + p8*motivation + p10*coursework
  indirect_fam_back := p1*p5*p10 + p1*p6 + p1*p7*p8 + p1*p7*p9*p10 + p4*p8 + p4*p9*p10 + p2*p10
  indirect_ability := p7*p8 + p7*p9*p10 + p5*p10
  indirect_motivation := p9*p10
"

# Fit model
pm.3 = sem(path.model.3, data = keith)

# Results
summary(pm.2, rsquare = TRUE)
lavaan 0.6.15 ended normally after 1 iteration

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        14

  Number of observations                          1000

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  ability ~                                           
    fam_back  (p1)    0.417    0.029   14.508    0.000
  motivation ~                                        
    fam_back  (p4)    0.127    0.034    3.741    0.000
    ability   (p7)    0.152    0.034    4.502    0.000
  coursework ~                                        
    fam_back  (p2)    0.165    0.028    5.838    0.000
    ability   (p5)    0.374    0.028   13.194    0.000
    motivatn  (p9)    0.267    0.026   10.158    0.000
  achieve ~                                           
    fam_back  (p3)    0.069    0.022    3.202    0.001
    ability   (p6)    0.551    0.023   23.758    0.000
    motivatn  (p8)    0.013    0.021    0.604    0.546
    courswrk (p10)    0.310    0.024   12.996    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .ability           0.825    0.037   22.361    0.000
   .motivation        0.944    0.042   22.361    0.000
   .coursework        0.651    0.029   22.361    0.000
   .achieve           0.371    0.017   22.361    0.000

R-Square:
                   Estimate
    ability           0.174
    motivation        0.055
    coursework        0.348
    achieve           0.629

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)
    indrct_blty_v_    0.002    0.003    0.599    0.549

Finally, we can fit a path model and estimate the direct, indirect, and total effects.

# Define path model
path.model.4 = "
  ability ~ p1*fam_back
  motivation ~ p4*fam_back + p7*ability
  coursework ~ p2*fam_back + p5*ability + p9*motivation
  achieve ~ p3*fam_back + p6*ability + p8*motivation + p10*coursework
  indirect_fam_back := p1*p5*p10 + p1*p6 + p1*p7*p8 + p1*p7*p9*p10 + p4*p8 + p4*p9*p10 + p2*p10
  indirect_ability := p7*p8 + p7*p9*p10 + p5*p10
  indirect_motivation := p9*p10
  total_fam_back := p3 + p1*p5*p10 + p1*p6 + p1*p7*p8 + p1*p7*p9*p10 + p4*p8 + p4*p9*p10 + p2*p10
  total_ability := p6 + p7*p8 + p7*p9*p10 + p5*p10
  total_motivation := p8 + p9*p10
  total_coursework := p10
"

# Fit model
pm.4 = sem(path.model.4, data = keith)

# Results
summary(pm.4, rsquare = TRUE)
lavaan 0.6.15 ended normally after 1 iteration

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        14

  Number of observations                          1000

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  ability ~                                           
    fam_back  (p1)    0.417    0.029   14.508    0.000
  motivation ~                                        
    fam_back  (p4)    0.127    0.034    3.741    0.000
    ability   (p7)    0.152    0.034    4.502    0.000
  coursework ~                                        
    fam_back  (p2)    0.165    0.028    5.838    0.000
    ability   (p5)    0.374    0.028   13.194    0.000
    motivatn  (p9)    0.267    0.026   10.158    0.000
  achieve ~                                           
    fam_back  (p3)    0.069    0.022    3.202    0.001
    ability   (p6)    0.551    0.023   23.758    0.000
    motivatn  (p8)    0.013    0.021    0.604    0.546
    courswrk (p10)    0.310    0.024   12.996    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .ability           0.825    0.037   22.361    0.000
   .motivation        0.944    0.042   22.361    0.000
   .coursework        0.651    0.029   22.361    0.000
   .achieve           0.371    0.017   22.361    0.000

R-Square:
                   Estimate
    ability           0.174
    motivation        0.055
    coursework        0.348
    achieve           0.629

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)
    indirct_fm_bck    0.348    0.024   14.751    0.000
    indirect_ablty    0.131    0.013    9.868    0.000
    indirect_mtvtn    0.083    0.010    8.003    0.000
    total_fam_back    0.417    0.029   14.508    0.000
    total_ability     0.682    0.023   29.460    0.000
    total_motivatn    0.095    0.021    4.448    0.000
    total_courswrk    0.310    0.024   12.996    0.000


Empirically Reducing the Model

The path between high school motivation and and high school achievement was not statistically significant (p = 0.546). This suggests that the effect we saw in the data may be attributable to sampling variation. One possibility is to reduce the model by removing this path.

Reduced path model for several predictors of high school achievement. In this model the direct effect between high school motivation and high school achievement was removed (path p8).

Note that removing a path can change both direct and indirect effects. In our path model, we would need to remove any path that traverses through path p8. There are two approaches to doing this:

One approach is to remove the path and use the original estimates to re-compute the direct, indirect, and total effects. For example, to compute the indirect effects of academic ability on high school achievement we would use:

\[ \begin{split} \mathrm{Indirect~Effects} &= (0.374 \times 0.310) + (0.152 \times 0.267 \times 0.310) \\[1em] &= 0.128521 \end{split} \]

A second approach is to re-fit the model to the empirical data. The fitted path model is shown below.

# Define path model
path.model.5 = "
  ability ~ p1*fam_back
  motivation ~ p4*fam_back + p7*ability
  coursework ~ p2*fam_back + p5*ability + p9*motivation
  achieve ~ p3*fam_back + p6*ability + p10*coursework
  indirect_fam_back := p1*p5*p10 + p1*p6  + p1*p7*p9*p10 + p4*p9*p10 + p2*p10
  indirect_ability := p7*p9*p10 + p5*p10
  indirect_motivation := p9*p10
  total_fam_back := p3 + p1*p5*p10 + p1*p6  + p1*p7*p9*p10 + p4*p9*p10 + p2*p10
  total_ability := p6 + p7*p9*p10 + p5*p10
  total_coursework := p10
"

# Fit model
pm.5 = sem(path.model.5, data = keith)

# Results
summary(pm.5, rsquare = TRUE)
lavaan 0.6.15 ended normally after 1 iteration

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        13

  Number of observations                          1000

Model Test User Model:
                                                      
  Test statistic                                 0.365
  Degrees of freedom                                 1
  P-value (Chi-square)                           0.546

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  ability ~                                           
    fam_back  (p1)    0.417    0.029   14.508    0.000
  motivation ~                                        
    fam_back  (p4)    0.127    0.034    3.741    0.000
    ability   (p7)    0.152    0.034    4.502    0.000
  coursework ~                                        
    fam_back  (p2)    0.165    0.028    5.838    0.000
    ability   (p5)    0.374    0.028   13.194    0.000
    motivatn  (p9)    0.267    0.026   10.158    0.000
  achieve ~                                           
    fam_back  (p3)    0.070    0.022    3.240    0.001
    ability   (p6)    0.551    0.023   23.758    0.000
    courswrk (p10)    0.314    0.023   13.841    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .ability           0.825    0.037   22.361    0.000
   .motivation        0.944    0.042   22.361    0.000
   .coursework        0.651    0.029   22.361    0.000
   .achieve           0.371    0.017   22.361    0.000

R-Square:
                   Estimate
    ability           0.174
    motivation        0.055
    coursework        0.348
    achieve           0.629

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)
    indirct_fm_bck    0.347    0.024   14.740    0.000
    indirect_ablty    0.130    0.013    9.866    0.000
    indirect_mtvtn    0.084    0.010    8.189    0.000
    total_fam_back    0.417    0.029   14.508    0.000
    total_ability     0.682    0.023   29.460    0.000
    total_courswrk    0.314    0.023   13.841    0.000

Computing the indirect effects of academic ability on high school achievement for this re-fitted model we would use:

\[ \begin{split} \mathrm{Indirect~Effects} &= (0.374 \times 0.314) + (0.152 \times 0.267 \times 0.314) \\[1em] &= 0.1301794 \end{split} \]

Or, we could extract the estimate for indirect_ablty from the output. To get these results to more decimal places, we can use the parameterEstimates() function, which outputs a data frame of the results. We want to extract the est column, which includes the estimates. Note that the indirect effects of academic ability on high school achievement is the 16th element of the est vector.

# Data frame of parameter estimates
parameterEstimates(pm.5)
# Estimates column
parameterEstimates(pm.5)$est
 [1] 0.41700000 0.12651448 0.15224346 0.16516282 0.37442021 0.26686292
 [7] 0.07021160 0.55114506 0.31441104 0.82528489 0.94380759 0.65137255
[13] 0.37079389 0.99900000 0.34678840 0.13049578 0.08390465 0.41700000
[19] 0.68164084 0.31441104
# Indirect effects of academic ability on high school achievement
parameterEstimates(pm.5)$est[16]
[1] 0.1304958

Whether you reduce the model and re-fit it to the data, or use the estimates from the original fitted model and set non-significant effects to zero, the differences are minimal; in this case only one of the path coefficients changed when rounded to three decimal places.

References

Keith, T. V. (2015). Multiple regression and beyond: An introduction to multiple regression and structural equation modeling (2nd ed.). New York: Routledge.