PSYC 7804 - Regression with Lab
Spring 2025
lavaan package (Rosseel et al., 2024) is the most popular package to run SEM models in R. We will use it to run mediation analysis, which the SEM framework accommodates nicely.
semTools package (Jorgensen et al., 2025) provides many helpful SEM functions, many of which build upon lavaan.
We will look at the data in detail later, but for now run the line of code below. You should see a function called lav_summary() appear in your environment
lavaan gives a bit too much information when we run summary() (try and see!), so I made a function that selects the information that we need if we are just running mediation.
lavaan for Mediation?Before we begin, I would be remiss not to mention other packages for running mediation:
PROCESS macro: PROCESS is a macro for SPSS, SAS, and R that runs mediation and moderation analysis. There is no package for it, so you need to run all the functions before using it. Aside from that, I find it to be less flexible than lavaan for specifying different models, as well as the online help and documentation for the R version being very limited.
mediation package (Tingley et al., 2019): This package implements the causal mediation framework described in Imai et al. (2010). It is a very flexible package, and it accommodates both multilevel mediation and non-linear mediation. Tingley et al. (2014) describe many of the mediation package functionalities in detail. The one “problem” is that it is not very easy to specify multiple mediators and moderators. It is also more geared towards experimental designs.
lavaan is a package for structural equation modeling (SEM). SEM is a general framework that aims to explain how the observed correlation matrix among a set of variables arises by essentially running many regressions.
As we will see, mediation is nothing but two or more regression models. You certainly do not need SEM turn run mediation; however, lavaan makes it quite straightforward.
Today’s data is adapted from the examples shown here. Let’s say that in our fictional study, participants were told about a crime committed by someone, and were given varying degrees of detail (detail) regarding the crime. The outcome, opinion, is how many months the participants believe the criminal should spend in prison.
gender: Binary variable indicating participant’s gender (0 = male, 1 = female).
detail: How much information participants were given regarding the crime. (\(X\))
feeling: how severe participants felt the crime was after hearing the details. (\(M_1\))
impact: How negative the participants believed the details were.(\(M_2\))
opinion: How many months in prison the participants would give the criminal (measured last). (\(Y\))
lavaanBecause of its flexibility, lavaan’s models require a specific syntax.
lm()
Call: lm(formula = opinion ~ detail, data = dat)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.4162 3.1027 5.936 1.29e-08 ***
detail 0.6508 0.0583 11.163 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard deviation: 8.432 on 198 degrees of freedom
Multiple R-squared: 0.3862
F-statistic: 124.6 on 1 and 198 DF, p-value: < 2.2e-16
AIC BIC
1424.38 1434.27
lavaan
First, we need to specify the model as a character. The syntax is similar to lm():
Remember that every arrow in the diagram corresponds to a slope
Importantly, you should see that arrowheads reach two boxes in the mediation diagram. This means that the mediation diagram implies 2 separate regression models:
M ~ X
Y ~ X + M
This means that \(X\) can influence \(Y\) directly through the \(c'\) path, but also indirectly by influencing \(M\) through the \(a\) path, and subsequently getting to \(X\) through the \(b\) path.
In mediation we can decompose the slope between \(X\) and \(Y\) into direct and indirect effect:
Total effect: \(c\)
the \(c\) path is the effect of \(X\) on \(Y\) when \(M\) is not accounted for.
Direct effect: \(c'\)
the \(c'\) (reads “c prime”) path is the effect of \(X\) on \(Y\) when \(M\) is accounted for.
Indirect effect: \(a \times b\)
\(a \times b\) represents the change in the effect of \(X\) on \(Y\) when \(M\) is accounted for. That is, how much \(Y\) is influenced by \(X\) indirectly through \(M\). (\(a \times b = c - c'\))
As always, this stuff does not come out of nowhere, and it’s actually quite quick to show why we say that \(c = c'+ a\times b\). We start with the 3 equations implied by the path diagrams:
\(Y = c \times X\)
\(M = a \times X\)
\(Y = c'\times X + b \times M\)
feeling (\(M_1\)) mediates the relation between detail (\(X\)) and opinion (\(Y\)).
lavaanlavaan is more practical because it allows to specify more complex models in one go and also lets us generate confidence intervals through bootstrapping.
mod_med <- "opinion ~ c*detail + b*feeling
feeling ~ a*detail
indirect := a*b
total := c + a*b"
# may take 5 to 15 seconds to run (ignore warnings about convergence of bootstraps for regression models, the developer said so)
lav_med <- sem(mod_med, dat,
se = "boot", bootstrap = 2000,
parallel ="snow", ncpus = 4)
lav_summary(lav_med) lhs op rhs label est ci.lower ci.upper
1 opinion ~ detail c 0.424 0.299 0.558
2 opinion ~ feeling b 0.411 0.243 0.567
3 feeling ~ detail a 0.552 0.466 0.638
4 indirect := a*b indirect 0.227 0.130 0.325
5 total := c+a*b total 0.651 0.548 0.757
*.
:=. Here I calculate the indirect and total effect.
se = "boot" tells the function that we want bootstrapped confidence intervals.
bootstrap = 2000 requests 2000 bootstrap draws, but you can get away with 1000 usually.
parallel ="snow" and ncpus = 4 are for splitting the work among 4 CPU cores, making the bootstrap procedure 4 times faster in theory.
feeling mediates the relation between detail and opinion.
lavaan omits intercepts by default in most models.
feeling) after hearing details about the crime (detail), the effect of detail on how many months participants thought the criminal should spend in prison was reduced by \(.23\).
detail causes higher feeling which in turn causes higher opinion (all slopes were positive).
(apologies for the wall of text 😶)
monteCarloCI() function from semTools to get Monte Carlo CIs parameters generated with the := operator.
semTools accounts for the correlation when sampling values of \(a\) and \(b\) (it samples from a multivariate normal distribution). Still, the results are very close.
lavaan is that we can seamlessly calculate effect sizes for mediation and get confidence intervals for them! Some effect sizes are:
lhs op rhs label est ci.lower ci.upper
1 opinion ~ detail c 0.424 0.293 0.549
2 opinion ~ feeling b 0.411 0.258 0.567
3 feeling ~ detail a 0.552 0.460 0.643
4 indirect := a*b indirect 0.227 0.137 0.326
5 total := c+a*b total 0.651 0.548 0.755
6 ind_tot := indirect/total ind_tot 0.349 0.212 0.512
7 ind_dir := indirect/c ind_dir 0.536 0.269 1.047
detail on opinion was mediated by feeling. This measure ranges from \(0\) to \(1\).
These measures have limitations in small sample sizes, so see Preacher & Kelley (2011) for a detailed discussion.
impact (\(M_2\)) as a mediator. That is, we believe that detail causes both feeling and impact, which in turn jointly cause opinion. The model implies 3 regressions. (I’ll start taking out the \(\times\) for shorter equations)
\[M_1 = a_1X\]
\[Y = c'X + b_1 M_1 + b_2 M_2\]
\[M_2 = a_2X\]
Thus, we have 2 indirect effects of detail on opinion.
feeling: \(a_1 \times b_1\)
impact: \(a_2 \times b_2\)
And the total indirect effect of detail on opinion is:
\[a_1 \times b_1 + a_2 \times b_2\]
lavaanlavaan model to add \(M_2\):
lhs op rhs label est ci.lower ci.upper
1 opinion ~ detail c 0.421 0.269 0.581
2 opinion ~ feeling b1 0.410 0.243 0.571
3 opinion ~ impact b2 0.006 -0.146 0.173
4 feeling ~ detail a1 0.552 0.466 0.638
5 impact ~ detail a2 0.609 0.514 0.706
6 indirect1 := a1*b1 indirect1 0.226 0.128 0.328
7 indirect2 := a2*b2 indirect2 0.003 -0.086 0.106
8 total := c+a1*b1+a2*b2 total 0.651 0.548 0.757
impact does not mediate the relation between detail and opinion.
detail on opinion after accounting for both feeling and impact.
detail on opinion is mediated overall. You can also calculate it with total \(- c'\).
gender. This model implies 2 regressions.
\[M = aX + z_1Z + a_zZX\]
\[Y = c'X + bM + z_2Z + b_zZM \]
Male equations: \(M = aX\) and \(Y = c'X + bM\)
We substitute \(0\) to any term with \(Z\), they just cancel out, and the indirect effect for male is simply \(a \times b\).
Female equations: \(M = aX + z_1 + a_zX\) and \(Y = c'X + bM + z_2 + b_zM\)
We substitute \(1\) to any term with \(Z\), the interaction terms are added to the the \(a\) and \(b\) paths. So, the indirect effect for female is \((a + a_z) \times (b + b_z)\).
lavaan lhs op rhs label est ci.lower ci.upper
1 opinion ~ detail c 0.421 0.289 0.551
2 opinion ~ feeling b 0.477 0.291 0.669
3 opinion ~ gender z2 6.716 -5.640 20.394
4 opinion ~ feeling:gender bz -0.136 -0.388 0.093
5 feeling ~ detail a 0.636 0.513 0.775
6 feeling ~ gender z1 12.491 3.125 22.215
7 feeling ~ detail:gender az -0.134 -0.305 0.030
8 indirect_female := (a+az)*(b+bz) indirect_female 0.172 0.055 0.294
9 indirect_male := a*b indirect_male 0.304 0.178 0.441
lavaan, we use the : operator. Unlike lm(), you have to specify all the terms manually.
lavaan, is where things start to get interesting in my opinion.

The value in mastering more advanced methods such as path analysis lies in the freedom that you gain as a researcher. You no longer have to fit your hypotheses and variables to some basic analysis such as a \(t\)-test of ANOVA; you can create a unique model that tests your unique and creative hypotheses 😄
PSYC 7804 - Lab 12: Mediation Analysis