Time Varying Mediation Function: Binary Outcome and Two Treatment Groups

Yajnaseni Chakraborti

yajnaseni.chakraborti@temple.edu

Donna L. Coffman

dcoffman@temple.edu

Harry Zobel

harryzobeljr@gmail.com

2021-03-05

Introduction

The purpose of this vignette is to provide users with a step-by-step guide for performing and interpreting the results of a time varying mediation analysis with two treatment groups and a binary outcome. Please note, that this package has been built considering the structure of a panel data, where each subject/participant has repeated responses (collected over time) for the outcome and mediator variables. However, we do not address dynamic treatment regimens in this package. Therefore, we assume the scenario where the treatment (exposure) is time-invariant (i.e., does not change over time).

Data

For illustration, we rely on an example dataset, simulated based on the Wisconsin Smokers’ Health Study 2 data (Baker et. al., 2016) which includes 1086 individuals assigned to one of three treatment conditions. One-third of participants received only a nicotine patch; another one-third received varenicline, and the final third of participants received a combination nicotine replacement therapy (NRT) which included nicotine patch + nicotine mini-lozenge. The outcome of interest is daily smoking status, a binary variable derived from the number of cigarettes smoked by the participants each day. If participants smoked any cigarettes that day, then daily smoking status = 1, otherwise it equals 0. In addition, mediator variables were measured by asking participants if they felt a negative mood in the last fifteen minutes, whether they wanted to smoke in the last 15 minutes, and how tired they felt of trying to quit smoking (i.e., cessation fatigue), all recorded on a 7-point Likert scale. Both the outcome and mediator variables were assessed two times per day for the first two weeks post-quit day (rendering 30 time points of response since assessments start on day 0 post target quit day), and every other day (2x per day) for weeks 3 and 4 (rendering 14 time points of response).

A traditional approach to analyzing this type of data would be to use mediation analysis in which the effects are assumed to not vary as a function of time. First, a single (i.e., time-invariant) direct effect would be estimated by regressing the outcome on the treatment condition and mediator. Next, a time-invariant indirect effect would be computed by multiplying the effect of treatment condition on the mediator by the effect of the mediator on the outcome. However, this method potentially misses important information about the dynamic effect that a mediator may have over time. Specifically, we hypothesize that mood changes across and within days and thus, its mediating effect on one’s success of quitting smoking is likely to vary over time. We therefore propose a time varying mediation analysis which estimates the mediation effect as a function that varies over time.

The tvmb function, like tvma, was developed for estimating the mediation effect of two treatment (exposure) groups.

Getting started

To use the time varying mediation analysis package in R, you must first install the package and load it. Before that, make sure you have R version 4.0.3 or greater. There are two ways to install the package from the CRAN (Comprehensive R Archive Network) repository, by using install.packages or the devtools function.

install.packages("tvmediation", dependencies = TRUE)
library(tvmediation)

The equivalent code using devtools is:

devtools::install_cran("tvmediation", dependencies = TRUE) 
library(tvmediation)

If you do not have devtools installed and loaded, you can do so using the following code:

install.packages("devtools", dependencies = TRUE)
library(devtools)

Alternatively, if you want to install the package directly from the GitHub repository to access new or revised functions in development, use the following code:

devtools::install_github("dcoffman/tvmediation", dependencies = TRUE) 
library(tvmediation)

Formatting your data before calling the tvmb function

Once installed, you can type ?tvmediation in the console to view the package documentation, as well as links to the important functions and data included in the package. The time-varying mediation analysis for the binary outcome and 2 exposure groups, relies on 2 user functions tvmb and LongToWide as well as a number of internal functions of the tvmediation package.

The tvmb function has four required and four optional arguments.

  1. treatment A binary vector indicating the treatment group
  2. t.seq A numeric vector of the time sequence of the measures
  3. mediator The matrix of mediator values in wide format
  4. outcome The matrix of outcome values in wide format

The optional arguments are:

  1. plot TRUE or FALSE for plotting the mediation effect. The default value is “FALSE”.
  2. CI “none” or “boot” method of deriving confidence intervals (CIs). The default value is “boot”.
  3. replicates Number of replicates for bootstrapping CIs. The default value is 1000.
  4. verbose TRUE or FALSE for printing results to screen. The default value is “FALSE”.

Note that, unlike the tvma and tvma_3trt functions, the time points at which mediation effects are estimated cannot be different than the actual recorded response time points t.seq. Thus, tvma and tvma_3trt have the provision of a different t.est than t.seq, but tvmb does not have that provision.

The dataset we will use for our illustration is named smoker and is included in the package.

To load the simulated dataset smoker.rda, type:

data(smoker)

The smoker data frame is organized in long format with SubjectID repeating over multiple rows for each participant. The tvmb function requires that the data be in wide format to estimate the time varying coefficients. The tvmediation package includes a useful function LongToWide to help users properly format their data for analysis.

LongToWide has three required arguments and a fourth optional argument.

  1. subject.id specifies the column of subject identifiers.
  2. time.sequences specifies the column of measurement times.
  3. outcome specifies the column for the variable (either the outcome or the mediator) that will be transposed.
  4. verbose is an optional argument that if “TRUE” prints the output of LongToWide to the console. The default value is “FALSE”.

The output of LongToWide is a matrix of data in wide format where columns represent the subjects and rows represent the time sequence. Thus, each cell contains the j-th subject’s response at the i-th time point.

The tvmb function requires two matrices, one for the mediator, and one for the outcome. Thus, we use the LongToWide function twice as illustrated below:

mediator <- LongToWide(smoker$SubjectID, smoker$timeseq, smoker$NegMoodLst15min)
outcome <- LongToWide(smoker$SubjectID, smoker$timeseq, smoker$smoke_status)
class(mediator)
## [1] "matrix" "array"
mediator[1:16, 1:10]
##     1 2  3  4  5  6  7  8  9 10
## 0   7 1  1  1  1 NA  4  1  1  3
## 0.5 2 1  1  6  1 NA NA  1  1  1
## 1   1 1  2 NA NA  1  3  1  1  1
## 1.5 1 1  1  1  1  1  2  1 NA  1
## 2   1 1  1  1 NA  1  1  1  1  1
## 2.5 3 1  1  1  1  3  1  1  1  1
## 3   1 6 NA  1  1  1  1  2 NA  1
## 3.5 1 1  2  6  3  7  1  6  1  1
## 4   1 1  1  1  1  1  1  1  2  4
## 4.5 1 1  1  5  1  5  2 NA  1  3
## 5   4 1  2  1 NA  5  2  1  1  1
## 5.5 1 1  2  1  1  2 NA  1  1  2
## 6   2 1  1  4  1  7  1 NA  2  3
## 6.5 1 1  1  1  1  1  1  5  3  1
## 7   1 2  1 NA  1  1 NA  1  1  4
## 7.5 2 2  1  1  1 NA  1  1  1  6
class(outcome)
## [1] "matrix" "array"
outcome[1:16, 1:10]
##     1 2  3  4  5  6  7  8  9 10
## 0   0 0  0  0  0 NA  0  1  1  0
## 0.5 1 0  0  1  1 NA NA  1  1  1
## 1   0 1  0 NA NA  0  0  0  0  0
## 1.5 1 1  1  0  0  0  0  0 NA  0
## 2   0 0  0  0 NA  1  0  0  0  0
## 2.5 0 0  0  0  0  1  1  0  0  1
## 3   0 0 NA  0  0  0  0  0 NA  0
## 3.5 0 1  0  0  0  1  1  1  1  1
## 4   0 0  0  0  0  1  0  1  0  0
## 4.5 1 1  0  0  0  1  1 NA  1  0
## 5   1 0  0  0 NA  0  0  0  0  0
## 5.5 1 0  0  1  0  1 NA  0  0  0
## 6   0 0  0  1  0  0  0 NA  0  1
## 6.5 0 0  0  1  0  1  1  1  0  1
## 7   0 0  0 NA  0  0 NA  0  0  0
## 7.5 0 1  0  0  1 NA  0  1  0  0

If your data are already in wide format, there is no need to use the LongToWide function and you can simply subset your dataset. However, mediator and outcome must be of class matrix; hence make sure you convert the class of the subsetted mediator and outcome objects to matrix before proceeding. This can be done using the R function as.matrix.

The tvmb function requires two more variables that we have not yet created:

  1. treatment A binary numeric vector indicating the treatment group
  2. t.seq A numeric vector of the time sequence of the measures

If there are two treatment groups, create a treatment variable with the following assignment: 1 for the comparator group and 0 for the placebo group. If there are three treatment groups, two dummy variables clearly indicating treatment1, treatment2 and placebo need to be created. An example is given as follows. In the smoker.rda dataset, three columns indicating the assignment of the three treatments are present.

Creating two dummy variables to indicate whether a participant was given varenicline or combination NRT. NRT1 indicates whether a subject received varenicline or not. NRT2 indicates whether a subject received combination NRT or not. Each subject’s response for the treatment group (e.g., varenicline) was converted to a numeric value, and 1 was subtracted to yield a vector of zeros and ones.

# Step 1: Since each subject has multiple rows of data, extract the unique response of each subject to receiving varenicline. The data is still in dataframe format.
trtv <- unique(smoker[ , c("SubjectID","varenicline")])
trtv[1:10,]
##    SubjectID varenicline
## 1          1          No
## 2          2         Yes
## 3          3         Yes
## 4          4          No
## 5          5         Yes
## 6          6         Yes
## 7          7         Yes
## 8          8         Yes
## 9          9         Yes
## 10        10         Yes
# Step 2: `2` to those subjects who received varenicline and `1` to the rest. The data is now in vector format.
trtv2 <- as.numeric(trtv[ , 2])
trtv2[1:10]
##  [1] 2 1 1 2 1 1 1 1 1 1
# Step 3: subtract 1 from these numeric responses and procure a vector of zeros and ones
NRT1 <- trtv2 -1
NRT1[1:10]
##  [1] 1 0 0 1 0 0 0 0 0 0
# Step 1: Since each subject has multiple rows of data, extract the unique response of each subject to receiving combination NRT. The data is still in dataframe format.
trtc <- unique(smoker[ , c("SubjectID","comboNRT")])
trtc[1:10,]
##    SubjectID comboNRT
## 1          1      Yes
## 2          2       No
## 3          3      Yes
## 4          4      Yes
## 5          5       No
## 6          6      Yes
## 7          7      Yes
## 8          8      Yes
## 9          9      Yes
## 10        10      Yes
# Step 2: `2` to those subjects who received combination NRT and `1` to the rest. The data is now in vector format.
trtc2 <- as.numeric(trtc[ , 2])
trtc2[1:10]
##  [1] 1 2 1 1 2 1 1 1 1 1
# Step 3: subtract 1 from these numeric responses and procure a vector of zeros and ones
NRT2 <- trtc2 -1
NRT2[1:10]
##  [1] 0 1 0 0 1 0 0 0 0 0

This steps can be alternatively collated into a single step and written as follows:

NRT1 <- as.numeric(unique(smoker[ ,c("SubjectID","varenicline")])[,2])-1
NRT2 <- as.numeric(unique(smoker[ ,c("SubjectID","comboNRT")])[,2])-1
NRT1[1:10]
##  [1] 1 0 0 1 0 0 0 0 0 0
NRT2[1:10]
##  [1] 0 1 0 0 1 0 0 0 0 0

Depending on our exposure of interest, i.e. whether we want to compare the mediation effects observed among participants with varenicline vs. placebo (nicotine patch), or combination NRT vs. placebo, or varenicline vs. combination NRT, a final variable was derived based on these two variables.

For example: The following codes are to derive the final variable to compare the mediation effects observed among participants with varenicline vs. placebo (nicotine patch).

treatment <- rep(NA, 1086) # replace 1086 with the number of subjects in your study #
treatment <- ifelse(NRT1==1 & NRT2==0, 1, ifelse(NRT1==0 & NRT2==0, 0, NA))
treatment[1:10]
##  [1]  1 NA  0  1 NA  0  0  0  0  0

The current version of the tvmb function only supports two treatment options. Therefore, in order to estimate the mediation effects observed among participants with combination NRT vs. placebo, the above code would be changed to,

# treatment <- rep(NA, 1086)
# treatment <- ifelse(NRT1==0 & NRT2==1, 1, ifelse(NRT1==0 & NRT2==0, 0, NA))

In case the user is interested in the mediation effects observed among participants with varenicline vs. not varenicline or combination NRT vs. not combination NRT1 where the comparator group is a collective of the remaining two treatment groups, refer to the tvma function vignette for guidance.

To generate t.seq we found the unique instance of each time point and then sorted from smallest to largest. There are 44 unique time points in the dataset where 0 after the decimal indicates the morning measurement and 5 after the decimal indicates the evening measurement recorded for that day.

t.seq <- sort(unique(smoker$timeseq))
t.seq
##  [1]  0.0  0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0
## [16]  7.5  8.0  8.5  9.0  9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5
## [31] 16.0 16.5 18.0 18.5 20.0 20.5 22.0 22.5 24.0 24.5 26.0 26.5 28.0 28.5

We are now ready to perform the time varying mediation analysis.

Calling the tvmb function

As discussed earlier, the tvmb function has four required and four optional arguments.

  1. treatment A binary vector indicating the treatment group
  2. t.seq A numeric vector of the time sequence of the measures
  3. mediator The matrix of mediator values in wide format
  4. outcome The matrix of outcome values in wide format

The optional arguments are:

  1. plot TRUE or FALSE for plotting the mediation effect. The default value is “FALSE”.
  2. CI “none” or “boot” method of deriving CIs. The default value is “boot”.
  3. replicates Number of replicates for bootstrapping CIs. The default value is 1000.
  4. verbose TRUE or FALSE for printing results to screen. The default value is “FALSE”.

We will call the function with additional optional arguments plot=TRUE and replicates = 250. We decreased the number of bootstrap replicates so that this vignette compiles faster but we suggest using at least 500 replicates in an actual analysis. The remaining optional arguments are left to their respective default values.

results_tvmb <- tvmb(treatment, t.seq, mediator, outcome, 
                     plot = TRUE, replicates = 250)

Results

The tvmb function returns a list of results including:

  1. alpha_hat estimated time-varying treatment effect on mediator
  2. CI.lower.a CI lower limit for coefficient alpha_hat
  3. CI.upper.a CI upper limit for coefficient alpha_hat
  4. gamma_hat estimated time-varying direct effect of the treatment on the outcome
  5. CI.lower.g CI lower limit for coefficient gamma_hat
  6. CI.upper.g CI upper limit for coefficient gamma_hat
  7. beta_hat estimated time-varying effect of the mediator on the outcome
  8. CI.lower.b CI lower limit for coefficient beta_hat
  9. CI.upper.b CI upper limit for coefficient beta_hat
  10. tau_hat estimated time-varying total effect of the treatment the on outcome
  11. CI.lower.t CI lower limit for coefficient tau_hat
  12. CI.upper.t CI upper limit for coefficient tau_hat
  13. medEffect time varying mediation effect of treatment on the outcome

Optional returns based on argument CI = "boot" include:

  1. CI.low CI lower limit of the time varying mediation effect medEffect
  2. CI.upper CI upper limit of the time varying mediation effect medEffect

The above estimates are compiled in a single dataframe which can be accessed using nameOfStoredResultsObjb$Estimates. The following line of code displays only the estimates at the first 6 time-points.

head(results_tvmb$Estimates)
##   timeseq    alpha_hat  CI.lower.a CI.upper.a   gamma_hat  CI.lower.g
## 1     0.0  0.060626731 -0.06059372 0.19978603          NA          NA
## 2     0.5  0.039845226 -0.06319683 0.15544015  0.08192034 -0.09545530
## 3     1.0  0.021490259 -0.06666099 0.11663719  0.04782434 -0.08812955
## 4     1.5  0.004829876 -0.07350330 0.08580284  0.01214661 -0.09410940
## 5     2.0 -0.009597311 -0.08239532 0.06160989 -0.02211706 -0.11272710
## 6     2.5 -0.023431999 -0.09166583 0.04540165 -0.03354099 -0.12381776
##   CI.upper.g     beta_hat  CI.lower.b CI.upper.b     tau_hat  CI.lower.t
## 1         NA           NA          NA         NA          NA          NA
## 2 0.24753082  0.035533997 -0.01103305 0.08081638  0.08504658 -0.06909640
## 3 0.17956777  0.021386588 -0.01995044 0.05860413  0.05420161 -0.06988970
## 4 0.12183839  0.007952471 -0.03304772 0.04025825  0.02047805 -0.08242043
## 5 0.08304135 -0.005250423 -0.04727972 0.02731000 -0.01275758 -0.10249298
## 6 0.07590032 -0.015531236 -0.05533601 0.01970559 -0.02528449 -0.11669428
##   CI.upper.t    medEffect     CI.lower    CI.upper
## 1         NA           NA           NA          NA
## 2 0.23850197 0.0018216948 -0.004581924 0.008151216
## 3 0.17495340 0.0015600361 -0.004369652 0.007242845
## 4 0.12202362 0.0012603747 -0.004264155 0.006454978
## 5 0.08935782 0.0009495644 -0.004275970 0.005806570
## 6 0.08483855 0.0005979124 -0.004325543 0.005257822

At each time point of interest timeseq = t.seq, the effects of the treatment on the mediator, the treatment on the outcome (adjusted and not adjusted for mediator), and the mediator on the outcome are estimated along with the respective 95% CIs. The CIs are computed via a non-parametric bootstrap method (Efron and Tibshirani, 1986), drawing samples of size 1086 from the original sample with replacement, estimating the sample mean and then applying the percentile method to compute the 95% CIs. Note that the CIs for the alpha, gamma, beta and tau coefficients (alpha_hat, gamma_hat, beta_hat, tau_hat) are computed regardless of the value of CI argument in the function. In the above results, medEffect is the estimated mediation effect of varenicline compared to nicotine patch only, that varies over timeseq. For CI = "boot" (which is the default option unless the user chooses otherwise) the 95% CI (CI.low, CI.upper) of medEffect is estimated via a similar bootstrapping and percentile technique described earlier for the coefficients.

If plot = TRUE argument is passed, the results will also include the following figures:

  1. plot1_a plot for alpha_hat with 95% CIs across timeseq
  2. plot2_g plot for gamma_hat with 95% CIs across timeseq
  3. plot3_b plot for beta_hat with 95% CIs across timeseq
  4. plot4_t plot for tau_hat with 95% CIs across timeseq
  5. MedEff plot for medEffect across timeseq
  6. MedEff_CI plot for medEffect with 95% CIs across timeseq
  7. bootstrap plot for estimated medEffect(s) from bootstrapped samples across timeseq

We recommend using the plots to interpret your findings as it may be difficult to derive meaning from the numerical values alone. To display the plots, use nameOfStoredResultsObj$ followed by the name of the plot to access the required plot accordingly. For example:

results_tvmb$plot1_a

In the above plot, the time-varying effect of varenicline on subjects’ negative mood in the last fifteen minutes is not statistically significant. The estimated 95% CI covers 0 (no effect).

results_tvmb$plot2_g

The above plot shows the time-varying direct effect of varenicline on subjects’ daily smoking status which is not statistically significant. The estimated 95% CI covers 0 (with the possible exception of days 5 and 6).

results_tvmb$plot3_b

In the above plot, the time-varying effect of subjects’ negative mood in the last fifteen minutes on subjects’ daily smoking status is not statistically significant as indicated by the estimated 95% CI covering 0.

results_tvmb$plot4_t

The above plot shows the time-varying total effect of varenicline on subjects’ daily smoking status which is not statistically significant. The estimated 95% CI covers 0 (no effect).

results_tvmb$MedEff

In the above plot, the time-varying effect of varenicline on daily smoking status that is mediated by the negative mood in the last fifteen minutes (compared to nicotine patch only) decreases in the first 2 weeks hovering mostly around 0 (no effect), then decreases sharply around day 14 before increasing again until day 21 and decreasing thereafter. However, the CIs include zero thus, we can conclude that this effect is not statistically significant (see the below plot MedEff_CI for the CIs).

results_tvmb$MedEff_CI

results_tvmb$bootstrap

The above plot shows the estimated mediation effect from bootstrapping the original sample 250 times with 1086 sample size.

The tvmb function computes bootstrap CIs by default. Therefore, the user may decide not to bootstrap CIs for the mediation effect by specifying CI = "none". However, if by mistake the user specifies CI = "none" and replicates = 500 simultaneously, the function will not display an error, but will simply execute without computing the CIs for mediation effect. Note that the CIs for the effects of the treatment on the mediator and the mediator on the outcome, and for the direct and total effects are computed even if the user passes the argument CI = "none".

Summary

The tvmediation package provides a set of functions for estimating mediation effects that vary over time for both binary and continuous time-varying outcomes. Currently, the package only allows for a time-invariant treatment. The mediator and outcome are assumed to be time-varying, such as intensive longitudinal measurements obtained via Ecological Momentary Assessment or via wearable and mobile devices. The development of this tool has widespread application for use in human behavior research, clinical trials, addiction research, and others by allowing specification of mediation effects that vary as a function of time.

References

  1. Baker, T. B., Piper, M. E., Stein, J. H., Smith, S. S., Bolt, D. M., Fraser, D. L., & Fiore, M. C. (2016). Effects of nicotine patch vs varenicline vs combination nicotine replacement therapy on smoking cessation at 26 weeks: A randomized clinical trial. JAMA, 315(4), 371-379.}

  2. Efron, B. & Tibshirani, R. Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy. Statist. Sci. 1 (1986), no. 1, 54–75.}