Introduction to metapack

Daeyoung Lim


Table of Contents

The metapack package contains two main functions interfacing C++ using Rcpp (Eddelbuettel and Balamuta 2017) and RcppArmadillo (Eddelbuettel and Sanderson 2014): bayes.parobs and bayes.nmr. bayes.parobs fits multivariate meta-regression models with a partially observed sample covariance matrix (Yao et al. 2015), and bayes.nmr fits the network meta-regression models. We refer the readers to the Appendix for the mathematical formulations of the models.

To use either of our functions, bayes.parobs and bayes.nmr, the data set must be preprocessed into the following format, with the only exception that bayes.nmr has ZCovariate instead of WCovariate to be consistent with the mathematical notation in the original paper.

Outcome SD XCovariate WCovariate Trial (k) Treat (t) Npt
\(\bar{y}_{\cdot 13}\) \(S_{13}\) \(\boldsymbol{x}_{13}^\prime\) \(\boldsymbol{w}_{13}^\prime\) 1 3 1000
\(\bar{y}_{\cdot 10}\) \(S_{10}\) \(\boldsymbol{x}_{10}^\prime\) \(\boldsymbol{w}_{20}^\prime\) 1 0 1000
\(\bar{y}_{\cdot 20}\) \(S_{20}\) \(\boldsymbol{x}_{20}^\prime\) \(\boldsymbol{w}_{20}^\prime\) 2 0 1000
\(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\)

The functions requires the data to be formatted in a way that every row corresponds to a unique pair of \((t,k)\) which stands for \(t\)th treatment and \(k\)th trial. Therefore, \(\bar{y}_{\cdot tk}\) and \(S_{tk}\) are \(J\)-dimensional row vectors for bayes.parobs, and scalars for bayes.nmr.

Fitting multivariate meta-regression with bayes.parobs

The variables included in the Table are required for bayes.parobs. The key argument in bayes.parobs is fmodel which specifies the model for the error covariance matrix. The objects enclosed in parentheses at the end of every bullet point are the hyperparameters associated with each model.

The remaining optional arguments are


bayes.parobs returns


The following boilerplate code demonstrates how bayes.parobs can be used:

Rho_init <- diag(1, nrow = J)
Rho_init[upper.tri(Rho_init)] <- Rho_init[lower.tri(Rho_init)] <- 0.5
bayes.parobs(Outcome, SD, XCovariate, WCovariate, Trial, Treat, Npt,
          fmodel = 4, prior = list(c0 = 1e6),
          mcmc = list(ndiscard = 1, nskip = 1, nkeep = 1),
          control = list(Rho_stepsize=0.05, R_stepsize=0.05),
          group = Group,
          scale_x = TRUE, verbose = TRUE)

fmodel can be a different number from 1 to 5.

Fitting network meta-regression with bayes.nmr

Unlike bayes.parobs, the aggregate design matrix for random effects, ZCovariate, is optional. If missing, bayes.nmr will assign a vector of ones for ZCovariate, i.e., \(\boldsymbol{Z}_k = \boldsymbol{1}\) for all \(k\). This reduces the model to have \(\mathrm{Var}(\bar{y}_{tk}) = \sigma_{tk}^2/n_{tk} + \exp(2\phi)\) for every \((t,k)\).

The remaining optional arguments are


bayes.nmr returns


The following boilerplate code demonstrates how bayes.nmr can be used:

fit <- bayes.nmr(Outcome, SD, XCovariate, ZCovariate, Trial, Treat, Npt,
        prior = list(c01 = 1.0e05, c02 = 4, df = 3),
        mcmc = list(ndiscard = 1, nskip = 1, nkeep = 1),
        Treat_order = c("PBO", "S", "A", "L", "R", "P",
                        "E", "SE", "AE", "LE", "PE"),
        scale_x = TRUE, verbose = TRUE)


Model descriptions for bayes.parobs

When there are multiple endpoints, the correlations thereof are oftentimes unreported. In a meta-regression setting, the correlations are something we want to know. In the case of subject-level meta-analyses (where individual participant/patient data are available), the correlations may only be one function call away. However, for study-level meta-analyses, the correlations must be estimated, which is not an easy task. bayes.parobs provides five different models to estimate and recover the missing correlations.

Since the aforementioned setting regards the correlations as missing, the reported data are \((\overline{\boldsymbol{y}}_{\cdot tk}, \boldsymbol{s}_{tk}) \in \mathbf{R}^J \times \mathbf{R}^J\) for the \(t\)th treatment arm and \(k\)th trial. Every trial out of \(K\) trials includes \(T\) treatment arms. The sample size of the \(t\)th treatment arm and \(k\)th trial is denoted by \(n_{tk}\). If we write \(\boldsymbol{x}_{tkj}\in \mathbf{R}^{p_j}\) to be the treatment-within-trial level regressor corresponding to the \(j\)th response, reflecting the fixed effects of the \(t\)th treatment arm, and \(\boldsymbol{w}_{tkj}\in \mathbf{R}^{q_j}\) to be the same for the random effects, the model becomes

\[\begin{equation}\label{eq:reduced-model} \overline{\boldsymbol{y}}_{\cdot tk} = \boldsymbol{X}_{tk}\boldsymbol{\beta} + \boldsymbol{W}_{tk}\boldsymbol{\gamma}_k + \overline{\boldsymbol{\epsilon}}_{\cdot tk}, \end{equation}\]

and \((n_{tk}-1)S_{tk} \sim \mathcal{W}_{n_{tk}-1}(\Sigma_{tk})\) where \(\boldsymbol{X}_{tk} = \mathrm{blockdiag}(\boldsymbol{x}_{tk1}^\prime,\boldsymbol{x}_{tk2}^\prime,\ldots,\boldsymbol{x}_{tkJ}^\prime)\), \(\boldsymbol{\beta} = (\boldsymbol{\beta}_1^\prime, \cdots, \boldsymbol{\beta}_J^\prime)^\prime\), \(\boldsymbol{W}_{tk}=\mathrm{blockdiag}(\boldsymbol{w}_{tk1}^\prime, \ldots,\boldsymbol{w}_{tkJ}^\prime)\), \(\boldsymbol{\gamma}_k = (\boldsymbol{\gamma}_{k1}^\prime,\boldsymbol{\gamma}_{k2}^\prime, \ldots,\boldsymbol{\gamma}_{kJ}^\prime)^\prime\), \(\overline{\boldsymbol{\epsilon}}_{\cdot tk} \sim \mathcal{N}(\boldsymbol{0}, \Sigma_{tk}/n_{tk})\), \(S_{tk}\) is the full-rank covariance matrix whose diagonal entries are the observed \(\boldsymbol{s}_{tk}\), and \(\mathcal{W}_\nu(\Sigma)\) is the Wishart distribution with \(\nu\) degrees of freedom and a \(J\times J\) scale matrix \(\Sigma\) whose density function is

\[ f(X\mid \nu,\Sigma) = \dfrac{1}{2^{J\nu}|\Sigma|^{\nu/2}\Gamma_J(\nu/2)}|X|^{(\nu-J-1)/2}\exp\left(-\dfrac{1}{2}\mathrm{tr}(\Sigma^{-1}X) \right)I(X\in \mathcal{S}_{++}^J), \]

where \(\Gamma_p\) is the multivariate gamma function defined by

\[ \Gamma_p(z) = \pi^{p(p-1)/4}\prod_{j=1}^p \Gamma[z+(1-j)/2], \]

and \(\mathcal{S}_{++}^J\) is the space of \(J\times J\) symmetric positive definite matrices. The statistical independence of \(\overline{\boldsymbol{\epsilon}}_{\cdot tk}\) and \(S_{tk}\) follows naturally from the Basu’s theorem.

The patients can sometimes be grouped by a factor that will generate disparate random effects. Although an arbitrary number of groups can exist in theory, metapack restricts the number of groups to two for practicality. Denoting the binary group indicates by \(u_{tk}\) yields

\[ \overline{y}_{\cdot tkj} = \boldsymbol{x}_{tkj}^\prime\boldsymbol{\beta} + (1-u_{tk})\boldsymbol{w}_{tkj}^\prime \boldsymbol{\gamma}_{kj}^0 + u_{tk}\boldsymbol{w}_{tkj}^\prime \boldsymbol{\gamma}_{kj}^1 + \overline{\epsilon}_{\cdot tkj}. \]

The random effects are modeled as \(\boldsymbol{\gamma}_{kj}^l \overset{\text{ind}}{\sim}\mathcal{N}(\boldsymbol{\gamma}_j^{l*},\Omega_j^l)\) and \((\Omega_j^l)^{-1} \sim \mathcal{W}_{d_{0j}}(\Omega_{0j})\). Stacking the vectors, \(\boldsymbol{\gamma}_k^l = ((\boldsymbol{\gamma}_{k1}^l)^\prime, \ldots, (\boldsymbol{\gamma}_{kJ}^l)^\prime)^\prime \sim \mathcal{N}(\boldsymbol{\gamma}^{l*},\Omega^l)\) where \(\boldsymbol{\gamma}^{l*} = ((\boldsymbol{\gamma}_{1}^{l*})^\prime,\ldots,(\boldsymbol{\gamma}_{J}^{l*})^\prime)^\prime\), \(\Omega_j = \mathrm{blockdiag}(\Omega_j^0,\Omega_j^l)\), and \(\Omega = \mathrm{blockdiag}(\Omega_1,\ldots,\Omega_J)\) for \(l \in \{0,1\}\). Adopting the non-centered reparametrization (Bernardo et al. 2003), define \(\boldsymbol{\gamma}_{k,o}^l = \boldsymbol{\gamma}_k^l - \boldsymbol{\gamma}^{l*}\). Denoting \(\boldsymbol{W}_{tk}^* = [(1-u_{tk})\boldsymbol{W}_{tk}, u_{tk}\boldsymbol{W}_{tk}]\), \(\boldsymbol{X}_{tk}^* = [\boldsymbol{X}_{tk},\boldsymbol{W}_{tk}^*]\), and \(\boldsymbol{\theta} = (\boldsymbol{\beta}^\prime, {\boldsymbol{\gamma}^{0*}}^\prime, {\boldsymbol{\gamma}^{1*}}^\prime)^\prime\), the model is written as follows:

\[ \overline{\boldsymbol{y}}_{\cdot tk} = \boldsymbol{X}_{tk}^*\boldsymbol{\theta}+\boldsymbol{W}_{tk}^*\boldsymbol{\gamma}_{k,o} + \overline{\boldsymbol{\epsilon}}_{\cdot tk}, \] where \(\boldsymbol{\gamma}_{k,o} = ((\boldsymbol{\gamma}_{k,o}^0)^\prime, (\boldsymbol{\gamma}_{k,o}^1)^\prime)^\prime\). If there is no grouping in the patients, setting \(u_{tk}=0\) for all \((t,k)\) reduces the model back to \(\eqref{eq:reduced-model}\).

The conditional distribution of \((R_{tk} \mid V_{tk}, \Sigma_{tk})\) where \(R_{tk} = V_{tk}^{-\frac{1}{2}}S_{tk}V_{tk}^{-\frac{1}{2}}\) and \(V_{tk} = \mathrm{diag}(S_{tk11},\ldots,S_{tkJJ})\) becomes

\[ f(R_{tk}\mid V_{tk},\Sigma_{tk}) \propto |R_{tk}|^{(n_{tk}-J-2)/2}\exp\left\{-\dfrac{(n_{tk}-1)}{2}\mathrm{tr}\left(V_{tk}^{\frac{1}{2}}\Sigma_{tk}^{-1}V_{tk}^{\frac{1}{2}}R_{tk} \right) \right\}. \]

Model descriptions for bayes.nmr

Network meta-analysis is an extension of meta-analysis where more than two treatments are compared. Unlike the traditional meta-analyses that restrict the number of treatments to be equal across trials, network meta-analysis allows varying numbers of treatments. This achieves a unique benefit that two treatments that have not been compared head-to-head can be assessed as a pair.

Start by denoting the comprehensive list of treatments in all \(K\) trials by \(\mathcal{T}=\{1,\ldots,T\}\). It is rarely the case that all \(T\) treatments are included in the data but we drop the subscripts \(t_k\) and replace it with \(t\) for notational simplicity. Now, consider the model

\[\begin{equation}\label{eq:nmr-basic} \bar{y}_{\cdot tk} = \boldsymbol{x}_{tk}^\prime\boldsymbol{\beta} + \tau_{tk}\gamma_{tk} + \bar{\epsilon}_{\cdot tk}, \quad \bar{\epsilon}_{\cdot tk} \sim \mathcal{N}(0,\sigma_{tk}^2/n_{tk}), \end{equation}\] where \(\bar{y}_{\cdot tk}\) is the univariate aggregate response of the \(k\)th trial for which treatment \(t\) was assigned, \(\boldsymbol{x}_{tk}\) is the aggregate covariate vector for the fixed effects, and \(\gamma_{tk}\) is the random effects term. The observed standard deviation, \(s_{tk}^2\) is modeled by

\[ \dfrac{(n_{tk}-1)s_{tk}^2}{\sigma_{tk}^2} \sim \chi_{n_{tk}-1}^2. \]

\(\tau_{tk}\) in Equation \(\eqref{eq:nmr-basic}\) encapsulates the variance of the random effect for the \(t\)th treatment in the \(k\)th trial, which is modeled as a deterministic function of a related covariate. That is,

\[ \log \tau_{tk} = \boldsymbol{z}_{tk}^\prime\boldsymbol{\phi}, \]

where \(\boldsymbol{z}_{tk}\) is the \(q\)-dimensional aggregate covariate vector and \(\boldsymbol{\phi}\) is the corresponding coefficient vector.

For the \(k\)th trial, we define a selection/projection matrix \(E_k = (e_{t_{1k}},e_{t_{2k}},\ldots, e_{t_{T_k k}})\), where \(e_{t_{lk}} = (0,\ldots,1,\ldots,0)^\prime\), \(l=1,\ldots,T_k\), with \(t_{lk}\)th element set to 1 and 0 otherwise, and \(T_k\) is the number of treatments included in the \(k\)th trial. Let the scaled random effects \(\boldsymbol{\gamma}_k = (\gamma_{1k},\ldots,\gamma_{Tk})^\prime\). Then, \(\boldsymbol{\gamma}_{k,o}=E_k^\prime\boldsymbol{\gamma}_k\) is the vector of \(T_k\)-dimensional scaled random effects for the \(k\)th trial. The scaled random effects \(\boldsymbol{\gamma}_k \sim t_T(\boldsymbol{\gamma},\boldsymbol{\rho},\nu)\) where \(t_T(\boldsymbol{\mu},\Sigma,\nu)\) denotes a multivariate \(t\) distribution with \(\nu\) degrees of freedom, a location parameter vector \(\boldsymbol{\mu}\), and a scale matrix \(\Sigma\).

The non-centered reparametrization (Bernardo et al. 2003) gives \(\boldsymbol{\gamma}_{k,o} = E_k^\prime(\boldsymbol{\gamma}_k - \boldsymbol{\gamma})\). Then, with \(\bar{\boldsymbol{y}}_k = (\bar{y}_{kt_{k1}},\ldots,\bar{y}_{kt_{kT_k}})^\prime\), \(\boldsymbol{X}_k = (\boldsymbol{x}_{kt_{k1}},\ldots, \boldsymbol{x}_{kt_{kT_k}})\), and \(\boldsymbol{Z}_k(\boldsymbol{\phi}) = \mathrm{diag}(\exp(\boldsymbol{z}_{kt_{k1}}^\prime \boldsymbol{\phi}),\ldots, \exp(\boldsymbol{z}_{kt_{kT_k}}^\prime \boldsymbol{\phi}))\), the model is recast as

\[ \bar{\boldsymbol{y}}_k = \boldsymbol{X}_k^* \boldsymbol{\theta} + \boldsymbol{Z}_k(\boldsymbol{\phi}) \boldsymbol{\gamma}_{k,o} + \bar{\boldsymbol{\epsilon}}_k, \] where \(\boldsymbol{X}_k^* = (\boldsymbol{X}_k, E_k^\prime)\), \(\boldsymbol{\theta} = (\boldsymbol{\beta}^\prime, \boldsymbol{\gamma}^\prime)^\prime\), and \(\bar{\boldsymbol{\epsilon}}_k \sim \mathcal{N}_{T_k}(\boldsymbol{0},\Sigma_k)\), \(\Sigma_k = \mathrm{diag}(\sigma_{kt_{k1}}^2/n_{kt_{k1}}, \ldots, \sigma_{kt_{kT_k}}^2/n_{kt_{kT_k}})\). This allows the random effects \(\boldsymbol{\gamma}_{k,o} \sim t_{T_k}(\boldsymbol{0},E_k^\prime \boldsymbol{\rho}E_k, \nu)\) to be centered at zero.

Since the multivariate \(t\) random effects are not analytically marginalizable, we represent it as a scale mixture of normals as follows:

\[ (\boldsymbol{\gamma}_{k,o}\mid \lambda_k) \overset{\text{ind}}{\sim} \mathcal{N}_{T_k}\left(\boldsymbol{0}, \lambda_k^{-1}(E_k^\prime\boldsymbol{\rho}E_k) \right), \quad \lambda_k \overset{\text{iid}}{\sim}\mathcal{G}a\left(\dfrac{\nu}{2},\dfrac{\nu}{2} \right), \] where \(\mathcal{G}a(a,b)\) indicates the gamma distribution with mean \(a/b\) and variance \(a/b^2\).


Bernardo, JM, MJ Bayarri, JO Berger, AP Dawid, D Heckerman, AFM Smith, and M West. 2003. “Non-Centered Parameterisations for Hierarchical Models and Data Augmentation.” In Bayesian Statistics 7: Proceedings of the Seventh Valencia International Meeting. Vol. 307. Oxford University Press, USA.

Eddelbuettel, Dirk, and James Joseph Balamuta. 2017. “Extending R with C++: A Brief Introduction to Rcpp.” PeerJ Preprints 5 (August): e3188v1.

Eddelbuettel, Dirk, and Conrad Sanderson. 2014. “RcppArmadillo: Accelerating R with High-Performance C++ Linear Algebra.” Computational Statistics and Data Analysis 71: 1054–63.

Yao, Hui, Sungduk Kim, Ming-Hui Chen, Joseph G Ibrahim, Arvind K Shah, and Jianxin Lin. 2015. “Bayesian Inference for Multivariate Meta-Regression with a Partially Observed Within-Study Sample Covariance Matrix.” Journal of the American Statistical Association 110 (510): 528–44.