Sample size calculation should reflect the complexity of the survey design by accounting for the weighting, stratification, and clustering in the survey design. A shortcut method of computing a total sample size that will yield a target level of precision is to compute a sample size using a simple random sampling (SRS) formula and then adjust by a design effect. Design effects may also be of interest to analysts to better understand how much the standard errors of their estimates are affected by complex design features.

The **design effect** (*deff*) is the ratio of
the variance of a survey statistic under the complex design to the
variance of the survey statistic under an SRS. The SRS variance can be
for with- or without-replacement sampling; both usages can be found in
the literature. Represented as a formula, the design effect is: \[\small deff(\hat{\theta}) =
\frac{V_{complex}(\hat{\theta})}{V_{SRS}(\hat{\theta})}\] where
\(\small \hat{\theta}\) is an estimator
of a chosen parameter.

Given a survey dataset, *deff*’s can be computed directly by
estimating both the complex sample variance and the SRS variance and
taking the ratio. In specific cases, formulas have been derived that
account for different features of sample designs and analysis variables.
Some of these are available in the PracTools package (Valliant, Dever, and Kreuter 2023).

A *deff* is specific to a particular estimator. The
*deff*’s for estimated mean per capita income and the estimated
proportion of persons who have received the latest coronavirus booster
shot may be quite different even if both come from the same survey.

The **effective sample size** is the sample size for an
estimator \(\small \hat{\theta}\) from
a complex sample divided by the design effect of \(\small \hat{\theta}\), i.e. \(\small n_{eff}= n/deff(\hat{\theta})\).

In other words, the effective sample size is the size of an SRS that would yield the same variance as that produced by the complex design. For example, if a complex sample of n=1000 elements has a design effect of 1.25 for some estimate, the effective sample size is \(\small n_{eff} = 1000/1.25 = 800\). That is, the complex sample of 1000 is only as precise as an SRS of 800 given the design effect formula implicit in this example. Different design effect formulas may be derived for different sample designs and different covariate data, as described below.

This vignette provides an overview on design effect components and formulas, discusses the PracTools design effect functions that estimate the design effects and gives examples on when and how to apply them.

Complex sample variances can be affected by three components:

- Weighting

- Stratification
- Clustering

In general, clustering increase the design effect (and decrease the effective sample size) while stratification decreases the design effect. Weighting can either increase or decrease complex sample variances, depending on how the weights are derived. Different design effect formulas applied over these components will result in different final design effects. These are reviewed below in the context of the design effect components.

The most common formula for the weighting component of the design effect is the one proposed by Kish (1965), where:

\[\small deff_K = 1 + relvar(\mathbf{w}) = 1+ CV^2(\mathbf{w}) = 1 + \frac{1}{n} \sum_{i=1}^n (w_i- \bar{w})^2/ \bar{w}^2,\]

\(\mathbf{w}\) is the vector of
sample weights and *relvar* denotes relvariance. Kish’s formula
only accounts for an increase in variance due to having unequal weights
and is derived under some extremely restrictive assumptions. It applies
to a stratified, simple random sample (STSRS). If all strata population
variances are equal, a proportional allocation is optimal for estimating
a mean. Since all weights are equal in a proportionally allocated STSRS,
any departure from that where the relvariance of the weights is non-zero
will be suboptimal, leading to \(\small deff_K
> 1\). Practitioners often use Kish’s deff even when the
sample is more complicated than STSRS because \(\small deff_K\) is so easy to compute.

However, \(\small deff_K\) is not always relevant in surveys where variances differ across strata, where subgroups are intentionally sampled at different rates, and/or where different subgroups have substantially different response rates. Having unequal weights in those situations is desirable and can more nearly meet analytic goals than having equal weights will.

Chen and Rust (2017) formulate a design effect that can be broken into the three components listed above. The Chen-Rust formula for the stratum \(\small h\) component of the design effect is:

\[\small deff_{strat,h} = W_h^2 \frac{n}{n_h}\frac{\sigma_h^2}{\sigma^2}\]

where \(\small W_h = N_h/N\) and \(\small \sigma^2\) is the population unit variance. Stratification with an efficient allocation to strata will reduce the design effect (i.e. increase the effective sample size). When the sample is allocated optimally over the strata, then the design effect is necessarily less than or equal to one (Cochran (1977), Section 5.6). Stratification is most effective when the \(\small y\)’s for elements within each strata are homogeneous and the \(y\)’s for elements between strata (i.e., \(\small y\)’s in different strata) are heterogeneous. As Kish states, the variance decreases “to the degree that the stratum means diverge and that homogeneity exists within strata.” (Kish (1965), pg. 76). In sum, good stratification design increases effective sample size.

The clustering formula takes into account the variability in each PSU cluster. The greater the homogeneity of the cluster, the more the design effect increases. The stratum \(\small h\) formula is:

\[\small deff_{clus,h} = 1 + \rho_h (n_h^{*}-1) \]

where \(\small \rho_h\) is the intraclass correlation coefficient in stratum \(h\), which measures the homogeneity of the cluster, and \(\small n_h^{*}\) is a type of weighted average number of sampling elements taken from each cluster. Consequently, clustering typically increases the design effect. As can be seen, as the homogeneity \(\small \rho\) increases and as cluster sample size \(\small n_h^{*}\) increases, the design effect increases. In some special cases, \(\small n_h^{*}\) reduces to the unweighted average number of sample elements per cluster, \(\small \bar{n}\).

Chen and Rust (2017) combine the above formulas across strata so that:

\[\small deff = \sum_{h=1}^H deff_{strat,h} \times deff_{wts,h} \times deff_{clus,h} = \sum_{h=1}^H W_h^2 \frac{n}{n_h} \frac{\sigma_h^2}{\sigma^2} [1+ relvar_h (\mathbf{w})][1 + \rho_h (n_h^{*}-1)]\]

where \(\small deff_(wts,h) = 1 + relvar_h(\mathbf{w})\) is the Kish deff for weights within stratum \(\small h\).

This formula has the advantage that the component parts of the design effect can be individually calculated and understood. However, as discussed above, the Kish design effect equal to \(1 + relvar(\mathbf{w})\) does not fully account for the possibility that the estimators may be more efficient than the relvariance of their weights implies. This is particularly true for probability proportional to size sampling or for the general regression estimator.

Several more sophisticated design effects described below have been
developed by later authors. Spencer (2000)
derived a *deff* for an estimated total (not means) of \(\small y\)’s assuming that the \(\small y\) variable can be modeled by the
linear regression model:

\[\small {{y}_{i}}=\alpha +\beta {{P}_{i}}+{{\varepsilon }_{i}}\]

where \(\small {{w}_{i}}={1}/{\left( n{{P}_{i}} \right)}\;\) is the weight for sample element \(\small i\) and the \(\small \epsilon_i\)’s are independent errors with mean 0 and common variance. The estimator of the total used by Spencer was the Horvitz-Thompson estimator (or \(\small \pi\)-estimator). Spencer’s design effect formula then adjusts the design effect of the weights by the model \(\small R^2\), regression coefficients, and variance of the model. When the model variance is large and the \(\small R^2\) is low, Kish’s design effect and Spencer’s design effect are close.

Henry and Valliant (2015) generalized
the *deff* to a general regression estimator (GREG) that includes
auxiliary variables under the model: \[\small
{{y}_{i}}=\alpha +\mathbf{x}_{i}^{T}\mathbf{\beta }+{{\varepsilon
}_{i}}\] where \(\small
\mathbf{x}_{i}\) is a vector of covariates used in the GREG.

Unlike the Spencer and Henry design effects, the Chen & Rust
design effect (2017) considers clustering but does not account for any
covariates when deriving their *deff*. The model they assume
includes strata and clustering and is an extension of work by Gabler, Haeder, and Lahiri (1999):

\[\begin{equation} \small {{E}_{M}}\left( {{y}_{hij}} \right)={{\mu }_{h}},~~\text{ }Co{{v}_{M}}\left( {{y}_{hij}},{{y}_{{h}'{i}'{j}'}} \right)=\left\{ \begin{array}{*{35}{l}} \sigma _{h}^{2}\qquad \quad h={h}',i={i}',j={j}', \\ {{\rho }_{h}}\sigma _{h}^{2}\qquad h={h}',i={i}',j\ne {j}', \\ 0\qquad \quad ~~\text{otherwise}. \\ \end{array} \right. \end{equation}\]

They do provide a useful decomposition of the *deff* for a
weighted mean estimator into factors due to stratification, clustering,
and unequal weighting that we listed in the *Combined Formula*
section above.

In sum, the survey practitioner needs to understand the complexity of the survey design and the potential value in modeling for estimating the design effect. The above is intended to frame and assist in this understanding.

PracTools has the following design effect functions:
`deff`

, `deffCR`

, `deffH`

,
`deffK`

, and `deffS`

. `deff`

is a
wrapper function in that it calls `deffCR`

,
`deffH`

, `deffK`

, or `deffS`

depending
on the `type=`

option. The following table compares the
PracTools design effect functions:

Function name | Parameters | Description |
---|---|---|

deffCR | w = vector of weights for a sample | Chen-Rust design effect for an estimated mean from a stratified, clustered, two-stage sample. Produces design effects by weights, strata, and cluster at the strata level and overall. |

strvar = vector of stratum identifiers | ||

clvar = vector of cluster identifiers | ||

Wh = vector of the proportions of elements that are in each stratum | ||

nest = whether cluster IDs numbered within strata | ||

y = vector of sample values of an analysis variable | ||

deffH | w = vector of inverses of selection probabilities for a sample | Henry design effect for single-stage samples when a general regression estimator is used for a total. |

y = vector of the sample values of an analysis variable | ||

x = matrix of covariates used to construct a GREG estimator of the total | ||

deffK | w = vector of weights for a sample | Kish design effect due to unequal weights. |

deffS | p = vector of 1-draw selection probabilities | Spencer design effect for an estimated total from a single-stage sample selected by PPS. |

w = vector of weights for a sample | ||

y = vector of the sample values of an analysis variable |

**Example 1: smho.N874 dataset** PracTools comes with
the `smho.N874`

data set, which is a data frame of 874
observations from the 1998 Survey of Mental Health Organizations (SMHO).
It contains two variables of importance here: EXPTOTAL, which is a
variable of analytic interest, and BEDS, which may be used as a measure
of size in probability proportional to size (PPS) sampling. The
following code is used to create the inputs for `deffK`

,
`deffS`

, and `deffH`

.

```
## libraries needed for example
library(PracTools)
library(sampling)
library(ggplot2)
## Use PracTools smho.N874 data set
data(smho.N874)
## Remove hosp.type == 4 as it is out-patient and has no BEDS
smho <- smho.N874[smho.N874$hosp.type != 4, ]
## Use co-variate BEDS for MOS
## Re-code BEDS to have a minimum MOS of 5
smho$BEDS[smho$BEDS <= 5] <- 5
## Create 1-draw probability vector based on BEDS
smho$pi1 <- inclusionprobabilities(smho$BEDS, 1)
## Create vector for sampling n=50
pik <- inclusionprobabilities(smho$BEDS, 50)
## Create sample
seed <- 20230802
set.seed(seed)
sample <- UPrandomsystematic(pik = pik)
smho.samp <- smho[sample == 1, ]
## Create vector of weights
wgt <- 1/pik[sample == 1]
```

The Kish design effect requires just the weights, while the Spencer requires a vector of sample length for the 1-draw probabilities, the weights of the sample, and the analytic variable that is regressed on the 1-draw probabilities.

```
## Kish
deffK(wgt)
[1] 6.263141
## Spencer
deffS(p = smho.samp$pi1,
w = wgt,
y = smho.samp$EXPTOTAL)
[1] 0.7130517
```

As can be seen, Spencer’s *deff* is substantially below Kish’s
*deff*, and more crucially, below 1 (i.e. the effective sample
size increases). Spencer’s *deff* suggests that PPS sampling on
BEDS is more efficient than just simple random sampling for the analytic
variable of interest. On the other hand, the Kish *deff* is
completely misleading since it says that SRS would be a much more
efficient design. A regression of EXPTOTAL on the 1-draw weights derived
from BEDS confirms this point.

```
summary(lm(smho$EXPTOTAL ~ smho$pi1))
lm(formula = smho$EXPTOTAL ~ smho$pi1)
Residuals:
Min 1Q Median 3Q Max
-67989801 -4123670 -1897337 1655133 138295163
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.628e+06 5.742e+05 6.318 4.62e-10 ***
smho$pi1 6.144e+09 2.304e+08 26.670 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 12880000 on 723 degrees of freedom
Multiple R-squared: 0.4959, Adjusted R-squared: 0.4952
F-statistic: 711.3 on 1 and 723 DF, p-value: < 2.2e-16
```

The plot with a regression line with an intercept shows a fairly strong relationship and suggests that PPS sampling with respect to BEDS is efficient. (In this example, a no-intercept model would be a better fit, but the Spencer deff is derived with the assumption that the model has an intercept.)

Henry’s design effect uses a matrix of covariates to build the model.
The covariates used below for a GREG estimator of the total of EXPTOTAL
are described in the help file for `smho.N874`

.

```
## Create matrix of covariates
x <- cbind(smho.samp$BEDS,
smho.samp$SEENCNT,
smho.samp$EOYCNT,
as.factor(smho.samp$FINDIRCT),
as.factor(smho.samp$hosp.type))
deffH(w = wgt,
x = x,
y = smho.samp$EXPTOTAL)
[1] 0.6549418
```

This is consistent with Spencer’s design effect in saying that the design is more efficient than SRS but also that the GREG estimator is more efficient than the \(\small \pi\)-estimator.

**Example 2: NHANES 2017-2018** The National Health and
Nutrition Examination Survey is a program of studies designed to assess
the health and nutritional status of adults and children in the United
States. While the COVID pandemic caused a break in operations, it has
typically run on a two-year schedule. The data used here are from the
2017 – 2018 NHANES (https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?BeginYear=2017}).
The datasets, DEMO_J.XPT, BMX_J.XPT, and BPX_J.XPT, used below can be
downloaded from that site. For the example here, the following code
recreates the data set:

```
library(haven)
nhanes.demo <- read_xpt("C:/NHANES/DEMO_J.XPT")
nhanes.bm <- read_xpt("C:/NHANES/BMX_J.XPT")
nhanes.bp <- read_xpt("C:/NHANES/BPX_J.XPT")
library(dplyr)
nhanes.demo <- nhanes.demo %>% select(SEQN, SDMVPSU, SDMVSTRA, WTMEC2YR)
nhanes.bm <- nhanes.bm %>% select(SEQN, BMXHT, BMXWT, BMXBMI)
nhanes.bp <- nhanes.bp %>% select(SEQN, BPXSY1, BPXDI1)
## Combine data sets
nhanes <- inner_join(nhanes.demo, nhanes.bm, by = "SEQN")
nhanes <- inner_join(nhanes, nhanes.bp, by = "SEQN")
```

Using the `deffCR`

function in PracTools on BMXHT
(Standing Height in centimeters):

```
deffCR(w = nhanes.na.rm$WTMEC2YR,
strvar = nhanes.na.rm$SDMVSTRA,
Wh = NULL,
clvar = nhanes.na.rm$SDMVPSU,
nest = TRUE,
y = nhanes.na.rm$BMXHT) ## Note: Missing BMXHT values have been removed
$`strata components`
stratum nh rhoh cv2wh deff.w deff.c deff.s
[1,] 134 434 -0.0104060072 1.3777530 2.377753 5.358045e-02 0.06902480
[2,] 135 548 -0.0063819241 1.2210056 2.221006 1.793669e-01 0.08717926
[3,] 136 595 -0.0059338153 1.5680748 2.568075 3.162617e-01 0.06054293
[4,] 137 489 -0.0109846523 1.6919849 2.691985 1.156024e-05 0.05199803
[5,] 138 543 -0.0078028704 1.4940016 2.494002 1.506828e-01 0.04321209
[6,] 139 563 -0.0081864680 1.2916004 2.291600 2.348559e-03 0.06166823
[7,] 140 531 -0.0010210084 1.1654563 2.165456 8.755365e-01 0.06117508
[8,] 141 609 -0.0067846425 1.1206941 2.120694 3.259820e-02 0.09565024
[9,] 142 633 -0.0045075356 0.7718338 1.771834 1.889764e-01 0.13253047
[10,] 143 459 -0.0083420387 1.2122912 2.212291 1.259705e-01 0.10672679
[11,] 144 608 0.0043127129 1.5729769 2.572977 1.517876e+00 0.05435944
[12,] 145 525 -0.0006598233 1.4624627 2.462463 9.296262e-01 0.05791430
[13,] 146 519 -0.0073484461 0.9075594 1.907559 7.618830e-03 0.09556860
[14,] 147 513 -0.0099422966 2.2498915 3.249891 1.529503e-01 0.04799347
[15,] 148 447 0.0064413472 1.1482058 2.148206 1.675286e+00 0.01386007
$`overall deff`
[1] 0.7259837
```

As the example shows, `deffCR`

provides the design effects
by weights, cluster, and strata, and includes the overall design effect.
As the overall design effect is less than one, the benefits of
stratification are obvious here.

For BMXBMI (Body Mass Index), a very different overall deff is calculated.

```
$`strata components`
stratum nh rhoh cv2wh deff.w deff.c deff.s
[1,] 134 433 -0.011048345 1.3841270 2.384127 0.000807066 0.05788729
[2,] 135 547 0.016890014 1.2234169 2.223417 3.164102451 0.07694603
[3,] 136 595 -0.005566258 1.5680748 2.568075 0.358614475 0.04902415
[4,] 137 487 -0.010918010 1.7022363 2.702236 0.014372671 0.04966114
[5,] 138 542 0.002112249 1.4931232 2.493123 1.229605157 0.03612878
[6,] 139 562 -0.003298061 1.2890828 2.289083 0.598349310 0.06717454
[7,] 140 531 -0.004601001 1.1654563 2.165456 0.439126324 0.06773169
[8,] 141 609 0.013328540 1.1206941 2.120694 2.900476535 0.08671762
[9,] 142 631 -0.005571733 0.7734050 1.773405 0.002095250 0.14670977
[10,] 143 458 0.105498156 1.2145247 2.214525 12.012659877 0.09520918
[11,] 144 608 0.012497688 1.5729769 2.572977 2.500738233 0.06217566
[12,] 145 524 -0.005311942 1.4622187 2.462219 0.434586798 0.07055294
[13,] 146 518 0.080413254 0.9046178 1.904618 11.854614598 0.10125102
[14,] 147 513 -0.008847636 2.2498915 3.249891 0.246211683 0.04876794
[15,] 148 447 0.018572947 1.1482058 2.148206 2.947117560 0.01564027
$`overall deff`
[1] 6.822108
```

Here, the design effect is much greater than 1. However, this is
driven largely by high `deff.c`

values in strata 10 and 13.
An advantage to using `deffCR`

is that it allows the
researcher to explore differences in the design effect at the stratum
level and look for unusual situations.

The `survey`

package also calculates design effects, using
the definitional formula given at the beginning of this vignette, \(\small deff(\hat{\theta}) =
\frac{V_{complex}(\hat{\theta})}{V_{SRS}(\hat{\theta})}\), by
directly estimating the numerator and denominator given the input data.
This is a common approach among packages that handle survey data. \(\small V_{complex}(\hat{\theta})\) is
calculated using all features of a design (weights, clusters, strata)
while \(\small V_{SRS}(\hat{\theta})\)
is an estimate of variance of the same parameter as it would be
estimated from an SRS. While there is other literature available on the
`survey`

package (e.g., see Lumley
(2010), Lumley (2020)), the basics
are illustrated here using the 2017-2018 NHANES data file.

```
## Call survey library
library(survey)
## Create survey design object
deff.dsn <- svydesign(ids = ~SEQN,
strata = ~SDMVSTRA,
weights = ~WTMEC2YR,
nest = TRUE,
data = nhanes)
## Calculate mean and total for BMI, after removing missing values
## deff=TRUE option provides the design effect
> svymean(~BMXBMI, deff.dsn, na.rm = TRUE, deff=TRUE)
mean SE DEff
BMXBMI 27.67122 0.12744 2.0359
> svytotal(~BMXBMI, deff.dsn, na.rm = TRUE, deff=TRUE)
total SE DEff
BMXBMI 8554169950 125687156 20.723
```

Note the difference in design effect for the mean and the total,
illustrating that the design effect is specific to the parameter being
estimated. Here, the only difference in the variable is mean vs. total
of BMXBMI. Also, different design effect formulas will yield different
design effect calculations. The Chen-Rust *deff* of about 6.82
for mean BMXBMI is considerably different from the `survey`

package *deff* of 2.04. The Chen-Rust *deff* is computed
assuming model (1) for \(\small y\)
while the survey package directly estimates the *deff* with no
model assumptions. Since model (1) may be incorrect, the two are not
guaranteed to be equal; both are greater than 1, conveying the point
that the design is less precise than an SRS of the same size would be.
It should be noted that the `survey`

package provides only
the final summary design effect based on the formula above with no
decomposition into components for stratification, clustering, and
weighting.

Given design effects from previous surveys, the intraclass correlation coefficient \(\small \rho\), which measures the homogeneity within clusters, can be approximated from the formula, \(\small deff_{clus}\left( {\hat{\theta }} \right) = 1 + \rho \left( \bar{n}-1 \right)\), which comes from the variance for an estimated mean or total. Solving that equation for gives:

\[\begin{equation} \small \rho = \frac{deff_{clus}(\hat{\theta})-1}{\bar{n}-1} \end{equation}\]

where \(\small deff_{Clus}(\hat{\theta})\) is the cluster design effect for the estimator \(\small \hat{\theta}\) and \(\small \bar{n}\) is the average number of elements sampled from each cluster. (Note that since an estimator is also based on a specific \(\small y\), we could subscript \(\small \rho\) with a \(\small y\) to emphasize that dependence.)

`deffCR`

evaluates a more elaborate estimate than
expression (2) of \(\small \rho_h\) for
each stratum of a stratified, two-stage design. Consequently, the output
from `deffCR`

(rhoh in the example above) will differ from
values computed from the simpler formula in (2).

Kish suggests a mnemonic for \(\small \rho\), pronounced “roh”, can be remembered as “rate of homogeneity,” and observes that in practice, “the distribution of the population in those clusters is generally not random. Instead, it is characterized by some homogeneity that tends to increase the variance of the sample.” The measure of that homogeneity is roh. Because, in a complex survey, the design effect is typically greater than one, negative values of roh are uncommon and only occur when the cluster means are more uniform than random. Also note that even a small positive roh can have a big impact on the design effect if \(\small \bar{n}\) is large.

Recall that the design effect is computed for a specific estimate. The design effect for different estimates can be very different, even if they are from the same sample and use the same weights. When developing a sample size, it is good to look at the design effects of several of the variables of interest, and not just one. This is often done using the design effects from similar or previous surveys.

Looking at the 2017-2018 NHANES data set again, we can calculate the
overall *deff* for the following estimated means: height (BMXHT),
weight (BMXWT), body mass index (BMXBMI), systolic blood pressure
(BPXSY1), diastolic blood pressure (BPXDI1), and hypertension (BPXSY1
> 130 or BPXDI1 > 80). The following table shows:

2017-2018 NHANES Variable | Design Effect (deffCR) |
---|---|

Height | 0.7259837 |

Weight | 3.8322210 |

Body Mass Index | 6.8221081 |

Systolic Blood Pressure | 2.5758779 |

Diastolic Blood Pressure | 7.1351104 |

Hypertension | 2.4553125 |

As can be seen, there is a lot of variation in the design effects, even with such related variables as height and weight. The survey practitioner is advised to calculate design effects on many variables and get a sense of the overall design effects and potential risks to the analytic objectives if the sample size is too small for many, but not all, variables.

Using deffs from earlier surveys can have serious limitations even if
the new sample is for an updated version of the previous one. If the new
design will have different strata or cluster definitions than the last,
*deff*’s from the previous survey may not apply to the new
design. If response rates in the new sample are likely to be
substantially different from those of the earlier survey, this must also
be considered when determining a sample size. Finally, an overall
*deff* for a two-stage (or more than two-stage) sample does not
separate the first- and second-stage sample sizes. Additional
calculations are needed for those; e.g., see Valliant, Dever, and Kreuter (2018), ch. 9.

Sample size formulas exist which account for the survey design
complexity. (See the vignette, “Selection of Appropriate PracTools
Sample Size Function” at https://CRAN.R-project.org/package=PracTools.) While
these sample size formulas can account for design complexities (like
stratification and clustering), they do not typically account for the
effect of weighting. Using a *deff* to compute a sample size is a
shortcut method to getting the total sample size, but it does not tell
the sample designer how to allocate the sample to strata or to different
stages in a multistage sample. It should be also remembered that the
design effect can vary from one variable to the next. Modeling
approaches to estimating the design effect, as found in Chen and Rust (2017) and Henry and Valliant (2015), may be helpful. The
survey practitioner is advised to use the design-appropriate sample size
formula and compute design effects on important variables and consider
modeling so as to ensure that the effective sample size will meet the
analytic requirements.

Chen, S., and K. F. Rust. 2017. “An Extension of Kish’s Formula
for Design Effects to Two- and Three-Stage Designs with
Stratification.” *Journal of Survey Statistics and
Methodology* 5 (2): 111–30.

Cochran, W. G. 1977. *Sampling Techniques*. New York: John Wiley
& Sons, Inc.

Gabler, S., S. Haeder, and P. Lahiri. 1999. “A Model Based
Justification of Kish’s Formula for Design Effects for
Weighting and Clustering.” *Survey Methodology* 25 (1):
105–6.

Henry, K. A., and R. Valliant. 2015. “A Design Effect Measure for
Calibration Weighting in Single–Stage Samples.” *Survey
Methodology* 41: 315–31.

Kish, L. 1965. *Survey Sampling*. New York: John Wiley &
Sons, Inc.

Lumley, T. 2010. *Complex Surveys: A Guide to Analysis Using
R*. New York: John Wiley & Sons, Inc.

———. 2020. *Survey: Analysis of Complex Survey Samples, Version
4.2-1*. https://CRAN.R-project.org/package=survey.

Spencer, B. D. 2000. “An Approximate Design Effect for Unequal
Weighting When Measurements May Correlate with Selection
Probabilities.” *Survey Methodology* 26 (2): 137–38.

Valliant, R., J. A. Dever, and F. Kreuter. 2018. *Practical Tools for
Designing and Weighting Survey Samples*. 2nd ed. New York:
Springer-Verlag.

———. 2023. *PracTools: Tools for Designing and Weighting Survey
Samples, Version 1.4*. https://CRAN.R-project.org/package=PracTools.